AI+AR based Framework for Guided Visual Diagnosis of Equipment

Teresa Gonzalez Diaz; Xian Yeow Lee; Huimin Zhuge; Lasitha Vidyaratne; Gregory Sin; Tsubasa Watanabe; Ahmed Farahat; Chetan Gupta

doi:10.36001/phmconf.2024.v16i1.3909

AI+AR based Framework for Guided Visual Diagnosis of Equipment

PDF

Published Nov 5, 2024

DOI https://doi.org/10.36001/phmconf.2024.v16i1.3909

Teresa Gonzalez Diaz

Hitachi America

Xian Yeow Lee

Hitachi America R&D

Huimin Zhuge

Hitachi America R&D

Lasitha Vidyaratne

Hitachi America R&D

Gregory Sin

Hitachi America R&D

Tsubasa Watanabe

Hitachi America R&D

Ahmed Farahat

Hitachi America

Chetan Gupta

Hitachi America R&D

Abstract

Automated solutions for effective support services, such as failure diagnosis and repair, are crucial to keep customer satisfaction and loyalty. However, providing consistent, high quality, and timely support is a difficult task. In practice, customer support usually requires technicians to perform onsite diagnosis, but service quality is often adversely affected by limited expert technicians, high turnover, and minimal automated tools. To address these challenges, we present a novel solution framework for aiding technicians in performing visual equipment diagnosis. We envision a workflow where the technician reports a failure and prompts the system to automatically generate a diagnostic plan that includes parts, areas of interest, and necessary tasks. The plan is used to guide the technician with augmented reality (AR), while a perception module analyzes and tracks the technician’s actions to recommend next steps. Our framework consists of three components: planning, tracking, and guiding. The planning component automates the creation of a diagnostic plan by querying a knowledge graph (KG). We propose to leverage Large Language Models (LLMs) for the construction of the KG to accelerate the extraction process of parts, tasks, and relations from manuals. The tracking component enhances 3D detections by using perception sensors with a 2D nested object detection model. Finally, the guiding component reduces process complexity for technicians by combining 2D models and AR interactions. To validate the framework, we performed multiple studies to:1) determine an effective prompt method for the LLM to construct the KG; 2) demonstrate benefits of our 2D nested object model combined with AR model.

How to Cite

Gonzalez Diaz, T., Lee, X. Y., Zhuge, H., Vidyaratne, L., Sin, G., Watanabe, T., Farahat, A., & Gupta, C. (2024). AI+AR based Framework for Guided Visual Diagnosis of Equipment. Annual Conference of the PHM Society, 16(1). https://doi.org/10.36001/phmconf.2024.v16i1.3909

Abstract 307 | PDF Downloads 303

Keywords

Automated diagnostic systems, knowledge graphs, relation extraction, large language models, scene understanding, object detection, augmented reality

References

[1] Hütten, N.; Alves Gomes, M.; Hölken, F.; Andricevic, K.; Meyes, R.; Meisen, T. Deep Learning for Automated Visual Inspection in Manufacturing and Maintenance: A Survey of Open- Access Papers. Appl. Syst. Innov. 2024, 7, 11.
[2] Jang, J.; Shin, M.; Lim, S.; Park, J.; Kim, J.; Paik, J. Intelligent Image-Based Railway Inspection System Using Deep Learning-Based Object Detection and Weber Contrast-Based Image Comparison. Sensors 2019, 19, 4738.
[3] Gonzalez, Teresa. et. al Guided Visual Inspection enabled by AI-based Detection Models. (2021). 1-8. 10.1109/ICPHM51084.2021.9486573.
[4] Shalaby, W., Arantes, A., GonzalezDiaz, T.,; Gupta, C. (2020, June). Building chatbots from large scale domain-specific knowledge bases: Challenges and opportunities. In 2020 IEEE International Conference on Prognostics and Health Management (ICPHM) (pp. 1-8). IEEE.
[5] Cabot, P., Navigli, R. REBEL: Relation Extraction By End-to-end Language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021
[6] Gilardi, F., Alizadeh, M., & Kubli, M. (2023). Chatgpt outperforms crowd-workers for text-annotation tasks. Proceedings of the National Academy of Sciences 2023
[7] Wang, C., Liu, X., Song, D. (2020). Language models are open knowledge graphs. arXiv preprint arXiv:2010.11967.
[8] Wadhwa S, Amir S, Wallace BC. Revisiting Relation Extraction in the era of Large Language Models. Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589
[9] Wang, K., Lin, Y., Weissmann, B., Savva, M., Chang, A. Ritchie, D. (2019). PlanIT: planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Transactions on Graphics. 38. 1-15.
[10] Ha H, Song S. Semantic abstraction: Open-world 3d scene understanding from 2d vision-language models. In6th Annual Conference on Robot Learning 2022 Aug 15.
[11] Mendoza-Ramírez, C.E.; Tudon-Martinez, J.C.; Félix-Herrán, L.C.; Lozoya-Santos, J.d.J.; Vargas-Martínez, A. Augmented Reality: Survey. Appl. Sci. 2023, 13, 10491
[12] I. Permozer and T. Orehovački, Utilizing Apple’s ARKit 2.0 for Augmented Reality Application Development, 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 2019, pp. 1629-1634.

Issue

Vol. 16 No. 1 (2024): Proceedings of the Annual Conference of the PHM Society 2024

Section

Technical Research Papers

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:

As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.

First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##

Most read articles by the same author(s)