Large Language Model Agents as Prognostics and Health Management Copilots

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Nov 5, 2024
Sarah Lukens Lucas H. McCabe Joshua Gen Asma Ali

Abstract

Amid concerns of an aging or diminishing industrial workforce, the recent advancement of large language models (LLMs) presents an opportunity to alleviate potential experience gaps. In this context, we present a practical Prognostics and Health Management (PHM) workflow and self-evaluation framework that leverages LLMs as specialized in-the-loop agents to enhance operational efficiency without subverting human subject matter expertise. Specifically, we automate maintenance recommendations triggered by PHM alerts for monitoring the health of physical assets, using LLM agents to execute structured components of the standard maintenance recommendation protocol, including data processing, failure mode discovery, and evaluation. To illustrate this framework, we provide a case study based on historical data derived from PHM model alerts. We discuss requirements for the design and evaluation of such “PHM Copilots” and formalize key considerations for integrating LLMs into industrial domain applications. Refined deployment of our proposed end-to-end integrated system may enable less experienced and professionals to back-fill existing personnel at reduced costs.

How to Cite

Lukens, S., McCabe, L. H. ., Gen, J., & Ali, A. (2024). Large Language Model Agents as Prognostics and Health Management Copilots. Annual Conference of the PHM Society, 16(1). https://doi.org/10.36001/phmconf.2024.v16i1.3906
Abstract 468 | PDF Downloads 250

##plugins.themes.bootstrap3.article.details##

Keywords

LLMs, GenAI, Technical Language Processing, Prescriptive Analytics, Large Language Models, Maintenance Recommendations, Knowledge Extraction, Copilot

References
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., . . . others (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774.


Addepalli, S., Weyde, T., Namoano, B., Oyedeji, O. A., Wang, T., Erkoyuncu, J. A., & Roy, R. (2023). Automation of knowledge extraction for degradation analysis. CIRP Annals, 72(1), 33–36. Retrieved from https:// www.sciencedirect.com/science/ article/pii/S0007850623000070 doi: https://doi.org/10.1016/j.cirp.2023.03.013

Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations. Retrieved from https://openreview.net/forum ?id=hSyW5go0v8

Bikaun, T., & Hodkiewicz, M. (2021). Semi-automated Estimation of Reliability Measures from Maintenance- Work Order Records. In PHM Society European Conference (Vol. 6, pp. 9–9).

Bouzenia, I., Devanbu, P., & Pradel, M. (2024). RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. arXiv preprint arXiv:2403.17134.

Brundage, M. P., Sexton, T., Hodkiewicz, M., Dima, A., & Lukens, S. (2021). Technical language processing: Unlocking maintenance knowledge. Manufacturing Letters, 27, 42–46.

Cambon, A., Hecht, B., Edelman, B., Ngwe, D., Jaffe, S., Heger, A., . . . Teevan, J. (2023). Early LLM-based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity. Microsoft Research. MSR-TR-2023-43.

Chan, C.-M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., . . . Liu, Z. (2024). ChatEval: Towards Better LLMbased Evaluators through Multi-Agent Debate. In The Twelfth International Conference on Learning Representations.

Chen, G. H., Chen, S., Liu, Z., Jiang, F., & Wang, B. (2024). Humans or LLMs as the Judge? A Study on Judgement Biases. arXiv preprint arXiv:2402.10669.

Chengwei Wei and Yun-Cheng Wang and Bin Wang and C.-C. Jay Kuo. (2024). P. APSIPA Transactions on Signal and Information Processing, 13(2), -. Retrieved from http://dx.doi.org/10.1561/ 116.00000010 doi: 10.1561/116.00000010

Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., . . . others (2024). Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. arXiv preprint arXiv:2403.04132.

Deng, X., Gu, Y., Zheng, B., Chen, S., Stevens, S., Wang, B., . . . Su, Y. (2024). Mind2Web: Towards a Generalist Agent for the Web. Advances in Neural Information Processing Systems, 36.

Dima, A., Lukens, S., Hodkiewicz, M., Sexton, T., & Brundage, M. P. (2021). Adapting natural language processing for technical text. Applied AI Letters, 2(3), e33.

Doris, A. C., Grandi, D., Tomich, R., Alam, M. F., Cheong, H., & Ahmed, F. (2024). DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation. arXiv preprint arXiv:2404.07917.

Dubois, Y., Li, C. X., Taori, R., Zhang, T., Gulrajani, I., Ba, J., . . . Hashimoto, T. B. (2024). Alpacafarm: A simulation framework for methods that learn from human feedback. Advances in Neural Information Processing Systems, 36.

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., . . . Larson, J. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv preprint arXiv:2404.16130.

Eleti, H. J., Atty, & Kilpatrick, L. (2023, June). Function Calling and Other API Updates. https://openai.com/index/function -calling-and-other-api-updates/. (Accessed: 2024-06-25)

Ferdousi, R., Hossain, M. A., Yang, C., & Saddik, A. E. (2024). Defecttwin: When llm meets digital twin for railway defect inspection. arXiv preprint arXiv:2409.06725.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., . . . Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.

GE. (2024). Remote Monitoring Powered by Digital Twins. https://www.ge.com/digital/ industrial-managed-services-remote -monitoring-for-iiot/. (Accessed: April 1, 2024)

Glass, M., Rossiello, G., Chowdhury, M. F. M., Naik, A., Cai, P., & Gliozzo, A. (2022, July). Re2G: Retrieve, Rerank, Generate. In M. Carpuat, M.-C. de Marneffe, & I. V. Meza Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2701–2715). Seattle, United States: Association for Computational Linguistics. Retrieved from https://aclanthology .org/2022.naacl-main.194 doi: 10.18653/ v1/2022.naacl-main.194

Greenberg, E. S. (2010). Labor Unions at Boeing: Reflections on Our Findings in’Turbulence: Boeing and the Future of American Workers and Managers’(Yale Press, 2010) and Developments Since Its Publication.

Gurcan, O . (2024). LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities. HHAI 2024: Hybrid Human AI Systems for the Social Good, 134–144.

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. In International Conference on Learning Representations.

Herzog, J. P. (2014, February 25). Monitoring system using kernel regression modeling with pattern sequences. Google Patents. (US Patent 8,660,980)

Herzog, J. P., Hanlin, J., Wegerich, S. W., & Wilks, A. D. (2005). High performance condition monitoring of air- craft engines. In Turbo Expo: Power for Land, Sea, and Air (Vol. 46997, pp. 127–135).

Hodkiewicz, M., & Ho, M. T.-W. (2016). Cleaning historical maintenance work order data for reliability analysis. Journal of Quality in Maintenance Engineering.

Hodkiewicz, M., Lukens, S., Brundage, M. P., & Sexton, T. (2021). Rethinking Maintenance Terminology for an Industry 4.0 Future. International Journal of Prognostics and Health Management, 12(1).

Horton, J. J. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (Tech. Rep.). National Bureau of Economic Research.

Kohl, L., Eschenbacher, S., Besinger, P., & Ansari, F. (2024). Large language model-based chatbot for improving human-centricity in maintenance planning and operations. In PHM Society European Conference (Vol. 8, pp. 12–12).

Lee, C., Xia, C. S., Huang, J.-t., Zhu, Z., Zhang, L., & Lyu, M. R. (2024). A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. arXiv preprint arXiv:2404.17153.

Lepenioti, K., Bousdekis, A., Apostolou, D., & Mentzas, G. (2020). Prescriptive analytics: Literature review and research challenges. International Journal of Information Management, 50, 57–70.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., . . . others (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

Li, D., Li, H., Li, J., Li, H.-W., Wang, H., Minerva, R., . . . Li, K.-C. (2024). Blockchain-enabled large language models for prognostics and health management framework in industrial internet of things. In International conference on blockchain, metaverse and trustworthy systems, blocksys’ 2024.

Lin, S., Hilton, J., & Evans, O. (2022, May). TruthfulQA: Measuring How Models Mimic Human Falsehoods. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (pp. 3214–3252). Dublin, Ireland: Association for Computational Linguistics. Retrieved from https://aclanthology .org/2022.acl-long.229 doi: 10.18653/v1/ 2022.acl-long.229

Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., . . . others (2024). AgentBench: Evaluating LLMs as Agents.

Liu, Y., Yang, T., Huang, S., Zhang, Z., Huang, H., Wei, F., . . . Zhang, Q. (2024, May). Calibrating LLMBased Evaluator. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 2638–2656). Torino, Italia: ELRA and ICCL. Retrieved from https:// aclanthology.org/2024.lrec-main.237

Lukens, S., & Ali, A. (2023). Evaluating the Performance of ChatGPT in the Automation of Maintenance Recommendations for Prognostics and Health Management. In Proceedings of the Annual Conference of the PHM Society (Vol. 15).

Lukens, S., Naik, M., Saetia, K., & Hu, X. (2019). Best Practices Framework for Improving Maintenance Data Quality to Enable Asset Performance Analytics. In Proceedings of the Annual Conference of the PHM Society (Vol. 11).

Majumder, S., Dong, L., Doudi, F., Cai, Y., Tian, C., Kalathil, D., . . . Xie, L. (2024). Exploring the capabilities and limitations of large language models in the electric energy sector. Joule, 8(6), 1544–1549.

Makatura, L., Foshey, M., Wang, B., HÅNahnLein, F., Ma, P., Deng, B., . . . others (2023). How Can Large Language Models Help Humans in Design and Manufacturing? arXiv preprint arXiv:2307.14377.

Mezzetti, D. (2020). txtai: the all-in-one embeddings database. Retrieved from https://github.com/ neuml/txtai

Muellerleile, C. M. (2009). Financialization takes off at Boeing. Journal of Economic Geography, 9(5), 663–677.

Ni, J., Hernandez Abrego, G., Constant, N., Ma, J., Hall, K., Cer, D., & Yang, Y. (2022, May). Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to- Text Models. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Findings of the Association for Computational Linguistics: ACL 2022 (pp. 1864–1874).Dublin, Ireland: Association for Computational Linguistics. Retrieved from https://aclanthology .org/2022.findings-acl.146 doi: 10.18653/ v1/2022.findings-acl.146

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., . . . others (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Panickssery, A., Bowman, S. R., & Feng, S. (2024). LLM Evaluators Recognize and Favor Their Own Generations. arXiv preprint arXiv:2404.13076.

Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (pp. 1–22).

Pau, D., Tarquini, I., Iannitelli, M., & Allegorico, C. (2021). Algorithmically Exploiting the Knowledge Accumulated in Textual Domains for Technical Support. In PHM Society European Conference (Vol. 6, pp. 12– 12).

Peshave, A., Aggour, K., Ali, A., Mulwad, V., Dixit, S., & Saxena, A. (2022). Evaluating Vector Representations of Short Text Data for Automating Recommendations of Maintenance Cases. In Proceedings of the Annual Conference of the PHM Society (Vol. 14).

Picard, C., Edwards, K. M., Doris, A. C., Man, B., Giannone, G., Alam, M. F., & Ahmed, F. (2023). From Concept to Manufacturing: Evaluating Vision- Language Models for Engineering Design. arXiv preprint arXiv:2311.12668.

Pires, F., Leito, P., Moreira, A. P., & Ahmad, B. (2023). Reinforcement learning based trustworthy recommendation model for digital twin-driven decision-support in manufacturing systems. Computers in Industry, 148, 103884.

Reimers, N., & Gurevych, I. (2019, November). Sentence- BERT: Sentence Embeddings using Siamese BERTNetworks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3982– 3992). Hong Kong, China: Association for Computational Linguistics. Retrieved from https:// aclanthology.org/D19-1410 doi: 10.18653/ v1/D19-1410

Ren, Z., Zhan, Y., Yu, B., Ding, L., & Tao, D. (2024). Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation. arXiv preprint arXiv:2402.13408.

Rovaglio, M., Calder, R., & Richmond, P. (2012). Bridging the Experience Gap - How do we migrate Skills and Knowledge between the Generations? . In Computer Aided Chemical Engineering (Vol. 30, pp. 1407–1411). Elsevier.

Sala, R., Pirola, F., Dovere, E., & Cavalieri, S. (2019). A dual perspective workflow to improve data collection for maintenance delivery: an industrial case study. In Advances in Production Management Systems. Production Management for the Factory of the Future: IFIP WG 5.7 International Conference, APMS 2019, Austin, TX, USA, September 1–5, 2019, Proceedings, Part I (pp. 485–492).

Sala, R., Pirola, F., Pezzotta, G., & Cavalieri, S. (2022). NLP-based insights discovery for industrial asset and service improvement: an analysis of maintenance reports. IFAC-PapersOnLine, 55(2), 522–527.

Sala, R., Pirola, F., Pezzotta, G., & Cavalieri, S. (2023). Improvement of maintenance-based Product-Service System offering through field data: a case study. Production & Manufacturing Research, 11(1), 2278313. Sentence Transformers. (2021). all-MiniLM-L6-v2: Sentence transformers model. https://huggingface .co/sentence-transformers/all-MiniLM -L6-v2.

Sexton, T., Brundage, M. P., Hoffman, M., & Morris, K. C. (2017). Hybrid datafication of maintenance logs from AI-assisted human tags. In 2017 IEEE International Conference on Big Data (pp. 1769–1777).

Sexton, T., Hodkiewicz, M., & Brundage, M. P. (2019). Categorization Errors for Data Entry in MaintenanceWork- Orders. In Proceedings of the Annual Conference of the PHM Society (Vol. 11).

Shankar, S., Zamfirescu-Pereira, J., Hartmann, B., Parameswaran, A. G., & Arawjo, I. (2024). Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences. arXiv preprint arXiv:2404.12272. Silverstein, M. (2008). Meeting the challenges of an aging workforce. American Journal of Industrial Medicine, 51(4), 269–280.

Sordoni, A., Yuan, E., CˆotÅLe, M.-A., Pereira, M., Trischler, A., Xiao, Z., . . . Le Roux, N. (2024). Joint Prompt Optimization of Stacked LLMs using Variational Inference . Advances in Neural Information Processing Systems, 36.

Stewart, M., Hodkiewicz, M., & Li, S. (2023). Large language models for failure mode classification: an investigation. arXiv preprint arXiv:2309.08181.

Stewart, M., Hodkiewicz, M., Liu, W., & French, T. (2022). MWO2KG and Echidna: Constructing and exploring knowledge graphs from maintenance data. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 1748006X221131128.

Talebirad, Y., & Nadiri, A. (2023). Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents. arXiv preprint arXiv:2306.03314.

Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930–1940.

Trilla, A., Mijatovic, N., & Vilasis-Cardona, X. (2022). Towards Learning Causal Representations of Technical Word Embeddings for Smart Troubleshooting. International Journal of Prognostics and Health Management, 13(2).

Trilla, A., Yiboe, O., Mijatovic, N., & Vitri`a, J. (2024). Industrial-grade smart troubleshooting through causal technical language processing: a proof of concept. arXiv preprint arXiv:2407.20700.

Varshney, T. (2023, Nov 30). Introduction to LLM Agents. https://developer.nvidia.com/ blog/introduction-to-llm-agents/.

Vidyaratne, L., Lee, X. Y., Kumar, A., Watanabe, T., Farahat, A., & Gupta, C. (2024). Generating troubleshooting trees for industrial equipment using large language models (llm). In 2024 ieee international conference on prognostics and health management (icphm) (pp. 116– 125).

Wang, M., Xu, X., Yue, Q., & Wang, Y. (2021). A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. Proceedings of the VLDB Endowment, 14(11), 1964– 1978.

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Advances in Neural Information Processing Systems, 33, 5776–5788.

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., & Choi, Y. (2019, July). HellaSwag: Can a Machine Really Finish Your Sentence? In A. Korhonen, D. Traum, & L. Marquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4791–4800). Florence, Italy: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P19-1472 doi: 10.18653/v1/P19-1472

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., . . . others (2024). Judging LLM-as-a- Judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems, 36.
Section
Industry Experience Papers

Most read articles by the same author(s)