Evaluating the Performance of ChatGPT in the Automation of Maintenance Recommendations for Prognostics and Health Management



Published Oct 26, 2023
Sarah Lukens Asma Ali


Until now, automation of maintenance recommendations for Prognostics and Health Management (PHM) has been a domain-specific technical language processing (TLP) task applied to historical case data. ChatGPT, Bard, GPT-4 and Sydney are a few examples of generative large language models (LLMs) that have received significant media attention for their proficiency in natural language tasks across a variety of domains.  Preliminary exploration of ChatGPT as a tool for generating maintenance recommendations has shown promise in its ability to generate and explain engineering concepts and procedures, but the precise scope of its capabilities and limitations remains uncertain.  Currently we know of no performance criteria related to formally measuring how well ChatGPT performs as a tool for industrial use cases.  In this paper, we propose a methodology for the evaluation of the performance of LLMs such as ChatGPT for the task of automation of maintenance recommendations.  Our methodology identifies various performance criteria relevant for PHM such as engineering criteria, risk elements, human factors, cost considerations and corrections.  We examine how well ChatGPT performs when tasked with generating recommendations from PHM model alerts and report our findings.  We discuss the various strengths and limitations to consider in the adoption of LLM’s as a computational support tool for prescriptive PHM as well as the different risks and business case considerations.

How to Cite

Lukens, S., & Ali, A. (2023). Evaluating the Performance of ChatGPT in the Automation of Maintenance Recommendations for Prognostics and Health Management. Annual Conference of the PHM Society, 15(1). https://doi.org/10.36001/phmconf.2023.v15i1.3487
Abstract 359 | PDF Downloads 277



Technical Language Processing, Prescriptive Analytics, ChatGPT, Large Language Models, maintenance recommendations, knowledge extraction

Addepalli, S., Weyde, T., Namoano, B., Oyedeji, O. A., Wang, T., Erkoyuncu, J. A., & Roy, R. (2023). Automation of knowledge extraction for degradation analysis. CIRP Annals, 72(1), 33–36. Retrieved from https:// www.sciencedirect.com/science/ article/pii/S0007850623000070 doi: https://doi.org/10.1016/j.cirp.2023.03.013

Alfeo, A. L., Cimino, M. G., & Vaglini, G. (2021). Technological troubleshooting based on sentence embedding with deep transformers. Journal of Intelligent Manufacturing, 32(6), 1699–1710.

Amazon. (2023). Amazon titan. https://aws.amazon .com/bedrock/titan/. (Accessed: June 16, 2023)

Ansari, F. (2020). Cost-based text understanding to improve maintenance knowledge intelligence in manufacturing enterprises. Computers & Industrial Engineering, 141, 106319. Retrieved from https://www.sciencedirect.com/science/
article/pii/S036083522030053X doi: https://doi.org/10.1016/j.cie.2020.106319

Ansari, F., Glawar, R., & Nemeth, T. (2019). PriMa: a prescriptive maintenance model for cyber-physical production systems. International Journal of Computer Integrated Manufacturing, 32(4-5), 482–503.

Bahrini, A., Khamoshifar, M., Abbasimehr, H., Riggs, R. J., Esmaeili, M., Majdabadkohne, R. M., & Pasehvar, M. (2023). Chatgpt: Applications, opportunities, and threats. In 2023 Systems and Information Engineering Design Symposium (SIEDS) (pp. 274–279).

Bastos, Pedro and Lopes, Isabel and Pires, LCM. (2012). A maintenance prediction system using data mining techniques. In World congress on engineering 2012 (Vol. 3, pp. 1448–1453). Bian, N., Han, X., Sun, L., Lin, H., Lu, Y., & He, B.
(2023). ChatGPT is a Knowledgeable but Inexperienced

Solver: An Investigation of Commonsense Problem in Large Language Models. arXiv preprint

Bokinsky, H., McKenzie, A., Bayoumi, A., McCaslin, R., Patterson, A., Matthews, M., Eisner, L. (2013). Application
of natural language processing techniques to marine V-22 maintenance data for populating a CBMOriented
database. pdfs.semanticscholar.org.

Bouabdallaoui, Y., Lafhaj, Z., Yim, P., Ducoulombier, L., & Bennadji, B. (2020). Natural language processing model for managing maintenance requests in buildings. Buildings, 10(9), 160.

Brundage, M. P., Kulvatunyou, B., Ademujimi, T., & Rakshith, B. (2017). Smart manufacturing through a
framework for a knowledge-based diagnosis system. In International Manufacturing Science and Engineering
Conference (Vol. 50749, p. V003T04A012).

Brundage, M. P., Sexton, T., Hodkiewicz, M., Dima, A., & Lukens, S. (2021). Technical language processing: Unlocking
maintenance knowledge. Manufacturing Letters, 27, 42–46.

Chowdhury, M., Rifat, N., Latif, S., Ahsan, M., Rahman,M. S., & Gomes, R. (2023). ChatGPT: The Curious
Case of Attack Vectors’ Supply Chain Management

Improvement. In 2023 IEEE International Conference on Electro Information Technology (eIT) (pp.

Dima, A., Lukens, S., Hodkiewicz, M., Sexton, T., & Brundage, M. P. (2021). Adapting natural language
processing for technical text. Applied AI Letters, 2(3), e33.

GE. (2023). Remote monitoring powered by digital twins. https://www.ge.com/digital/industrial-managed-services-remote-monitoring-for-iiot/. (Accessed: June 8, 2023)

Google. (2023). Bard. https://bard.google.com/. (Accessed: June 16, 2023)

Hodkiewicz, M., Kl¨uwer, J. W., Woods, C., Smoker, T., & Low, E. (2021). An ontology for reasoning over engineering textual data stored in fmea spreadsheet tables. Computers in Industry, 131, 103496. Retrieved from https:// www.sciencedirect.com/science/article/pii/S0166361521001032 doi:

Hubbard, D.W. (2020). The failure of risk management: Why it’s broken and how to fix it. John Wiley & Sons. JA1011, SAE. (2009). A guide to the reliability-centered maintenance (rcm) standard (2009). Society of Automotive Engineers, 2da Rev.

Jalil, S., Rafi, S., LaToza, T. D., Moran, K., & Lam, W. (2023). ChatGPT and software testing education:
Promises & Perils. arXiv preprint arXiv:2302.03287.

Karray, M. H., Ameri, F., Hodkiewicz, M., & Louge, T. (2019). ROMAIN: Towards a BFO compliant reference
ontology for industrial maintenance. Applied Ontology, 14(2), 155–177.

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepano, C., others (2023). Performance
of chatgpt on usmle: Potential for ai-assisted medical education using large language models. PLoS
digital health, 2(2), e0000198.

Lepenioti, K., Bousdekis, A., Apostolou, D., & Mentzas, G. (2020). Prescriptive analytics: Literature review and
research challenges. International Journal of Information Management, 50, 57–70.

Lepenioti, K., Pertselakis, M., Bousdekis, A., Louca, A., Lampathaki, F., Apostolou, D., Anastasiou, S. (2020). Machine learning for predictive and prescriptive analytics of operational data in smart manufacturing.

In Advanced Information Systems Engineering Workshops: CAiSE 2020 International Workshops, Grenoble, France, June 8–12, 2020, Proceedings 32 (pp. 5–16).

Nandyala, A. V., Lukens, S., Rathod, S., & Agarwal, P. (2021). Evaluating word representations in a technical language processing pipeline. In PHM sSciety European Conference (Vol. 6, pp. 17–17).

OpenAI. (2022). Chatgpt. https://chat.openai.com/. (Accessed: June 16, 2023)

OpenAI. (2023). Gpt4. https://openai.com/gpt-4. (Accessed: June 16, 2023)

Ortega, P. A., Kunesch, M., Del´etang, G., Genewein, T., Grau-Moya, J., Veness, J., others (2021). Shaking the foundations: delusions in sequence models for interaction and control. arXiv preprint arXiv:2110.10819.

Pau, D., Tarquini, I., Iannitelli, M., & Allegorico, C. (2021). Algorithmically exploiting the knowledge accumulated in textual domains for technical support. In PHM Society European Conference (Vol. 6, pp. 12–12).

Peng, B., Galley, M., He, P., Cheng, H., Xie, Y., Hu, Y., Gao, J. (2023). Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.

Peshave, A., Aggour, K., Ali, A., Mulwad, V., Dixit, S., & Saxena, A. (2022). Evaluating vector representations of short text data for automating recommendations of maintenance cases. In Annual Conference of the PHM Society (Vol. 14).

Pires, F., Leitao, P., Moreira, A. P., & Ahmad, B. (2023). Reinforcement learning based trustworthy recommendation model for digital twin-driven decision-support in 103884.

Qadir, J. (2023). Engineering education in the era of chatgpt: Promise and pitfalls of generative ai for education. In 2023 IEEE Global Engineering Education Conference (EDUCON) (pp. 1–9).

R. Gulati and R. Smith. (2021). Maintenance and Reliability Best Practices (3rd ed.). Industrial Press Inc.

Rajpathak, D. G. (2013). An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain. Computers in Industry, 64(5), 565-580. Retrieved from https://www.sciencedirect.com/science/article/pii/S0166361513000456 doi: https://doi.org/10.1016/j.compind.2013.03.001

Rao, A., Kim, J., Kamineni, M., Pang, M., Lie, W., & Succi,M. D. (2023). Evaluating ChatGPT as an adjunct for radiologic decision-making. medRxiv, 2023–02.

Rathore, B. (2023). Future of textile: Sustainable manufacturing & prediction via chatGPT. Eduzone: International
Peer Reviewed/Refereed Multidisciplinary Journal, 12(1), 52–62.

Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges,
bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121-154. Retrieved from https://
www.sciencedirect.com/science/article/pii/S266734522300024X doi:https://doi.org/10.1016/j.iotcps.2023.04.003
Society of Maintenance & Reliability Professionals. (2017).

SMRP Best Practices 6th Edition.

Trilla, A., Mijatovic, N., & Vilasis-Cardona, X. (2022). Towards learning causal representations of technical word embeddings for smart troubleshooting. International Journal of Prognostics and Health Management, 13(2).

Usuga Cadavid, J. P., Grabot, B., Lamouri, S., Pellerin, R., & Fortin, A. (2020). Valuing free-form text data from maintenance logs through transfer learning with camembert. Enterprise Information Systems, 1–29.

Wang, X., Anwer, N., Dai, Y., & Liu, A. (2023). ChatGPT for design, manufacturing, and education. Procedia CIRP, 119, 7–14.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.

Woods, C., French, T., Hodkiewicz, M., & Bikaun, T. (2023). An ontology for maintenance procedure documentation. Applied Ontology, 1–38.

Woods, C., Selway, M., Bikaun, T., Stumptner, M., & Hodkiewicz, M. (2023). An ontology for maintenance activities and its application to data quality. Applied Ontology, 1–34. doi: 10.3233/SW-233299

Yeo, Y. H., Samaan, J. S., Ng, W. H., Ting, P.-S., Trivedi, H., Vipani, A., others (2023). Assessing the performance
of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. medRxiv, 2023–02.
Industry Experience Papers

Most read articles by the same author(s)