Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Maintenance logs serve as the backbone of data-driven Predictive Maintenance (PdM) systems by providing information that can be used to create and label datasets for training survival analysis and machine learning (ML) models. However, due to personnel manually entering information into maintenance logs and the various levels of flexibility that maintenance tracking systems allow, service records often contain errors. Currently, the cleaning of equipment maintenance records is performed manually by experts such as data scientists or reliability engineers. Nevertheless, this task is time-consuming and often does not entirely eliminate noise from the data. In this paper, we propose using large language model (LLM)-based agents to automate the cleaning of maintenance logs. We provide an implementation that allows the agents to perform data cleaning as well as metrics to assess agents' performance. Finally, we compare the performance of several LLMs on this task. Our empirical results indicate that LLM-based agents are a promising solution for improving the quality of the datasets used in PdM systems and ultimately developing predictive maintenance models that are more reliable and useful.
##plugins.themes.bootstrap3.article.details##
Predictive Maintenance, Maintenance Logs, Data Cleaning, Data Quality, Large Language Models
Chu, X., Morcos, J., Ilyas, I. F., Ouzzani, M., Papotti, P., Tang, N., & Ye, Y. (2015). KATARA: A data cleaning system powered by knowledge bases and crowdsourcing. In Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp.1247–1261). ACM. doi: 10.1145/2723372.2749431
Del Moral, P., Nowaczyk, S., & Pashami, S. (2022). Filtering misleading repair log labels to improve predictive maintenance models. In Proceedings of the european conference of the phm society 2022 (Vol. 7, pp. 110–117). doi: 10.36001/phme.2022.v7i1.3360
Fan, W., & Geerts, F. (2012). Foundations of data quality management (Vol. 4). Morgan Claypool. doi: 10.2200/S00439ED1V01Y201207DTM030
Heidari, A., McGrath, J., Ilyas, I. F., & Rekatsinas, T. (2019). HoloDetect: Few-shot learning for error detection. In Proceedings of the 2019 international conference on management of data (pp. 829–846). Retrieved 2025-08-13, from http://arxiv.org/abs/1904.02285 doi:10.1145/3299869.3319888
Ilyas, I. F., & Chu, X. (2015). Trends in cleaning relational data: Consistency and deduplication. , 5(4), 281–393.
doi: 10.1561/1900000045
Madhikermi, M., Buda, A., Dave, B., & Framling, K. (2017). Key data quality pitfalls for condition based maintenance. In 2017 2nd international conference on system reliability and safety (ICSRS) (pp. 474–480). IEEE. doi: 10.1109/ICSRS.2017.8272868
Mahdavi, M., & Abedjan, Z. (2020). Baran: effective error correction via a unified context representation and transfer learning. , 13(12), 1948–1961. doi: 10.14778/3407790.3407801
Mahdavi, M., Abedjan, Z., Castro Fernandez, R., Madden, S., Ouzzani, M., Stonebraker, M., & Tang, N. (2019).
Raha: A configuration-free error detection system. In Proceedings of the 2019 international conference on
management of data (pp. 865–882). ACM. doi: 10.1145/3299869.3324956
Narayan, A., Chami, I., Orr, L., & Re, C. (2022). Can foundation models wrangle your data? , 16(4), 738–746.
doi: 10.14778/3574245.3574258
Prytz, R., Nowaczyk, S., Rognvaldsson, T., & Byttner, S. (2015). Predicting the need for vehicle compressor repairs using maintenance records and logged vehicle data. Engineering Applications of Artificial Intelligence, 41, 139–150. doi:
10.1016/j.engappai.2015.02.009
Qi, D., Miao, Z., & Wang, J. (2025). CleanAgent: Automating data standardization with LLMbased agents (No. arXiv:2403.08291). arXiv. doi: 10.48550/arXiv.2403.08291
Rekatsinas, T., Chu, X., Ilyas, I. F., & Re, C. ´(2017). HoloClean: holistic data repairs with probabilistic inference. , 10(11), 1190–1201. doi: 10.14778/3137628.3137631
Woods, C., Selway, M., Bikaun, T., Stumptner, M., & Hodkiewicz, M. (2024). An ontology for maintenance activities and its application to data quality. , 15(2), 319–352. doi: 10.3233/SW-233299
Zhang, H., Dong, Y., Xiao, C., & Oyamada, M. (2024). Large language models as data preprocessors (No. arXiv:2308.16361). arXiv. doi: 10.48550/arXiv.2308.16361

This work is licensed under a Creative Commons Attribution 3.0 Unported License.