Predicting Maintenance Actions from Historical Logs using Domain-Specific LLMs
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Maintenance logs of complex specialized equipment capture problem–action records that are essential for building predictive maintenance solutions but remain difficult to utilize due to their terse, abbreviation-heavy style. This work provides the first systematic benchmark and domain-adaptation study of large language models (LLMs) for predicting maintenance actions from free-text problem descriptions in the MaintNet aviation dataset. We evaluate a range of proprietary and open-source LLMs under zero-shot and few-shot prompting and additionally fine-tune selected open models for supervised evaluation. Experiments are conducted on both raw-abbreviation and expanded datasets, using both lexical (ROUGE, BLEU) and semantic (cosine similarity, BERTScore) metrics. Results show that GPT-4o achieves the strongest semantic alignment, while the instruct version of Gemma-3-4B leads in lexical overlap. Few-shot prompting boosts weaker models disproportionately, narrowing the gap with stronger baselines. Fine-tuning delivers the most significant gains, with instruct versions of Gemma-3-4B, LLaMA-3.2-3B, and Phi-4-mini, improving BLEU by up to 90% and ROUGE-2 by 30%. Notably, the fine-tuned Gemma-3-4B surpasses GPT-4o across multiple metrics, demonstrating the effectiveness of domain-specific adaptation. These findings highlight the potential of fine-tuned LLMs to utilize unstructured aviation logs for building reliable maintenance systems.
##plugins.themes.bootstrap3.article.details##
domain-specific models, fine-tuning, maintenance, aviation, historical logs
Altuncu, M. T., Mayer, E., Yaliraki, S. N., & Barahona, M. (2018). From text to topics in healthcare records: An unsupervised graph partitioning methodology. arXiv preprint arXiv:1807.02599. https://arxiv.org/abs/1807.02599
Jarry, G., Delahaye, D., Nicol, F., & Feron, E. (2020). Aircraft atypical approach detection using functional principal component analysis. Journal of Air Transport Management, 84, 101787. https://doi.org/10.1016/j.jairtraman.2020.101787
Akhbardeh, F., Desell, T., & Zampieri, M. (2020). MaintNet: A collaborative open-source library for predictive maintenance language resources. arXiv preprint arXiv:2005.12443. https://arxiv.org/abs/2005.12443
Akhbardeh, F., Desell, T., & Zampieri, M. (2020, December). NLP tools for predictive maintenance records in MaintNet. In Proceedings of the 1st conference of the asia-pacific chapter of the association for computational linguistics and the 10th international joint conference on natural language processing: System demonstrations (pp. 26-32). https://aclanthology.org/2020.aacl-demo.5/
Akhbardeh, F., Alm, C. O., Zampieri, M., & Desell, T. (2021, August). Handling extreme class imbalance in technical logbook datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4034-4045). https://doi.org/10.18653/v1/2021.acl-long.312.
Payette, M., Abdul-Nour, G., Meango, T. J. M., Diago, M., & Côté, A. (2025). Leveraging failure modes and effect analysis for technical language processing. Machine Learning and Knowledge Extraction, 7(2), 42. https://doi.org/10.3390/make7020042.
Sundaram, S., & Zeid, A. (2025). Technical language processing for Prognostics and Health Management: applying text similarity and topic modeling to maintenance work orders. Journal of Intelligent Manufacturing, 36(3), 1637-1657. https://doi.org/10.1007/s10845-024-02323-4.
Akhbardeh, F., Zampieri, M., Alm, C. O., & Desell, T. (2022, June). Transfer learning methods for domain adaptation in technical logbook datasets. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 4235-4244). https://aclanthology.org/2022.lrec-1.450.
Brundage, M. P., Sexton, T., Hodkiewicz, M., Dima, A., & Lukens, S. (2021). Technical language processing: Unlocking maintenance knowledge. Manufacturing Letters, 27, 42-46. https://doi.org/10.1016/j.mfglet.2020.11.001
Kelm, B., Haas, P. H., Jochum, S., Margies, L., & Müller, R. (2025). Enhancing Assembly Instruction Generation for Cognitive Assistance Systems with Large Language Models. Procedia CIRP, 134, 7-12. https://doi.org/10.1016/j.procir.2025.03.010
Meunier-Pion, J. (2024, June). Natural Language Processing for Risk, Resilience, and Reliability. In PHM Society European Conference (Vol. 8, No. 1, p. 4). https://doi.org/10.36001/phme.2024.v8i1.3956
Naqvi, S. M. R., Varnier, C., Nicod, J. M., Zerhouni, N., & Ghufran, M. (2021, December). Leveraging free-form text in maintenance logs through BERT transfer learning. In International Conference on Deep Learning, Artificial Intelligence and Robotics (pp. 63-75). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-98531-8_7.
Naqvi, S. M. R., Ghufran, M., Varnier, C., Nicod, J. M., Javed, K., & Zerhouni, N. (2024). Unlocking maintenance insights in industrial text through semantic search. Computers in Industry, 157, 104083. https://doi.org/10.1016/j.compind.2024.104083.
Vidyaratne, L., Lee, X. Y., Kumar, A., Watanabe, T., Farahat, A., & Gupta, C. (2024, June). Generating troubleshooting trees for industrial equipment using large language models (LLM). In 2024 IEEE International Conference on Prognostics and Health Management (ICPHM) (pp. 116-125). IEEE. https://doi.org/10.1109/ICPHM61352.2024.10626823.
Lukens, S., McCabe, L. H., Gen, J., & Ali, A. (2024, November). Large Language Model Agents as Prognostics and Health Management Copilots. In Annual Conference of the PHM Society (Vol. 16, No. 1). https://doi.org/10.36001/phmconf.2024.v16i1.3906.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38-45). https://doi.org/10.18653/v1/2020.emnlp-demos.6.
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., ... & Vasic, P. (2024). The Llama 3 herd of models. arXiv preprint arXiv:2407.21783. https://arxiv.org/abs/2407.21783.
Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., ... & Iqbal, S. (2025). Gemma 3 technical report. arXiv preprint arXiv:2503.19786. https://arxiv.org/abs/2503.19786
Abouelenin, A., Ashfaq, A., Atkinson, A., Awadalla, H., Bach, N., Bao, J., ... & Zhou, X. (2025). Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-Loras. arXiv preprint arXiv:2503.01743. https://arxiv.org/abs/2503.01743.
Adler, B., Agarwal, N., Aithal, A., Anh, D. H., Bhattacharya, P., Brundyn, A., ... & Zhu, C. (2024). Nemotron-4 340B technical report. arXiv preprint arXiv:2406.11704. https://arxiv.org/abs/2406.11704.
Team, Q. (2024). Qwen2 technical report. arXiv preprint arXiv:2407.10671. https://arxiv.org/abs/2407.10671