ACE – Automating Causal Extraction: Leveraging Large Language Models for Bowtie Diagram Generation in Failure Analysis
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
This paper investigates whether open-source, instruction-tuned large language models (LLMs) can automate the generation of Bowtie diagrams from Failure Mode and Effects Analysis (FMEA) documentation. Three pipelines are developed: Retrieval-Augmented Generation (RAG), Optical Character Recognition (OCR) based extraction, and a vision-enabled dual-LLM approach. Each is designed to handle both structured FMEA tables and unstructured narrative text. Three models (Mistral, Qwen-2.5, and LLaMA-3) are evaluated using Sobol sensitivity analysis, stochasticity experiments, and expert Likert scoring on narrative outputs. With strict schema-constrained prompts, models frequently achieve Node and Edge F1 scores above 0.8 on tabular data. Outputs were identical across repeated runs under fixed settings. Sobol analysis shows that prompt strictness and prompt type are the dominant drivers of Bowtie quality, whereas decoding parameters have a negligible effect. On unstructured narrative text, all models struggled, producing hallucinated nodes, incorrect role assignments, and diagrams that deviated from expert references. The results establish a working approach for automating Bowtie generation from FMEA tables and identify the specific obstacles to extending this to narrative sources.
How to Cite
##plugins.themes.bootstrap3.article.details##
Bowtie Diagrams, Large Language Models, FMEA, Prompt Engineering, RAG, Failure Analysis, Bowties
Azam, M., Chen, Y., Arowolo, M. O., Liu, H., Popescu, M., & Xu, D. (2024). A comprehensive evaluation of large language models in mining gene relations and pathway knowledge. Quantitative Biology, 12(4), 360–374.
Brown, T., Mann, B., Ryder, N., Subbiah, M., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv:2312.10997.
Gopalakrishnan, S., Garbayo, L., & Zadrozny, W. (2024). Causality extraction from medical text using large language models. arXiv:2407.10020.
Hassani, I. E., Masrour, T., Kourouma, N., Motte, D., & Tavčár, J. (2024). Integrating large language models for improved FMEA: A framework and case study. Proceedings of the Design Society. doi: 10.1017/pds.2024.204
Herman, J., & Usher, W. (2017). SALib: An open-source Python library for sensitivity analysis. Journal of Open Source Software, 2(9), 97.
Hosseinichimeh, N., Majumdar, A., Williams, R., & Ghaffarzadegan, N. (2024). From text to map: A system dynamics bot for constructing causal loop diagrams. System Dynamics Review, 40(3), e1782.
Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., & McHardy, R. (2023). Challenges and applications of large language models. arXiv:2307.10169.
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense passage retrieval for open-domain question answering. arXiv:2004.04906.
Khatibi, E., Abbasian, M., Yang, Z., Azimi, I., & Rahmani, A. M. (2024). ALCM: Autonomous LLM-augmented causal discovery framework. arXiv:2405.01744.
Kiciman, E., Ness, R., Sharma, A., & Tan, C. (2023). Causal reasoning and large language models: Opening a new frontier for causality. arXiv:2305.00050.
Kim, H., & Andersen, D. F. (2012). Building confidence in causal maps generated from purposive text data: Mapping transcripts of the Federal Reserve. System Dynamics Review, 28(4), 311–328.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv:2005.11401.
Li, B., Jiang, G., Li, N., & Song, C. (2024). Research on large-scale structured and unstructured data processing based on large language model. In Proceedings of MLPRAE ’24 (pp. 111–116). ACM.
Li, N., Song, Y., Wang, K., Li, Y., Shi, L., Liu, Y., & Wang, H. (2025). Detecting LLM fact-conflicting hallucinations enhanced by temporal-logic-based reasoning. arXiv:2502.13416.
Liu, N.-Y. G., & Keith, D. (2024). Leveraging large language models for automated causal loop diagram generation. Available at SSRN 4906094.
Naval Surface Warfare Center. (2011). Handbook of reliability prediction procedures for mechanical equipment (NSWC-11). Retrieved from https://reliabilityanalyticstoolkit.appspot.com
Rouabhia-Essalhi, R., Boukrouh, E. H., & Ghemari, Y. (2022). Application of failure mode effect and criticality analysis to industrial handling equipment. The International Journal of Advanced Manufacturing Technology, 120(7), 5269–5280.
Saltelli, A., et al. (2010). Variance-based sensitivity analysis of model output. Computer Physics Communications, 181(2), 259–280.
Schwitter, N. (2025). Using large language models for preprocessing and information extraction from unstructured text. Methodological Innovations, 18(1), 61–65.
Segismundo, A., & Cauchick Miguel, P. A. (2008). Failure mode and effects analysis (FMEA) in the context of risk management in new product development. International Journal of Quality & Reliability Management, 25(9), 899–912.
Sharma, K. D., & Srivastava, S. K. (2018). Failure mode and effect analysis (FMEA) implementation: A literature review. Retrieved from https://api.semanticscholar.org/CorpusID:115607603
Sobol, I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation, 55(1–3), 271–280.
Taramsari, H. B., Rao, B., Nilchiani, R., & Lipizzi, C. (2024). Identification of variables impacting cascading failures in aerospace systems: A NLP approach. In Conference on Systems Engineering Research (pp. 413–427). Springer.
Turner, C., Hamilton, W. I., & Ramsden, M. (2017). Bowtie diagrams: A user-friendly risk communication tool. Proceedings of the Institution of Mechanical Engineers, Part F, 231(10), 1088–1097.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.