A Model Based Approach to Extract Health Information from Textual Data



Published Oct 28, 2022
Diego Mandelli Congjian Wang


In current nuclear power plants (NPPs) a large amount of condition-based data is being generated and stored to assess and monitor component health and performance. The format of this data can be either numeric (e.g., pump vibration data) or textual (e.g., condition report which assess component health). While assessing component health from numeric data can be performed with a large variety of methods, the extraction of information from textual data still remains a challenge. Natural language processing (NLP) methods are starting to be deployed in current NPPs mainly to filter out incident reports (IRs) that are not safety related by employing supervised machine learning methods. However, these methods do not really provide the quantitative information that might be contained in IRs. This paper presents an approach to extract information from textual data (e.g., from IRs, maintenance reports) that is based on NLP data analytics methods coupled with model based system engineer (MBSE) models. NLP methods are employed to perform syntactic and semantic analyses. Syntactic analysis analyzes the grammatical structure of a sentence; such analysis includes: part of speech (POS) tagging (i.e., identification of grammatic elements of each string - e.g., nouns, verbs), named entity recognition (i.e., identification of text entities - e.g., names, dates, events), and relation extraction (e.g., coreference resolution). On the other hand, semantic analysis is designed to analyze the logic structure of a sentence. Through a specific set of rules, our methods can identify whether a sentence contains health information of a component (e.g., degraded performance, anomaly behavior) or the causal relationship between two events (i.e., a cause-effect pair). An innovative element of our approach is that semantic analysis relies on MBSE models to identify links between textual elements. MBSE are diagrams designed to represent system and component dependencies (from both a form and functional point of view). In our approach, MBSE models emulate system engineer knowledge about component/system architecture. This paper presents in detail how the integration of NLP methods and MBSE models is performed. Few analysis examples focusing on centrifugal pumps will be presented.

How to Cite

Mandelli, D., & Wang, C. (2022). A Model Based Approach to Extract Health Information from Textual Data. Annual Conference of the PHM Society, 14(1). https://doi.org/10.36001/phmconf.2022.v14i1.3249
Abstract 398 | PDF Downloads 351 Slides (PDF) Downloads 84



NLP, health assessment

Xingang, Z., Kim, J., Warns, K., Wang, X., Ramuhalli, P., Cetiner, S., Kang, H. G., & Golay M. (2021). Prognostics and Health Management in Nuclear Power Plants: An Updated Method-Centric Review with Special Focus on Data-Driven Methods. Frontiers in Energy Research, vol. 9. DOI=10.3389/fenrg.2021.696785
Lane, H., Hapke, H., & Howard, C. (2019). Natural Language Processing in Action: Understanding, analyzing, and generating text with Python. Manning Publications.
Dori, D., Crawley, E. (2002). Object-Process Methodology: A Holistic Systems Paradigm. Springer ed.
Doan, S., Yang, E. W., Tilak, S. S., Li, P. W., Zisook, D. S., Torii, M., (2019). Extracting Health-Related Causality from Twitter Messages Using Natural Language Processing. BMC Medical Informatics and Decision Making, vol. 19, pp. 71–8.
Technical Research Papers