A Semantic Similarity Model to Compare Heterogeneous Data Sources to Augment Engineering Data with New Failure modes in Automotive Industry



Published Jun 29, 2021
Dnyanesh Rajpathak John Cafeo


In real life industry, in-time exposure of symptoms and their failure modes observed during fault events provide key signals to system engineers to take necessary corrective actions for effectively arresting product defects. A real-life ontology-based semantic similarity system is employed for automatic comparison of engineering data (comes in the form of design failure mode effect analysis, DFMEA data) with the field repair data collected during product warranty period.

Given the complexity of engineering data and the overwhelming volume (hundreds of millions of data points) of field repair data makes identification of new symptoms and failure modes from the first principles an impractical task. Typically, the engineering data is recorded by using technical vocabulary, e.g. unstable electric contact, Seat Belt comfort per MVSS208, whereas the field repair data is highly unstructured in nature. Consequently, we observe following types of noises in the field repair data – abbreviated text entries, inconsistent use of vocabulary (‘seat buckle is damaged’ vs ‘buckle unlatching’), and finally the incomplete text entries. More importantly, limited mental mapping capacity of a human agent limits discovery of new symptoms and failure modes from the industrial scale data. Not surprisingly, the text mining and semantic similarity are gaining a serious attention due to their ability to automatically discover the knowledge assets buried in unstructured text by training machines to compare and link high volume of data.

In our approach, initially the key constructs (e.g. symptoms, failure modes) from the data are annotated by using the domain ontology. These constructs are then used to construct pairs of terms and pairs of tuples, which are used to compute pair-to-pair and tuple-to-tuple semantic similarity respectively. Finally, the text-to-text semantic similarity is calculated by combining other two semantic similarity scores. It is used to determine whether new symptoms or failure modes from the field repair data can be used to augment the DFMEA data.

The proposed method is implemented as a prototype tool and its performance is validated by using real-life data from automobile domain. On an average, our system has F1 score of 0.75 and 0.78 in discovering and identifying new symptoms and synonym symptoms respectively, whereas it achieved the F1 score of 0.72 and 0.68 in discovering new failure modes and in identifying synonym failure modes respectively. The fault detection rate is improved by 35%, whereas the fault isolation rate is improved by 40.5%.

How to Cite

Rajpathak, D., & Cafeo, J. (2021). A Semantic Similarity Model to Compare Heterogeneous Data Sources to Augment Engineering Data with New Failure modes in Automotive Industry. PHM Society European Conference, 6(1), 10. https://doi.org/10.36001/phme.2021.v6i1.2887
Abstract 33 | PDF Downloads 39



Model-based diagnostics, PHM for Automotive, Rail, Marine, Wind and Energy

Technical Papers