Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jun 29, 2022
Karl Lowenmark Cees Taal Joakim Nivre Marcus Liwicki Fredrik Sandin

Abstract

Annotations in condition monitoring systems contain information regarding asset history and fault characteristics in the form of unstructured text that could, if unlocked, be used for intelligent fault diagnosis. However, processing these annotations with pre-trained natural language models such as BERT is problematic due to out-of-vocabulary (OOV) technical terms, resulting in inaccurate language embeddings. Here we investigate the effect of OOV technical terms on BERT and SentenceBERT embeddings by substituting technical terms with natural language descriptions. The embeddings were computed for each annotation in a pre-processed corpus, with and without substitution. The K-Means clustering score was calculated on sentence embeddings, and a Long Short-Term Memory (LSTM) network was trained on word embeddings with the objective to recreate the output from a keywordbased annotation classifier. The K-Means score for SentenceBERT annotation embeddings improved by 40% at seven clusters by technical language substitution, and the labelling capacity of the BERT-LSTM model was improved from 88.3 to 94.2%. These results indicate that the substitution of OOV technical terms can improve the representation accuracy of the embeddings of the pre-trained BERT and SentenceBERT models, and that pre-trained language models can be used to process technical language.

How to Cite

Lowenmark, K. ., Taal, C. ., Nivre, J. ., Liwicki, M. ., & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. PHM Society European Conference, 7(1), 306–314. https://doi.org/10.36001/phme.2022.v7i1.3356
Abstract 634 | PDF Downloads 357

##plugins.themes.bootstrap3.article.details##

Keywords

Condition Monitoring, BERT, SentenceBERT, K-Means, Long Short-Term Memory

Section
Technical Papers