Processing natural language and extract relevant information in deep technical engineering domain remains an open challenge.
On the other side, manufacturers of high-value assets which often deliver product services through the equipment life, supporting maintenance, spare parts management and remote monitoring and diagnostics for issues resolution, have availability of a good amount of textual data containing technical cases with a certain engineering depth.
This paper presents a case study in which various Artificial Intelligence algorithms were applied to historical technical cases to extract know-how useful to help technicians in approaching new cases.
Initially the work process and available data are presented; the focus is on the outbound communication delivered from the technical team to the site operators, that is structured in 3 main paragraphs: event description, technical assessment, recommended actions.
The work proceeded in two parallel streams: the first concerned the analysis of event descriptions and technical assessments, aiming to detect recurring topics; the second concerned the analysis of recommended actions that technical support delivered trough years to site operators in order to create a library, which can help for enabling statistical data analysis, quality check review and being the starting point for further AI/NLP developments.
A text preprocessing was applied to both streams, consisted in defining standard and domain entities / stopwords and identifying / removing them, creating acronyms and synonyms maps in order to make context disambiguation, sentence splitting for the recommended actions, and finally text lemmatization. For every text the output of the preprocess was a series of keywords.
Then, unsupervised learning algorithms were applied. For this purpose, firstly, we applied feature extraction, bag of words (TF-IDF) and word embeddings (W2V, D2V, BERT), in order to transform our data from language domain into points in a n-features domain. Afterwards, different combinations of unsupervised algorithms were applied to split data into homogeneous groups, such as: LDA, K-means, Spectral, Affinity Propagation and HDBSCAN.
The combinations between language modeling and clustering were evaluated using the Silhouette score and visual analysis.
To validate the effectiveness, the developed NLP algorithms have been implemented into the current SW application used by technical support to perform the service. Moreover, a dedicated app to show trending topics and retrieve insightful information has been developed.
An outlook of the open technical challenges and on the future perspective of NLP applications in the work process are finally delivered.
How to Cite
NLP, information extraction, product service, clustering, topics modeling, BERT, word2vec
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.