Anomaly Detection and Fault Classification in Multivariate Time Series Using Multimodal Deep Models

In the realm of gear fault diagnosis, where various analytical methods often require extensive domain expertise, automation remains challenging due to diverse fault diagnosis tasks. To address these limitations, we propose a novel PHM algorithm integrating out-of-distribution detection and representation learning. Initial steps involve feature extraction using envelopes and fast Fourier transform (FFT). Representation Learning employs Transformers and Self-supervised learning for meaningful representations. The latent space values are then utilized for Out-of-Distribution Detection through kNN and classification, achieving a remarkable 99% accuracy. Our approach significantly enhances gear fault diagnosis automation, proving effective across diverse, unencountered problems. 


INTRODUCTION
The realm of gear fault diagnosis, an area of enduring research interest, has seen the development of various analytical methods (Žvokelj, Zupan, and Ivan, 2016).Despite their efficacy, many of these established techniques require profound domain expertise and fail to facilitate the automation of gear fault diagnosis because of diverse fault diagnosis tasks (Lu, Wang, Qin, and Ma, 2017).
To overcome the limitations, we propose a novel PHM algorithm based on out-of-distribution detection and representation learning.
First, we conducted feature extraction using envelopes and the fast fourier transform (FFT).Subsequently, for Representation Learning, we employed Transformers and Self-supervised learning to obtain meaningful representations.Following this, we utilized the latent space values for both Out-of-Distribution Detection through kNN and classification.The results demonstrated an approximate 99% probability of identifying Out-of-Distribution data points.It was observed that the accuracy during Out-of-Distribution detection was significantly higher compared to cases where it was not performed.
In terms of practical contributions, our approach demonstrates effective handling of Out-of-Distribution cases, ensuring robust performance across a diverse array of problems that may not have been encountered during training.

PROPOSED METHOD
In this study, we address the proposed method of gear failure classification in the context of predictive maintenance for industrial machinery based on out-of-distribution detection.

Figure 1 The Proposed Method
The approach encompasses a multi-step process integrating time series and frequency domain data analysis, representation learning, outlier detection, and classification.

Feature Extraction
We used two feature extraction techniques: envelope and FFT.
Since the length of each data is different, the data was divided into 1 second, and preprocessing was performed by combining FFT and Envelope Detection (magnitude of the Ryu Gunwoo & NohYoon Seong.This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.complex signal resulting from the Hilbert transform) for each axis.

Representation Learning
Convolutional Neural Networks (CNN) were applied to preprocessed data in both the time and frequency domains to extract features and patterns.Masked data was then fed into a transformer encoder to understand sequential data, as this forces the model to predict masked elements based on contextual information (Zerveas, Jayaraman, Patel, Bhamidipaty, and Eickhoff, 2021).The self-attention mechanism enhances the model's ability to capture complex dependencies in sequential data.A pretext task through a fully connected layer further refines the model's understanding of the data.

Out-of-Distribution Detection and Classification
After training the representation, we extracted the output of the last hidden layer and removed certain levels to generate a combination of trained OOD detectors.Each OOD detector used a K-nearest neighbor (KNN) approach (Sun, Ming, Zhu, and Li, 2022) to determine the OOD threshold based on the 75th percentile distance from the training data, where both the data in the distribution and the OOD data in the test set were within the distribution.We then removed the data determined to be OOD from all the generated detectors and used the KNN classifier to predict the defect level.

Figure 2 Representation Learning Result
The legend indicates the fault level Figure 2 shows the extracted representation from the transformer encoder using t-SNE.Despite the fact that the model contains no information about torque and RPM, the data is effectively clustered by fault level.The overall accuracy is approximately 0.78, indicating a remarkably high performance.This figure is 0.12 higher than when out-of-distribution was not considered.Furthermore, considering static covariates (rpm, torque) resulted in higher accuracy compared to scenarios where they were excluded.This observation, coupled with the superior performance compared to considerations solely based on time domain or frequency domain, confirms the advantages of the proposed method.

CONCLUSION
In conclusion, our paper introduces a methodology for gear degradation analysis, emphasizing the need to consider multidomain features and out-of-distribution issues.By combining advanced techniques such as CNN analysis, Transformer Encoding, and a customized classification approach.this methodology significantly advances our understanding of gear degradation.

Table 1
Accuracy of each OOD detectorTable1shows the accuracy for the OOD detector obtained by removing the data for each level respectively in a ablation way.Each detector may perform slightly worse, but since the OOD decision is made by an ensemble of all detectors, the accuracy can be as high as 0.9994 with a high probability.Validation set, and Test set with a ratio of 7:1:2.Ablation studies were conducted on the Training set and Validation set by removing one label at a time during experiments to consider the out-of-distribution problem.Subsequently, the experiments were repeated seven times, and the average accuracy of multi-class classification was reported in Table2.