Similarity-based Feature Extraction from Vibration Data for Prognostics

Many techniques for prognostics depend on estimating then forecasting health indicators that reflect the overall health or performance of an asset. For vibration data, health indicators are typically calculated by combining various vibration measurements along with derived features extracted from time, frequency or time-frequency domain analysis. However, selecting or handcrafting good features is a labor-intensive task. On the other hand, deep learning models might be able to learn health indicators automatically from vibration data but require large amount of training data, which are typically hard to obtain from real assets. In this paper, we propose an innovative similarity-based feature extraction method for vibration data which can then be used to learn health indicators and estimate remaining useful life of equipment. The method learns a set of representative templates of frequency spectra for both normal and failure states, and then calculates similarity-based features between new vibration data and the set of learned templates. These features are used to estimate health indicators which are then extrapolated to estimate the future health condition of the asset and its remaining useful life. The proposed method has been tested on the PRONOSTIA bearing dataset provided by FEMTO-ST Institute and achieved a higher accuracy in estimating the remaining useful life of bearings compared to other studies. The results demonstrate the effectiveness of the proposed method for assets with limited training data.


INTRODUCTION
Prognostics is concerned with predicting the future health or performance of an asset and estimating its remaining useful life. This task is referred to as Remaining Useful Life (RUL) estimation. Accurate prediction of RUL prevents unexpected failures, eliminate costly repairs, and accordingly increase asset availability.
Vibration measurement has been widely studied for prognostics tasks for industrial equipment, especially rotating machinery and equipment with bearings, motor and gearbox, base on time, frequency or time-frequency domain analysis (Plante et al., 2015;Doguer & Strackeljan, 2009;Singhal & Khandekar, 2013;Al-Badour et al., 2011). Some implementations aim to develop Health Indicators (HIs) from vibration data that represent the trends related to degradation condition from normal to failure based on domain expert knowledge or feature extraction methods. In (Doguer & Strackeljan, 2009), time domain features extracted from vibration measurements are used to detect the roller bearing surface defects. Frequency domain analysis performs effectively for stationary signals. In (Plante et al., 2015), by comparing the frequency spectrum of the vibration data for three fault cases (unbalance, mechanical looseness and bearing fault) to that of a healthy motor, the specific natural frequency corresponding with each fault condition are identified. In (Al-Badour et al., 2011), wavelet analysis is investigated for feature extraction from rotating machinery with non-stationary vibration measurements.
However, different features work for different problems. Given a new problem, constructing health indicator by selecting good features from all extracted features might require laborious preliminary analysis. Handcrafting health indicator or features requires domain knowledge which might be hard to get in real application. On the other hand, deep learning is gaining popularity due to the robustness and superiority in terms of accuracy when trained with huge amount of data. Using deep learning method, health indicator can be automatically constructed from vibration data. In the work of (Zhang et al., 2020), implementations of various deep learning algorithms on vibration data for bearing fault diagnostics were reviewed, including Convolution Neural Network (CNN), Deep Belief Network (DBN), and Recurrent Neural Network (RNN). However, deep learning methods require large amount of training data, which are typically hard to obtain from real assets.
In this paper, we propose an innovative similarity-based fea-ture extraction method for vibration data which can then be used to learn health indicators for prognostics. The method uses Fast Fourier Transform (FFT) to learn a set of representative templates of frequency spectra for both normal and failure states. Then, for a new snapshot of vibration data, similarity measures are calculated based on correlation coefficient and Euclidean distance between the new snapshot and the set of learned templates. These similarity-based features reflect the deviation of the new vibration snapshot from vibration in normal and failure conditions. These features are used to estimate health indicators using a model of multiple Long Short Term Memory (LSTM) networks. The health indicators are then extrapolated to estimate the future health condition of the asset and the remaining useful life. The proposed method is explained in Section 2.
The proposed method is then tested on the PRONOSTIA bearing dataset provided by FEMTO-ST Institute for RUL estimation (Nectoux et al., 2012). The PRONOSTIA bearing dataset is a popular benchmark dataset for RUL estimation since its usage in PHM 2012 data challenge. The winner in the PHM 2012 data challenge presents three methods with different features extracted including spectral kurtosis, various time-frequency domain features using wavelet transform, and human defined features after thorough inspection of the data (Sutrisno et al., 2012). Another health indicator named weighted minimum quantization error crafted by fusing multiple features is used in a model-based RUL prediction method with Particle Filter based algorithm (Lei et al., 2016). In (Guo et al., 2017), related-similarity features are extracted to construct health indicators using RNN but only the similarities to the normal state are considered. In (Chen et al., 2019), the frequency spectrum of the vibration measurements is used directly as input to a RNN based encoder-decoder framework to estimate health indicator and then predict RUL. In the proposed method, similarity-based features from frequency domain are extracted by considering both the normal and failure state. Meanwhile, various typical time domain features are also included. Since the learning data in PRONOSTIA dataset is limited with only 2 run-to-failure bearings for each of the three operating conditions, we construct an ensemble model which contains multiple LSTM networks to estimate the health indicator from the extracted features. The RUL prediction result for the PRONOSTIA bearing dataset produced by our method has a higher accuracy compared to the state-of-the-art studies. The case study on the PRONOSTIA bearing dataset is described in Section 3.

METHOD
In this paper, we propose an innovative similarity-based feature extraction method for vibration data which can then be used to learn health indicators and estimate the remaining useful life. This section is organized as follows. Section 2.1 introduces the similarity-based feature extraction method, which serves as data preprocessing for the followed prognostic model. Section 2.2 describes the prognostic model with LSTM networks for health indicator estimation. In model testing stage, the trained HI estimation model is applied to estimate health indicator for any snapshot of vibration data from a new device. The estimated health indicators are then used to predict remaining useful life for the new device, which is described in Section 2.3

Feature Extraction
In the proposed feature extraction method, we first learn a set of representative templates of frequency spectra for both normal and failure states using Fast Fourier Transform (FFT). Then, for a new vibration snapshot, similarity measures are calculated based on correlation coefficient and Euclidean distance between the spectra of the new snapshot and the learned templates. These similarity measures serve as similaritybased features which reflect the deviation of the new vibration snapshot from vibrations in normal and failure conditions. Usually, vibration data is measured continuously at a predefined sampling frequency from an asset and processed as a sequence of snapshots, where each snapshot corresponds to the vibration data measured in a predefined time window. The sampling frequency for vibration measurements could vary in an extremely large span from Hertz level up to MegaHertz level. Higher sampling frequency could cover frequency information in a wider frequency range which allows for more information for building prognostic model. Typically, the sampling frequency is within a few hundred Hertz to a few hundred KiloHertz. For the cases with relatively large sampling frequency, using the original frequency spectrum directly as features results in large dimensionality in feature space and may degrade the performance of prognostic models. Therefore, we consider the similarity measures on top of the original frequency spectrum as features which carry the information of deviation from normal state or approach to failure state.
Given continuous vibration measurements from a run-tofailure asset, the initial period is usually considered as normal condition and the period approaching failure is considered as failure condition. In extracting similarity-based features from frequency spectrum, the representative spectrum templates for normal and failure condition are first learned by averaging spectra from a number of vibration snapshots measured in normal condition and failure condition respectively. The spectrum templates for normal condition F n and failure condition F f are two series represented as: .., f f m ], which span the frequency range below half of the sampling frequency based on Nyquist theorem.
For a new vibration snapshot, the similarities are measured between its frequency spectrum F = [f 1 , ..., f i , ..., f m ] and the spectrum templates for normal (F n ) and failure (F f ) condition respectively. Two type of similarities are estimated including Pearson correlation coefficient and Euclidean distance. Pearson correlation coefficient is a normalized statistic measure for the linear correlation between two series as shown in Eq. 1. The Euclidean distance is a popular measure for the point-wise distance between two series without considering any lag or distortion as shown in Eq. 2..
Given the case with vibration data measured in a high sampling frequency, the full spectrum is equally divided into a number of sub-frequency bands and similarities are calculated for each of the sub-bands.
Besides the similarity-based features, eight typical statistical time domain features are also included for prognostics. Given a snapshot of 1-dimension raw vibration data: , the eight statistical time domain features are calculated from the absolute value in v as defined in Table 1.
where E is the expectation operator

Health Indicator Estimation using LSTM Networks
After feature extraction, we estimate health indicator from features using LSTM networks. We assume that the training dataset includes a set of devices [D 1 , ..., D i , ..., D m ], where each device has one run-to-failure sequence of features. In training the HI estimation model with LSTM networks, the training data sequence from all devices are formatted to be a set of data samples: where t is the time step and T i denotes the failure Figure 1. Unfolded representation of a general example with two LSTM layers followed by one dense layer. The input is a data sample with a sequence of data points and output is the final estimation of associated health indicator time which also represents the lifespan of the i th device. Here, each data sample X i t has the shape of L × R, represented as follows: where L is the lag of historical data points considered in learning the current health indicator and R is the number of extracted features. y i t is the health indicator label at time t which is defined as degradation percentage using time information: y i t = t T i . For each device, the health indicator at the initial point is 0, which means there is no degradation and possibly no chance of failure, and at the end of its runto-failure experiment is 1, which represents a high probability of failure. After data preparation, a HI estimation model will take feature data of X as the input data and Y as labels to learn the estimation of health indicator from the extracted features.
The HI estimation model is a composition of multiple layers of LSTM networks followed by fully connected multiple dense layers. This complex deep recurrent learning network allows the model to learn the temporal dependencies vertically and complex relationship between different features horizontally for different fault modes and degradation modes. To show how the learning model works, we take a general example of two LSTM layers followed by two dense layers as shown in Figure 1.
Because of the complexity of the learning model, we include dropout (Srivastava et al., 2014) during the training to prevent overfitting. We apply dropout to the input connections with the LSTM nodes for each LSTM layer. The dropout on the input means that the data on the input connection to each LSTM block will be excluded, at a given probability, from node activation and weight updates. In training the proposed model, mean square error (MSE) shown in Eq. 3 is used as the cost function:

Remaining Useful Life (RUL) Prediction
Any equipment naturally degrades over time. RUL prediction is usually needed after the device has been operated for a certain period of time and there is a chance of failure occurrence.
In the proposed method, we use the vibration measurement data from history to predict the time of end-of-life in the future. Then, the remaining useful life for the given device can be deduced.
In RUL prediction, historical health indicators are first calculated from historical vibration snapshots through the aforementioned feature extraction and trained HI estimation model. Then, a polynomial curve fitting is applied to the historical health indicators to fit the degradation mode. Finally, future health indicators can be predicted via extrapolating the fitted polynomial curve and failure is predicted to happen when the predicted health indicator reaches the failure threshold. In proposed method, the failure threshold of health indicator is set to be 1. A simplified representation of the RUL prediction is shown in Figure 2.

CASE STUDY ON IEEE PHM 2012 PROGNOSTIC CHALLENGE DATASET (PRONOSTIA DATA)
The proposed method is tested on the PRONOSTIA bearing dataset provided by FEMTO-ST Institute for remaining useful life prediction (Nectoux et al., 2012). The process is comprised of two stages: Training and Testing. In training, the input data is the sequences of vibration snapshots measured from multiple run-to-failure bearings in learning set, and the output is the representative spectra templates for each operating condition and the trained HI estimation model.
In testing stage, the input data will be the sequence of historical vibration snapshots measured from new bearings in test set. The HI estimation model learned in training stage will be applied to estimate health indicators for the historical vibration data. Curve fitting algorithm is then applied to fit the degradation mode from historical health indicators and extrapolated to predict future health indicators. Failure is predicted to happen when the future health indicator reaches the failure threshold. Finally, the remaining useful life of the new bearings in test set can be predicted.

PRONOSTIA data
The PRONOSTIA data, a benchmark vibration dataset for bearing failure prognostics, was used in IEEE PHM 2012 prognostic challenge. It contains a set of real experimental data measured during the whole life span of bearings. The experimental Platform of PRONOSTIA has been extensively discussed in (Nectoux et al., 2012). There are multiple causes of bearing failure, including inner race, outer race, ball, improper lubrication, etc.. In PRONOSTIA experiments, cause of bearing failure could be one or more types of failures, which represents a real life situation. In the platform, there are two accelerometers measuring the vibration along horizontal and vertical direction respectively. In the data challenge, 3 operating conditions were considered: • Condition 1:1800 rpm and 4000 N • Condition 2: 1650 rpm and 4200 N • Condition 3: 1500 rpm and 5000 N As shown in Table 2, six run-to-failure bearing datasets are provided to the participants as learning data to build the prognostic models. For the 11 bearings in test set, vibration data is truncated and provided till some point before failure. Thus, the task will be using the learned prognostic model to predict failure time and remaining useful life for the 11 bearings in test set. The learning set is quite small with only 2 run-tofailure bearings under each operating condition. Meanwhile, the spread of the lifespan of all bearings is wide, which varies roughly from 1.5 to 7.8 hours as shown in Table 3. Therefore, learning a good model to accurately predict RUL becomes difficult and challenging.

Scoring Function
To evaluate the effectiveness of proposed method, we calculate the Percent Error (PE) of RUL prediction for test bearings. PE (%Err) is defined in Eq. (4).  where RUL i real and RUL i predict are the real and predicted RUL for the i th test bearing respectively. To compare the proposed method with related studies on the same dataset, the mean and standard deviation (SD) of percent errors for all test bearings are computed.
In IEEE PHM 2012 Prognostic Challenge, underestimates and overestimates of the RUL were considered in different manners by the scoring function in Eq. (5), where A i is the score for the i th test bearing calculated from its PE (%Err i ). The score is 1 when PE is 0 (the predicted RUL is exactly equal to the real RUL). Non-zero PE will add penalty to decrease the score. Early prediction of failure (%Err i > 0, where failure is predicted to happen earlier than the actual occurrence) receives less penalty than late prediction.
The overall score of RUL prediction result is defined as the average of scores from all test bearings, as shown in Eq. (6)   Figure 3 is the Fig. 15 in (Nectoux et al., 2012) which depicts the scoring function.

Representative Spectra
In PRONOSTIA bearing data, there are three operating conditions. We assume that bearings in the same operating condition have similar failure modes while bearings in different operating condition might have different failure modes. We learn the representative spectra for each of the operating conditions separately. The frequency spectrum is distributed in the frequency range up to 12.5 kHz which is half of the sampling frequency. We normalize the representative spectra before calculating the similarities between a new spectrum and the representative spectrum template. The normalized representative spectra for operation conditions 1, 2 and 3 are shown in Figures 4, 5 and 6 respectively.

Training of HI Estimation Model
In the model training stage, hyperparameter optimization is first needed to determine a good set of hyperparameters to control the learning process for an accurate mapping from input data to the output. Grid search is a traditional hyperparameter optimization technique which scans exhaustively through a predefined grid in the hyperparameter space and finds the set of hyperparameters which produces best performance (Hsu et al., 2003). Usually the performance is measured by evaluating the model accuracy on a held-out validation set.
When implementing the proposed deep learning model for failure indicator estimation, we use grid search to determine the optimal hyperparameters which include: number of nodes in each of the two LSTM layers, number of nodes in the 1 st dense layer, dropout rate, optimizer, activation function and batch size. In grid search, for each set of parameters, the model is trained six times by leaving each of the six training bearings out as a validation set. The average of the mean square error (MSE) from validation set is used as the performance matric and we choose the set of parameters with the smallest average MSE.
The determined parameters are shown in Tabel 4. For the Adam optimizer, the default setting from Keras is used which has learning rate equals to 0.001, beta 1 equals to 0.9 and beta 2 equals to 0.999. Number of nodes in the 2 nd dense layer is set to be 16.

Activation function Sigmoid
Optimizer Adam

Batch size 128
After determining the hyperparameters for a model structure, model can be trained with learning data. In model training, learning data needs to be well split into a training set and a validation set for evaluating model's generalizability. However, the learning data in PRONOSTIA bearing data is small as mentioned in Section 3.1, which has six bearing datasets in total and each operating condition only has two bearings respectively. We use all six bearings to train an ensemble model which includes six individual models, each is trained by leaving one of the six bearings out as a validation set and the remaining five bearings as a training set. The health indicator estimated by the ensemble model is the average of health indicator values from six individual models. Ideally, health indicator is directly related to the degradation of the device. Therefore, it should have a monotonic trend for an operating device without any maintenance or repair. In the proposed method, the estimated health indicator from the ensemble model is corrected to ensure the monotonictiy by interpolating each succeeding health indicator using preceding health indicator if the succeeding one is smaller than preceding one. Examples of the health indicator estimation result for bearings in learning data are shown in Figures 7, 8 and 9.

Testing: RUL Prediction Results for Test Set
The bearings in test set have truncated vibration data which stopped at some point before the failure happens. The task is to estimate the remaining time to the occurrence of failure based on historical vibration measurements. We apply the learned HI estimation model to bearings in test set to estimate health indicators from provided historical vibration data. By  fitting the estimated historical health indicators, a predictive curve is learned which captures the historical health condition and degradation mode of the bearing. Future health indicators can be predicted using the learned predictive curve and failure is predicted to occur when future health indicator reaches a predefined failure threshold, which is set to be 1 in the case study. Finally, RUL is predicted to be the time duration between the latest point of the historical vibration data and the predicted failure point.
The estimation of historical health indicators, learned predictive curve and RUL prediction results for the bearings in test set are shown in Figure 10 for 5 bearings in operating condition 1, and Figure 11 for 6 remaining bearings in operating conditions 2 and 3. The real and predicted RUL from the proposed method for the 11 test bearings are shown in Table 5.
We also compared the proposed method with the state-of-theart algorithms in the literature (Sutrisno et al., 2012;Lei et al., 2016;Guo et al., 2017;Chen et al., 2019) on the same data set by calculating PE of the RUL estimation for each bearing in test set as shown in Table 6. The mean and SD of PE, and the score produced using the scoring function in Eq. 5 are shown in the bottom three rows in Table 6. The proposed method  produces highest score, and lowest mean PE with smallest variance.
The goal in RUL estimation is to predict the remaining time to the occurrence of failure so that maintenance or repair actions can be scheduled in time to prevent failure happens, which helps to reduce equipment downtime and improve productivity. Therefore, early prediction (failure is predicted to happen earlier than its real occurrence) is usually preferred than late prediction. As shown in Table 6, our predicted RUL is smaller than real RUL for all bearings except Bearing3 3, indicating that the proposed method tends to make early prediction.
Similarity-based features are extracted from frequency spectrum. Time domain information from vibration measurements could help in estimating health indicator and predicting remaining useful life. Our proposed method used the combination of similarity-based features and eight statistical time domain features, as shown in Section 2.1. To demonstrate the effectiveness of the similarity-based features, we apply the process of health indicator learning and remaining useful Figure 10. RUL prediction result for bearings in operating condition 1 in test set.  Table 7. With the similarity-based features, the percent error and score are better than some of the related works shown in Table 6. In proposed method, the combination of timedomain features and similarity-based features from frequency domain provides more comprehensive information and further improves the model performance.

CONCLUSION
When estimating the remaining useful life of equipment, it is important to capture time-dependent degradation patterns Figure 11. RUL prediction result for bearings in operating condition 2 and 3 in test set. The legend for this figure is the same as the legend in Figure 10. from long-term sequences of measurements. In this paper, a similarity-based feature extraction method is proposed by comparing the vibration with learned representative templates in normal and failure states respectively. The similarity-based features reflect the degradation information by considering the deviation of current condition from normal and failure condition. Then a HI estimation model is learned by training LSTM networks over the extracted features. In the case study on PRONOSTIA dataset, the result produced by the proposed method has higher accuracy compared to multiple representative works on the same dataset. The proposed method produces smallest mean percent error with smallest standard de-viation. Furthermore, the proposed method tends to make early prediction of failure, which is good for prognostics to prevent failure occurrence.
The proposed RUL estimation method can be applied for prognostics in industries in which vibration data are measured. Vibration data are commonly measured from rotating components (such as bearings) or systems(such as assembly manipulators). The proposed method can be trained with a few failure cases and applied to predict the remaining time to failure at a given time point using historical vibration measurements.