A Novel feature extraction for anomaly detection of roller bearings based on performance improved Ensemble Empirical Mode Decomposition and Teager-Kaiser Energy Operator

Although Ensemble empirical mode decomposition (EEMD) method has been successfully applied to various applications, features extracted using EEMD could not detect anomalies for roller bearings, especially when anomalies includes small defects. In this study a novel feature extraction method is proposed to detect the state of roller bearings. Performance improved EEMD, which is a reliable adaptive method to calculate an appropriate noise amplitude is applied to decompose the acceleration signals into zero-mean components called intrinsic mode functions (IMFs). Then, three dimensional feature vectors are created by applying the Teager-Kaiser energy operator (TKEO) to the first three IMFs. The novel features obtained from the healthy bearing signals are utilized to construct the separating hyperplane using one-class support vector machine (SVM). In order to validate the method proposed, a number of operating conditions (shaft speed and load) are considered to generate the data (vibration signals) by means of an assembled test rig. It is shown that the proposed method can successfully identify the states of the new samples (healthy and faulty). The uncertainty of the model prediction is investigated computing Margin and the number of support vectors. It create less complex (less fraction of support vectors) and more reliable (higher Margin) hyperplane than the EEMD method.


INTRODUCTION
Since roller bearings constitute one the most important elements of rotating machines, early fault diagnosis of roller bearings is extremely important, especially for high speed, automatic and precise machines. Thus, many research efforts have been focused on fault diagnosis and detection of roller bearings.
Several signal processing techniques exist to decompose a signal and extract informative features for roller bearings. Randall and Antoni (2011) have broadly treated the background of some powerful diagnostic methods for roller bearings in a very useful tutorial paper. Empirical mode decomposition (EMD) is another recent technique, a socalled self-adaptive data driven technique, for analyzing multi-component nonlinear and non-stationary signals and brake down them into some elementary modes called Intrinsic mode functions (IMFs). (Huang et al., 1998). However, this technique still holds some drawbacks such as mode mixing problem. Ensemble empirical mode decomposition (EEMD) is a more recent developed method aimed to solve mode mixing problem (Wu & Huang, 2009). Although the EEMD has been successfully applied to damage detection of roller bearings (Lei et al., 2013), it is shown that there are still some cases for which it is not able to recognize introduced novelties.
In this study a new feature extraction method is proposed for novelty detection, which is based on performance improved EEMD and Teager-Kaiser energy operator (TKEO). In traditional EEMD the amplitude of noise added to the original signal is considered as a predefined constant value. Whereas, in performance improved EEMD (PIEEMD) proposed by the authors (Tabrizi et al., 2015A), amplitude of added noise is adaptively computed for each data point explained in section 2.1.
Teager-Kaiser energy operator (TKEO) technique is a nonlinear operator able to track the energy and to identify the instantaneous frequencies and instantaneous amplitudes of signals. Teager (1980) proposed TKEO first for modelling nonlinear speech production. Kaiser (1990) applied it to single time varying signals, for simultaneous modulation of amplitude and frequency. As it detects a sudden change of the energy stream without a priori assumption of the data structure, it can be utilized for vibration based condition monitoring (non-stationary signals). Junsheng et al. (2007) applied the TKEO to each IMFs decomposed by the EMD to extract the instantaneous amplitudes and frequencies. Then _____________________ Ali Tabrizi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. envelope spectra were obtained using the spectrum analysis to look for characteristic frequencies of damaged roller bearings. Li, Fu & Zhang (2009) applied the TKEO to the original vibration signals and characteristic frequencies were extracted from envelope spectra. Li, Zhang & Tang (2009) implemented a novel method to recognize faults of roller bearing based on Teager-Huang transform (THT) introduced by Cexus & Boudraa (2006). In all those studies, it was investigated how to identify a big damage size (1mm in depth, 1.5mm width of the groove). Feng et al. (2011) utilized the Fourier spectrum of Teager energy to identify the characteristic frequency of faulty bearings (very big defect sizes: 2mm diameter and 1mm depth). Liu et al. (2013) presented an approach to bearing fault diagnosis based on the TKEO and the Elman neural network. The wavelet packet was used to reduce noise existing in the Teager energy signal, and then feature vectors were extracted from the Teager spectrum. Rodriguez et al. (2013) transformed the vibration signal to the Teager-Kaiser domain and featured it with statistical and energy-based measures. The diagnosis was performed with the neural network and the least square support vector machine (LS-SVM). Kwak et al. (2014) applied the TKEO in a combination with minimum entropy deconvolution (MED) to detect a defective roller bearing in terms of Kurtosis.
There are various pattern recognition methods such as Artificial neural network (ANN) and Support vector machine (SVM) which was introduced by Vapnik (1995). The SVM is a relatively new computational learning method based on statistical learning theory which has been applied successfully to numerous applications (Widodo & Yang, 2007). It can solve the learning problem with a smaller number of samples. Thus, taking into account the fact that acquiring sufficient faulty samples is not applicable in practice, the SVM has been used in a number of fault diagnosis problems successfully. As in many diagnostic applications, this is the case of a single type of data (the healthy one), one-Class SVM proposed by Scholkopf et al. (2000) can be adopted for anomaly detection.
In this study a new feature extraction method is proposed to detect the state of roller bearings. The signal is decomposed using performance improved EEMD. Then, the three dimensional feature vectors are created by applying TKEO to the first three IMFs of the healthy bearing signals are utilized as input for one-class SVM to construct the separating hyperplane. It is shown that the method proposed can successfully identify the states of the new samples (healthy and anomaly ones). A number of healthy and faulty acceleration signals are analyzed to verify the feature extraction proposed in this study.
The methodology is introduced in two parts, feature extraction in section 2 and pattern recognition in section 3. In feature extraction section, the performance improved EEMD method and Teager-Kaiser energy operator are introduced in section 2.1 and 2.2, respectively. One-class SVM is introduced as the pattern recognition method used in this study in section 3. The procedure of the novel feature extraction method is explained in section 4. The experimental setup and the data-acquisition process are presented in section 5. The application of the new approach to the acquired data and the results are discussed in section 6. Finally, the paper concludes after some discussion in section 7.

FEATURE EXTRACTION METHODS
The In this study informative features introduced, which are extracted by applying Teager-Kaiser energy operator to IMFs obtained using performance improved EEMD. These methods are explained in the next sections.

Performance improved Ensemble empirical mode decomposition (EEMD)
The EEMD repeatedly decomposes the original signal with added white noise into a series of IMFs by applying the original EMD process, and treats the means of the corresponding IMFs during the repetitive process as the final EEMD decomposition result. The decomposition steps by the EEMD can be summarized as follows: 1. To add a random white noise signal to the acquired original signal: where = 1,2, … , and is the amplitude of added white noise and M is the pre-determined number of trial.
2. To decompose the obtained signal ( ( )) into IMFs using EMD: represents the i-th IMF of the j-th trial, represents the residue of j-th trial and is the IMFs number of the j-th trial.
where = 1,2, … , and I is the minimum number of IMFs among all the trials.
Adding the noise aims to affect the extrema of the original signal so that the intermittency of the components will be removed. Rather than adding a predefined constant amplitude value (such as 0.2 of standard deviation of the signal in the traditional EEMD method), which might not effectively change some extrema, the adaptive method is used to improve the performance of the EEMD (Tabrizi et al., 2015A). After adding a random white noise, by applying the SNR definition (Eq. (4)), the Amplitude value for each data point of a sample is obtained using Eq. (5). Considering an appropriate value for SNR, the extrema of the original signal are influenced adequately.
where = 1, … , ( is the is the ensemble trial number). Tabrizi et al., (2015A) showed that the performance improved EEMD achieves better damage detection results. A simulated signal and its decomposition results using the EEMD and the performance improved EEMD methods are shown in Figures 1 and 2, respectively. Obviously, the IMF obtained using the performance improved EEMD (PIEEMD) is more similar to the high frequency component.

Teager-Kaiser energy operator (TKEO)
The energy of a signal is the sum of squared absolute value of the signal over a time, which is not the instantaneous summed energy. Kaiser (1990) observed that a second order differential equation is the energy required to generate a simple sinusoidal signal varies with both amplitude and frequency. In order to estimate the instantaneous energy of a signal x(t), Teager-Kaiser Energy Operator (TKEO) is used as an energy tracking operator as follows (Maragos, 1993A): are the first and the second time derivatives of x(t), respectively. For a discrete time signal x(n) (where n is the discrete time index), using difference to approximate differential, the TKEO can be proposed as: As at any instant, only three consecutive samples are needed to estimate the instantaneous TKEO, it is adaptive to the instantaneous changes in signals to resolve transient events.
It has some merits such as low computational cost, high resolution of time and frequency and adaptability to instantaneous feature.
The instantaneous frequency and instantaneous amplitude at any time instant of the signal ( ) are defined as follows (Maragos, 1993B): They can be represented as follows:

MACHINE)
In order to construct a pattern recognition model for novelty detection, only one class of data (features extracted from healthy bearing signals) is used to create one-class SVM model. It constructs a hyperplane around the data, such that its distance to the origin is maximal among all possible hyperplanes and classifies new samples belong to other possible classes as anomaly (Scholkopf et al., 2000).
The Margin is defined as: In real problems, an exact line dividing the data is not obtainable and we might have a curved decision boundary. Ignoring few outlier data points will create smooth boundary (using slack variables). To separate the data set from the origin, the following quadratic program must be solved (Scholkopf et al., 2000): where and are the weight vector and the offset parameterizing the hyperplane.
is the slack variable, is the regularization parameter and represents an upper bound on the fraction of outliers (training errors) and a lower bound on the fraction of support vectors (SVs) with respect to the number of training samples. It is a variable taking values between 0 and 1 that monitors the effect of outliers (hardness and softness of the boundary around data). The decision function used to label new samples whether they are healthy or outliers (anomaly) is as follows: The SVM could also be applied in a case of non-linear classification by mapping the data onto a high dimensional feature space, where the linear classification is hence possible. A non-linear vector function such as ( ) = ( 1 ( ), … , ( )) is used to map the n-dimensional input vector x onto l dimensional feature space, so that the decision function becomes as follows: By applying the Kernel function as the inner product of mapping functions, it is not necessary to explicitly evaluate mapping in the feature space.
Various kernel functions could be used such as: As the kernel function defines the feature space in which the training set is classified, the selection of the appropriate kernel function is very important.
Introducing Lagrange multipliers we obtain the dual problem as: If approaches 0, the upper boundaries on the Lagrange multipliers tend to infinity, so the second inequality constraint in Eq. (17) becomes void. As the penalization of errors becomes infinite, it returns to the corresponding hard margin algorithm.
For the positive, non-zero multipliers (support vectors )) we will have: Accordingly the non-linear decision function for labelling new samples is represented as follows ( represents the positive, non-zero multipliers called support vectors):

METHODOLOGY
The goal of this study is to evaluate performance of the proposed feature extraction algorithm in condition detection of a roller bearing.
The fault diagnosis method for the traditional EEMD technique is given as the following (Tabrizi et al., 2014): 1. To collect the acceleration signals of the healthy and defective bearings at three different external loads and two shaft speeds.
2. To apply the EEMD method to decompose the vibration signals into some IMFs. The first m IMFs including the most dominant fault information are chosen to extract the feature.
3. To calculate the total energy i E of the first m IMFs: 4. To create a feature vector with the energies of the m selected IMFs: 5. To normalize the feature function: where = (∑ | | 2 =1 ) 1/2 .
Whereas the proposed feature extraction is implemented as the following steps: 1. To decompose the signal using the performance improved EEMD (PIEEMD) with SNR=10 dB (Tabrizi et al., 2015A) 2. To apply the TKEO to the first m IMFs of each signal.
3. To calculate the sum of each TKEO.
4. To create a feature vector with the sum of the calculated TKEO: 5. To normalize the feature: where = (∑ =1 ) .
Finally, the training procedure of one-class SVM is carried out by utilizing the normalized feature vectors so far obtained. The 80% of healthy samples are used for training and the rest (remaining healthy samples and all faulty data) are taken as the test samples. Once the training procedure is successfully performed, the parameters are hold to test samples to identify the different work conditions and fault patterns. Cross validation is used to optimize the parameters of pattern recognition method.

EXPERIMENTS
The bearing data set (acceleration signals) were collected under various operating conditions using the test rig ( Figure  3)

RESULTS AND DISCUSSIONS
An acquired acceleration signal, its three first IMFs (decomposed using EEMD) and the TKEO of those IMFs are shown in Figure 4. Implementing the methodology to the signals, the normalized energy of IMFs ( ) for the EEMD method (using only first three elements of the feature vectors (Tabrizi et al., 2014)) and the normalized for the proposed method. 0.3 of standard deviation of each original signal is used as the appropriate amplitude of added noise in the traditional EEMD method (Tabrizi et al., 2015B). As it can be seen in Figure 5, there is a confusion among healthy and faulty samples for the lighter defect size (150 microns) obtained by the EEMD method. In view of this, the novel feature proposed along this study is applied to check whether it can improve the performances of detection. As it can be seen in Figure 6, the healthy and faulty samples are perfectly separable. Thus, it is expected to achieve higher success rate in labelling of new samples.
In Table 1 and Table 2, the results of classification are shown (for shaft speed = 200 and 300 Hz) using one-class SVM. The results are highly dependent on the classification parameters.
The optimal values of the classification parameters (  and ) obtained by cross validation are presented for each methods. The success rates obtained using the proposed feature extraction, are higher so that in some cases there exist considerable differences. For example, with the condition 300 Hz speed and 1.8 kN load, the proposed double steps technique improves the test success rate 23.1%.   (Tables 3 and Table 4). The bearing condition can be perfectly recognized (using EEMD) for a single working condition (Speed = 200 Hz and load = 1.4 kN). In this condition, the fraction of SVs is 8/24, whereas applying the new method the complexity of the hyperplane decreases because it is defined by a lower SVs fraction (5/24). Furthermore, the Margin created by the EEMD is 0.999305, while using the proposed method the Margin is improved to 1.146190. It means that the proposed feature extraction generates a less complex and more reliable hyperplane. Thus, the uncertainty of the model in identifying the state of new samples would be less than using the traditional EEMD.
In all operating conditions, adopting the proposed method to construct the hyperplane, higher Margins are obtained, which indicates more reliable classification. It achieves the perfect success rates in the most cases, except for two operating conditions (Speed = 300Hz, load = 1.4 and 1.8 kN). Even in these conditions, the success rates are higher than the EEMD. In the load = 1.8 kN condition, there exist only one misclassified sample, which is a healthy sample labelled as a faulty bearing (false alarm). In fault diagnosis, it is more important not to classify a faulty sample as a healthy one than having a faulty alarm.
When the  parameter approaches zero, the problem then resembles the corresponding hard margin algorithm, since the penalization of errors becomes infinite (Eq. (17)). As it can be seen in tables 1 to 4, in some cases, the constructed hyperplane based on EEMD, seems to be hard-margin because of very low  and very small number of SVs. For example, the condition corresponding to the speed of 200 Hz and the applied load of 1.8 kN, the parameter value is 0.05 and the achieved number of SVs is only 2. It indicates a hardmargin condition that only a few outlier can determine the boundary and makes the classifier significantly sensitive to noise in the data. By increasing the  parameter to create a soft-margin model, the training accuracy will be reduced considerably. In contrast, all the constructed models based on the proposed feature extraction method are soft-margin SVM and more reliable.
In order to detect the larger defect size (450 microns), the proposed feature extraction method is applied and the perfect success rates of classification are achieved for all operating conditions. As it can be seen in Figure 7, the healthy and faulty samples are perfectly separable, even for the condition where the states of the bearing were not detected perfectly for the smaller defect size (Speed = 300 Hz and load = 1.4 kN).

CONCLUSIONS
Applying the EEMD does not lead to a perfect anomaly detection in the case of small size defect (150 microns). However, it is shown that the proposed feature extraction method (based on performance improved EEMD and the normalized TKE) is a powerful method for detecting even the smallest damage level (150 microns) so that it can classify the samples perfectly in various operating conditions. It create less complex (less fraction of SVs) and more reliable (higher Margin) hyperplane than EEMD method. For the larger defect size (450 microns