Rolling Element Bearing Fault Diagnosis based on Deep Belief Network and Principal Component Analysis

Rolling element bearings are critical components in industrial rotating machines. Faults and failures of bearings can cause degradation of machine performance or even a catastrophe. Bearing fault diagnosis is therefore essential and signiﬁcant to the safe and reliable operation of systems. For bearing condition monitoring, acoustic emission (AE) signals attract more and more attention due to its advantages on sensitivity over the extensively used vibration signal. In bearing fault diagnosis and prognosis, feature extraction is a critical and tough work, which always involves complex signal processing and computation. Moreover, features greatly rely on the characteristics, operating conditions, and type of data. With consideration of changes in operating conditions and increase of data complexity, traditional diagnosis approaches are in-sufﬁcient in feature extraction and fault diagnosis. To address this problem, this paper proposes a Deep Belief Network (DBN) and Principal Component Analysis (PCA) based fault diagnosis approach using AE signal. This proposed approach combines the advantages of deep learning and statistical analysis, DBN automatically extracts features from AE signal, PCA is applied to dimensionality reduction. Different bearing fault modes are identiﬁed by least squares support vector machine (LS-SVM) using the extracted features. An experimental case is conducted with a tapered roller bearing to verify the proposed approach. Experimental results demonstrate that the proposed approach has excellent feature extraction ability and high fault classiﬁcation accuracy.


INTRODUCTION
Rolling element bearings are key components in mechanical systems.They are subject to various stresses, transmissions and shocks, which may cause bearing fault, and eventually lead to system breakdown.The degradation of bearing condition will definitely affect the performance of the systems.To prevent unexpected system failure and reduce the maintenance cost, bearing fault diagnosis is desired to detect fault as early as possible.
A rolling element bearing is composed of rolling elements, an inner race, an outer race, and a cage.The faults can appear on any component, and these faults can be roughly divided as local faults and distributed faults (Zhang et al., 2008;Cerrada et al., 2018).Local faults are defined as a single localized fault, such as pitting, scratch, crack, hole, etc. Distributed faults are defined as irregularities of bearing structure, such as misalignment of shaft or races, eccentric races, off-size rolling elements, roughness, etc.These faults can be caused by many reasons, such as, overheating and load, improper installation, imperfect manufacturing.These distributed faults will cause excessive contact force and friction, which will finally lead to local faults.When a bearing operates under fault conditions, they cause certain characteristic signals in the form of sound, vibration, energy, or acoustic emission.
With the advancement of machine condition monitoring techniques, many different types of signals such as vibration, acoustic emission, ultrasound have been used for diagnosis (Rai & Upadhyay, 2016;Zhang et al., 2010;Li et al., 2018).Among these signals, acoustic emissions are the transient elastic waves, which are generated from a rapid release of localized stress energy caused by deformation or defect within or on the surface of a material (Al-Ghamd & Mba, 2006).Compared with most widely used vibration signals, AE signals have many advantages: 1) insensitive to mechanical disturbances and noises caused by different operating conditions; 2) sensitive to fault size, which can offer earlier fault detection than vibration signals.These advantages make AE signals promising in bearing fault diagnosis.
Over the past decades, a lot of efforts have been presented for bearing fault diagnosis.These existing works are mainly divided into signal processing based approach and learning based approach (Cerrada et al., 2018;S. Guo, Yang, Gao, Zhang, & Zhang, 2018).For signal processing based approaches, feature extraction is needed to extract a fault indicator that is related to fault modes and fault state.Zhao et al. (S. Zhao, Liang, Xu, Wang, & Zhang, 2013) applied Empirical Mode Decomposition and Approximate Entropy based approach to detect different fault modes.Khanam et al. (Khanam, Tandon, & Dutt, 2014) employed Discrete Wavelet Transform (DWT) to decompose signals and estimate ball bearing faults.Hilbert transform (Bujoreanu, Monoranu, & OLARU N, 2014), matching pursuit method (Cui, Zhang, Zhang, Zhang, & Lee, 2016), spectral analysis and statistical analysis (Gerber, Martin, & Mailhes, 2015) and envelop analysis (Sun, Guo, & Gao, 2015) are also very effective signal processing techniques in bearing monitoring.
In practical applications, however, fault characteristic signals are often corrupted by noises, which makes feature extraction and fault diagnosis difficult and challenging.For different monitoring signals and different fault modes, no generic feature extraction method is available.As a result, feature extraction is ad-hoc for different systems and this process is time-consuming, requires complex signal processing techniques, and needs extensive human involvement.All these limitations severely hinder the development of applications of signal processing based diagnosis approaches.
Learning-based approach, on the other hand, aims to learn potential signal patterns that are related to different fault modes or fault levels (Wang, Xiang, Zhong, & Zhou, 2018).Moreover, learning-based approaches, as a supervised learning process, require data including fault samples and their labels.The process involves feature extraction and feature classification.Feature extraction can be conducted in the time domain, the frequency-domain, and transform-domains, such as statistical parameters, signal energy of Intrinstic Mode Functions (IMFs) from Empirical Mode Decomposition (EMD), Discreet Wavelet Transform (DWT), Hilbert-Huang transform, etc.With feature extracted, a classifier is designed to classify features under different fault conditions for fault detection.Some widely used classifiers include neural networks, Support Vector Machines (SVM), and Bayesian estimation, among others.
Recently, with the successes of deep learning in image recognition and speech processing, many deep learning based fault diagnosis approaches were proposed for many applications (Cococcioni, Lazzerini, & Volpi, 2013;Tang et al., 2018;Chen & Li, 2017).Guo et al. proposed a continuous wavelet transform scalogram (CWTS) and convolutional neural network (CNN) based approach for rotating machinery fault diagnosis (S.Guo, Yang, Gao, & Zhang, 2018) .Qin et al. presented an optimized DBN and improved logistic sigmoid unit based fault diagnosis for planetary gearboxs of wind turbines (Qin, Wang, & Zou, 2019).These applications show that deep learning has great potentials in feature extraction and data mining.
These reported learning based approaches, although achieved good performance in some aspects, are insufficient in dealing with bearings under complex operating conditions and the continuously increase of volume and complexity of monitoring data.As a result, diagnosis accuracy can be affected.Another limitation of these proposed approaches is that signal transformation or feature extraction are also involved.Thus, the strong feature extraction and learning ability of deep learning cannot be fully exploited.Inspired by these limitations, a deep learning and statistics based approach is proposed in this paper.This proposed approach integrates DBN and PCA to extract features from raw data and, therefore, can avoid complex signal processing and corresponding feature extraction.LS-SVM is an effective pattern recognition approach, which is an extension and improvement of SVM.It has been widely used in fault diagnosis, and has shown excellent fault identification capability.Due to this feature, it is employed to process the extracted features for diagnosis in this paper.Experimental result shows that it can achieve high accuracy.
This paper is organized as follows.A brief introduction of DBN, PCA and LS-SVM is presented in Section 2. Section 3 describes the fault diagnosis steps of the proposed approach.Experiments are presented and results are analyzed and visualized in Section 4 to demonstrate the performance of the proposed approach.Finally, Section 5 provides concluding remarks and some future research directions.

DBN principle
DBN can be regarded as a special neural network constructed from multiple Restricted Boltzmann Machines (RBMs) (G.Zhao et al., 2017).Fig. 1 shows the schematic representation of a two hidden layer DBN.DBN has a strong capability in capturing representative information from raw time series data.The output of the learning information is extracted features, which can be utilized as the input of supervised learning algorithms in classification or regression for fault diagnosis.
RBM is a special probabilistic model of Boltzmann machine,  The joint configuration (v, h) can be given by the energy function ( 1).
where v j and a j are the binary states and bias of the j-th element of the visible vector, h i and b i are the binary states and bias of the i-th element of the hidden vector, w ij is the weight of the connection between the visible layer and the hidden layer.The joint distribution over the visible layer and hidden units is defined as where Z is a partition function, which can be described as In DBN structure, the connections just exist between the visible layer and the hidden layer.The neurons in the same layer are independent with each other.The conditional probabilities of the hidden layer and the visible units are given as The learning process of DBN can be divided into two stages: pre-training and fine-tuning (Hinton, Osindero, & Teh, 2006).
In the pre-training process, the RBMs are trained layer by layer with an unsupervised manner.The forward pre-training process can be regarded as a construction and reconstruction process using Eq. ( 1).After all the RBMs in the DBN are pre-trained, the fine-tuning step will be applied to DBN using backpropagation algorithm (G.Zhao et al., 2018).In this fine-tuning process, the weights and biases of every layer are adjusted continuously until the error becomes smaller than predefined threshold.The trained DBN model is obtained after the fine-tuning step and can be used in describing the fault dynamics.
As mentioned earlier, the training procedure includes pretraining and fine-tuning.Pre-training stage aims to extract features based on its learning rules automatically.In pretraining, the stacked RBMs are trained layer by layer using greedy learning algorithm (Hinton et al., 2006).This is an unsupervised training process.
Given training input data and the initialization parameters, the first hidden layer can be trained greedily by ( 4) and ( 5).This process is a positive phase, in which the gradient of the log probability of the given training data can be described as: The learning rule aims at maximizing the log probability of the data, which is equal to minimizing the divergence of the distribution defined by the model and the given training data.
Based on the Contrastive Divergence (CD) algorithm (Hinton et al., 2006), in the training process, the parameters of DBN can be adjusted by where γ ∈[0,1] denotes the learning rate, which can be used to adjust the learning speed.
The fine-tuning process is conducted to optimize the pretrained network parameters and this supervised learning process will further adjust the structure to improve the classification accuracy.Conjugate gradient algorithm is applied to fine-tune the trained parameters using the labeled data.In this step, all parameters are updated at the same time until the fine-tuning threshold is reached.The trained DBN model can be got after these two steps are finished.

PCA principle
Principal component analysis is a traditional statistical analysis approach which can be used to get principal components.
PCA can be assumed as a transformation that projects original data to a new space with lower dimension (Chang et al., 2008).Given original data vector x i (i = 1, ..m), the covariance matrix of the data vector can be calculated as: where µ is the mean value of the vector, µ = 1 m m i=1 x i .Assume that the origin data vector is a n dimensional data, the eigenvalue of the covariance matrix can be described as: where λ j are the eigenvalues of the covariance matrix, which are sorted in descending order and u j are the corresponding eigenvectors.
To get the first k eigenvectors (k < n) that corresponding to the k largest eigenvalues, let The principal components of the original data can be computed as the orthogonal transformations of x i : The obtained components are named as principal components.Dimensional reduction can be achieved by using the first several eigenvectors of the eigenvectors.In this paper, the distribution of the extracted features are assumed as Gaussian, this work mainly uses the characteristic of dimensional reduction of PCA.

LS-SVM principle
LS-SVM is an extension of SVM.It changes the inequality constraints in SVM to equality constraints, which transforms the quadratic problems in SVM into linear equations problems.Compared with traditional SVM, LS-SVM has higher operation efficiency and solution accuracy.
Given training dataset (x i , y i ) with x i being the input vector and y i being its corresponding output label, we define χ as the corresponding feature vector that can be used to map the input vector into a new feature space.Then a hyperplane can be described as: where w denotes the weight of the orientation of the hyperplane, b is the bias.LS-SVM based identification problem can be regarded as the following optimization problem (Liu, Bo, & Luo, 2015): where J is the objective function, γ is the tradeoff coefficient.Sample x i can be projected into high-dimensional space by nonlinear mapping χ.To minimize the objective function, the first step is to define the corresponding Lagrange function: ) where β i is a Lagrange multiplier.The conditions for optimality are given as: Eliminate w and e, a linear equation can be obtained as: where y = [y i , y 2 , ..., y N ] T ; l = [1, 1, ..., 1] T ; β = [β 1 , β 2 , ..., β N ] T ; Φ i,j = (x i ) T (x j ), I is the unit matrix.Finally, LS-SVM classification decision-making model can be described as: is the Radial Basis Function (RBF) kernel function to be used in this paper, and σ is the kernel bandwidth.
Bearing fault diagnosis is an multi-classification problem in which the classification model is actually constructed by combining multiple two-class SVM classifiers.In this paper, one-against-one SVM is applied to identify different fault modes.

DBN-PCA-LSSVM BASED BEARING DIAGNOSIS
This research presents a bearing diagnosis using AE signals.The proposed approach includes offline modeling and online testing processes.In the modeling process, acoustic emission signals collected from bearings are pre-processed and fed into the initialized DBN structure, through which features can be extracted automatically layer by layer.
The detailed implementation diagnostic procedure steps are described as follows: Step 1: Partition the time series data into segments based on the data sampling rate and roller bearing operation state, and divide the data into training data and testing data.
Step 2: Define the DBN structure, and train the DBN using the training data set.In this step, the number of hidden layers, the number of neurons in each hidden layer, the learning rate, and the initial weights should be defined.The training process is finished when the performance meets pre-defined requirements or iteration number reaches the threshold.
Step 3: PCA is applied on the extracted features ([x 1 , x 2 , ..., x n ]) from DBN to reduce dimensionality.Lowdimensional training features ([z 1 , z 2 , ..., z m ], m < n) can be obtained in this step.Besides, a projection matrix, which will be used in the testing process, is also obtained.
Step 4: Optimize LS-SVM by using training labels and the low-dimensional features obtained in Step 3 to identify different bearing fault modes.
Step 5: Test the trained model with the testing data set, trained DBN model, eigenvector of PCA, and trained LS-SVM.Analyze the performance on test data.
This proposed DBN-PCA-LSSVM based bearing diagnostic approach can extract features from raw data automatically, classify the fault mode, and estimate the fault severity.It avoids complex signal processing and human involvement in feature selection and extraction, which makes it more applicable, and easier to be extended to other applications.

Acoustic emission data preparation
The bearing used for the verification of the proposed approach is a tapered roller bearing: Timken LM501310 cup and LM501349 cone.The structure of the rolling bearing is described in Fig. 4 (Zhang et al., 2008) and the main geometric parameters are listed in Table 1.The data was collected from the AE sensors with the sampling rate of 50kHz, and the experimental test was conducted under different fault sizes with three different rotating speeds and three different loads, which makes nine different data sets for each fault size.The details of the collected AE data are described in Table 2.For each test, the fault size is given by depth and width measured by microns to show fault severity.To utilize the data efficiently, the fault size is defined as the sum of width and  3. Based on this fault size definition, the fault mode is defined from F-1 (Health) to F-6 as shown in Table 3.To show the feature extraction performance, principle component analysis (PCA) is applied on the output of each layer.
Only the first three main components are visualized in the 3-D space to make the performance of feature extraction clear.Fig. 5 is the visualization of raw AE data while Fig. 6 is the visualization of the extracted features by DBN.The misclassification samples mainly appear on F-4 and F-6, 104 samples from F-4 are misclassified as F-6, and 75 samples from F-6 are misclassified as F-4.These results are also consistent with the results in Fig. 6, in which the degrees of overlapping of F-6 (cyan samples) and F-4 (black samples) are larger than others.Here is an example: One sample of F-6 is incorrectly classfied as F-4.Based on the outputs of DBN, the probabilities of this sample belongs to F-4 and F-6 are 0.5909 and 0.4087, respectively.Both of the probabilities are not large enough for the classifier to make a strong decision.In this case, misclassification occurs.The potential causes and advanced approach to make correct classification will be further investigated in the future work.

CONCLUSIONS AND FUTURE WORK
This paper presents a DBN-PCA-LSSVM based rolling element bearing fault diagnosis approach.In the proposed approach, DBN is developed to extract feature automatically from raw sensor data, PCA is applied to reduce the dimensionality of the extracted features, and LS-SVM is performed to identify different fault modes.The rolling bearing fault diagnosis experiment is presented to validate the proposed approach.
The contributions of this paper are as follows: 1) By combining PCA with DBN, an integrated, accurate and intelligent fault diagnosis is proposed, in which DBN and PCA are used for bearing feature extraction, and LS-SVM is used for fault diagnosis.This integration takes full advantages of strong feature learning ability of DBN and statistic analysis of PCA.
2) Features with different dimension sizes are analyzed to find the optimal feature size for LS-SVM, which can achieve the highest classification accuracy.
With a DBN structure designed, the raw AE data are fed into the DBN and features are extracted automatically for fault diagnosis.PCA is applied to reduce the dimension of the extracted, then improve the accuracy and efficiency of fault classification.The experimental results show that this approach can achieve high diagnostic accuracy.From this perspective, the proposed DBN-PCA-LSSVM based approach provides a generic solution that can be applied to a variety of systems.
Compared with the traditional signal processing and machine learning based approaches, the proposed method does not require complex signal processing techniques and human involvement.
The future research work will mainly focus on optimizing the structure and analyzing the misclassification causes of the proposed approach to enhance fault diagnosis accuracy and efficiency.
Table 5. Fault diagnosis results

Figure 1 .
Figure 1.The structure of a 2 hidden layer DBN

Figure 2 .
Figure 2. Structure of Restricted Boltzmann Machine

Fig. 5
Fig.5shows that the raw data of all the fault modes have severe degrees of overlapping.In other words, it will be very difficult to classify different fault modes and severity from raw data.With DBN learning outputs at different layers, the degrees of overlapping of the outputs from each DBN layer is decreasing.At the output layer, the extracted features are almost separated, as visualized in Fig.6.

Figure 8 .
Figure 8. Diagnosis results obtained by DBN-PCA-LSSVM with different dimension sizes for PCA

Table 1 .
The geometric parameters of bearing

Table 2 .
AE testing data description For example, for C1 in Table2with fault width and depth of 35.33 and 2.46, respectively, the fault dimension is measured as 35.33+2.46=37.79,as shown in Table

Table 4 .
Training parametersFor bearing fault diagnosis, the information in one cycle of rotation should be included in the input vector of DBN.For this reason, consider the lowest rotating speed, the input vector size is set as 3750.The DBN structure is given as 3750-1500-600-200.Other training parameters are given in Table4.Each fault mode has 2250 samples in which 60% are used for training and the remaining 40% are used for testing.To train the DBN with a fair way, each raw data set is separated into several segments, which are randomly selected to construct the training set and testing set.

Table 5
analyzed the misclassification results of each fault mode.