A transfer learning-based rolling bearing fault diagnosis across machines

Currently, most studies focus on training an excellent deep learning model based on sufficient labeled data collected from machines. However, in real applications, it is costly or impractical to obtain massive labeled data for model training. Therefore, in this paper, a Transfer Learning (TL)-based fault diagnosis method is proposed to transfer the model learnt from one machine (source domain) to another one (target domain). In the training process, labeled source data and unlabeled target data are used, which is very promising for real industrial applications. In this frame of transfer learningbased fault diagnosis, a cyclic spectrum correlation analysis method is firstly introduced to obtain order frequency maps for removing the influence of speed variation and revealing the hidden cyclic frequency of signals. Then, the Dynamic Adversarial Adaptation Network (DAAN) is introduced to transfer label information across machines. The proposed fault diagnosis method across machines is applied on two rolling element bearing datasets collected from two different test rigs. Experimental results demonstrate the effectiveness and superiority of the proposed method compared with stateof-the-art approaches.


INTRODUCTION
Being key components of rotating machinery, rolling bearings are widely used in aircrafts, high-speed trains, wind turbines, etc. Once a bearing fails, the equipment is not anymore able to operate normally, and even accidents may occur. Fault diagnosis plays a crucial role in ensuring the safe operation of machinery as an important part of Prognostics and Health Management (PHM).
Bearing fault diagnosis is mainly based on vibration signals, because they are sensitive to weak faults (Yong et al., 2016). The traditional fault diagnosis chain mainly includes several steps: signal acquisition, signal denoising, feature extraction, and decision making. One well-known approach is to select the bearing fault-related frequency band for signal demodulation (Randall, 2011). Antoni et al. (2006) proposed the fast Kurtogram to select the frequency band associated with bearing faults, which has been studied and widely applied in the bearing fault diagnosis. Recently, Cyclic Spectral Correlation (CSCorr) has gained momentum in the condition monitoring community, because it is able to reveal hidden periodicities of second-order cyclostationarity (Antoni, J., 2007), such as the weak bearings signals, which are often buried in noise or masked by other stronger signals. Mauricio et al. (2018) explored the application of CSCorr in the condition monitoring of planerary gearboxes under varying speed conditions and obtained promising diagnostic results.
Thanks to the evolution of Machine Learning (ML) technology, ML-based machinery fault diagnosis methods have been significantly developed. In general, the statistical features of vibration signals are firstly extracted and then fed into ML models, such as Support Vector Machine (SVM) (Widodo et al., 2007), k-nearest neighbor (Lu et al., 2021), naive Bayes (Zhang et al., 2018), etc., to obtain diagnostic results. In the past decade, Deep Learning (DL) technology has attracted much attention from researchers in computer vision (Voulodimos et al., 2018), medical image segmentation (Hesamian et al., 2019), speech recognition (Nassif et al., 2019). Owing to their excellent feature extraction ability from large amount of data, DL models have been extensively studied in the field of PHM, including Convolutional Neural Network (CNN) Chen et al., 2019), Long Short-Term Memory (LSTM) , etc.
Currently, most DL models are trained by one dataset collected from one single machine for fault diagnosis Wang et al., 2019). In real life applications, it is still very impractical to train an effective deep learning model for each machine, as collecting enough labeled data covering various operating conditions and various fault types is very Dandan Peng et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

A Transfer Learning-Based Rolling Bearing Fault Diagnosis Across Machines
time-consuming and costly. Therefore, a natural idea is to leverage the label information of one machine to improve the models' diagnostic performance on other machines. However, a direct model re-application on other machines will decrease the performance considerably. The main reason is the distribution mis-match between the two machines, which is also referred as the domain shift issue (Ganin et al., 2016).
The Transfer Learning (TL) technology can address the domain shift problem and transfer label information across domains successfully (Long et al., 2017). Inspired by the adversarial training idea of Generative Adversarial Network (GAN) (Creswell et al., 2018), Ganin et al. (2016) proposed a deep transfer learning network, called Domain Adversarial Neural Network (DANN), in which the marginal distributions of the source domain and the target domain are aligned to a shared feature space for solving the domain shift issue. It is an unsupervised domain adaptation method and widely applied in the transfer learning tasks among different working conditions (Guo et al., 2018;Mao et al., 2020;Guo et al. 2021). Besides the marginal distribution alignment between domains, the conditional distribution alignment also contributes to the adaptation (Yu et al., 2019). Therefore, Yu et al. (2019) developed the Dynamic Adversarial Adaptation Network (DAAN) to extract the domain-invariant features by aligning the marginal and conditional distributions together. It has been demonstrated that these deep TL models perform well in the case of TL among different working conditions, but they may fail in the transfer learning tasks across machines.
In this paper, a novel TL-based fault diagnosis framework across machines is proposed. Firstly, the CSCorr method is performed on the vibration signal to obtain an orderfrequency two dimensional (2D) map, which can not only remove the influence of speed variation, but also reveal the hidden cyclic frequency of bearing signals. Then, DAAN is introduced to align the marginal (global) and conditional (local) distributions between machines and thus successfully transfers the model trained from one machine to another one. The training data is composed of labeled source data and unlabeled target data, thus solving the difficulty of labelling data. The proposed TL-based fault diagnosis method is validated on two different bearing datasets and experimental results approve its effectiveness. The proposed transfer learning-based fault diagnosis method for rolling bearings is successfully applied to the fault diagnosis tasks across machines.
The remaining part of this paper is organized as follows. Section 2 reviews the basic theories of CSCorr and DANN. The proposed fault diagnosis framework is illustrated in Section 3. Section 4 demonstrates the effectiveness of the proposed method by excessive experiments, and finally some conclusions are given in Section 5.

BACKGROUND THEORY
This section reviews the basic theory of Cyclic Spectral Correlation (CSCorr) and Domain Adversarial Neural Network (DANN).

Cyclic Spectral Correlation
Cyclic Spectral Correlation (CSCorr) (Antoni, J., 2007) is a very powerful tool in bearing fault diagnosis field. A bearing vibration signal can be described as a second-order cyclostationary signal, defined as Eq. (1).
where X R is the autocorrelation function, E represents the expected value, ( ) x t is the signal with the time variable t, * means conjugate, T is the cyclic period and is the lag parameter. The cyclic autocorrelation function describes the periodicity of the second moment of the signal.
CSCorr reveals the hidden modulation signal (the cyclic frequency α), and its carrier frequency (the spectral frequency f ) (Mauricio et al., 2020). It is a bi-frequency representation of cyclic and spectral frequencies. The spectral frequency highlights the carrier component of impulses, and the cyclic frequency highlights the second-order periodicity of impulses. Specifically, it measures the correlation between two frequency components of the signal at f and f +α. The statistical descriptor of CSCorr can be described as Eq. (2).
( ) W X f does the Fourier transform of the signal x(t) over the time interval W; f is the spectral frequency dual with time t and α is the cyclic frequency dual with time-lag .

Convolutional Neural Networks
Convolutional Neural Networks (CNN) are mainly composed of convolution layers, pooling layers and activation functions. The convolution layer implements the convolution operation of input features via convolution kernels, whose number is determined by the one of feature maps. Then, the activation function ReLU (Nair et al., 2010) is followed to increase the nonlinear feature learning ability of the network. Subsequently, a pooling layer is stacked, which mainly contains two algorithms: Max pooling and Mean pooling. Not only can it prevent the network from overfitting, but it can also reduce feature dimensions, thereby accelerating network training. After multiple stacked convolution and pooling layers, the output features are fed to the fully connected layer and the Softmax layer. After that, all features are mapped to the range (0, 1), which is the predicted probability distribution.

Domain Adversarial Neural Network
where f is the parameter of G f ; y and y L are the parameter and the loss function of y G , respectively; d and i d are the parameter and the loss function of G d separately; i d is the domain label (0 or 1), corresponding to the source domain or the target domain; and is a trade-off parameter. The training goal of DANN is to minimize the loss of the label classifier G y so that the G f can extract discriminative features while maximizing the loss of the domain discriminator G d to obtain domain-invariant features.

PROPOSED METHOD
In fault diagnosis field, it is difficult to train a deep learning model for each machine due to the lack of labels and data. Therefore, this paper proposes a TL-based bearing fault diagnosis method across machines, as shown in Figure 1. Firstly, CSCorr is adopted as a signal preprocessing method to highlight the hidden cyclic frequencies of bearing signals and 2D order frequency maps are obtained. Then, the DAAN domain adaption method is proposed to achieve domain transfer with the input of CSCorr maps. In addition to map global feature distributions, the local subdomain (each fault class) distributions between two domains are also expected to be aligned while using the DAAN method (Yu et al., 2019). It further introduces a class-wise domain discriminator module into the original DANN network (Ganin et al., 2016) to align the local subdomain distribution. The DAAN method includes four modules, the feature extractor G f , the label classifier y G , the global domain discriminator G d and the class-wise domain discriminator c d G .

Label Classifier
The label classifier y G implements the classification task on the source domain dataset. In front of the label classifier, there is a feature extractor G f , constructed by Resnet18 (Targ et al., 2016). G f aims to extract discriminative and domaininvariant features across domains. Then, these deep features are input to the label classifier, which consists of a fully connected layer and a Softmax layer. The number of neurons in the fully connected layer is equal to the one of the bearing healthy classes C. Softmax layer is used to get the predicted probability distribution that the sample x i belongs to class c. The loss function L y of y G is the cross-entropy loss function, defined as Eq.(4). (4)

Global Domain Discriminator
The idea of the global domain discriminator G d in DAAN comes from the domain discriminator in DANN. G d is used to align the global distributions of the source and te target domains. As shown in Figure 1, the features obtained from the feature extractor G f are input into the G d . G f tries to confuse G d as much as possible, so that the global domain discriminator G d cannot distinguish the features from the source domain or the target domain and thus the domaininvariant features between the two domains are learnt. G d is composed of three fully connected layers, and the numbers of neurons are set to 1024, 1024 and 2, respectively. Te activation function in the global domain discriminator is the ReLU. Each fully connected layer is followed by a dropout layer to avoid network overfitting. The loss function L g of the global domain discriminator is defined as in Eq.Error! Reference source not found..
where ( ) d L is the cross-entropy loss function, and i d is a domain label (0 or 1), corresponding to the source or target domain.

Network Training and Optimization
It can be seen from Figure 1 that the total loss function of the network includes the losses of the label classifier, the global domain discriminator, and the wise-class domain discriminator. Therefore, the total loss function of the network is defined as in Eq.Error! Reference source not found..
where is a trade-off parameter; L y is the loss of the label classifier; L D is the loss of the domain discriminator, which consists of the loss of the global domain discriminator and the wise-class domain discriminator. w is a weight parameter. When w=0, the DAAN degenerates into the DANN, and the global distribution alignment is more important. When w=1, it means that the local subdomain distribution of each fault class dominates, while the importance of the global distribution of both domains decreases. Therefore, the value of w can be dynamically adjusted according to the actual application. Then, the Adaptive Moment Estimation (Adam) (Kingma et al., 2014) optimization method is used to optimize and update the network parameters. The advantage of Adam is that each iteration of the learning rate has a certain range, making parameters relatively stable. The gradient direction in the back propagation process is required to be automatically reversed, which benefits from the Gradient Reversal Layer (GRL) (Yu et al., 2019).

Dataset Description
Dataset I: This dataset comes from the LMSD section of KU Leuven. The test rig is shown in Figure 2. Two rolling element bearings are installed on the shaft, namely the healthy bearing and the test bearing. Accelerometers are mounted on the housing of the experimental bearing to collect vibration signals. A motor drives the shaft on which a disk is mounted. Three test bearings are considered, including a healthy bearing, a bearing with a small spall in the inner race, and a bearing with a mild spall in the inner race, as shown in Figure 3. Dataset II: To achieve model transfer across machines, in addition to Dataset I, a public bearing dataset (Huang et al., 2018) is introduced in the paper. The test rig is shown in Figure 4. Two bearings are mounted on the shaft: the healthy bearing and the test bearing. Accelerometers are used for the signal collection. An encoder is used to measure the rotational speed of the bearing. The dataset includes four speed conditions (the speed increases, the speed decreases, the speed first increases and then decreases, and the speed first decreases and then increases). Under each speed condition, three repeated experiments were carried out. Three test bearings (a healthy bearing, a bearing with an inner race fault and a bearing with an outer race fault) are used to do the experiments. The duration of the signal is 10s. The sampling frequency is 200 kHz.

Results and Discussion
This section validates the effectiveness of the proposed transfer learning method on the cross-load, cross-speed, and cross-machine model transfer tasks. Besides the proposed method, a CNN fault diagnosis model and a DANN transfer learning model used in (Peng et al., 2021) are considered as baseline methods.
The models are all written with Python 3.6 and the deep learning framework Pytorch and run on an Ubuntu 16.04 system with a GTX 2080 GPU. The accuracy is used to evaluate the network performance. It can be expressed as:

Transfer Learning Among Load Conditions
The effectiveness of the proposed method is validated on cross-load model transfer tasks on Dataset I. 50% (681) of the samples captured under the unbalanced load condition are used for training and the rest (679) of the dataset is used for testing. Similarly, under the balanced load condition 50% (759) is used for training and the rest (757) is used for testing. The dataset from one load condition is considered as the source domain, and the dataset from the other load condition as the target domain. It is worth noting that regarding the baseline method CNN, a model is trained using the training set of the source domain, and then the well-trained model is directly applied to the test set of the target domain. For the DANN and the DAAN methods, a model is trained by the labeled training set in the source domain and the unlabeled training set in the target domain, and then the trained model is tested on the test set in the target domain.
The experimental results are shown in Table 1. Clearly, the proposed method exhibits the best transfer performance on both cross-load model transfer tasks among three methods. Specifically, when transferring from the unbalanced load to the balanced load, the accuracy of the proposed method reaches 99.33%, which is 8.49% and 7.29% higher than that of the CNN and the DANN, respectively. On another transfer learning task, the performance of the proposed method improves by 9.19% compared with that of DANN. Therefore, on the cross-load model transfer tasks, the proposed method shows a very competitive model transfer ability.

Transfer Learning Among Speed Conditions
In this section, Dataset I is used to validate the effectiveness and the superiority of the proposed method on the model transfer ability among speed conditions. The dataset under one of the speed conditions is used as the source domain, and the dataset under another speed condition is used as the target domain. Similarly, in each speed condition, 50% (around 25) of the data is used for training and the remaining dataset (around 25) is for testing. The experimental results are shown in Table 2. Obviously, on each cross-speed model transfer task, the proposed method significantly improves the transfer performance of the network compared to the other two methods. In the last row of Table 2, the average diagnostic accuracy over all cross-speed transfer tasks for three methods is calculated. The average diagnostic performance of the proposed method reaches 80.20%, which is 23.81% higher than CNN and 21.49% higher than DANN. This indicates that the DAAN can learn the discriminative domain-invariant features, and thus can improve the model transfer ability among speed conditions.

Transfer Learning Across Machines
The effectiveness of the proposed method on the crossmachine model transfer task is validated based on Dataset I and Dataset II. One dataset is regarded as the source domain, and the other dataset as the target domain. There are 1516 samples for Dataset 1 in total and 1296 for Dataset 2. Similarly, 50% of the dataset is used for training, and the remaining 50% dataset is for testing. The cross-machine model transfer results of the three methods are shown in   . It can be clearly observed from Figure 5 that the features of the different classes are mixed together, and the features of the same class corresponding to the two domains are not mapped in the same feature space. However, regarding the DAAN method, the features of the healthy class (blue and red dots in Figure 6) of the source domain and the target domain are completely mapped to the same feature space, and the features of the inner race fault (green and cyan dots in Figure  6) of the source domain and the target domain are also completely mapped to the same feature space. Moreover, the features of the healthy class and the inner race fault class are completely separated, regardless of the source domain or the target domain. This further confirms the importance of the class-wise domain discriminator in DAAN.

CONCLUSION
This paper proposes a transfer learning-based bearing fault diagnosis method across machines. The method consists of two steps. Firstly, the vibration signal is transformed to an order-frequency map using the cyclic spectral correlation. This highlights the fault characteristic information of the measured signal. Subsequently, DAAN domain adaptation method is introduced to map the global feature distributions of two domains to a common feature space and also map the local feature distributions for each fault class to a common feature space, thereby improving the model transfer learning ability across machines. The proposed transfer learningbased fault diagnosis framework exhibits excellent diagnostic performance on cross-load, cross-speed, and cross-machine bearing fault diagnosis transfer tasks.