Vibration Signal Decomposition using Dilated CNN

Vibration sensors have gained increasing popularity as valuable tools for Prognostics and Health Management (PHM) applications, enabling early detection of mechanical failures in industrial machines. Vibration signals comprise two main sources of information: periodic vibrations from components, phase-locked to the rotating speed (e.g., gears), and non-deterministic broadband vibrations associated with bearings, structure, and background noise. In PHM applications, it is important to decompose vibrations into these two sources to optimize the use of different diagnostic methods for each signal component. In practice, the decomposition should be cost-effective by working without supplementary information about system operating conditions and kinematics.
Existing methods of vibration source separation commonly rely on an auto-regression (AR) model of vibrations and employ adaptive filtering techniques to estimate its parameters. However, these methods suffer from degraded accuracy in complex geared vibrations containing numerous periodic components and requiring large filter length to promise high frequency resolution in component separation.
To address these challenges, we propose a new method that utilizes dilated Convolutional Neural Networks (CNNs) instead of adaptive filtering to improve the accuracy of decomposing complex vibration signals, all without the need for any supplementary information.
To evaluate the performance of the new method, we conducted experiments using both simulated signals and real-world vibrations. The simulation results demonstrate improved accuracy in signal decomposition when our method is used instead of adaptive filtering. Additionally, the new method applied to real vibrations, showcases significant enhancement in bearing failure detection through accurate isolation of bearing-related vibrations.
This study reveals the potential of our new method in various PHM applications requiring highly accurate diagnostics and prognostics in complex geared vibrations, particularly when supplementary information about operating conditions and system kinematics is unavailable.


INTRODUCTION
The application of vibration sensors, especially wireless ones, is gaining momentum in the field of industrial condition monitoring.Tiboni et al. (2022) demonstrated that this is primarily due to the sensors' ability to detect mechanical failures at early stages.

Vibration Signal Components
Vibration signals can be highly complex, particularly in geared machines that operate under varying speeds and loads, as noted by Feng at al. (2018), Zimroz at al. (2014) and Gildish at al. (2022).
Even in cases where speed and load remain constant over a short period of time, the vibration signal typically comprises a combination of signals generated by two sources, as described by Antoni et al. (2004).The first source is generated by sub-systems, such as gears, that are phaselocked to the operating speed, resulting in periodic components within the signal (periodic or deterministic part of signal).The second source comprises components that are not phase-locked (broadband or non-deterministic part of signal), such as bearings that experience rolling and slipping due to varying loads, as well as structure-related vibrations, as demonstrated by Gildish at al. (2022, June).Additionally, background and measurement noise contribute to the second part of vibrations.conditions, the digitized vibration signal can be expressed as the sum of periodic and non-deterministic stationary processes, as shown in the equation below: where () and () represent periodic and stationary processes, respectively.
According to Wold's theorem, as published in Priestley at al. (1981), it is always possible and unique to decompose the signal into its periodic and non-deterministic components.
In a periodic process, any value () can be predicted based on its past values ( − ), where m > 0. However, when dealing with a non-deterministic process (), attempting to predict its future values solely based on its past values would result in a systematic random error.
In this study, we will isolate these two components using our new method, even in the absence of rotating speed measurements and system kinematics.

AR Model of Vibrations
Antoni at al. ( 2004) demonstrated that for periodic signals (), the optimal predictor of any value, in terms of minimizing the mean squared prediction error, is a linear combination of its past values.The predictor for values in process () is equivalent to the estimator of the periodic component of the signal () .This predictor can be represented in an autoregressive form, as shown below: where the predictor  ̂() estimates the value () by utilizing its previous N values.The time delay ∆ guarantees that the non-deterministic component remains uncorrelated, as expressed by the condition {()( − )} = 0, ∀  > ∆.Once estimated, the model coefficients () represent the parameters of the auto-regression (AR) model with a length of N.
When the predictor () is calculated the difference between the signal and the predictor represents the estimation of the non-deterministic component of the process as follows: () = () − ().Antoni et al. (2004) demonstrated that achieving high accuracy in estimating the periodic part of vibration signals requires using an AR model with a considerably long length, particularly when dealing with noise and a large number of periodic components.
The selection of the AR model order N is crucial in this context.On one hand, a longer length is necessary to effectively capture a significant number of closely spaced periodic components.On the other hand, the length should be considerably shorter to avoid model overfitting and provide the predictor with sufficient time to estimate the coefficients within the limitations of the signal length.

Related Works
The methods presented below enable the decomposition of vibrations into the periodic and non-deterministic components.Antoni at al. (2004) and Randall (2004) utilized the periodic nature of vibrations to construct an adaptive filter that extracts the periodic part by exploring the AR model of signal and proposed a new adaptive filtering algorithm.Dixit et al. (2017) have summarized the main adaptive filtering methods used for estimating AR model parameters.The primary concept behind filtering is to minimize the error between the current signal values and the predicted values by the filter.
The advantage of these methods is that they do not require any supplementary information regarding system operating conditions and kinematics.However, when a high frequency resolution is needed to estimate complex geared vibrations with numeric spectrum components related to the periodic part of the signal, the length of the filters increases significantly.
Randall at al. ( 2011) introduced a method for extracting periodic components known as cepstrum-based extraction.
The cepstrum is a logarithmic representation of the spectrum.By transforming the signal into the cepstrum domain, the harmonics corresponding to the rotating speed are mapped to specific positions in the quefrency domain (analogous to frequency in the cepstrum domain), enabling their extraction.While this method does not require knowledge about the system's kinematics, it is not accurate since relies on the presence of a significant number of shaft harmonics from each shaft in the spectrum, which is typically not the case.
An alternative approach to extracting periodic contents is proposed by Groover et al. (2005), Braun (2011), Peeters et al. (2005 and2007) and in Gildish at al. (2022 June).The signal is resampled to a consistent angular basis for each system shaft, followed by synchronous averaging to extract rotating speed multipliers related to the periodic component of vibrations.Although computationally expensive, this method showcases high accuracy in dealing with complex geared vibrations containing multiple periodic sources.Subsequently, Abboud et al. (2016) and Abboud et al. (2019) extended this method to handle non-stationary scenarios involving changing rotating speeds and loads.However, these methods require the measurement of the system's rotating speed and the possession of knowledge about the system's kinematics.This can be costly and may not be available in numerous PHM applications.
The literature review clearly indicates that existing methods do not offer an accurate solution for decomposition of the complex vibration signals in situations where measurements of rotating speed and system kinematics are unavailable.
This paper is structured as follows: Section 2 describes the proposed method, while Sections 3 and 4 present the evaluation results using simulations and real data, respectively.The conclusions are summarized in Section 5.

Contribution
The proposed method improves the decomposition of vibration signal into sources compared to existing adaptive filtering methods.It is an unsupervised method that does not require any additional information about operating conditions or system kinematics.The new method improves bearing fault detection at early stages by accurately removing highenergy gear vibrations without requiring any additional system information.This advancement is expected to reduce costs in PHM systems.Additionally, recent advancements in edge AI hardware enable practical implementation in devices with ultra-low power constraints, expanding its potential applications.

Algorithm Flowchart
The proposed method is outlined in Figure 1.For the estimation of the periodic part from vibrations, a dilated Convolutional Neural Network (CNN) is utilized, as detailed in section 2.2.
The new method decomposes vibration signals into periodic and non-deterministic vibrations by leveraging the AR assumption, as shown in equation ( 2).This assumption allows the periodic signal component to be predicted through a linear combination of its  previous values, effectively separating it from the non-deterministic component.
The proposed approach predicts the signal value () by utilizing the previous N values as inputs to the dilated CNN model, where AR model order N corresponds to the model's receptive field.The optimization of the model parameters aims to minimize the error () between the predicted values () and the actual values () which corresponds to a nondeterministic part of signal.The configuration parameters of the model consist of the depth (number of layers), kernel size, and dilation factor in each layer.
The method tackles the challenge posed by adaptive filtering methods, where a long filter length (AR model order) is required to accurately predict complex vibrations with numerous periodic components and noise.In contrast, the number of parameters to be optimized in the dilated CNN is significantly lower as demonstrated further in section 2.2.3).
Figure 1.Flowchart illustrating the proposed method of the periodic vibrations estimation In the following sections, we will discuss the application of dilated Convolutional Neural Networks (CNNs) for decomposing vibration signals into the two aforementioned sources.

Dilated CNN
The application of dilated CNN in time series forecasting was initially introduced by Borovykh et al. (2018) to expand the receptive field of filters in 1D convolutions.This approach originally employed a non-linear activation function for time series prediction.Subsequently, the method has been adapted and utilized in various domains, such as financial forecasting in Li et al. (2021) and time series forecasting in smart grid applications in Mishra et al. (2021).The problem of adaptive filter length is resolved by allowing a substantial increase in the dilated CNN receptive field while keeping the number of optimization parameters very small.This is achieved by extending the network depth and adjusting its kernel size.
In general, in CNNs, a non-linear activation function is applied to the output of each layer.However, in our study, we employ a linear CNN approach since the optimal predictor for the periodic components needs to be linear, as demonstrated in equation ( 2).As demonstrated by Borovykh at al. (2018), when considering a one-dimensional vibration signal () and a CNN with  layers, the input in each layer is obtained from the output of the previous hidden layer and can be expressed as follows: where   () is the output of layer , operator ( *  ) refers to the dilated convolution,   and   represent the dilation factor and weights (or kernel) of layer  respectively.In contrast to the regular convolution in dilated convolution the filter is applied to every   th element in the input vector.This enables the model to effectively learn connections between distant data points, facilitating the efficient capture of longrange dependencies within the input signals.
In this study we utilize an architecture comprising  layers of dilated convolutions, with the dilation factor increasing by a factor of 2 in each subsequent layer: An example of three-layer dilated CNN is shown in Figure 2 Figure 2. Example of dilated CNN with 3 layers

Signal Decomposition
The predicted values of the signal, obtained using  layers of the proposed linear dilated CNN, can be expressed as follows: where () is the estimated periodic part of signal ().
The filter weights at layer , represented by   , have a length defined by the kernel size, and the dilation factor at each layer increases by a factor of 2:  ∈ [2 0 , 2 1 , . . 2 −1 ].
The model is trained is as follows: 1.Each signal is partitioned into a training set and a validation set, with a split ratio of 80:20.
2. The mean square error (MSE) between the predicted and actual signal values is calculated after each epoch for both the training and validation sets.
3. Early stopping is used to avoid overfitting by stopping the training when the running mean of the validation set MSE no longer improves.
From a signal processing perspective, the utilization of dilated CNNs replaces a single filter, as commonly employed in adaptive filtering, with a multi-scale filter bank [  1 ,  −2 ,…  −1 ].This approach is similar to Wavelet CNNs introduced by Fujieda et al. (2018), which aid in enhancing the spectrum resolution for estimating periodic components.The advantage of using linear dilated CNNs lies in their easier optimization process, as they do not involve pooling layers and more complex architecture like in Wavelet CNNs.

Receptive Field of Dilated CNN
An adaptive filter of length  (receptive field) is equivalent to a single-layer dilated CNN with a dilation rate of 1 and a kernel size of .This makes it easy to compare these two methods.
The main advantage of utilizing a dilated CNN instead of adaptive filtering can be illustrated by comparing the number of parameters that need to be optimized.Assuming an equivalent receptive field  (AR model order) in both cases, we can compare the number of parameters involved.
The receptive filled of dilated CNN is defined in Araujo at al. ( 2019) and can be generally expressed as follows: where  is the number of layers,   and   are the dilating factor and kernel size respectively in layer  ,   is the stride of layer .
In our study, we employ the following definitions to showcase the advantages of the method, without sacrificing generality.It is important to note that in the future, the parameters can be optimized according to specific application requirements.
We adopt a specific configuration where the dilation factor is increased by a factor of 2 for each layer, stride is constant and equal to 1 and kernel size is the same for all the layers: Thus, the simplified expression for the receptive field of the dilated CNN can be given as follows:

New Method Advantages
In complex vibration signals containing numerous periodic components, it is crucial to have a sufficiently large receptive field to ensure accurate predictions.To achieve the desired receptive field, we have two options: either optimize  parameters, as done in adaptive filtering, or optimize  •  parameters in the case of dilated CNN.
The advantages of utilizing the dilated CNN can be observed in the table below, where different values of  are considered (between 1000 to 15000) and correspond to the number of parameters to be optimized in the regular adaptive filtering.
To demonstrate the advantages, the model depth is set to  = 8, and the kernel size  is calculated using equation (10).
For instance, when the receptive field  is equal to 15000, the dilated CNN requires only 480 parameters to be optimized, which is approximately 31 times fewer compared to the number of adaptive filtering.
Reducing the number of parameters provides notable benefits in terms of parameter optimization and allows for expanding the receptive field while keeping the optimization parameters minimal.This helps prevent model overfitting, unlike classical adaptive filtering, where insufficient signal length hinders accurate estimation of filter parameters for longer filter lengths.

METHOD EVALUATION IN SIMULATIONS
The purpose of the simulation is to assess the benefits of using the new approach compared to the regular adaptive filtering when estimating periodic signal components from vibrations.
During the evaluation, we consider the dilated CNN model with depth=1 and dilation=1 as an approximation of the existing adaptive filtering where filter length is equal to receptive field  of the model.Consequently, the performance of the new method will be assessed by comparing it to this model configuration.
To evaluate these advantages, a dataset consisting of 1000 signals was simulated.Each signal within the dataset contains both periodic and non-deterministic components.Evaluation criteria were calculated for each signal and then averaged over the entire dataset.

Signals Simulation
The simulation of signals similar to vibrations was carried out according to the following specifications: 1.Time Duration: Each signal had a duration of 1 second.
2. Sampling Frequency: The signals were sampled at a frequency of 24 kHz.

Periodic Components:
To simulate signals resembling real vibrations, the periodic part of each signal was generated as a sum of 100 sinusoidal signals with specific properties: The frequency of each sinusoid was randomly generated from a Beta [2, 2] distribution to ensure a higher likelihood of peaks appearing in the middle of the signal spectrum, similar to real-world signals.
The phase of each sinusoid was uniformly distributed in the range of [-π, π].The amplitudes of the sinusoids were drawn from a uniform distribution ranging from 0 to 1.
4. The non-deterministic part of the vibration signals was simulated as noise with a normal distribution and SNR=10dB to ensure that the noise variance is sufficiently high to evaluate the method in a noisy environment.
5. Both the periodic and non-deterministic parts were convolved with a structure function simulated as a second-order transfer function with 5 poles.The resonance frequencies of the system were randomly drawn from a uniform distribution between 0 and fs/4, where fs=24000Hz represents the sampling frequency.
6. To maintain consistency and comparability, the generated signals were normalized to the range between -1 and 1. Figure 3 presents an example of simulated vibrations in both the time and frequency domains.The top graph illustrates the periodic and non-deterministic components of the signal.In the bottom graph, the spectrum reveals peaks that correspond to the periodic part, while the background noise represents the non-deterministic aspect of the vibrations.

Evaluation Criteria
The proposed method was evaluated based on the following criteria given () and () are actual periodic part and that estimated from vibrations by using dilated CNN:  The Mean Square Error (MSE) is estimated given  = 24000 samples in every signal  The Coefficient of Determination ( 2 ) where ̅ is the average value of actual periodic part of signal

Model Configuration
The evaluation of the method involved utilizing the different configurations of the dilated CNN in comparison to the adaptive filtering whose length was equal the receptive field of the model for consistent comparison:  Several model depths were employed, including 1, 2, 4, and 8 layers.The depth of 1 served as a baseline for comparison, whose performance corresponds to that of adaptive filtering.
 The recipient field  of the dilated CNN, which also corresponds to the length of adaptive filter we plan to compare was adjusted within the range of 1500 to 5000 samples.The minimum 1500 was chosen to provide the minimum kernel size of 6 samples as following from equation ( 10)  Simulations of different receptive fields and model depths were conducted to assess the model's performance.The kernel size of the model is uniquely defined by its depth and receptive field size following equation ( 7) as follows: where  is the receptive field and  is the model depth.
 Model optimization was performed by using AdamW algorithm  The model optimization metric was MSE

Decomposition Results
In  The simulation results for varying receptive fields and model depths are summarized in Figure 5.For each combination of receptive field and model depth, the kernel size was recalculated using equation (10).
Figure 5. Simulation results evaluated using two criteria:  2 (top) and  (bottom).The model with depth=1 approximates the adaptive filtering approach.
Both evaluation criteria yield similar results:  The blue graph's behavior reveals the drawback of adaptive filtering (model depth=1).As the receptive field grows, its performance declines due to the increased number of parameters and limited signal size.Consequently, over-fitting becomes a concern in this estimation scenario.
 When the dilated CNN is used, the estimation accuracy improves as the model depth increases, particularly for larger receptive field values.This fact demonstrates the improvement of the new method compared to the exiting one  When the receptive field is small, the behavior of the estimation varies noticeably with increasing model depth.The model with depth=8 underperforms other models for receptive fields ranging between 1500 and 3200 samples.However, it begins to outperform them starting from 3500 samples.The reason behind this is that the kernel size depends on both the receptive field and depth, and as the model depth increases, the kernel size becomes too small.Further research is needed to optimize the kernel size for different applications.
 The performance of all models deteriorates as the receptive field increases.However, models with larger depth exhibit better performance for larger receptive fields which demonstrate the advantages of the new method

Simulations Conclusions
The simulation results highlight the advantages of the new method for decomposing vibration signals:  The utilization of dilated CNNs instead of adaptive filtering improves the decomposition across a wide range of receptive fields, by adjustment of kernel size and depth.
 Increasing the model depth yields significant improvement in decomposition accuracy for larger receptive fields.
 The performance of the models with small depth tends to degrade as the receptive field increases due to the limited signal length for optimizing parameters.
 The new method does not enhance performance for small receptive fields because the kernel size becomes excessively small as the depth increases, consequently negatively affecting performance.

EXPERIMENTAL RESULTS
This section evaluates a new method to enhance early-stage detection of bearing faults using real vibration data from an offshore 5MW wind turbine.The experimental setup involved the utilization of the WT-HUMS (Wind Turbines Health and Usage Management System) developed by RSL Electronics for recording the vibrations and rotating speed data, as illustrated in Figure 6.The analysis employed a sensor installed on the gearbox output, near the generator, capable of sensing both the gearbox and generator components.The sensor had a sampling frequency of 24 kHz, and each recording had a duration of 1 second.
To ensure data quality, a validation procedure was implemented to select only vibration recordings in which the rotating speed remained stable.The stability criterion was defined as a maximum change of 2% in rotating speed during the recording.
The objective of the experiment was to enhance the detection of generator bearing faults by isolating periodic and broadband vibrations using various configurations of dilated CNNs.The parameters of the dilated CNN were adjusted in a similar manner as defined in the simulations (see section 3.3): The example of the periodic and the non-periodic parts extraction is demonstrated in Figure 7 and Figure 8.

CONCLUSIONS AND RECOMMENDATIONS
In this study, we propose a new method using dilated CNN to accurately decompose vibration signals into periodic and non-deterministic components.Our method eliminates the need for system kinematics and rotating speed measurements.
Simulations and experiments on real vibrations from faulty wind turbine generator bearings demonstrate significant improvements compared to conventional adaptive filtering techniques.The study also highlights the importance of model configuration, receptive field, and depth.Future research should focus on optimizing the method for different scenarios, exploring different dilation and kernel options, and considering the benefits of non-linear models for improved estimation.

Figure 3 .
Figure 3. Example of simulated vibration signal (orange) along with its periodic component (blue).The top and bottom graphs correspond to snapshots from the time and frequency domains, respectively.
the top graph of Figure 4, we present an example of simulated periodic vibrations (orange) alongside the periodic component extracted using the dilated CNN (blue) in the time domain.The bottom graph displays the spectrum, revealing peaks that correspond to the simulated periodic component (orange) and the estimated periodic component (orange) of vibrations.

Figure 4 .
Figure 4.The periodic component of a vibration signal, as depicted in the previous figure, was extracted using a dilated CNN.

Figure 6 .
Figure 6.Measurement system architecture and sensor locations.Vibrations acquisition was facilitated by a Main Processing Unit, which transmitted the recorded data to a Ground Station.

Figure 7 .
Figure 7. Evaluation of the method for real vibration signals in time (top) and frequency (bottom) domains.Top graph: the signal (orange) and the extracted periodic component (blue).Bottom graph: the signal spectrum (orange) and the extracted non-periodic component (blue).

Figure 8 .
Figure 8. Zoomed version of the previous example

Figure 9 .
Figure 9. Spectrum of the original signal (black) and the extracted periodic component obtained from real vibrations.Model with depth=1 approximates adaptive filtering.

Figure 10
Figure 10 illustrates the enhancement in bearing fault detection at early stages achieved by utilizing the new method.The figure displays the envelope FFT of the generator bearing BPFI frequency after removing the highenergy periodic component from the signal.The peak amplitude associated with bearing faults is maximized when employing the dilated CNN with the maximum depth.These results exemplify the advancement in early bearing diagnostics facilitated by the new method.

Figure 10 .
Figure 10.Example of envelope FFT at bearing BPFI fault frequency at early stages of the inner race defect for different model depths.The model with depth=1 approximates the adaptive filtering approach.