RUL Estimation Enhancement using Hybrid Deep Learning Methods

The turbofan engine is one of the most critical aircraft components. Its failure may introduce unwanted downtime, expensive repair, and affect safety performance. Therefore, It is essential to accurately detect upcoming failures by predicting the future behavior health state of turbofan engines as well as its Remaining Useful Life. The use of deep learning techniques to estimate Remaining Useful Life has seen a growing interest over the last decade. However, hybrid deep learning methods have not been sufficiently explored yet by researchers.In this paper, we proposed two-hybrid methods combining Convolutional Auto-encoder (CAE), Bi-directional Gated Recurrent Unit (BDGRU), Bi-directional Long-Short Term Memory (BDLSTM), and Convolutional Neural Network (CNN) to enhance the RUL estimation. The results indicate that the hybrid methods exhibit the most reliable RUL prediction accuracy and significantly outperform the most robust predictions in the literature.


INTRODUCTION
As an essential part of the aircraft, the turbofan engine is a complex and sophisticated system; its safety and reliability are indispensable. Any unexpected breakdown in the engine before its overhaul will lead to a severe accident that may cost millions in lost human lives, pollution, costly repair, etc (Saxena, Goebel, Simon, & Eklund, 2008). Therefore, maintaining aircraft engine reliability presents a challenge to reduce engine downtime and maintenance costs without jeopardizing safety and ensuring engine availability. According to statistics, aircraft engine maintenance costs are approximately 70% of their whole life cycle costs (Guo, 2015). Consequently, it is essential to integrate an optimal maintenance Ikram Remadna et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. strategy that detects upcoming degradation by predicting the engines future health state, preventing unplanned downtime, and reducing maintenance costs.
In turbofan engine maintenance, several intelligent maintenance strategies are evolved from traditional ones (Gouriveau, Medjaher, Ramasso, & Zerhouni, 2013). Traditional maintenance is either reactive way (fixation or replacement of engine component after the detection of its breakdown) or proactive way (controlling the scheduling maintenance tasks based on the assumption of a certain level of performance degradation whether maintenance is essential or not). Currently, both ways are inefficient and unable to eradicate faults or to conduct them (Ding & Kamaruddin, 2015). An intelligent maintenance strategy, referred to as predictive maintenance, coordinates the scheduling maintenance tasks based on the fault diagnosis, fault prognosis, and RUL estimation.
Prognostics and Health Management (PHM) has appeared as a predictive maintenance process, which offers several advantages. Its main functions are fault detection, fault isolation, also failure prognostics that allows predicting RUL, finally making appropriate maintenance decision (Atamuradov, Medjaher, Dersin, Lamoureux, & Zerhouni, 2017). Predicting the RUL with high accuracy plays a critical role in the PHM process since its inaccurate estimation may cause unexpected catastrophic failures. Recently, various RUL prediction methods have been proposed, which can be categorized into three main approaches (Gouriveau, Medjaher, & Zerhouni, 2017): (1) physics model-based approach, (2) data-driven approach, and (3) hybrid approach. The physics model-based approach uses the mathematical model that can be a set of differential or algebraic equations, which are very useful to predict RUL in cases where the failure data available are insufficient. This approach requires extensive physical background and knowledge. However, the data-driven approach is used to model the degradation and estimate the RUL for the machine with enough failure data. The ease of collecting the monitoring data of many industrial systems has motivated many researchers to use data-driven models in estimating the RUL. Moreover, the hybrid approach integrates physics model-based and data-driven approaches to estimate its RUL.
Extracting performance degradation features from multi sensor data is a critical technical problem for complex systems as data grows high dimensional, which significantly impacts the prediction performance. Unfortunately, the popular way of designing features manually for complicated domains requires much human labour, and most information can be lost, and performance can not be guaranteed (Mierswa & Morik, 2005). Additionally, the traditional popular feature extraction methods as Principal Component Analysis (PCA) (Demšar, Harris, Brunsdon, Fotheringham, & McLoone, 2013) and Linear Discriminant Analysis (LDA) (Sharma & Paliwal, 2015) are unsupervised methods based on a projection of the original high-dimensional space to a low-dimensional space by maximizing the correlation between the data. However, they suffer from being based on linear projection. Therefore, the non-linear features extraction method has been exploited to learn the useful and the low-dimensional features, such as ISOMAP (Tenenbaum, De Silva, & Langford, 2000), and Locally Linear Embedding (LLE) (Roweis & Saul, 2000). However, the major problem with these techniques is that they have a predetermined way of extracting local data relationships among data samples, which may be inaccurate in a lowdimensional space.
Since 2006, a new branch of machine learning methods called Deep Learning was introduced to treat highly non-linear and varying data in their raw form without any human labour (LeCun, Bengio, & Hinton, 2015). DL is characterized by a deep hierarchical structure where several processing layers are stacked to learn automatically high-level representations from large-scale data that are ultimately useful for improving the prediction or classification. Various deep learning algorithms, including Long-Short Term Memory (LSTM), Deep Neural Network (DNN), Auto-Encoders (AE), Deep Belief Networks (DBN), and Convolutional Neural Network, have been proposed and outperformed conventional machine learning algorithms. Today, various researchers have shown the success of these DL architectures in many fields, including computer vision (Voulodimos, Doulamis, Doulamis, & Protopapadakis, 2018), natural language processing (Collobert & Weston, 2008), the application of machine health monitoring (Zhao et al., 2019), diagnostics in healthcare (Belaala, Bourezane, Terrissa, Al Masry, & Zerhouni, 2020).
The most current RUL estimation architectures incorporate only a single method such as CNN, DNN, or LSTM. Accordingly, the idea of the hybrid method and the application of a parallel multi-model emerged to leverage the power of different models that capture various information at different time intervals and ultimately achieving more accurate predictions, which was not previously addressed. Recently, CNN has achieved promising results in RUL prediction, which was exploited to capture spatial features without considering the time-series correlation to data. Conversely, BDLSTM or BD-GRU is capable of capturing bi-directional temporal dependencies features from sensors data. This paper proposed twohybrid methods based on these promising architectures. The first proposed hybrid method uses CAE as a feature extractor combined with two temporal modelling tools simultaneously in a parallel way (referred to as BDGRU-BDLSTM). The second promising hybrid architecture blends the CNN and BD-GRU simultaneously to capture local and temporal features directly from raw sensory data instead of just using CNN for feature extraction (referred to as CNN-BDGRU). The public NASA's C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) dataset is applied to verify the superiority and effectiveness of the proposed methods. The principal contributions of this research are summed up as follows: • An end-to-end deep network structures for RUL predictions of a turbofan engine are proposed and discussed in this paper to provide a comprehensive comparison of different types of deep learning-driven approaches.
• Multi-model helps to leverage the power of different models in which two DL models are integrated in a parallel manner that capture various information at different time intervals.
• Extracting succinct and useful degradation features using the unsupervised Convolutional Auto-Encoders method from multi-sensor data with complex correlations.
• A detailed case study demonstrating the superiority of proposed hybrid models is presented.
The paper's remainder is structured as follows: Section 2 reviews recent applications of deep learning models on the C-MAPSS dataset. Section 3 describes the proposed hybrid methods for RUL estimation of a turbofan engine. The experimental results and discussions that demonstrate the effectiveness and superiority of the proposed hybrid models are considered in Section 4. Finally, we close the study with conclusions and future work in section 5.

LITERATURE REVIEW
The C-MAPSS dataset has been presented and applied to evaluate various effective DL methods for aircraft engine RUL estimation in recent years. In this section, we survey works that applied to the C-MAPSS dataset and have leveraged deep learning methods to tackle the task of RUL estimation. The selected works are presented in the following excerpts, which either applied the CNN, LSTM, DNN using auto-encoders.

CNN
Within the deep learning architecture, the first implementation of CNN for RUL estimation of aircraft engines proposed in (Babu, Zhao, & Li, 2016), where the input data is segmented into sliding windows and afterwards normalized. The CNN structure's ability to learn a higher-level abstract representation along with the multi-channel time series through its convolutional and average-pooling layers is shown. A linear regression layer is attached to the top layer to perform RUL predictions. The results showed the superiority and effectiveness of the CNN model over other machine learning models such as the Multilayer Perceptron (MLP), the Support Vector Machine (SVM), and the Relevance Vector Machine (RVM).
In a similar study, Li et al. (Li, Ding, & Sun, 2018) proposed a novel deep CNN-based approach for RUL forecasts of aircraft turbofan engines. The authors employ a time window strategy for data processing for improving the feature extraction via deep CNN. The normalized sensors data are directly utilized as the model inputs. Besides, they use the dropout technique to prevent overfitting. This model achieves the most accurate estimation of RUL and the lowest Root Mean Square Error (RMSE) than 13 other data-driven methods. The authors also highlighted that optimum performance is achieved through 5 convolution layers and a time window length of 30.

RNN and its variants
The Recurrent Neural Network (RNN) retains internal memory to process sequential data. However, RNNs had the vanishing gradient problem arising in long sequence input, which cannot keep the previous information, except the latest one.  (Hsu & Jiang, 2018) proposed an LSTM to address the RUL prediction problem for turbine engines, which is able effectively to extract temporal dependencies from the historical data. Liao et al. (Liao, Zhang, & Liu, 2018) have used LSTM relying on the bootstrap procedure for uncertainty estimation of RUL. The bootstrap method is a good solution to obtain uncertainty prediction without any sensor data distribution. The proposed approach achieved higher accuracy compared to CNN and LSTM discussed in (Babu et al., 2016) and ( long-term dependencies among sensor time-series signals to predict RUL. The grid search was also applied to tune the hyperparameters, thereby obtaining the best network structure. This method showed enhanced performance compared to other methods in the literature. Another variant of LSTM was used in (Wang, Wen, Yang, & Liu, 2018) is Bi-directional LSTMs that can learn the bidirectional temporal dependencies from sensor data for Aircraft Engine RUL estimation. It can capture long-range information in both future (forward) and past (backward) contexts of the input sequence simultaneously. In another study, a new bi-directional LSTM model was presented in (Zhang, Wang, Yan, & Gao, 2018) for identifying the system degradation performance and subsequently predicting RUL. The proposed model consists of two BDLSTM layers and achieved promising results compared to LSTM, bi-directional RNN (BDRNN), MLP, CNN reported by (Babu et al., 2016). Another main variant of RNN that is recently utilized for RUL estimation and enhanced LSTM-model with few parameters, Gated Recurrent Unit (GRU) is presented by (Chen, Jing, Chang, & Liu, 2019). The authors proposed a new approach for RUL estimation of a nonlinear degradation process, using Kernel PCA (KPCA) as the first phase for dimensionality reduction and nonlinear feature extraction. The second phase uses GRU to prevent the problem of long-term dependency and allows each recurrent unit to adaptively extract dependencies of different time scales.

DNN using auto-encoders
In addition to the CNN and RNN architectures, AE is another main structure that is essentially a feature extractor for reducing data monitoring condition performed in an unsupervised manner. Many studies have shown the leverage of using AE alongside another machine learning method for estimating the RUL of a turbofan engine (Song, Shi, Chen, Huang, & Xia, 2018) (Ma, Su, Zhao, & Liu, 2018). Song et al. (Song et al., 2018) proposed a new hybrid model integrating the advantages of AE and bidirectional LSTM to enhance the RUL's prediction accuracy. The main idea is that the encoding part of AE (bottleneck) acts as input for the BDL-STM to produce the expected output. The results demonstrate that the combination of AE and BDLSTM outperformed the other methods, such as MLP, CNN, LSTM, BDLSTM and Autoencoder-LSTM. In this work, Ma et al. (Ma et al., 2018) also proposed a novel end-to-end deep architecture based on a stacked sparse Autoencoder (SAE) and logistic regression. This study utilized the grid search procedure to optimize the hyper-parameters of the SAE model.
Inspired by these previous studies, the idea of the hybrid approach and applying a parallel multi-model emerged to leverage the power of different methods, which have high potentials to boost prognostic accuracy instead of incorporating only on a single model such as CNN, DNN, or LSTM. Accordingly, capitalizing on the recent success of DL, this paper presents a framework driven by an end-to-end ML system that introduces two new hybrid RUL prediction approaches to capture various information through different time intervals. The previous studies have shown the advantage of using AE alongside another machine learning method for estimating the RUL of the turbofan engine. Traditional AE used a fully connected layer reported in the literature, whereas the CAE model has a promising ability for feature extraction and dimensionality reduction through convolutional layers. Aligning with Convolutional Auto-Encoders's power, the first promising proposed hybrid model adopts CAE in aero-engine prognostic problem to extract automatically useful features with a high-level of abstractions. These CAE features serve as inputs to train the two temporal modeling tools simultaneously in a parallel manner (referred to as BDLSTM path and BDGRU path) that can capture more robust features and eventually predict the RUL. Although CAE has been applied on different tasks, this is the first use of CAE in RUL's estimation problem for engine turbofan.
For a comprehensive comparison, the second hybrid architecture proposed differently from what has been reported in the literature, which consists of CNN and BDGRU models simultaneously in parallel paths to capture local and temporal features directly from raw sensory data, instead of just using CNN. The outputs from both paths (CNN and BDGRU) are concatenated to obtain the target RUL. The GRU has appeared as an enhanced LSTM model with few parameters for improving the training phase's speed and model performance.
Besides, we used a BDGRU for the bi-directional temporal feature extraction and preventing the long-term dependency problem. The superiority of the two proposed hybrid models is demonstrated using the public NASA's C-MAPSS dataset and by comparison with all its counterparts and the most robust results in the literature.

PROPOSED HYBRID DEEP LEARNING MODELS
This section introduces the relevant hybrid deep learning approaches proposed in this research for the RUL prediction of an aircraft engine. Figure 1 describes the proposed framework for RUL estimation that comprises two main stages. In the training stage, which is completely offline, the obtained historical data from sensors flow through the components of the training stage, ultimately the degradation model for the RUL estimation is constructed based on deep learning methods. In the prediction stage, conducted online, the obtained current data is stored in the dataset and processed to obtain normalized sequence data, where the trained model is applied to predict the RUL. Based on the RUL values, a maintenance action will be applied to the system at the exact scheduled moment.

Problem Formulation
Considering that there are N machines of the same type, such as the turbofan engine, each engine contains T i run to machine end of life (cycles) collected by multiple sensors. Mathematically, the whole data can be defined as : Thus, X i denotes the gathered sensor measurements matrix of an engine in which Y i corresponds to the equipment operation cycles, as shown in Eq.2 and Eq.3, respectively.
Where T i is the total operation cycles of the i-th engine and RUL can be calculated between the current time (y t ) after degradation (td) has been detected and the failure time (T ). The RUL can be described as follows (Javed, Gouriveau, & Zerhouni, 2017).
To address the non-linearity function, deep learning methods are proposed (') in this paper. Let X i denote its input, and the observed RUL is as its output.

RU L
To minimize the error between the predicted RUL value and the observed target RUL at time t.

Convolutional Auto-encoder with BDGRU-BDLSTM Hybrid Model
As illustrated in Figure 2, our proposed hybrid model comprises of two main stages: The first one is CAE, which is used to automatically extract performance degradation features while lowering the dimension of multiple sensors in an unsupervised manner. The second stage is the temporal modeling tool, which combines BDGRU and BDLSTM models simultaneously and in a parallel way to provide the RUL's estimation. The full details of the two main stages are described as follows.
A. Stage 1: CAE module The CAE architecture consists of two parts, an encoder and a decoder, two symmetrical and reversed structures. The encoder network part comprises six convolution layers with the same filter size (10⇥1) and one max-pooling layer. Precisely in this work, two-pairs of convolution layers are stacked, where the number of filters is set to 8 and 18, followed by one Max-pooling layer with filter size (1⇥2). The third and fourth CNN layers consist of 32 and 64 filters, respectively. After every two-pairs of convolution layers, a dropout layer is added to reduce overfitting and avoid repeatedly extracting the identical features. To obtain a unique feature map, the final convolution layer's filter defined by one. All convolution layers used Rectified Linear Unit (ReLU) as the activation function. Furthermore, the zero-padding operation is used to maintain the feature map unaltered. The operations of un-pooling and de-convolution are used in the decoder part of CAE to reconstruct the input instead of convolution and max-pooling operations used in the encoder part.

B. Stage 2: BDGRU-BDLSTM hybrid module
The useful features learned from CAE are used as input to the multi-model structure that can maintain good generalization performance. The proposed multi-model is a temporal modeling tool, which is based on the combination of BDGRU and BDLSTM models simultaneously. This combination aims to obtain more robust features and eventually predict the RUL. Both paths (BD-GRU and BDLSTM) share the same configuration where two layers are stacked with 50 nodes. The layers use the hyperbolic tangent (tanh) as an activation function. A dropout technique is applied with a rate of 0.25 per layer.
At the end to estimate the RUL, the outputs from both paths are concatenated and will be inserted into a fully connected layer; this layer has one neuron and uses the exponential activation function.

CNN-BDGRU Hybrid Model
As shown in Figure 3, the proposed hybrid model is based on the combination of CNN and BDGRU in a parallel manner for regression. The CNN path acts as a spatial feature extractor, while simultaneously, the BDGRU path is utilized for the bidirectional temporal features extraction. Although there is no correlation between the two pathways, their outputs are concatenated to obtain the RUL's overall prediction. Specifically, the BDGRU structure is designed to handle each sequence data in two directions, through the GRU cells forward for prediction and backward direction for smoothing the prediction and relieving the noise impact. Below, we detail the structure of three major components of our hybrid model: A. The CNN path In our proposed model, CNN is exploited to capture spatial features by stacking the convolutional kernels. CNN is composed of five convolution layers, which have the same filter size (10⇥1). The first and second CNN layers consist of 18 filters, while the third and fourth layers contain the same number of 32 filters. To obtain a unique feature map, the final convolution layer is used with a single filter to fuse all the previous feature maps. The ReLU is applied along with zero-padding for all convolution layers. In this way, a high-level representation is obtained for each raw collected features.

B. The BDGRU path
The BDGRU is selected to learn the long-rang dependencies of features. Through this path, both forward direction and backward direction are computed in two separate GRUs independently. Their outcomes are fused and distributed to the next layer. Two BDGRU layers are stacked within the same configuration as the first proposed method is used. Besides, the BDGRU and GRU share the same cell architecture that allows addressing the vanishing gradient problem. Furthermore, the hidden state of the BDGRU cell is expressed as follows: Where ! and symbolize forward and backward process, respectively. C. The fusion path The final prediction at each time step is achieved by concatenating the outputs from both paths (CNN path and BDGRU path). This fusion layer has one neuron and uses the exponential activation function.

EXPERIMENT STUDY
In this section, the performance of the deep learning-driven prognosis approaches is evaluated on a prognostic benchmarking problem. First, the C-MAPSS simulated turbofan engine dataset descriptions are presented in Section 4.1, followed by these main sections including data pre-processing, evaluation metric, prediction procedure. Finally, the details of the results and comparisons with several architectures to provide a comprehensive examination of the proposed models are discussed in Section 4.5.
Besides, all experiments are carried out on a personal computer with Intel Core i5-5200U (2.20 GHz) Central Processing Unit (CPU), 06 Go Random Access Memory (RAM), the Microsoft Windows operation system 64 bits. The programming language is python 3.5 with deep learning library Keras.

NASA C-MAPSS dataset description
NASA's C-MAPSS datasets are immensely used by scientists in the field of RUL prediction . The main gas turbine engine modules include the fan, lowpressure compressor (LPC), high-pressure compressor (HPC), high-pressure turbine (HPT), low-pressure turbine (LPT), and nozzle, as shown in Figure 1. The C-MAPSS datasets repre-sent the gas turbine engines deterioration. The four fleets are into four sub-datasets in C-MAPSS with varying number of operating and fault conditions. Each sub-dataset is separated into training and testing sets, as seen in Table 1.

Data pre-processing
The subsets FD001 and FD003 exhibit constant sensor measurements and three operational settings throughout the lifetime of the engine; which could not be useful for RUL estimation. However, all three operational sensor settings and all sensor measurements can provide useful information about the deterioration of a turbofan engine in FD002 and FD004. Consequently, unlike works (Li et al., 2018), (Ma et al., 2018) (Wang et al., 2018), which excluded the three operational settings and selected 14 sensors out of the 21 sensors. The pro-posed methods, all three sensor settings and all sensor measurements are picked as input features for all sub-datasets. The goal is to avoid designing features manually by proposing flexible models of an End-to-End ML system using Deep Learning.
It is essential to prepare the data before training the models. Therefore, the data normalization, the masking, and the padding phase are used in this study.

Data Normalization
According to the differentiation issue of features range scales, several normalization methods have been proposed to ensure the same range scale of all features (Patro & Sahu, 2015). The Min-Max normalization given in Eq.8, is used to map the raw features within the range of [0,1].
Where x i is the time sequence of the i th sensor measurements, Min and Max are the minimum and maximum values in x i given its range, and the x 0 i is the normalization input data. Figure 4 shows an illustration of FD002 testing data before and after normalization.

Masking and padding
The engines have varying length cycles, and hence, the shorter sequences than the maximum length of cycles in the whole dataset are padded with zeros to obtain the same length. Consequently, mask zero is used in the training phase to record if the time step already exists or just padding.

Evaluation Metric
In this work, the RMSE and scoring function are used for evaluating the model's performance. The formulation of RMSE is defined in Eq.9 to measure the effectiveness of the RUL prediction methods. The RMSE function penalizes both early (underestimate) and late (overestimate) predictions.

RM SE
where N represents the total number of data samples. i = RU L 0 i RU L i , i is the error between the predicted RU L 0 value and the true RUL for the i th test samples. The scoring function adopted by the international conference on PHM data challenge is shown in Eq.10 This scoring function takes into account the impact of the maintenance costs, in which a higher penalty is imposed when the RUL is overestimated. Under this estimation, the maintenance will be scheduled after the appropriate time.

Prediction procedure
For both proposed methods, first, The C-MAPSS sub-datasets are pre-processed where the data are normalized and padded. Next, the training sub-datasets are split into training and validation sets; 80% of the engines in sub-dataset are randomly selected for the training, while the remaining 20% are designated as validation set.

CNN-BDGRU Training procedure
The flowchart of the proposed CNN-BDGRU is described in Figure 5. The hybrid model receives as inputs the normalized training set and the RUL values adopted as the target outcomes.  In the training process, a gradient-based optimization algorithm adjusts the weights in the network based on the min-imization of the objective function. Specifically, the Root Mean Square propagation (RMSprop) optimizer method is used to optimize the training model, with the learning rate set at 0.001 to achieve stable convergence (Ruder, 2016). Besides, the Mean Square Error (MSE) serves as the loss function, which is expressed as, The maximum number of training epochs is 3000. For each training epoch, the samples are segmented into mini-batches.
To avoid overfitting, the early stopping technique of regularization introduced. Its principal idea is that in the absence of the improvement in performance, the training process is discontinued.
Finally, the testing data samples are fed into the hybrid training model to estimate the RUL and obtaining the RMSE accuracy in the test set.

CAE with BDGRU-BDLSTM Training procedure
The process of training the proposed hybrid CAE with the BDGRU-BDLSTM model consists of two modules: a CAE model as the first phase and the BDGRU-BDLSTM method in the second phase. Both modules used the RMSprop optimizer and the MSE as a cost function. The whole process of training the proposed deep hybrid CAE with the BDGRU-BDLSTM model is summarized in Figure 6.
The whole CAE network is trained in an unsupervised manner that takes the normalized training set as input to reconstruct it; the encoder part represents the more robust deterioration features. The CAE network's weights updated iteratively during training through a gradient-based optimization algorithm based on the minimization of reconstruction errors (MSE), expressed in Eq.11, where i = X 0 i X i , i is the error between the reconstruction X 0 and the original input X for the i th test samples. Besides, the samples are grouped into mini-batches for each training epoch, with a limit of 1000 training epochs.
After the CAE network's training process, the second step is to train the multi-model (BDGRU-BDLSTM) for the RUL estimation, where the encoder parameters are frozen during the multi-model training step. The extracted CAE features are fed to the multi-model (BDGRU-BDLSTM) as inputs, and the RUL values of the training set are used as the target outputs. Backpropagation through time (BPTT) is the training algorithm applied to update the weights for minimizing the error using RMSprop, with the learning rate set at 0.001. Furthermore, MSE is utilized as the loss function, which is expressed in Eq.11, Where i = RU L 0 i RU L i . The total Finally, the convolutional encoder is used jointly with the BDGRU-BDLSTM model for the RUL estimation and obtained the RMSE accuracy on the test set in the operating phase.

Prediction performance
In this section, the obtained prediction results from applying each of the proposed hybrid models to the turbofan engines datasets are presented. The purpose of this paper is to make a thorough comparison of the different DL approaches for RUL predictions. The actual and predicted RUL values during the whole life-time of the two randomly selected engines out of several testing engine units across the four datasets (i.e., FD001-FD004) for both methods are depicted in Figure 7 -10.
It is worth noting that the RUL prediction results for all engine units over the four sub-datasets are precisely predicted, especially for RUL's estimation at the last cycles of the engine unit is more reliable and closer to the true RUL than at the early cycles. Besides, it can be observed that when the RUL engines are large, the accuracy prediction is noticeably higher conversely to the smaller RUL engines (as shown in Figure 7 (b) engine 47). The reason is that when the engine degradation reaches failure, the fault features increase, and that can be extracted through the proposed methods and obtain better perdition results. The RUL's engine is linearly decreasing with time until the degradation engine samples are available. Moreover, accurate estimation of the late period in the engine life cycle plays a crucial role in enhancing operational reliability and system availability, maintaining workplace safety, and reducing maintenance costs.
According to Figure 11, we can easily observe from the distribution of box plots for experiments that the performance of the proposed hybrid model (CAE with BDGRU-BDLSTM) generally performs well on all four sub-datasets, in particular, FD002 and FD004 that are very complicated and the existing models typically fail to provide accurate prediction results for these sub-datasets. Hence, the CAE with the BDGRU-BDLSTM model achieves a good result on FD001 and FD003, the simplest sub-datasets. Table 2 shows the results of both proposed hybrid models in terms of the values of RMSE and score, where IMP is the improvement of the proposed CAE     with BDGRU-BDLSTM model over the CNN-BDGRU model. It is defined as IMP= (1-(CAE with BDGRU-BDLSTM / CNN-BDGRU))⇥100. From the IMP values, we can observe that the CAE with BDGRU-BDLSTM hybrid model consistently obtains RMSE values lower than the CNN-BDGRU model, which has improved the performance in term of RMSE to 14.208%, 6.83%, 3.967%, and 5.537% for group FD001, FD002, FD003 and FD004, respectively. In term of Score values, the proposed hybrid model (CAE with BDGRU-BDLSTM) achieved the lowest Score than the CNN-BDGRU model on FD001, FD003 and FD004, while on FD002, it was a slightly higher Score (worst results). The IMP in term of Score values is around 13.06%, 9.56%, 4.19% for group FD001, FD003 and FD004, respectively. The error of validation has the same trend of change with training error; both decreases with consecutive epochs tend to be constant in the late period, which proves there is no overfitting problem. It is noted that CAE with BDGRU-BDLSTM hybrid model loss is more stable than CNN-BDGRU hybrid model loss, as depicted in Figure 12.  Figures 13 and 14, which indicate that the proposed methods were able to discover the hidden features.

Computational Cost Analysis
The time complexity for both proposed methods (CAE with BDGRU-BDLSTM, CNN-BDGRU) is discussed in this section. The complexity of the pooling and the Fully Connected layers (FC) takes 5-10 % of the overall computational time (He & Sun, 2015). Therefore, their complexities are not involved in the total time complexity of the proposed models. The CNN-BDGRU complexity per time step can be calculated as the sum of the complexities of the convolutional lay- For all the training processes with a function of the input length (x) and epochs (e), the total time complexity is equal to: To determine the time complexity of the proposed method that integrates the CAE with BDGRU-BDLSTM, we need to compute the time complexity of convolutional layers, the BDLSTM layers, as well as BDGRU layers. Therefore, its overall time complexity is estimated as Eq.14, as a function of the input length for all the training process.
We can conclude that the computation time of CNN-BDGRU model is less than the second proposed model (CAE with

Compared with other approaches
Various prognostic popular methods are performed for comparison purposes, including DNN, RNN, LSTM, GRU, and CNN (as shown in Figure 15). We tried different structures for these methods, and we picked the best ones as follows: 1) DNN : contains two hidden layers, which have 50 neurons in each hidden layer and, ultimately, one neuron is attached for RUL estimation.
2) RNN : consists of two recurrent layers with hidden units of 50 nodes. Dropout is employed with a rate of 0.25 in each RNN layer.
3) LSTM : is implemented with a similar configuration to the RNN method to extract the long-term dependencies. The average performance of DNN, RNN, LSTM, GRU and CNN on each C-MAPSS sub-datasets have been reported in Figure 15 as a RMSE box plot. Among all methods, DNN and RNN performed worse (higher RMSE) than the remaining methods on all four sub-datasets. CNN achieved slightly lower RMSE values than other comparing methods on single operating condition datasets, i.e. FD001 and FD003. On the other hand, GRU and LSTM achieved a lower RMSE than other comparing methods on multiple operation condition datasets, i.e. FD002 and FD004. These results demonstrate powerful of our proposed models, which achieve observable lower average RMSE values (better) in all subsets than other architectures. Besides, the obtained performance from FD002 and FD004 is slightly lower RMSE prediction accuracy, and the reason is that these sub-datasets are more complicated than the FD001 and FD003.
To analyze the results in more detail and to demonstrate the powerful of the proposed CAE as an advanced feature extractor method, the three different features are introduced for comparison purposes. The first kind of features is only raw data with normalization, the second features constructed from the PCA method, and the last features created from the proposed CAE method.
For PCA, the principal components explaining 99% of the data variance were chosen as most appropriate in this study; the original features are reduced to 15 principal components.
Considering that X p ⇢ X m , where m is the number of original features, and p is the number of principal components, with p<m. The curve of the cumulative sum of variance with the principal components for FD003 using the PCA method is shown in Figure 16.  Figure 17 shows the distribution box plots of the RMSE testing of multi-model BDGRU-BDLSTM with different features. The proposed BDGRU-BDLSTM multi-model is based on the combination of BDGRU and BDLSTM models simultaneously and in a parallel manner to predict the RUL. "None" indicates that the normalized raw data were used as input.
Overall, when trained BDGRU-BDLSTM model on the normalized raw data showed the worst performance over the C-MAPSS datasets. Interestingly, the performance of BDGRU-BDLSTM improved with feature extraction methods. Among the three feature extraction methods, the CAE method can learn robust features than the remaining methods, which gives the best and minimum RMSE values with BDGRU-BDLSTM layers in all sub-datasets.

Effect analysis
To demonstrate the effectiveness of multiple-model DL techniques, we present the effect of combining two DL methods CNN and BDGRU in sequential versus parallel, shown in Figure 18. The comparison result is quantified using RMSE of RUL prediction, and we can notice that the combination of CNN-BDGRU in parallel pathways achieved promising results compared to CNN-BDGRU in a sequential way. To verify the validity of the CAE with BDGRU-BDLSTM structure, three experiments are conducted for comparison purposes. In the experiments, we merge CAE once with BD-GRU and once with multi-models BDGRU-BDLSTM. According to Figure 19, we can observe that the combination of CAE with the multi-model BDGRU-BDLSTM has achieved good results on all C-MAPSS sub-datasets. Figure 19. Box plot of the RMSE for CNN-BDGRU in sequential versus parallel.

Comparison with the latest works
Many scholar research has been reported on all sub-datasets C-MAPSS and used in more than 60 publications. Recent studies on the C-MAPSS dataset have been taken into account for comparison to show powerful of the proposed models. Table 3 recapitulates the results of recent studies of the advanced DL methods on the RUL estimation problem extended to all C-MAPSS sub-datasets. Table 3 shows that the proposed hybrid methods have achieved promising results compared to the recent studies on all C-MAPSS sub-datasets quantified using the RMSE and score metrics. Especially for the complicated datasets FD002 and FD004, the Score and RMSE prediction accuracy obtained from both methods higher than the existing methods. Except on the sub-dataset FD003, the DCNN method (Li et al., 2018), and Deep Bidirectional LSTM (Wang et al., 2018) are slightly higher Score and RMSE (worse) than both our proposed hybrid methods. However, our proposed methods used all three sensor settings and all sensor measurements as input without manually designing features, unlike works (Ma et al., 2018), (Li et al., 2018), and (Wang et al., 2018) that picked 14 sensors data and excluded the three operational settings. Amongst these recent studies, it is worth mentioning that our proposed method that used CAE is the first attempt to adopt CAE in aero-engine prognostic problem in order to extract useful features that serve as inputs for the two separate and parallel pathways (referred to as BDLSTM path and BDGRU path) to obtain more robust features. Furthermore, the reason why our methods are proposed is exhibited superior performance among the most existing methods and for capitalizing on the recent success of multiple-model deep learning techniques and aligning with the power and the success of Convolutional Auto-Encoders to extract automatically useful features with high-level abstractions from com-

CONCLUSION
To leverage the power of various methods, the idea of hybrid methods emerged, ultimately enhancing and obtaining a more accurate prediction. Firstly, we proposed a CAE with a temporal modeling tool that combines BDGRU and BDLSTM models in a parallel manner for degradation features extraction and RUL prediction. We found that the CAE is more suited for data extraction and reduction rather than conventional approaches. Secondly, a hybrid architecture consisting concurrently of CNN and BDGRU models is developed and applied to capture local and temporal features for the RUL estimation. The GRU has appeared as an improved LSTM model with few parameters to increase the training stage's efficiency and speed. Besides, for the extraction of bidirectional temporal features and to prevent the long-term dependency problem, we used a BDGRU. The proposed hybrid models' evaluated results indicate significant improvements over their counterparts and the most robust literature results in terms of RMSE on the C-MAPSS public NASA dataset. We pointed out that the CAE with DBGRU-BDLSTM outcomes reliably performs higher for FD001, FD002, FD003, and FD004 than our CNN-BDGRU outcomes, in terms of the RMSE value around 14.208%, 6.83%, 3.967%, and 5.537%. As future works, we intend to incorporate an automated method that detects the fault time step of each engine to tackle the problem of RUL that is ill-defined in healthy operation. Due to the black-box nature of the proposed DL, we aim to address the lack of interpretability and transparency of the proposed models using attention mechanisms. Soheyb Ayad is an associate professor at the Computer Science Department in the University of Biskra (Algeria). He is also member in LINFI laboratory. He is interested in Networking, Ad hoc Networks, Wire-less Sensor Networks, Cloud Comput-ing, Cloud robotics, Web services, Semantic Web, Internet of Things and Predictive Maintenance fields. He is author of some international publications.

ABBREVIATIONS
Noureddine Zerhouni holds a doctorate in Automatic-Productivity from the National Polytechnic Institute of Grenoble(INPG), France, in 1991. He was a lecturer at the National School of Engineers(ENI, UTBM) in Belfort. Since 1999, he is Professor of Universities at the National School of Mechanics and Microtechnics(ENSMM) in Besançon. He is doing his research in the Automatic department of the FEMTO-ST Institute in Besançon. His areas of research are related to the monitoring and maintenance of production systems. He is also an expert in adult education in the areas of process improvement and project management.

A.1 Computational Complexity
The complexity of the CNN layers is calculated as: where i is the index of a convolutional layer, d is the number of convolutional layers, n i is the number of filters in the l-th layer, n i 1 is the number of input channels of the l-th layer, s i is the spatial size of the filter, and m i is the spatial size of the output feature map.
Considering that LSTM is local in both space and time, which means that for each time step LSTM's storage complexity does not depend on the input sequence length (Hochreiter & Schmidhuber, 1997). We conclude that LSTM's complexity per time step and weight is estimated just as O(1). Therefore, the overall complexity of all LSTM layers per time step is equal to: where W is the number of weights, i is the index of a LSTM layer, d is the number of LSTM layers. The time complexity of GRU and FC is similar to an LSTM. While the BDLSTM or BDGRU's runtime complexity is increased by twice.
Where : For LSTM or GRU: W= KH + KCS + HI + CSI For FC : W= IH + HK Where I is the number of inputs units, K is the number of outputs, H is the number of hidden units, C is the number of memory cell blocks, S is the size of the memory cell blocks.