Fusion and Comparison of Prognostic Models for Remaining Useful Life of Aircraft Systems

,


INTRODUCTION
This paper provides a detailed preliminary literature review and comparison of different prognostic approaches, which summarizes the forecasting methods' taxonomy and methodology details.It also provides a brief introduction to the maintenance concept and CBM.
PHM technology is a cutting-edge innovation being worked on lately and an effective means to advance the change of upkeep support mode and work on systematic security, unwavering quality, and economic reasonableness.On the one hand, implementing prediction and health management can significantly reduce the risk of flight accidents and further improve the system's safety and mission success rate.Through enhanced fault diagnosis, deficiencies can be distinguished and segregated quickly and precisely, with a high fault diagnosis rate and a low false alarm rate.Early warnings can be given at the early stage of fault occurrence; by prediction, the future improvement pattern of faults and damages can be anticipated, assessed, and repaired prior to hazardous phases.On the other hand, the use of PHM technology can realize the reasonable, rational use, and costeffective maintenance of complex equipment.
As a significant core technology of PHM, prognosis is also the most challenging and far-reaching innovation.It intends to anticipate the event and the remaining useful life of the component or system failure and support operation planning and maintenance decisions.The prediction of RUL is mainly based on condition monitoring combined with historical record databases, utilizing artificial intelligence (AI), casebased reasoning, and different advancements to learn and express system degradation patterns and predict the path of development and propagation of faults or damages.System Shuai Fu et al.This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
performance degradation trends are subsequently evaluated by (Ren et al., 2017).
The RUL prediction principle is shown in Figure 1.It is founded on the current health status of the equipment, working environment and load, status monitoring sensor information, and so forth.And it is combined with physical failure models, historical data on performance degradation, and information on how to find and fix problems to figure out how long components or systems still work.This helps make decisions about maintenance and operation planning.
Figure 1.Prediction principle of RUL Prognostic maintenance, also known as CBM, PM, or simply prognostics, is the ability to see the condition of the equipment and to plan and perform maintenance accordingly before a critical failure (Kai Goebel et al., 2017).Maintenance philosophies are the mix of strategies that ensure an item works as expected when needed, which are classified into two categories: reactive maintenance (unplanned) and proactive maintenance (pre-planned).
During repairs, maintenance tasks are performed after a fault condition occurs.Emergency maintenance is performed to avoid the severe consequences of failures.In preventive care, maintenance tasks are performed regularly.Periods are fixed intervals determined by the individual devices themselves using historical data without input.The equipment is routinely serviced, regardless of whether maintenance is required or not.Both passive and preventive care are costly.Components are often exposed to degradation processes that can be viewed using today's advanced sensor technology and addressed by inference and prediction models.Adaptive planning of maintenance operations is an essential feature of PM instead of preventive maintenance.PM can be generally divided into two categories: CBM and reliability-centered maintenance (RCM).RCM attempts to accomplish two tasks: first, to analyze and classify the types of defects (e.g., Failure Modes and Effects Analysis), and second, to evaluate the impact of maintenance plans on system reliability (Kothamasu et al., 2009).
CBM is maintenance when the need arises and is considered part of the broader and newer field of PM, where new AI technologies and connectivity capabilities are put into practice.The acronym CBM is more commonly used to describe "condition-based monitoring" than maintenance itself (Jardine et al., 2006).Albeit CBM reduces replacement costs, maintenance time, and system downtime, it has limitations such as high installation costs, unpredictable maintenance periods, and massive organizational changes to existing monitoring setups.CBM identifies the subject's background and incorporates relevant disciplines such as prognostic and health management and integrated vehicle health management (IVHM).
Diagnostics and prognostics are the two main disciplines of CBM.The development of diagnostics and prognostics is shown in Figure 2. Diagnostics consist of identifying an asset's degradation and current state and revealing its cause and location, a relatively mature field compared to the forecast.The purpose is to stop and schedule maintenance tasks for the system after detecting an anomaly or to instruct the system to perform other actions.Typically, early failures follow a slow decay path.Seeing the progression is more valuable than detecting errors past a critical point.Furthermore, it is a prerequisite for prognostic (Vogl et al., 2019).Prognostics predict the future fitness of a system or component by converting current health to a fault state and predicting the remaining useful life.It is considered one of the most complex and critical enabling technologies in other phases of CBM (Behbahani et al., 2014;Kim et al., 2021).

PROGNOSTICS LITERATURE REVIEW
Prognostics can be analyzed in four categories (Vachtsevanos et al., 2007), whereas a hybrid model implies fusion or a combination of other forms that is not shown:  Hybrid Models The details and literature of each prognostic approach are given in the following subsections, and the classification of the prediction of RUL is shown in Figure 3.

Data-Driven Models
The data-driven method mainly predicts the remaining service life based on the equipment condition monitoring data and the measured parameters from the normal to failure degradation processes of similar equipment components or systems.This method relies on sensor data and converts it into relevant information and performance degradation models; it does not need to pay attention to complex physical failure mechanisms.There are many methods and models for data-driven RUL prediction, and some literature further subdivides them.For example, Ochella and Shafiee et al., (2020) divided data-driven methods into traditional numerical and machine learning methods.Conventional numerical methods include linear regression and Kalman filtering, while machine learning methods mainly use intelligent algorithms such as neural networks, decision trees, and support vector machines.Tsui et al. (2015) summarized data-driven diagnosis and prediction techniques and divided data-driven prediction techniques into independent incremental process-based models, Markov chain-based models, filter-based models, proportional hazards models, and threshold regression models.Zhang et al. (2015) divided data-driven methods into random coefficient models, AI, and trend-based methods.Among these classifications, there are intersections between some subclasses, such as degenerate trajectory extrapolation methods and synthetic intelligence methods.Many studies use machine learning algorithms to predict the future state through multi-step forward and extrapolated predictions.
NASA Ames Research Center has released the engine operation-failure simulation dataset to facilitate researchers in exploring and developing data-driven RUL prediction technology.The data set is generated by conducting a large number of engine performance and degradation simulation experiments on NASA's civil aviation propulsion system and simulation platform CMAPSS (Commercial Modular Aero-Propulsion System Simulation) according to the engine damage expansion modeling proposed in (Saxena et al., 2008), which provides excellent convenience for researchers to test and validate data-driven RUL prediction methods, so it is widely used in many studies (Chao et al., 2021;García Nieto et al., 2015;Heimes, 2008;Javed et al., 2015).

Physics-Based Models
The prediction method based on the physical-based model (PBM) requires an in-depth analysis of the performance degradation process and the physical failure mechanism according to the knowledge of the mechanical dynamics, the structural characteristics, and the material characteristics of the equipment.Physical-based models can be functions or differential equations derived from traditional physical failure principles, such as fatigue cracks, wear, and corrosion.Prediction methods based on physical models are generally used for critical component or system-level failure and remaining useful life predictions.For specific failures, models are designed as functions of component damages, such as cracks, chips, and loads or stresses.Establishing physical-based models usually requires an indepth analysis of the failure and its mechanism and careful consideration of the physical, chemical, aerodynamic, and even thermal processes experienced by the component.The complexity and difficulty of modeling and analysis have brought some limitations to the use and promotion of this method, as can be found in (Celaya et al., 2011;Daigle & Goebel, 2013;Luo et al., 2008).

Knowledge-Based Models
Knowledge-based prediction methods can be used when it is difficult to establish a system or component failure model.This form of predictive model has low complexity and only requires historical failure data or maintenance recommendations for components under the same operating conditions given during the design of the aircraft system.Typically, the acquired failure and failure data are fitted to statistical distributions such as Poisson, Exponential, Weibull, and Log-Normal.The Weibull distribution is the most widely used because it can be applied to various situations, including early failure in the 'bathtub curve' (Mudholkar et al., 2009).
Knowledge-based prediction methods are based on the distribution of similar component/equipment/system event records and use historical failure data to estimate the overall characteristics of the object (for example, mean time between failures MTBF, mean time to failure MTTF, reliability operation probability, etc.).Statistical methods and reliability analysis have been widely used, so it is also called the prediction method based on statistics and reliability theory.However, this method only provides predictive evaluations based on the overall reliability index, which lacks information on individual failures or health states.Furthermore, maintenance personnel are more concerned with the actual health and remaining useful life of specific components or subsystems of current operating equipment, rather than the overall reliability-related metrics of similar features studied by (Muller et al., 2008).

Hybrid Models
This section focuses on reviewing hybrid forecasting methods for RUL forecasting.Much of the PHM literature focuses on hybrid approaches that combine data-driven and physics-based models.In this hybrid approach, physics-based models (e.g., particle filters) incorporate fundamental principles (e.g., the Paris' law of crack growth), and model parameters are identified and updated using measurement data.The RUL is obtained by projecting an estimate of the system's internal state into the future until an error threshold is reached.Data-driven models in the hybrid approach were used for anomaly detection to initiate the RUL prediction process, estimate internal system states based on measurement data, and replace system degradation models in physical model-based predictions.
Hybrid predictive models that combine data-driven and physics-based models also include methods to connect results from separate models to improve RUL estimates.
Hybrid forecasting methods that utilize different approaches are explicitly reviewed based on various combinations of the three forecasting models mentioned above.

The Dataset
The

Preprocessing Data
Labeling different flight phases in the preprocessing data would be the first step to take with the data set.Crossing all the operational cycles and labeling different flight phases would be beneficial to monitor the health state of different sub-modules, and it is much easier to deal with steady-state population data compared to transient data, where one would understand that the climb mode operation and the descent mode operation are more likely to be transient operations where there is a continuous change in altitude.Along with the change in altitude, the ambient air around the engine itself changes, meaning that the pressure and temperature values will be different at different altitudes.Along with that, rotor speeds will also be changing during either climbing or descending in a flight mission, whereas cruise mode operation is more likely to be steady state operation.The health degradation of altitude changes and the classification of different flight phases can be seen and smoothed more accurately, as seen in Figure 6.
The results depicted in Figure 7 indicate that a total of 16 randomly selected flight samples were employed for the purpose of conducting a test.This test successfully confirmed and validated the existence of flight phases that exhibited consistency with those seen in the preceding figures.

Feature Engineering
The operational principles of a gas turbine are founded upon the utilization of the Brayton cycle, whereby the compressor is driven by the turbine.An expert would recognize that the decline in temperature across the high-pressure turbine serves as a dependable indicator of the overall health and condition of the HPT module.The turbine's productivity for a given flow rate is contingent upon the reduction in temperature.This enables the examination of the correlation between the drop in temperature and the RUL as documented in the dataset (or the flight number).The relationship between the temperature decrease in the HPT and the RUL (or flight cycle) of the HPT subsystem can be observed in Figure 8.This relationship serves as an indicator of the health trajectory of the HPT subsystem.In summary, when the temperature progressively falls from 0 degrees and transitions from the color red to indigo, the RUL experiences a corresponding decline.It is considered that all the data pertaining to instances of component failure originated from a state of optimal functioning.The initial health state is assigned a numerical value of 1, whereas the health condition at the point of failure is assigned a numerical value of 0. The health status is hypothesized to exhibit a linear decline from a value of 1 to 0 as time progresses.The process of linear deterioration is employed to facilitate the fusion of sensor readings.The behavior of the engine exhibits variability across different phases of a flight.Group summary statistics are utilized to extract characteristics from grouped data by organizing a range of summary statistics based on different label groups.
Extensive investigations have been conducted on the fluctuations of time-series statistical measures, including the mean, standard error of the mean, standard deviation, skewness, variance, minimum, and range.An excess of statistical data pertaining to different components has been collected with the aim of identifying the most reliable predictors of concerns in Figure 9.Where the health indicator of the machine  is   , and the predicted health indicator of the machine  determined using the second-order polynomial model that has been discovered in the machine  is  ̂, .
The computation of the similarity score is determined by the following Eq.(2): In the context of the validation data set, the model will identify a single member of the ensemble.Subsequently, it will locate the 50 closest ensemble members inside the training data set.By using these 50 ensemble members, the model will establish a probability distribution and use the median value from this distribution as an estimation for RUL.Any component's performance deteriorates over time.RUL is the number of operational cycles that a component has left before reaching a critical failure threshold.
The paper presents and evaluates several approaches using a simulated data set that includes the degradation histories of a small fleet of 10 turbofan engines.These engines have diverse and unknown starting health conditions.Various data-driven methodologies such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees, and Neural Networks (NN), along with similar architectures, are used for the purpose of conducting RUL estimates.
A significant discrepancy exists between the predicted RUL and the actual RUL when the machine is in a state of intermediate health.In this instance, it can be observed that the top curves exhibit proximity in their first stages, then diverge as they approach the point of failure.Consequently, this leads to the emergence of two distinct modes within the distribution of RUL.In this instance, the accuracy of the RUL assessment is further improved when the machine is in close proximity to failure. Figure 11 displays the estimated RUL for a single unit sample, represented by its probability density function, when a 90% confidence interval is used.The training process was conducted using the AMD Ryzen 5 PRO 4650U processor, which is equipped with Radeon Graphics and runs at a clock speed of 2.10 GHz.The system was also equipped with 16GB of RAM.The average rate of prediction is 8700 observations per second, with a maximum limit of 100 splits.The feature ranking algorithms utilize ANOVA to identify the top 30 characteristics from a pool of 259.
The results indicate that weighted KNN, ensemble bagged tree, and cubic SVM yield the highest levels of accuracy, with validation accuracies of 98.1%, 97.9%, and 97.8%, respectively.The comprehensive prediction outcomes are visually depicted in Figure 12.The results obtained in this paper diverge considerably from the previous findings when the author expands the feature selection from 30 to 100 out of a total of 259.The average values for the validated root mean squared error (RMSE), Rsquared, and validated mean squared error (MSE) are 0.069, 0.99, and 0.0048, respectively.The optimized outcome is presented in Table 3.
It is important to realize that physics might have a big effect on the subcomponents of complex system models with limited sensor information and a lot of data if they don't include physics.In other words, there is no significant structural ambiguity in the whole system because the physics behind how the parts fit together and how the system is put together as a whole are well understood.The main reason for the discrepancy between the response of the system model and the real system is the misrepresentation of the physical processes occurring at the subcomponent level inside the system.It is possible to estimate the system's overall performance, regardless of the level of precision the subcomponent models display.A potential area for future study might involve the calibration of models, taking into account structural model uncertainty, while considering the advantages of utilizing bigger datasets and enhancing comprehension within certain knowledge domains.

CONCLUSION
Sparse literature has mentioned using the hybrid approach, which incorporates all the knowledge-based, data-driven, and physics-based models.It is potentially beneficial to fuse all types of information (e.g., domain knowledge, maintenance feedback, condition data, and physics) and leverage the strengths of all kinds of models to manage the prediction uncertainty better.Despite the opportunities brought by the hybrid approach, challenges remain in: how to design the fusing method to integrate heterogeneous information; how to aggregate results from different competing models (e.g., using regression or a Bayesian framework); and how to utilize data-driven models to reduce the prediction uncertainty (e.g., using data-driven models to estimate the measurement model to replace a system model).
Prognostics is part of one of the most challenging disciplines in condition-based maintenance.This paper summarized the strategies engaged with prognostics for carrying out condition-based support and predicting the remaining useful life.The next step is to continue researching fusing novel hybrid predictive methodologies and applying them to aircraft systems with possible larger datasets and other potential algorithms.

Figure 2 .
Figure 2. Development of Diagnostics and Prognostics

Figure 3 .
Figure 3. Classification of RUL prediction Orsagh et al. (2003) studied the diagnosis and prediction of turbine engine bearings, consolidated the Yu-Harris model and the crack growth model, and utilized the Yu-Harris model before initiating cracks.The crack growth model and the methods for randomizing the models were discussed later.Prakash & Samantaray et al. (2016) conceived a method for predicting the remaining service life of spur gears based on a physical failure model.Jacome et al. (2019) addressed the low durability issue of proton exchange membrane fuel cells (PEMFC) under automotive load cycling (ALC) by applying physics-based prognostics and presented a review of modelbased methods for PEMFC prognostics under ALC.
Figure 4. Hybrid prognostics models(Liao & Köttig, 2014) Each dataset contains the simulated results of aircraft engines as second-by-second flight data from up to 100 flights or engine failure, whichever comes first.Each unit experiences flights of a certain duration, indicated by flight class, and enters an abnormal degradation state at random according to the file number and the specified failure type.The data set provides the following:  Generic airflow cycle measurements along the engine length, such as total temperature, total pressure, and flow. 2 rotor speeds, compressor stall margins and some operational parameters, e.g., Mach number, altitude, throttle resolver angle, current cycle count, and flight class.A binary health state indicator and a RUL label.the engine thermodynamic model, the data set includes an atmospheric model capable of operating at  Altitudes from sea level to 40,000 ft  Mach numbers from 0 to 0.90, and  Sea-level temperatures range from -60 to 103 F. A commercial aircraft goes through a well-defined mission, which consists of the flight phases of land idle, takeoff, climb, cruise descent, and landing.In this dataset, only the climb, cruise, flight idle, and descent information is included, as shown in Figure5.

Figure 5 .
Figure 5. Flight phases of a single flight cycle

Figure 8 .
Figure 8. Relationship between RUL and the HPT temperature drop as health indicator

Figure 9 .Figure 10 .
Figure 9.The health status of all members in the ensemble undergoes a transition from a value of 1 to 0, with varied rates of degradation To create a comprehensive health indicator, the sensor data can be multiplied by their respective weights.The data collected from various sensors is integrated to form a unified health indicator.The health indicator is subject to smoothing by the utilization of a moving average filter.Figure 10 illustrates the fluctuations of the health indicator while considering the training data and validation data.

Figure 11 .
Figure 11.RUL result of one unit sample Comparative result analyses of the efficacy of different models based on a range of measures are presented.Table2presents the performance evaluation of several data-driven methods for predicting failure time.

Figure 12 .
Figure 12.Confusion matrix of failure more classificationTo figure out how well the RUL model works, predictions are made using 50%, 70%, and 90% of a validation dataset.The empirical evidence indicates that as the size of the sample rises, the distribution of errors becomes more concentrated around the zero point, leading to a reduction in the occurrence of outliers, as seen in Figure13.

Figure 13 .
Figure 13.RUL prediction error examined by employing varying percentages of each member within the validation ensemble

Table 1 .
Overview of Data Sets

Table 2 .
Classification results of different algorithms

Table 3 .
Results evaluated by RMSE, MSE, R-Squared, and MAE