Methodology on Establishing Multivariate Failure Thresholds for Improved Remaining Useful Life Prediction in PHM

Prognostic approaches commonly try to predict the Remaining Useful Life (RUL) based on machine health status by either directly establish a mapping or setting up a failure threshold to determine the End-of-Life (EoL). On the one hand, determining a failure threshold is crucial but subjective for most reported cases. Machine operation risks, which are intuitional but difficult to quantify, can be used to bridge the gap between prediction and determining a multivariate failure threshold. On the other hand, historical machine life information is rarely considered together with the condition indicators for such prognostic tasks. Building multivariate failure thresholds based on quantifiable operation risks for prognostic tasks is the general topic that is rarely studied due to the following challenges: 1) How to quantify operation risks under multiple variables? 2) How to determine the multivariate failure thresholds? 3) How to make reliable extrapolations of future conditions? To address these questions, as the extension of our previous work (Jia, Li, Wang, Li, & Lee, 2020), this paper proposes 1) a Gaussian Copula model-based risk quantification method to determine multivariate failure thresholds, and 2) a Similarity enhanced Blackwellized Particle Filter (RBPF) to predict future system conditions. Two examples of establishing tri-variate and bivariate failure thresholds are given. The proposed methodology is validated on the aero-engine RUL prediction task based on the C-MAPSS dataset from the PHM society data competition 2008. The result suggests that the proposed methodology has better explainability and practicability with comparable prediction capability.


INTRODUCTION
Prognostics and Health Management (PHM) methodologies and techniques have been widely studied in academia and practiced by the industry in recent years. The prognostics seek to predict the RUL based on machine operating status described by measurements from condition monitoring (CM). One popular way is to build a direct mapping (e.g., a deep learning regression model) from machine status to RUL (Khan & Yairi, 2018;Y. Wang, Zhao, & Addepalli, 2020). The other primary method type, such as Stochastic Process Method (SPM), Similarity-Based Method (SBM), and State-Space Method (SSM), could primarily rely on a failure threshold to determine the EoL Haoshu Cai, Xiaodong Jia, et al., 2020;Jia et al., 2019). The study to resolve systematically setting up failure thresholds for prognostic tasks are rarely found since, in most cases, such threshold is set empirically by human experience or based upon some simple statistic distributions (P. Wang & Coit, 2007). As emphasized in our previous work , operation risk is an essential criterion for critical systems and essential input of maintenance planning. However, it is not easy to be quantified based upon operating conditions. It would be practical for practitioners to set up a failure threshold to determine EoL based on a quantifiable operation risk.
Multivariate survival analysis could be applied to quantify operation risk based on reliability and operating condition, which generally takes life and its covariates to analyze the expected duration of time until machine failure. Effective models include multivariate distribution, competing risks models (Prentice et al., 1978), Cox Proportional Hazards Model (Breslow, 1975), and Copula Model (Clayton, 1978). Our previous work  proved that COX Proportional Hazard Model has the over-dominant issue and multivariate Weibull distribution has the independence assumption of input variables. Since machine life can be correlated to some indicator of aging over time, the independence assumption does not always hold.
To address these challenges, this study proposes 1) a Gaussian Copula model-based risk quantification method to determine multivariate failure thresholds, and 2) a Similarity enhanced Blackwellized Particle Filter (RBPF) to predict future system conditions. The proposed Gaussian Copula Model could establish the joint effects of life and multiple variables. Life is essentially used in reliability analysis, and other variables can be health indicators or raw sensory readings. Based on the proposed risk model, a systematic methodology for RUL prediction is proposed. Two cases of establishing bi-variate and tri-variate failure thresholds are demonstrated and discussed. The proposed RUL prediction methodology is validated on the aero-engine RUL prediction task based on the C-MAPSS dataset from the PHM society data competition 2008.
The rest of the paper is organized as follows. Section 2 gives a literature review of the Copula Model and its application. Section 3 elaborates the proposed methodology. Section 4 illustrates the benchmarking results based on the public datasets. The conclusions are given in Section 5.

LITERATURE REVIEW
As most widely used to depict dependence relations in economics (Trivedi & Zimmer, 2006), clinical chemistry and finance (Cherubini, Luciano, & Vecchiato, 2004), quantifying risks in insurance (McNeil, Frey, & Embrechts, 2015), and establishing process control chart (Busababodhin & Amphanthong, 2016), Copula models create a link between multivariate joint distributions and univariate marginal distributions (Glidden, 2000;Nelsen, 2007;Patton, 2012). Regular Copula Models can be classified into Tstudent Copula (from which Gaussian Copula is derived) and Archimedean Copulas, which include Frank, Gumbel, Clayton (Clayton, 1978), and Joe Copula distinguished by different Copula functions (Kumar, 2019;Nelsen, 2007). As the most prominent Copula model, Gaussian Copula could easily expand to high dimensionality (more than three) compared with the most asymmetric bivariate Copulas. However, Gaussian Copula could only model linear correlation and not account for tail dependence (Klein, Kneib, Marra, & Radice, 2020). Such tail dependence can be modeled using Clayton in bivariate cases. In a general categorization scale, copula models can be classified as fully parametric, semiparametric (Chen & Fan, 2006) and nonparametric (Genest & Segers, 2009). Maximum Likelihood Estimation (MLE) or multi-stage MLE can be directly used to estimate model parameters for the parametric models such as Gaussian Copula and T-student Copula (Lindskog, 2000). As for parametric models such as Clayton and Frank, numerical optimization of log-likelihood function is expected to estimate model parameters (Bouyé , Durrleman, Nikeghbali, Riboulet, & Roncalli, 2000).
Bivariate Copula is more common and abundantly studied than high-dimensional Copula models, limiting the choice when multiple variables exist. A suitable Copula model is generally case-sensitive (Busababodhin & Amphanthong, 2016). However, it still provides a powerful tool to build the joint influence of life and other health indicators towards risk quantification. As reviewed in , methodologies of determining failure threshold rarely take risk into consideration. Some proposed methods are very complex and inflexible (Jiang, 2010) (Javed, Gouriveau, & Zerhouni, 2013). To address these challenges, this study proposes a Gaussian Copula model-based risk quantification methodology to assist in establishing multivariate failure threshold for prognostic tasks. The workflow of the proposed methodology and its application on C-MAPSS Engine RUL prediction is shown in Figure 1.

Methodology Overview
1) Obtain HI from health assessment: As suggested in , Principal Component Analysis is conducted on Run to Failure (R2F) readings of sensor #2, #3, #4, #11, #17, which are selected using Fisher distance and Mann-Kendall test. The obtained 1-dimensional principal component is treated as the health indicator.

2) Train Fault Detection and Classification Model:
The C-MAPSS dataset consists of two failure types: FAN degradation and High Pressure Compressor (HPC) degradation, which can be easily clustered by the Kmeans method. The obtained label is used for training a Fault Detection and Classification (FDC) model, which is further applied to determine the failure threshold model for testing data.

3) Train Risk Quantification Model:
The risk quantification model calculates the operation risk given the machine operation time and HI. Two models are trained for FAN degradation and HPC degradation failure mode, respectively. The multivariate failure threshold can be obtained based on the given risk value.
The detailed procedure and discussion will be given in Section 3.2.

4) Predict Future HI:
This step extrapolates the partially degraded HI trajectories from the testing samples using a similarity enhanced RBPF. The future HI values are estimated together with the prediction uncertainties. The EoL can be determined by applying the risk-based failure threshold. Details of HI prediction are given in section 3.3.

Train Risk Quantification Model
The system operation risk can be addressed in terms of failure time (reliability) and its covariates (other indicators of operation status) in the presence of dependence. Dependent failure times are typical when system components degrade over time. As for some critical systems, safe is defined as when the machine is reliable in terms of time and healthy in terms of functionally. Risk can be therefore defined as not being able to survive and remain healthy. The mathematical expression of the risk in this study is described in Eq.(1).
Where denotes the Life and 1 , … , are indicators (crucial process measurement) of operation status, respectively, , 1 , … , represents the current time and current indicators' value. (•) is the survival function meaning the joint probability of survival. The risk is defined as 1 − (•).
Variables' dependencies affect ways of calculating the joint probability of survival are classified. The joint cumulative distribution function (d.f.) can be calculated simply as the product of the marginal cumulative d.f. when all variables are independent, which is however, not always valid. The degradation tendency of some components could have strong correlations with machine life. Some degradation pattern can also be reflected by different sensors. The potential complex dependencies make the independent assumption of indicators and time not always hold when calculating the joint probability. Copula model provides a simple but effective solution to calculate such complex joint probability.
, then the copula model of can be defined as Eq. (2). ( 1 ( 1 ), 2 ( 2 ), … , ( )) is a multivariate d.f. with margins 1 , 2 , … , . To construct such a multivariate d.f. using a copula model, one can 1) specify a marginal univariate distribution for each variable and 2) choose a copula to provide a correlation structure between variables.
( 1 , 2 , … , ) (2) In this case, the Gaussian copula model is used to construct the joint distribution due to its simplicity (Gaussian assumption) and high dimensional capability (for tri-variate and higher). The general form of Gaussian copula is shown in Eq. (3), in which is the correlation parameter. To train and test a Gaussian copula model, the probability-integral transformations (to uniformity) should first be performed to map the raw variables into the unit cube [0,1] (Lindskog, 2000). Afterward, the model parameter can be estimated based on samples by maximizing Eq.(4) using MLE. In this case is the density function of Gaussian distribution for each variable. The obtained model can be future used to calculate any joint probability as shown in Eq.(5) according to the Sklar properties (Sklar, 1996) ( 1 , 2 , … , ; ) (4) ( , ) = (1, … ,1, , , 1, … ,1) (5) To illustrate the training procedure and discuss the key features of the proposed risk quantification model, a trivariate example is shown based on life, sensor #2 and sensor #4 (denoted as 'Sensor 1' and 'Sensor 2' for simplicity) from C-MAPSS dataset. These two representative sensors are selected based on the distribution disparity at healthy and faulty operation status as shown in Figure 2 (a) and (b). The faulty data generally follow a normal distribution.
− 1, 2 ( 1 , 2 ) − 1, ( 1 , ) − 2, ( 2 , ) The operation risk can be calculated using Eq.(6) in which ( ) = ( < ) is the marginal cumulative d.f. of life at . 1, 2 ( 1 , 2 ) is the joint cumulative d.f. of 1 and 2 at ( 1 , 2 ). 1, 2, ( 1 , 2 , ) is the joint cumulative d.f. of , 1 and 2 at ( , 1 , 2 ). The sensor readings at EoL are collected and used to train the tri-variate Gaussian copula model. Based on the copula model, the marginal d.f. and bivariate joint cumulative d.f. can be obtained according to Eq.(5). The risk value can be therefore calculated given a vector of input [ 1 , 2 , ]. Based on this risk quantification model, a multivariate failure threshold appeared as a ( ) dimensional surface at different risk values from [0,1]. Figure 4 shows the failure threshold of FAN degradation at risk level 0.1, 0.3, 0.5, 0.7, 0.9, 0.99 respectively. The EoL can be easily determined for an extrapolated degradation trace. Several conclusions can be drawn from Figure 3: 1) As the tolerable risk increase, the threshold surface expands in all directions, covering more EoL data used for training; 2) the shape of the threshold surface is dominated by one variable on the rims and jointly determined by all variables at the central area; 3) The percentage of training EoL data within the threshold is lower than the risk value, indicating even at the risk of 0.99, there is still substantial historical machines alive. This is caused by the strict definition of risk which is the contrary event of 'healthy and alive'. This definition suits the critical system or critical failure mode where one failure could lead to dangerous consequences. It can also be addressed that each sensor is equally weighted with life in this model, which can be too strict for some cases. The proposed risk quantification method could be a powerful tool to determine multivariate failure threshold when considered sensors are equally critical and indicate the system's health status in a certain manner. Otherwise, to avoid such tight assumptions, applications could consider health assessment in advance and build a bi-variate risk quantification model rather than a high-dimensional risk quantification model.
In practice, the standard of choosing tolerable risk varies from case to case and largely depends on expert experience. When historical R2F data is available, a suitable risk level could be determined by the cross-validation strategy, where the data is split in multiple folds for training and validating the risk quantification model. Since the EoL information is available, practitioners could quickly obtain the corresponding risk level for further use. To apply this methodogy for data competition tasks like the C-MAPSS RUL prediction in the case study, the risk level could be tuned as a hyperparameter for better prediction accuracy. In this case, 75% is selected to achieve the most balanced prediction performance on all four datasets.
To obtain suitable failure thresholds for the C-MAPSS RUL prediction, this study built a bi-variate risk quantification model based on HI and life, as shown in Figure 2 (c) and (d).

RUL Prediction using Similarity Enhanced RBPF
RBPF is used in the proposed work for making robust extrapolation of future HI with uncertainty based on partially degraded data. Compared with the traditional Particle Filter and Kalman Filter, RBPF is capable of carrying out analytical marginalization to enhance filtering and extrapolating performance (Sä rkkä , Vehtari, & Lampinen, 2007).
= ⋅ exp( ⋅ ) + Eq.(7) is built to regulate an exponential degradation trend where , , are the unknown parameters and is the observed HI at time . The RBPF can be therefore built as Eq.(8) to predict HI, in which is estimated HI at , , are model parameters at . The , , and in Eq. (8) follow Gaussian distribution with zero-mean and whose variances are tuned to the best model performance.
The implementation of the RBPF includes two steps: filtering and extrapolating. During the filtering step, the unknown model parameters = { , , , , }are estimated based on the partial degradation. Based on the estimated at current time , the prediction can be obtained by extrapolating the state equation (3 rd equation) in Eq.(8).
For each testing sample, the initialization of could significantly impact extrapolation integrity and uncertainty. To make a reasonable initialization, a similarity matching based approach is introduced to obtain critical RBPF parameters , and , . The workflow of the proposed Similarity enhanced RBPF is as following: 1) Estimate { s } =1,..., for each historical R2F record using MLE, where = { , } and denotes the number of historical R2F records.
2) For each partially degraded testing sample, matching a similar R2F trace and record its . The R2F trace is considered similar when a window's Euclidian distance to the testing data is smaller than a small number . A set of similar parameters{ s } =1,.., is obtained for each sample.

RESULTS & DISCUSSIONS
The proposed methodology for risk quantification and failure threshold determination is validated on the RUL prediction task on the C-MAPSS dataset. The prediction performance of proposed methods shows superiority over benchmarked methods from some recent literature in terms of Root Mean Squared Error (RMSE) and Score calculated using Eq.(9). The benchmarked methodologies include deep Convolutional Neural Network (DCNN) and Long Short-Term Memory (LSTM) from (Li, Ding, & Sun, 2018), Deep Belief Networks (DBN), Multi-Objective Deep Belief Networks Ensemble (MODBNE), Random Forest (RF), Gradient Boosting (GB), Support Vector Machine (SVM) and Least absolute shrinkage and selection operator (LASSO) reported from (Zhang, Lim, Qin, & Tan, 2016), and RULCLIPPER from (Ramasso, 2014). As noticed by literature, a constant value (set to 125) is assigned as target RUL before the degradation period. When is applied, the largest prediction RUL is limited to 125. For illustration, the prediction result of 75% risk level is summarized in Table 2 and Table 3 together with the obtained best result.     Figure 6 and Figure 7 illustrate how the proposed bi-variate failure threshold determines the EoL and predict RUL using the RBPF based extrapolations. Some significant findings are as follows: 1) the failure threshold determination model of FAN and HPC degradation have a significant difference in terms of life, which justify the importance of FDC before quantifying operation risks; 2) the extrapolated HI has reasonable exponential shape and uncertainty range, suggesting the effectiveness and necessity of applying the similarity enhanced RBPF; 3) the prediction uncertainty of HI can be easily converted to RUL prediction uncertainty with the failure thresholds as contour lines; 4) the EoL point has a high change appearing in the complex joint, which shows the necessity of depicting the correlation of HI and life and justified the capability of copula model for such tasks; 5) the ground truths can be spotted in the rational middle of all risk levels, indicating the effectiveness of the proposed methodology.
The effects of varying tolerable risk on the prediction score are illustrated in Figure 8. It can be seen from (a)-(d) when is not applied, an optimal score is found for FD001 to FD004 when the risk levels are 76%, 86%, 47%, 68%. This suggests that practitioners can flexibly obtain such a useful failure threshold based on practice operation risk. This risk value can statistically reflect the probability of survival under healthy status. Comparatively, the optimal scores are found on relatively higher risk levels when is applied. This is because generally removes the penalty of scores when a longer RUL is predicted, which could accept a higher risk level. The risk level for each dataset to achieve the best score is summarized in Table 2 and Table 3 which compare the RMSE and score performance of the proposed method against benchmark methods with = 0 and 125.
As seen from Table 2 where is not applied, the proposed model has comparable score performance compared with the best among other methods on FD001 and FD002. It can be highlighted that the prediction Score of FD002 is 3869.2, which is significantly better than all the other methods. When is applied, as suggested in Table  3, the proposed methodology holds comparable score performance with all the benchmarked deep learning methods throughout FD001 to FD004. The major advantage of the proposed methodology over deep learning is its explainability, practicability, and lower requirement on data volume. The overall performance on each dataset is much better in RMSE and score compared with our previous work in . However, the proposed methodology does not show better accuracy on FD001 and FD003 than the benchmarked deep learning methods. This is because the training data for building the risk quantification model and the similarity base are unbalanced where more historical records are available under multiple operation conditions (6 conditions in FD002 and FD004). This could introduce a bias towards multiple conditions where the machine degrades faster. Therefore, the overall prediction of RUL on FD001 and FD003 is conservative. Generally, the score performance is slightly worse on FD003 and FD004 than FD001 and FD002 due to the combination of multiple failure modes and operation conditions. The FDC model's misclassification could select wrong model for failure threshold determination, causing conservative or radical prediction results. The assumption of exponential degradation could make some parameters very sensitive to addressing the HI curve before the degradation started. This could also cause radical prediction on the HPC failure model when selecting a relatively larger risk level. The proposed method can be future improved using a more accurate FDC model and complex degradation function for RBPF extrapolation. Building a local model for each dataset could also increase prediction performance by addressing the data imbalance issue. However, this would increase modeling complexity and reduce the data used for training each model.

CONCLUSION
This paper proposes a Gaussian Copula model based multivariate risk quantification model to determine the failure threshold for RUL prediction tasks. A systematic methodology for RUL prediction is proposed based on the failure threshold determination method and the similarity enhanced RBPF extrapolator. The work discussed the pros and cons of the proposed risk quantification method with a tri-variate and a bi-variate example. The obtained bi-variate failure threshold is further validated on the aero-engine RUL prediction task using the C-MAPSS dataset. Several essential conclusions are achieved: 1) The proposed risk quantification model can easily quantify machine operation risk based on multiple input factors without variables' independence assumption.
2) The proposed failure threshold can effectively convert the predictive distribution of HI into the predictive distribution of RUL.
3) The tolerable risk can be flexibly tuned accordingly. If low risk can be tolerated, the life indicator dominates the threshold. If higher risk can be tolerated, the health indicator dominates the threshold. 4) The similarity enhanced RBPF could make extrapolation more reliable. It can still work well with limited reference data. 5) For a particular case, The RUL prediction performance could reach its optimal by varying the tolerable risk in RUL prediction tasks. As for future work, the bivariate Copula model can be updated to those with better tail character (such as Clayton, Joe, etc.) to enhance the tails dependence of life. The definition of risk can be further studied to suit for the nonessential case with multiple variables. The proposed methodology can be further validated on the newly published C-MAPSS 2 dataset from the 2021 PHM Conference Data Challenge.