Development of Multivariate Failure Threshold with Quantifiable Operation Risks in Machine Prognostics

Prognostics and Health Management (PHM) is attracting the attention from both academia and industry due to its great potential to enhance the resilience and responsiveness of the equipment to the potential operation risks. In literature, many methodologies are proposed to predict the Remaining Useful Life (RUL) of the equipment. However, there are two major challenges that limit the practicality of these methodologies. 1) How to generate a quantifiable Health Indicator (HI) to represent the operation risks? 2) How to define a reasonable failure threshold to predict RUL? To answer these two questions, this paper proposes a novel methodology for failure threshold determination with quantifiable operation risk in machine prognostics. In the proposed methodology, Fisher distance and Mann-Kendall (MK) test are firstly used to extract useful sensors based on which HI is estimated by applying Principle Component Analysis (PCA). Then, RaoBlackwellized Particle Filter (RBPF) is employed to obtain the HI prediction and the uncertainties. Afterwards, a Bivariate-Weibull-distribution-based risk quantification model is designed to quantify the cumulative risk over time and over the increase of HI. The failure threshold, which is the ending point of the RUL, varies over different users and applications depending on the level of risk they want to tolerate. The validation of the methodology is based on the C-MAPSS data from the PHM data competition 2008 hosted by PHM society. The results validate the effectiveness of the proposed risk quantification method and its potential application on machine prognostics.


INTRODUCTION
Remaining Useful Life (RUL) prediction has been extensively studied in the literature, and many different methodologies are proposed to fulfill the prediction task. These methodologies include but not limit to Regression-Based Method (RBM), Random Coefficient Method (RCM), Stochastic Process Method (SPM), State-Space Method (SSM) and Similarity-Based Method (SBM), as being summarized in Haoshu Cai, Xiaodong Jia, et al., 2020;Jia et al., 2019).
Based on the literature survey, it is found that the failure threshold in most RUL prediction studies is often set empirically by referring to the engineering practices. There still lack a systematic approach to define failure threshold with quantifiable operation risk. This is important because 1) the predicted Health Indicator (HI) cannot be effectively converted to RUL without an effective failure threshold (N. P. Li, Lei, Lin, & Ding, 2015). 2) a quantifiable failure threshold is the key for a prognostic model to interface with the maintenance planning (Camci, 2014(Camci, , 2015. This is essentially because the maintenance planning models expects the take the quantified risk as input to calculate the direct and indirect failure cost (Camci, 2014(Camci, , 2015. Without a tunable failure threshold with quantifiable risk, it is difficult to build friendly interface between prognostic models and maintenance planning models. 3) A failure threshold with quantifiable risk will allow different manufacturers or users to tune the tolerable risk based on their practical needs and application scenarios.
As a branch of survival analysis, multivariate survival analysis models covariates and life indicators as clusters to analysis expected duration of time until machine failure.
Commonly seen approaches to model the variation between life indicators and other factors are Cox Proportional Hazards Model (Breslow, 1975), Clayton Copula Model (Clayton, 1978) and competing risks models (Prentice et al., 1978), etc. One major focus of multivariate survival is to address the correlation between life indicators and health indicators, which is sometimes picky for dependency measure selection and background knowledge (Georges, Lamy, Nicolas, Quibel, & Roncalli, 2001). In multivariate survival analysis applications, improper dependency measures can render the life indicator over-dominant in the model.
To address this challenge, this study proposes a risk quantification model to assist the optimization of failure threshold (FT). The proposed model considers the joint effects of life indicator and health indicator, where the life indicators is essentially used in reliability analysis, and the health indicator is derived from sensory readings. The proposed risk model falls somewhat between reliability analysis and HI based failure statistics. Based on the proposed risk model, a systematic methodology for RUL prediction is proposed, and the effects of varying tolerable risk on the predictive distribution RUL are studied and discussed based on the well-known C-MAPSS data about aero-engines degradation (Jia, Huang, Feng, Cai, & Lee, 2018).
The rest of the paper is organized as follows. Section 2 gives an overview of the previous studies about the failure thresholding. Section 3 elaborates the proposed methodology. Section 4 illustrate the benchmarking results based on the public datasets. The conclusion remarks are given in Section 5.

REVIEW
In machine prognostics, a unit is often considered to have failed when its health indicator crosses a certain failure threshold(X. Li et al., 2020;Yang et al., 2020). Determination of certain threshold is largely depended on experience. Other than empirical determination, such threshold can also be determined from data-driven approaches for both continuously monitored degradation and discretely monitored degradation. (Javed, Gouriveau, & Zerhouni, 2013) proposed a Fuzzy Clustering method determined by Subtractive-Maximum Entropy to dynamically determine the FT. (R.-y. Jiang, 2010) developed an online and offline alarm threshold determination methodology to optimize the inspection scheme in condition-based maintenance (CBM). The alarm threshold is optimized by trade off cost and machine degradation whose information is obtained and updated during last inspection. As for continuous monitoring, rather than a predetermined FT, uncertain FT or probabilistic FT are usually determined with adequate historical or peer information (Wang & Coit, 2007). (Emami-Naeini, Akhter, & Rock, 1988) use nonlinear inequalities to build a sensor threshold selector for fault detection and classification (FDC). The selector is based on errors from a statistical FDC model, signal noise properties and filters properties. (Chehade, Bonk, & Liu, 2017) develops a convex quadratic formulation which consider both historical and real time degradation data to online estimate the FT for each machine. An adaptive failure threshold method is proposed by (Hua, Zhang, Xu, Zhang, & Xu, 2013) to estimate reliability when there is degradation performance on a machine. The threshold is determined by estimating conditional probability density function using a sliding window based dynamic kernel estimation method. (Javed et al., 2013) also proposed an Extreme Learning Machine based method for dynamic threshold decision in continuous state prediction task. (L. Jiang, Feng, & Coit, 2011) developed a failure threshold determination model for dependent failure process. The method considers the dependency of coexisting soft and hard failure for both diagnosis and prognostics.
The reviewed FT determination methods are hardly based upon risks the operator would directly want to shoulder. Some autonomous FT determination methods are hard to adjust according to additional constraints and some other methods my require large computation resources. To address these challenges and expand the horizon of FT determination in machine prognostics, this study proposes a risk quantification methodology to assist the optimization of FT.

Overview
The proposed methodology includes 3 three major steps, as shown in Figure 1. 1) Health Indicator (HI) estimation: this step includes the selection of trended features that are useful for RUL prediction and the estimation of the embedded degradation trend. In this study, the feature selection is based on Fisher distance and Mann-Kendall (MK) test. The estimated HI is given by performing Principal Component Analysis (PCA) on the selected sensor readings.

2) HI Prediction:
The purpose of HI prediction is to predict the development of HI in future time horizons. Rao-Blackwellized Particle Filter (RBPF) is employed to obtain the prediction and the uncertainties, as shown in Figure 1.

3) Risk Quantification:
The risk quantification model is designed to quantify the cumulative risk over time and over the increase of HI. The mathematical expression of the risk in this study is described as: Where , denotes the Life and Health Indicator, respectively, , ℎ represents the current time and current HI value. ( , ℎ) is the joint probability of survival when = and = ℎ. The risk is defined as 1 − ( , ℎ).
In the following discussion, the primary processing steps in the proposed methodology are elaborated.

HI Estimation
Health indicator of aero-engine system is usually estimated according to sensor selection from monitoring sequences(H. Xu, Hou, Li, & Zheng, 2018).The results for feature selection are shown in Figure 2. The Fisher distance in Figure 2(a) measures the usefulness of the sensory data to distinguish the healthy and faulty. In this study, the mean of the first 15 operations cycles are specified as healthy, and the last 15 operation cycles prior to the maintenance are defined as faulty. Based on the preliminary screening according to Fisher distance, sensors 2, 3, 4, 11, 17 are identified as useful sensors for detection.
The MK test further evaluates the presence of a gradual trend in the sensory data throughout the machine degradation. If the MK test is passed, there is a trend in the sensory data. In Figure 2(a), the pass rate of the MK test is utilized as an indicator to show the trend-ability of the data, which indicates that sensors 2, 3, 4, 11, 17 are trend-able. Therefore, in this study, the HI estimation is primarily obtained by performing PCA on the sensor 2, 3, 4, 11, and 17 .

HI Prediction
The RBPF that is specified in Eq. (2) is employed to predict the HI. Compared with other particle filter methods, RBPF is employed because it can depict the prediction uncertainty, and it is efficient to do the computation .
Where , denotes the samples of the unknown model parameters. The is the state vector, which represents the estimated HI at time . The is the observed HI at time .
, , and in Eq.
(2) are assumed to follow Gaussian distribution with zero-mean. The variance of , , and are tuned to the best performance.
The third equation in Eq.(2) regulates an exponential degradation trend, and it is obtained by discretizing the exponential model below: (3) Where , , are the unknown parameters in the exponential model and is the observed HI at time .
The application of the RBPF includes the filtering process and the prediction process. The objective of performing filtering on the partial degradation is to estimate the unknown model parameters = { , , , , } . Based on the estimated at current time , the prediction can be obtained by extrapolating the state equation (3 rd equation) in Eq. (2).
, , which are standard deviation of and , are estimated based on the historical Run-to-Failure (R2F) data.
In the proposed method, the following processing steps are followed: 1) Estimate { } =1,..., for each historical R2F record, where denotes the number of historical R2F records.

2) Estimate
, based on historical data.
3) For each testing unit with partial degradation data, initialize the filtering process by using entries in { } =1,..., . At the end of the filtering process, different filtering results at current time are obtained, which is denoted as { } =1,…, . 4) Extrapolate the future trend by using entries in { } =1,…, as the starting point of the HI prediction.
To convert the predicted HI distribution into RUL distribution, the failure threshold is required. A systematic way to obtain a failure threshold with quantified risk is still discussed in the literature. The discussion in the next subsection fills in this gap by utilizing a Bivariate Weibull model.  Figure 3 shows the risk surface and the iso-risk lines that are given by the proposed Bi-variate Weibull model. The iso-risk lines in Figure 3 can be used as a failure threshold by specifying the tolerable risk level. If a 5% risk line is adopted, it implies the user has a 95% probability of not encountering unexpected failure (Generazio, 2007). Typically, the optimal risk level can be obtained based on cost-benefit analysis in practice. Once the optimal risk is obtained, the failure threshold is fixed based on the bivariate model in Figure 3. In this study, the failure threshold is the ending point of the HI prediction. The definition of the risk is given in Eq.(1). The joint survival in Eq.(1) is obtained from the joint distribution Prob( , ℎ), which is a Bivariate Weibull distribution as in below:

Risk Quantification
The parameters in the Bivariate Weibull density function = { , , , , , } which represent shape, scale and location parameter of life indicator and health indicator respectively. is estimated by maximizing the likelihood function from the marginal pdf ( , ). In the Bivariate Weibull distribution model, the life indicator , and the health indicator ℎ are assumed to be independent. This assumption has significantly simplified the challenge of estimating multivariate PDF (Probability Density Function), and it is valid in most applications. Weibull distribution is widely used for life distribution in reliability engineering due to its flexibility. It is generally believed dozens of life samples would be sufficient to estimate a 3-parameter life Weibull distribution in practice. More samples would improve such estimation accuracy. Expert experience can also be applied by defining the distribution of HI and life when reference history is insufficient.
Unlike the multivariate survival analysis, the proposed Bivariate risk model considers the life indicator and the health indicator as random variables with equal importance.
In multivariate survival analysis, the HIs are modeled as predictors rather than random variables, and this renders the life indicator over-dominant in the model. To give an example, the risk surface and iso-risk lines given by the Cox Proportional Hazard Model, which is a typical model for multivariate survival analysis, are shown in Figure 4. It clearly indicates that the life indicator is over-dominant in the model, and the iso-risk lines cannot be utilized as the failure threshold for RUL prediction.

RESULTS & DISCUSSIONS
In this paper, the proposed methodology is illustrated in the aero-engine RUL prediction problem using C-MAPSS dataset. The details of the dataset is summarized in Table 1, where Fan degradation and Higher Pressure Compressor (HPC) degradation are two failure modes in different datasets. The data processing, HI prediction and risk quantification are deployed sequentially following the procedures from Figure 1. The prediction performance is validated using RMSE and Score calculated using Eq.(5). As noticed from literature, a constant value is assigned as target RUL before degradation period (X. Li, Ding, & Sun, 2018). In this case, is set to 125 for result evaluation. The selection of risk level for estimating RUL depends on whether is applied or not. When is not applied, a 75% risk level is selected for relatively conservative prediction. When is applied, the largest prediction RUL is limited to 125 therefore the risk level can be selected as large as possible. In this case 95% risk level is selected when is applied. To compare and evaluate the performance of the proposed method, some benchmarked methodologies are summarized from literature: DCNN and LSTM from (X. Li et al., 2018), DBN (Deep Belief Networks), MODBNE (Multi-Objective Deep Belief Networks Ensemble), RF (Random Forest), GB(Gradient Boosting), SVM (Support Vector Machine) and LASSO reported from (Zhang, Lim, Qin, & Tan, 2016), and RULCLIPPER from (Ramasso, 2014).  Figure 6 give two examples of the proposed methodology. The significant findings are summarized as following: 1) The filtered HI is a noise-free process; 2) The predicted HI follows an exponential trend; 3) The prediction uncertainty of HI can be effectively converted to the uncertainty of RUL by adopting the iso-risk line as failure threshold . 4)  lower risk is chosen. However, the effects of HI on the RUL distribution are dominant when a higher risk is chosen. 5) The RUL ground truth in Figure 5 and Figure 6 indicate that a tolerable risk within [0.9, 0.99] can give rather good accuracy. However, it is important to note that the tolerable risk varies over different applications and over different units. Figure 7 shows the typical prediction trajectory of a training unit under the risk level 25%, 95% and 50% (where best prediction is achieved). The findings can be summarized as following: 1) With the increasing length of history, the predicted RUL gradually converges to the ground truth; 2) the prediction uncertainty bound is gradually shrinking to a very narrow range; 3) the prediction uncertainty bound is relatively narrow at all prediction horizon when selecting lower risk level compared with higher risk level; 4) the overall predicted RUL under higher risk level is larger than the lower risk level. These findings justify the importance of risk quantification in RUL prediction.
The effects of varying tolerable risk on the prediction score are illustrated in Figure 7. It is found that an optimal score is found for FD001 and FD002 when varying the risk. In contrast, there is no optimal score for FD003 and FD004. This is essentially because FD003 and FD004 contains two degradation types: HPC degradation and FAN degradation, and FD001 and FD002 only have HPC degradation. This finding further implies that the risk model should be different for different failure modes.
The RUL prediction result under the risk level 70% with not applied and risk level 95% with applied for FD001 to FD004 is compared with benchmarked methods from literature in Table 2. It can be highlighted that 1) the prediction Score and RMSE of FD001 and FD002 very good and are compariable with other benchmarked methodologies; 2) the RMSE of FD002 with is 15.51, which is significantly better than any other benchmarked methods; 3) the general prediction performance under the same risk level are significantly different among FD001 to FD004; 4) the performance of proposed methodology is largely depend on the selection of risk level. By using the proposed risk quantification method, the prediction accuracy can be further optimized.
(a)  Figure 8 shows the trajectories of predicted RUL under different risk level when different length of operation history is given. The predicted RUL values tend to be smaller at low risk level because the failure threshold is set lower when a low risk level is given which result in early termination of degradation extrapolation. This justify the effectiveness of proposed risk quantification method. The RBPF prediction model is justified by that the uncertainty band gradually shrinks and predicted RUL converge to ground truth with training history increase. It should be addressed that prediction uncertainty and prediction threshold are different where the former is determined by prediction model while the latter is not. Prediction uncertainty is related to the historical data on itself while failure threshold is chosen based on tolerable risk in this case. Failure threshold will determine the end of life (end point of RUL prediction) but will not affect prediction uncertainty bound. In the C-MAPSS case, the degradation mode of training units can be detected and classified using PCA and Kmeans method as shown in Figure 9. Indexed by the FDC result, the risk surface of FAN degradation and HPC degradation can be compared from a) and b). It can be concluded that 1) the risk contour patterns are different between two types of failure; 2) the potential life under FAN degradation is much higher compared with the HPC degradation units under the same risk level; 3) the failure HI for FAN degradation is relatively smaller than the HPC degradation. These suggest that the prediction accuracy based on risk surface trained from all failure modes cannot at least perform high accuracy in predicting FAN degradation units RUL by giving relatively conservative (smaller) RUL estimation. Therefore, the proposed method has very poor prediction performance on FD003 and FD004. This also explains why the optimal risk level for FD003 and FD004 is 99% or 100% since nearly all of the predicted RUL is too small. It is important to build the risk quantification model with regard to different failure modes and conduct FDC before estimating the RUL. Another reason for less prediction accuracy can be the inferring method RBPF is simply randomly initialized. The prediction performance can be further improved by applying some enhanced methodologies for RBPF initialization.

CONCLUSION
This paper proposes a Bi-variate model, which can be easily extended to the multivariate case, to define the failure threshold with quantifiable operation risk. The model considers the joint effect of life indicators and health indicators by using a Bi-variate Weibull model. Based on the risk model, a systematic methodology for RUL prediction is proposed, and several important conclusions are achieved: 1) The proposed risk model can effectively convert the predictive distribution of HI into the predictive distribution of RUL.
2) If low risk can be tolerated, the reliability information estimated from life indicator should be primarily considered.
If higher risk can be tolerated, the HI based failure threshold will be more meaningful. This indirectly implies the HI based RUL prediction can potentially extend the equipment life in some non-critical applications.
3) The RUL prediction results can reach its optimal performance by varying the tolerable risk in RUL prediction tasks.
4) In RUL prediction, it is crucial to pro-diagnose the equipment failure, and the risk model for different failures should be different.
As for future works, multivariate risk models will be explored by removing the independence assumption in the current model. Besides, adopting the proposed method to interface prognostic models and maintenance planning models will be explored.