A Framework to Rank Prognostics Health Indicators with Application to Brake Rotors

This study presents a framework to assess the effectiveness of various health indicators (HIs) used to monitor the state of health (SOH) of a brake rotor health monitoring system. The following criteria were used to rank various health indicators: (i) Identifiability: Correlation of the HI with the Ground Truth (GT); (ii) Compactness: Mean of the standard deviation of the estimated SOHs; (iii) Robustness to Noise Factors: An HI is considered robust when it meets all functional and customer requirements under all operating conditions and its performance is not affected by the variations in the environment, operating conditions or other factors impacting the performance in an undesired way (noise factors); (iv) Monotonicity: To quantify the monotonic trend in HIs as the fault level increases from healthy baseline to the most severe faults. Monotone HIs are preferred as they will likely generalize better to data not used in development; and (v) Estimation Error: The average relative error between the GT and the prediction obtained from the regression analysis. Results showed that this framework can be applied to several HIs derived from performing time and frequency analysis on various sensor signals used to monitor the health of brake rotors. Top HIs selected based on this framework provided the best performance in detecting degraded brake rotors as evidenced by higher classification score.


INTRODUCTION
The advantages of prognostics and health management (PHM) include increased safety and reliability, reduced collateral damage, lower logistics costs, avoiding unnecessary service and enabling efficient maintenance scheduling. Metrics are needed to quantify the performance of PHM systems to ensure that the developed solution meets the performance requirements. This is especially true in safety-critical settings, where accurate fault PHM is crucial to maintaining system safety. Performance metrics can also aid in developing PHM algorithms by enabling comparison of alternative approaches.
For clarity, in this paper, the term Health Indicator (HI) is defined as a "feature" derived from raw signals. In other publications and nomenclature, the term Condition Indicator (CI) is used to refer to the same "feature". Numerous methods to construct an HI (e.g. choice of signal, pre-processing, manipulation, post-processing, fusion and calibration) could impact the performance of the algorithm and results in variations in the outcome of a prognostic solution. The number of HIs to be used in algorithm development could easily add up to hundreds of features and without defining performance metrics, it would be a tedious task to manually go through all of them and decide which one is the best. Specially, where an improvement in one factor, comes at a cost of another factor. Therefore, it is valuable to define objective metrics, link relevant ones to customer/technical requirements and use an objective framework to select the top HIs.
In the automotive industry, there are various stakeholders within a vehicle health management (VHM) organization that may define requirements for a PHM system. These include engineers, fleet managers, service personnel, customers, regulatory bodies, and program managers, for example. VHM has different goals and expectations for each stakeholder, and it is important to assess the needs of PHM solutions with respect to different perspectives. Program managers have to maintain high customer expectations and Hamed Kazemi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. evaluate the cost-benefit tradeoffs of PHM. Fleet managers need to monitor fleet health to enable mission planning with minimal down time. Engineers must design components and systems to meet performance requirements that likely include a maximum acceptable rate of failure. Service personnel need to schedule maintenance often enough to prevent in-service failures, but not too often to result in expensive and unnecessary maintenance. Early fault detection and isolation is key in addressing all of these perspectives (Saxena, Celaya, Saha, Saha, and Goebel 2010).
There are a number of research works on HI construction and evaluating the suitability of HIs for prognostics (see Lei, Li, Guo, Li, Yan, and Lin (2018) for a review). Various prediction performance metrics have been introduced in the literature covering a wide range from algorithm performance to computational and cost-benefit metrics (Vachtsevanos (2003), Banks and Merenich (2007), Saxena, Celaya, Balaban, Goebel, Saha, Saha, and Schwabacher (2008), Saxena, Celaya, Balaban, Saha, Saha, andGoebel (2009), Feldman, Kurtoglu, Narasimhan, Poll, andGarcia (2010), Coble (2010), Yang, Habibullah, Zhang, Xu, Lim and Nadarajan (2016)). Different researchers have used and suggested different metrics depending on the application, the end user and the requirements. Some of these performance metrics are in competition with each other (Coble 2010). Basic metrics are often transformed to more complex metrics which reduces the chance of being adopted by the practitioners resulting in not having a standardized set of metrics to be used for comparing various algorithms.
One of the main challenges when developing a PHM system is in identifying a set of health indicators (HIs) that may be used to assess component health. An HI is any feature that is designed to represent the health condition of the unit under test. These are parameters extracted from pre-processed signals and may employ any number of signal processing techniques (e.g. detrending, filtering, time-synchronous averaging, domain transforms, calculating RMS, kurtosis, skewness, envelope, order analysis, etc.).
In this paper we first introduce a framework to rank various health indicators used for early fault detection and identification. It is important to note that the proposed framework is applicable to the health monitoring of the chassis components (e.g. brake rotors, wheel bearing, etc.) The goal is to formulate a framework that can be used in selecting the best candidates from a large set of HIs that may be used to detect faults in a component. The large set of HIs under study may include HIs generated from different signals, as well as HIs generated from the same signal using different pre-or post-processing techniques. Emphasis is placed on quantifying algorithm performance.
The criteria proposed to rank HIs include identifiability, variability, sensitivity to noise factors, monotonicity and estimation error.
Then, a case study related to a brake rotor health monitoring system is presented in which the rank health indicator framework was applied to hundreds of health indicators derived from performing time and frequency analysis on three signals of interest to monitor the health of brake rotors. Processing techniques such as derivative, detrend, variance, envelope, order analysis, and correlation were applied to these signals in both time domain and phase domains to generate HIs to differentiate between healthy and faulty rotors. The HI ranking framework is applied to identify the best HIs in terms of source signal and signal processing method, enabling selection of the best HI for deploying an early fault detection algorithm.

MATERIALS AND METHODS
In order to identify a set of metrics that can be used to assess the performance of an HI, we must first understand the application and determine what characteristics of an HI is desired. For the brake rotor health monitoring system, if the goal is to formulate the problem as a regression problem and estimate the rotor thickness variation, then a good HI is one that can estimate the Health Stage (HS) of the brake rotor with smallest error. Deviations from the identity, in the form of either bias or variance, will degrade this HI. An HI with high variance but low bias will still be useful for estimating HS, albeit with some degree of error. An HI with high bias but low variance may not be as useful for estimating HS but may serve as an excellent feature for a classification algorithm if the HI maintains monotonicity with increasing health states.
On the other hand, we can also consider the characteristics of a bad HI. Obviously, an HI that is constant will not be useful for any fault detection or health state estimation. At the very least, an HI must have statistically different values for different health states. With these qualitative ideas in mind, we can describe the five metrics we propose for quantifying HI performance.

Performance Metrics
The following criteria could be beneficial in characterizing the performance of an HI:

Identifiability:
This metric measures the correlation between HI and HS. If an HI is a good indicator of HS, the correlation between the two will be high. This metric captures the ability of the HI to identify different HSs. An HI with higher correlation to the HS is desired. Note that this is the most basic indication that an HI will be promising for detecting faults, as correlation directly measures if a change in the HS results in a change in the HI, a necessary feature for fault detection. The Identifiability metric was calculated by obtaining the Pearson correlation coefficient between arrays of HI and HS using the algorithm in Eqn. 1, where ̅̅̅̅ and ̅̅̅̅ represent the mean of HI and HS respectively where m and n represent the length of HI and HS respectively. (1)

Monotonicity:
Another significant requirement of a good HI is monotonicity. As fault level increases, so should the HI. While this is partially captured by correlation, a more direct metric of deviation from monotonicity helps identify HIs with strong correlation at extreme values, but weak correlation in intermediate value. Note that monotonicity is mostly an important quality for an HI for an irreversible component degradation process, assuming system damage to be cumulative. It is important to note that although monotonicity is generally expected but not guaranteed in every system. If a component goes under self-healing, the appropriate HI to model such behavior would be nonmonotonic (Coble and Hines 2009), however this is not a valid assumption for most systems. Also, some systems may show localized self-recovery, or because of undesirable noise that cannot be eliminated, may lead to non-monotonic behaviors. In addition, for intermittent faults, monotonicity may not be required. Therefore, one might factor in the underlying assumptions for the degradation before applying this metric. The formulation of monotonicity is given by Eqn. 2. and In Eqn. 2, is the total number of health states that experimental data is collected at, is the ground-truth health state of experiment , and { } is the set of health indicators calculated for experiment . Note that is sorted so that is larger than −1 . 1 −1 <0 is 1 only when −1 < 0, otherwise, it's 0. Since is sorted increasing, 1 −1 <0 should be 0 if −1 > 0. This metric was derived to penalize negative slopes between median HIs when plotted as a function of HS. A value of 0 indicates a perfectly monotone HI, and any non-zero value is a measure of deviation from monotonicity.

Linearity:
The ideal HI is one that accurately estimates HS with no bias. An HI may have high identifiability and monotonicity, but if it yields a high error bias versus the HS (for example, by having a logarithmic trend vs. HS) will not present as much value as an HI with ideal linearity.
The formulation used for linearity is given by Eq. 3 below.
This captures the deviation of the median HI estimates from the ideal HI.

Compactness:
While identifiability, monotonicity and linearity capture the tendency of an HI to trend with HS, they are missing an assessment of the consistency of an HI for a given HS. Ideally, HI estimates for a constant HS will have a small variance. It has been observed that HIs often display heteroskedasticity, and variance tends to grow as the underlying fault worsens. Therefore, the measure of variability we employ is the average relative variance of the HI, captured by Eqn. 4.
Estimation Error: Finally, an overall summary metric of the predictive power of an HI is the mean absolute percentage error (MAPE) between an HI's estimation and the ground-truth HS. The lower estimation error indicates more suitable HIs. The formulation for estimation error is presented in Eqn. 5.
Here, is the index of a single output sample from the algorithm, is the calculated health indicator, and is the ground truth health state for sample .
In some senses, this metric combines the linearity and robustness metrics, and it is true that an HI that scores well in estimation error must also score well in both linearity and variability, and vice-versa. It must be recognized, however, that an HI with strong robustness but poor linearity may still be a very strong candidate, especially for a binary fault detection algorithm in which separation of health states is important but prediction of health state is not. This possibly indicates that a regression is required to map the HI to a more linear form.

Robustness/Sensitivity to Noise Factors:
Control and noise factors affect the response of the systems. It is important to select an HI that will give high performance of the PHM system in all possible operating conditions. In automotive, for example, it will be required for an HI to exhibit strong performance regardless of road surface, outside temperature, vehicle age and condition, and passenger load. SOH estimates across two levels of each noise factor were compared using paired sample t-test or Wilcoxon Signed Rank test if the normality assumption of the population is violated. In a recent study we proposed methods to quantify robustness of HIs as we investigated the impact of tire type, tire pressure, and vehicle mass on a brake rotor health monitoring system (Kazemi, Garner, Drame, Du, and Sadjadi 2021). HIs that are robust to noise factors will have higher overall performance and thus it is important to choose features or HIs that are robust.
Other metrics that were considered but not included in this study were: computational cost, memory cost, number of interfaces and inputs, and calibration complexity. These metrics are important considerations for production implementation of an HI but are not related specifically to HI performance and are therefore out of the scope of this framework.

Normalization Approach
In the interest of comparing multiple health indicators to select the best candidate, it is useful to normalize all metrics to a similar scale so they can be compared. This section explains a normalization approach that was used to scale and shift the metrics used for ranking HIs to the range from zero to one, where 0 is the least suitable HI and 1 is the ideal HI. The metrics then can then be averaged, possibly with different weighting, to calculate an overall relative performance score for each HI.
Of the five metrics proposed, only identifiability is bounded by definition. This metric can be simply normalized by the classic approach in Eqn. 6 below, which maps the maximum value to one and the minimum value to 0, with a linear relationship in-between. This is the simplest way to normalize this metric but other normalization approaches could also be applied.

= − min ( ) max( ) − min ( )
The other four metrics are bounded below by zero, unbounded above, and the goal is to minimize them. Eqn. 6 is not suitable for normalizing these metrics due to the unbounded nature of the calculation. Any HIs with extremely poor results (and therefore large values) will skew the normalization. A slight adaptation that limits the influence of a wide tail on these metrics is applied, as described in Eqn. 7. In this normalization approach, a percentile threshold ∈ [0, 100] is chosen such that any value worse than the bottom percent will be mapped to zero. A perfect score of zero will be mapped to 1, and anything between zero and the threshold will be linearly scaled between 0 and 1. Figure 1 shows a comparison of the two normalization approaches for the monotonicity metric. The plot of the raw metric demonstrates the problem, in which the outlier at 50 is skewing the normalization by Eqn. 6 to give nearly all HIs a perfect score of 1. Normalization by Eqn. 7 yields a more evenly distributed set of scores, with any HI in the bottom third ( = 33) of monotonicity score taking a normalized score of 0.

Case Study -Application in Brake Rotor Prognostics
In our previous work (Kazemi, Du, Drame, Dixon, and Sadjadi (2019), Du, Mai, Kazemi, and Sadjadi (2020) Jalaliyazdi, Garner, Sadjadi, and Kazemi (2021)), we developed a methodology to monitor the health of brake rotors and generated health indicators to differentiate between a healthy brake rotor and a degraded rotor due to thickness variation. In this paper, we demonstrate the application of the rank HI framework to that study to demonstrate the effectiveness of such framework in selecting the best performing features. The following briefly discusses the brake rotor fault detection methodology, experimental setup, fault injection and data collection and features generated based on the signals of interest.

Experimental Setup and Data Collection
To create faulty rotors, healthy rotors were machined to produce varying levels of thickness variation. "first order" and "2 nd order" rotor thickness variation (RTV) fault profiles were generated. A first-order RTV profile has one maximum thickness and one minimum thickness. The thickness vs. angle curve approximately resembles a single period of a sinusoid. A "second-order" faulty profile was also injected which has two maxima and two minima, approximately resembling two periods of a sinusoid. In total, 25 faulty rotors were created. Figure 2 Explored Health Indicators. Over 2000 vehicle level road tests were conducted, and data was collected using multiple GM production vehicles. In total, 2359 data sets were generated. 165 test cases were conducted with healthy rotors (i.e. RTV = 5-15 µm) and the remainder of the tests were performed with faulty rotors with varying levels of RTV in the range of 20 to 180 µm.
The main signals of interest that were recorded were Master Cylinder Pressure (MCP), Longitudinal Acceleration (AX), Vehicle Speed (VS), Wheel Speed (WS), and Brake Pedal Position (BPP). CANalyzer was used to record CAN signals at the rate of 100 Hz. Data was analyzed using MATLAB 2017b.

Health Indicators
Various combination of pre-processing (e.g. detrend, derivative, phase domain transformation), and postprocessing (e.g. variance, envelope, skewness, kurtosis, amplitude of average order spectrum) methods (Kazemi et al. 2019) were applied to MCP, WS Sensor and AX in both time and frequency domain which resulted in generating hundreds of health indicators. Regression analysis were then performed on these features to model the wheel RTV. To keep the focus on the rank HI framework, and for illustration purposes only the results from regressing the HIs to front left wheel RTV is presented.
Error! Reference source not found. summarizes the explored signal processing methods. Examples of the features generated include: The median difference between the upper and lower envelope of detrended MCP during brake, local peak of the order amplitude spectrum of the detrended AX at various harmonics (e.g. orders 1, 2, 3, etc.) during brake and non-brake actions, the envelope of the wheel angular velocity during brake, etc.
Two parameters needed to calculate each of the metrics introduced in Section 2.1, are HI and HS. HI is the feature or health indicator discussed in this section and HS is health stage of the ground truth which was defined as the maximum RTV of the four brake rotors on the vehicle in each test case.

RESULTS AND DISCUSSION
Since robustness to noise factors was a strict functional requirement, we only considered HIs that were robust to noise factors of tire type, tire pressure and passenger weight. This eliminated several HIs and narrowed it down to 264 HIs. Figure 3 shows the overall results of applying ranking HI framework to the 264 HIs developed to determine the SOH of the brake rotors. The normalized performance metrics for identifiability, linearity, monotonicity, variability, and relative estimation error is shown for all of the HIs. It is sorted to display the features in the order of importance based on the average of the metrics used to rank HIs. The top performing HI was determined to be the total peak value of the average order spectrum of the detrended MCP. A plot of HI versus Health Stage (HS) for this feature is shown in Figure 4 in which the raw HIs, the median of the distribution of the HIs as well as the ideal response that predicts the RTV of the wheel is presented. Results showed that the peak of the average order spectrum of detrended MCP signal outperformed other HIs by having a higher correlation to the GT, less variability, with higher monotonic trend and lower estimation error. Performance analysis of the algorithm confirmed that this HI provided better separation between An example of an HI, the kurtosis of the wheel angular speed in phase domain, which showed to have the most linear response is illustrated in Figure 5. Even though this HI has the perfect normalized score of 1 for linearity, it has an overall average score of 0.84 and is ranked 55 out of the 264 analyzed HIs. Figure 7 shows the plot of HI vs Hs for the 25 th (Wheel Speed maximum correlation with AX) and the 50 th percentile (the peak order of the average order spectrum for wheel speed at order 5) performers respectively. Visual inspection reveals that the performance of the HI in tracking the RTV of the wheel is deteriorating which is in line with poor performance of these HIs based on the metrics used to rank them. In particular, the Identifiability and Relative Error were significantly impacted for these HIs that compared to the top performing HI. The summary of the performance metrics for four HIs are listed in Table 1. The best overall performing HI, peak amplitude of the average order spectrum of MCP showed normalized score of 1 for four categories of identifiability, Monotonicity, Variability and Estimation Error.

Figure 8 Correlation analysis between various performance metrics used
It appears that the Monotonicity and Estimation Error played the most significant role in choosing an HI that performs better in detecting faulty brake rotors. To gain more insight on the degree of similarity between various metrics used and to answer the question whether a subset of these parameters can be used we performed correlation analysis between these metrics which is presented in Figure 8.
As can be seen, identifiability and relative error are strongly correlated. In addition, linearity and monotonicity are highly correlated. This may indicate that only one of the metrics in each of these two categories can be used. Note, however, that this may not be the case for all HIs under investigation. All 264 HIs considered in this study had been linearly regressed to estimate the HS. Had this preprocessing step not been taken, it is likely that the correlations between linearity and monotonicity, and between identifiability and relative error, would both be reduced.
The three metrics that were selected were Monotonicity, Identifiability and Robustness. Top performing HIs based on the average of these three metrics showed excellent classification performance in detecting thickness variation levels of 20 μm and larger as evidenced by ROC area of 1 illustrated in Figure 9. Note that in our case study, the proposed approach to detect degraded rotors is formulated as a classification problem and therefore the ROC detection score was used to evaluate the choice of HIs. Figure 9 suggests that the use of proposed metrics can be used in selecting HIs and features for early fault detection. Depending on the application, one or more of these metrics could be useful in selecting HIs to perform detection, classification, regression, etc.  Figure 9 Top performing HIs selected from the framework act as the best classifiers in detecting healthy and faulty rotors CONCLUSION Health indicator performance measures are needed when developing diagnostics and prognostics solutions for various chassis components (e.g. brake rotor, wheel bearing, etc.). It is advantageous to have a tool to select top performing health indicators and help to compare competing models. We developed a framework that can be used for this purpose and tested this for an application related to brake rotor health monitoring. We showed that there is a strong link between HIs selected by the framework and the performance of the selected HIs in accurately estimating the health of the brake rotor system. Results showed that the use of Identifiability, Compactness, Monotonicity, Robustness, Linearity, Estimation Error and Sensitivity to noise factors can be used in selecting features and indicators to determine the state of health of a brake rotor system.