Review of PHM Data Competitions from 2008 to 2017: Methodologies and Analytics

Recently, the data driven approaches are winning popularity in Prognostics and Health Management (PHM) community due to its great scalability, reconfigurability and the reduced development cost. As the data-driven approaches flourished, the data competitions hosted by the PHM society over the last ten years contribute a valuable repository of public resources for benchmarks and improvements.  To better define the directions for future development, this paper reviews the cutting-edge PHM methodologies and analytics based on the data competitions over the last decade. In this review, the goal of PHM and the major research tasks are stated and depicted, then the methodologies and analytics for the PHM practices are summarized in terms of failure detection, diagnosis, assessment and prediction, and the applications of PHM in various industrial sectors are highlighted as well. The data competitions in the last ten years are utilized as examples and case studies to support the ideas presented in this paper. Based on all the discussions and reviews, the current challenges and future opportunities are pointed out, and a conclusion remark is given at the end of the paper to summarize the current achievements and to foresee the future trends.


INTRODUCTION
PHM, as an emerging engineering discipline, mainly aims to detect, diagnose and predict the machine failures (Lee et al., 2014).For an effective PHM system, it is expected to provide early detection and isolation of the incipient fault precursors, and subsequently to predict the future propagation of the machine failures and the remaining useful life (RUL).Over the past decade, the use of artificial intelligence tools or datadriven approaches to fulfill PHM tasks gained more popularity due to its simplicity, scalability and reduced development cost (Jia, Jin, et al., 2018).Comparing with the physics based model, the data driven approaches require less domain knowledge and it is flexible in consolidating expert experience.Moreover, once the data driven model is properly trained, the use of the model is computational more efficient than the complex physical models.More importantly, standardized toolbox can be developed for the data driven models and it can thus accelerate development cycle significantly.Users can quickly grasp how to use these tools after a short period of training.Although the merits, further development of the data-driven tools for PHM needs an open community and sufficient amount of public data for benchmarking.
As the data-driven approaches flourished, the data competitions hosted by the PHM Society over the last ten years contribute a valuable repository of public resources for benchmarks and improvements.The PHM data competitions that are hosted by PHM Society since 2008 provide lots of open source dataset and successful engineer applications.Over the last 10 years' data competition, a wide coverage of research topics in PHM was deeply discussed and a wide range of engineering applications were investigated.Therefore, standing at this time point, it is very important to review the achievements in the past 10 years and also to discuss about the future opportunities.
To fit this purpose, this paper reviews the cutting-edge PHM methodologies and analytics based on the data competitions over the last decade.In this review, the goal of PHM and the major research tasks are stated and depicted, then the methodologies and analytics for the PHM practices are summarized in terms of failure detection, diagnosis, assessment and prediction, and the applications of PHM in various industrial sectors are highlighted as well.The data competitions in the last ten years are utilized as examples and case studies to support the ideas presented in this paper.
The rest of this paper is organized as follows.Section 2 revisits the data competitions in the past 10 years and highlights the major PHM research tasks by reviewing these data competitions.In Section 3, methodologies and analytics for PHM investigated are summarized and reviewed, the data competition in past 10 years are used as case studies to support the idea.Section 4 shows the lessons learned and the future trend.Conclusion remarks are given in Section 5.

Major tasks in the data competitions
By reviewing the last 10 years' data competitions, the major research tasks in PHM can be summarized as in Figure 1.These major tasks include: Detection aims to identify if a failure has occurred in an engineering system, without knowing the root cause.In detection problem, a binary outcome is expected to indicate whether a failure has occurred.
Diagnosis aims to pinpoint the one or several root causes of the detected failures, so that corrective actions can be arranged accordingly.In diagnostics problems, a specific failure type is expected to be assigned to the detected failure.
Assessment aims to evaluate the risks or health level of machine based on its recent behaviors.For machine life prediction, assessment is often employed to describe the machine degradation process.For fault detection, a failure can be detected when the risks exceed the pre-defined thresholds.
Prognosis mainly predicts the future health states and the remaining useful life of the system.

Figure 1. Major research tasks in PHM
A summary of the research tasks in last 10 years' data competition is presented in Table 1 and a more detailed review of the data competitions is presented in the Appendix.It is found that the fault detection and diagnosis are normally required at the same time.This is because fault detection only alarms a potential failure without recommending any maintenance actions.However, fault diagnosis links the underlying problem to a set of observable symptoms, so that a detailed procedure for repair can be taken.However, fault diagnosis is not necessary for some simple devices, like the anemometers (PHM Society 2011), because simple replacement of the parts can fix the problem when failures are detected.
Prognosis and Health Assessment (HA) are the core researches in PHM, and it is often preferred by the engineering systems which have a slow degradation process, such as the battery (IEEE 2014), cutting tools (PHM Society 2010), aircraft engine (PHM Society 2008), etc.One thing in common across these systems is that the machine degradation state can be inferred as a monotonic trend by modeling the operational data.Based on this degradation trend, the RUL and future degradation can be further predicted.Therefore, Prognosis and HA greatly enhances information transparency for operation and maintenance strategy optimization, which is found especially useful for the geographical distributed assets and the highly automated systems.
The study of virtual Metrology (VM) in PHM Society 2011 aims to predict the Material Removal Rate (MRR) in semiconductor Chemical Mechanical Polishing (CMP) process (Jia, Di, et al., 2018).In semiconductor industry, VM is a key enabler of the advanced process control to better account the usage material degradation and machine condition drifting during the manufacturing process (Jia, Di, et al., 2018;Kao, Cheng, Wu, Kong, & Huang, 2011).In Figure 1, the research task in PHM Society 2016 is listed as others, since it does not belong to any of the previously mentioned research task in PHM.

A brief review of the data driven models
By reviewing the data competitions in last 10 years, the commonly used data driven models for PHM investigation are summarized in Table 2.In Table 2, the learning models for each major PHM task are listed and the model specifications are described by specifying the model inputs and outputs at the model training and testing phase.
In this paper, the methodologies for PHM are summarized by three different sub-groups: M1: the (semi-) supervised learning models for PHM.M2: the unsupervised health assessment and fault detection.M3: the unsupervised RUL prediction and health prediction.
In methodology M1, the labeled training data samples or data clusters are employed to establish a function or mapping relationship between the input feature matrix and the desired output labels.In the testing phase, these trained models are deployed to label the testing samples and tell the machine health conditions.Methodology M2 evaluates the machine health and detects potential failures in an unsupervised fashion.Normally, the data driven models in this methodology fulfills two major tasks: (1) output a risk score based on the known baseline (healthy data) to indicate the machine health or risk level quantitatively; (2) alarm potential machine failures when the risk level exceeds predefined threshold.Methodology M3 mainly predicts the future machine health and the remaining useful life (RUL) without knowing the underlying degradation pattern for supervised learning.In current literature, the learning tasks in M3 includes two main steps: (1) to learn the underlying degradation trend of the machine based on the R2F data for model training; (2) to predict future machine health based on the recent machine behaviors and the prior knowledge of the machine degradation pattern.The learning algorithms in the latter step are usually done by time series extrapolation.
It is also worth mentioning that the training input feature   and the testing input feature   in Table 2 should have the same dimensionality.In different learning tasks and engineering problems,   and   can be either individual data sample (data vector) or a set of samples that are observed in certain time window (data distribution).Normally, the training labels can be continuous or categorical real numbers that describe certain health related information.

METHODOLOGY & ANALYTICS
In this section, the three methodologies that are described in previous section will be detailed.The PHM data competitions in Table 1 will be used as examples to illustrate the ideas.

(Semi-)Supervised learning models for PHM
The methodology M1 is outlined in Figure 2. In the training phase, the learning models are trained by taking the training feature matrix and the label information as input.For different engineering problems in  3.For the testing phase in Figure 2, the label information for the unlabeled testing samples are computed and the results from different models can be further fused by multiple strategies, as shown in Figure 2. The (semi-)supervised learning methodology covers the majority of the learning tasks in PHM, which includes the M1.a~g as in Table 1 and Table 3  The methodology M1.g predicts the RUL of the machine by establishing a function relationship with monitoring data and the remaining operation cycles directly.In this application scenario, the training data contains several run-to-failure (R2F) datasets for model training.For individual training sample in the R2F data, the remaining operation cycles of the machine can be simply obtained by counting the remaining number of operation cycles before machine failure.This methodology simplifies the RUL prediction problem significantly.However, the shortcoming of this methodology is also obvious since the degradation trend of the machine in this approach is linear over operation cycles, which may seriously limit the prediction accuracy of this method.As has been reported in the PHM Society 2008 and 2012, this supervised RUL prediction is found less accurate compared with more advance filtering technique which will be discussed later.Although, this method is still valuable due to its simplicity and efficiency, and it can be used to establish baseline prediction accuracy for further improvements.By comparing these two approaches, the distance-based approaches normally take vectors as input and the outlier score for individual input vector is computed.In the residual based approach, the model can take both vectors and matrices (or distributions) as input, so that the residuals or the statistics of residuals can demonstrate the deviation of recent observations toward the baseline.If the recent observation apparently deviates from the baseline, then a failure is  Li et al., 2017;Park et al., 2017) detected and a larger risk score is assigned.In both distance based and the residual based approach, the threshold for fault detection can be tuned by the Receptive Operative Curve (ROC) by accounting the tradeoff between fault detection rate (FDR) and false alarm rate (FAR).The methodology M3 for unsupervised prognosis is outlined in Figure 4. Different with the supervised prognosis M1.g in Figure 2, the pattern of the DT for the machine is not known before and it may not be linear over operation cycles or time.Therefore, the flow chart in Figure 4 needs to build an unsupervised HA model first to uncover the underlying degradation pattern of the machine.The HA model in Figure 4 utilizes the methodology M2 in Figure 3 to derive the DT of the machine based on the R2F data in the training set.In the prediction step, three different prediction methods are available to fulfill the prediction tasksthe similarity based approach, the regression or curve fitting approach and the SSM.

Unsupervised health assessment and fault detection
The similarity-based approach employs the historical DTs in the training library as simulation to predict the future degradation and RUL.Advantages of this approach involves its efficiency and simplicity.However, it requires significant amount of R2F datasets to obtain rather accurate prediction and this method fails to demonstrate the uncertainty of the prediction.Another simple approach for prognosis is extrapolating the DT using curve fitting or regression techniques.In this scenario, the DT obtained from the HA module is treated as a time series and a mapping relationship can be established between time index and the health value or confidence value.Commonly used time series extrapolation methods involve Auto-Regressive Moving Averaging (ARMA), support vector regression (SVR), Gaussian Process Regression (GPR), etc.Although these regressors work well for some simple cases when the DT of the machine can be represented by certain basis function, their prediction accuracy deteriorates very fast for larger prediction horizons.To further enhance the prediction accuracy for more complex situation, KF and particle filters are commonly employed.In the literature, these filtering techniques are usually used together with parameterized state space model (SSM) for long term prediction.As being summarized in (J.K. Kimotho, Meyer, & Sextro, 2014), these parameterized SSM includes the commonly used Exponential model, logarithmic model, log-linear model, linear model and polynomial model.

Others
The prediction of MRR in PHM Society 2016 does not fit the major tasks of PHM.However, this prediction task is important for advanced process control since it allows the controller to account machine degradation when setting the recipe parameters (Di, Jia, & Lee, 2017).The analytics used in PHM Society 2016 are mainly regression techniques and the engineering problem behind this data competition resembles the PHM Society 2010 for milling machine cutter wear estimation, where the former is a virtual metrology problem and the latter is a virtual sensing problem.
Virtual metrology (VM) and virtual sensing are quite similar but also different.Virtual metrology is normally quality oriented and it aims to enhance the product quality by identifying the important quality indicators.Taking semiconductor fabrication for example, the VM models are widely used to identify the faulty wafer runs.The VM models regards health indicator as a hard-to-measure quantities and predicts it from the easy-to-measure process variables.This concept resembles the idea of virtual sensing which aims to the estimate the hard-to-measure quantity from easy-tomeasure variables.However, the virtual sensing technique usually requires real-time online implementation and it has been extensively discussed for decades.

DISCUSSION AND PROSPECTS
The methodologies and analytics reviewed in this paper are mainly the data-driven approaches for PHM applications.These methods are all highly scalable and can be easily replicate to different engineering applications.The main advantage of this review is to give readers a systematic review of the current data driven or machine learning models in PHM applications.The mapping between the machine learning tasks and the PHM major task are established and reviewed.
Although these data driven models are now widely studied in PHM, there are still several pioneer topics that need to be further explored in the future.

•
The presence of multiple working regimes or dynamic working regimes.A residual clustering based methodology is proposed in (Siegel, 2013) to explore this topic.In their investigation, the robotics arms and wind turbine drive train are employed as examples to illustrate the effectiveness of their approach.
• Data quality is another important topic that needs to be further investigation.It is expected that a toolbox is available to allow users quickly decided whether their data hold value for PHM investigation.Related discussion can be (Y.Chen, 2012;Y. Chen, Zhu, & Lee, 2013) who mainly investigates the diangosability of the system.(Jia et al., 2017;P. Li et al., 2018) recently propose a systematic methodology to evaluate the data suitability for PHM from the aspects of data detectability, diagnosability and prognosablity.
• A fleet based prognostic is another important topic to explore.This applies to the situation when large amount of data is available from a fleet of similar machines.These historical data from machine fleet can help establish strong database for data mining and how to rely on the fleet data for health prognosis is still an open question for the PHM community.
• Prognostic based maintenance strategy optimization is important to convert the health-related information to values.The prognostic based maintenance scheduling for off-shore wind farm is investigated in (Van Horenbeek, Van Ostaeyen, Duflou, & Pintelon, 2012) and the added value for prognostic based maintenance policy is justified.

CONCLUSION
In this paper, the PHM data competitions from 2008 to 2017 are revisited.The methodologies and analytics that are employed for these PHM problems are reviewed and summarized.Based on the discussion in this paper, the methodologies for PHM are summarized as three methodologies: (1) M1: the methodology for (semi-)supervised learning for PHM as shown in Figure 2; (2) M2: the methodology for unsupervised HA and fault detection as shown in Figure 3; (3) M3: the methodology for unsupervised health prognosis in Figure 4.After reviewing the methodologies and analytics, the lessons learned from these data competitions are pointed out and the further trend of PHM are briefly discussed.

Figure 2 .
Figure 2. Methodology for the (semi-) supervised PHM . The methodology M1.a addresses the health assessment problem using regression techniques.One good example for this methodology involves the data competition in PHM Society 2010 where the participants are asked to evaluate the cutter wear in the milling machines.In the competition, the cutter wear was measure by LEICA MZ12 microscopy system and was given in the training data for model construction.In the result submission, the participants are asked to build a data driven model based on the training data to replace the expensive photographing device for tool wear measuring.In this investigation, the monitoring data consists of the vibration, force and acoustic emission signals.The analytics that are applicable to this investigation is tabulated in Table3.One can find that most of algorithms are regression techniques which map the monitoring data to the tool wear indices.In these literatures (SreerupaDas, Hall, Herzog, Harrison, & Bodkin, 2011) (Peel, 2008), rather accurate estimations are achieved by using these regression algorithms.The methodology M1.b aims to evaluate the operation risks with known healthy and faulty data samples.In this application, the probabilistic classifier like logistic regression (LogiReg), Naive Bayes (NB) classifiers are usually employed.For this type of classifiers, the training labels are categorical integers, but the testing output indicates the probability of the testing sample belong to certain class.In terms of risk assessment and fault detection, the training labels are normally binary to represent healthy and faulty.The testing output of these models indicates the operation risks.By thresholding the risk indicators properly, the machine failures can be further detected.Examples for methodology M1.b involve the LogiReg that is used to assess the engine degradation inPHM Society 2008 (Tianyi Wang,  2010)  and the NB classifier that is utilizes in PHM Society 2013 to detect a non-nuisance case(Katsouros, Papavassiliou, & Emmanouilidis, 2013).The methodology M1.c~f can be discussed together since the fault detection and diagnosis (FD&D) are normally required together in practice.The analytics that are commonly for FD&D are clustering or classification techniques.The major difference between clustering and classification is that the label information is assigned to individual data sample for

Figure 3 .
Figure 3. Methodology for unsupervised HA and FD The methodology M2 is outlined in Figure 3.In the setting methodology M2, the baseline data which represents the machine healthy behaviors are utilized to establish a baseline model.In the testing phase, the expected outcome of the model includes: (1) a health/risk score to demonstrate the machine health or risks.For the machine degradation with a trend over time, the estimated degradation trend (DT) is

Table 1 .
Research tasks for the data competition 2008 -2017

Table 2 .
Table 2, the learning Data driven models for PHM research tasks models can be regression algorithms, classification or clustering techniques as shown in Table

Table 3 .
Examples and analytics for the methodology M1

Table 4 .
Examples and analytics for the methodology M2