Methods to Improve the Prognostics of Time-to-Failure Models

Autonomous and autonomic systems have started to develop machine learning (ML) methods for prognostics and health management (PHM) directly at the platform level. Remaining-useful-life (RUL) estimation, also known as Time-to-failure (TTF) estimation, using streaming sensor data is critical for PHM as it can help to decide and schedule appropriate courses of action (COAs). This work casts the RUL-estimation problem as a classification problem over a finite-time horizon. Rather than using a winner-take-all method to develop a RUL estimator, we propose a top-K estimator that considers the RUL values corresponding to the K-largest probabilities yielded by the classifier to develop our estimator. The top-K RUL values can be used to drive the execution of conservative or aggressive PHM strategies, or be tracked over time to develop robust RUL estimators that leverage the history of RUL estimates. The performance of the proposed RUL estimators is illustrated on a dataset from NASA’s Prognostics Center of Excellence.


INTRODUCTION
Modern manned and unmanned vehicles are composed of complex systems that must work together to transform a fuel source into propulsion, provide navigation capabilities, and a rudimentary set of safety features, at a minimum. Any individual component malfunction within these systems may cause a cascading failure effect that could jeopardize the safety of the systems and its ability to accomplish the intended purpose. Technically, failures can be characterized as obsolescence, catastrophic and degradation. As parts pass their manufacturing end-of-life period, the lack of replacement parts forces subsystems or entire platforms to become obsolete. Sudden failures, such as the loss of a stall sensor Edward Baumann et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. DISTRIBUTION A. Approved for public release: distribution unlimited. during takeoff, have immediate catastrophic consequences. Degradation occurs when the functionality of a system becomes gradually compromised. Left unattended degradation failures can lead to catastrophic system failures. System degradation can be identified by continuous monitoring of the system state, as part of a system-wide Condition-Based Maintenance (CBM) strategy. CBM can be further extended into forecasting the future state as part of Condition-Based Maintenance Plus strategies via advanced Prognostic Health Management (PHM) applications.
Prognostics is the process of correlating and processing sensor data to estimate RUL, also known as Time-to-failure (TTF), based on the history of system states (Patil, Das, Goebel, & Pecht, 2008). Detection of current and future machinery degradation failure can be applied to predict the RUL in support of autonomous and autonomic systems. Autonomy allows missions to operate with no directions from humans and autonomicity is used to complete the mission under a self-managed operation (Sterritt, 2009). Both autonomy and autonomicity require adequate situational awareness of the system's own state to make effective mission decisions.
Traditional RUL estimation approaches have been based on statistical survival analysis, which attempts to characterize the probability of survival of a system up to a given time (Kumar & Klefsjö, 1994). This family of methods often uses a Weibull distribution, whose parameters must be estimated for a given system, to model the survival probability with its expected value as a RUL estimate (Jing & Min, 2016). ML and Artificial Intelligence are widely used in autonomous and autonomic PHM systems for RUL estimation and fault detection (van der Laan & Rose, 2011). Many different prognostics measures have been applied with varying degrees of success to determine the RUL of individual systems and whole platforms. Their operational success has been largely dependent upon the failure modes considered and the sensor capabilities available (Saxena et al., 2008).
The advent of advanced data collection and processing ca- pabilities in a variety of autonomous systems has motivated the development of ML methods that attempt to use timeseries data for RUL estimation, see (Aggarwal et al., 2018) and references therein. Two common methods for RUL estimation based on ML are: (1) to evaluate the RUL estimation problem as a regression problem where a single output represents the predicted RUL of the system; or (2) to create a classifier where each class represents a RUL estimate for the system. The process of using ML for RUL estimation can be represented as a sequential process comprising data collection, data preprocessing, and classification. As depicted in Fig. 1 for the case when a classifier is used, the input data are comprised of measured time-series data segments. These data points are first processed to remove redundant information, reduce the noise, and extract the aggregate features useful for classification. Next, the data is also normalized and standardized to mitigate numerical instabilities that often affect the training of the classification block. Finally, the preprocessed data are moved to the classification block, yielding a classification label for the time-series data segment. Although many classification algorithms have been developed, such as decision trees, support vector machines, and K-nearest neighbors (Hastie, Tibshirani, & Friedman, 2017), using ML methods on time-series data remains a challenging problem because of the high dimensionality inherent to the time-series data themselves. Long short-term memory (LSTM) networks, a type of recurrent neural network, have recently yielded state-of-theart results for classification and forecasting of time-series data (Vincent & de Brebisson, 2015;Sezer, Gudelek, & Ozbayoglu, 2020).
In this paper, we model the RUL estimation problem as a classification problem over a finite time-horizon. The framework developed in this paper can be applied to any classifier that yields a probability distribution over classes as its output. The classification labels correspond to a RUL estimate if a failure is predicted to occur during the time-horizon considered, or correspond to a no-failure-expected indicator if a failure is not expected. Rather than considering a single point RUL estimate, our approach uses the top-K (K-largest) probabilities to identify the top potential RUL values. Our studies reveal that a RUL estimator that always selects the RUL value associated with the class assigned the largest probability value by the classifier as its RUL estimate, i.e., using the winner-takeall rule, can yield inconsistent results over time. The top-K RUL estimator provides the top-K probability profile which can be averaged to mitigate the estimation ambiguity present in a winner-take-all approach. Lower and upper bounds RUL estimates can also be obtained from the top-K RUL estima-tors from the minimum and maximum operations of the RUL values. These RUL bounds are used to develop aggressive and conservative PHM policies and will play a critical role in the execution of timely and effective mitigation fault behaviors for the system.
The main contributions of this work are that: (i) it proposes an approach to RUL estimation based on a classification framework that combines multiple single-point estimates to compute a family of estimators in support of the selection of different mitigation behaviors based on the aggressiveness desired for the PHM application; (ii) it develops a family of model-comparison metrics that capture the notion of false negative and false positive counts, and allows control over the criticality from both RUL estimation and autonomy software integration perspectives; and, (iii) it proposes a method for processing RUL estimates to mitigate inconsistencies across sequential estimation periods.
The paper is organized as follows. In Section 2, the general problem is defined. Section 3 introduces a quantized RUL estimator and a family of top-K RUL estimators is developed. Section 4 discusses how to evaluate multi-classification temporal prediction. Section 5 presents the numerical results of our proposed method applied to a real data set using a LSTM classifier. Section 6 provides a method for sequential processing of RUL estimates. Finally, Section 7 concludes the paper and discusses future research directions.

PROBLEM FORMULATION
We consider a system with S sensors where each sensor generates a time series of sensor measurements (x s,t : t = 1, 2, . . .) of arbitrary length, with x s,t ∈ R denoting the measurement obtained from the s-th sensor at time index t. Measurements are taken synchronously across all sensors at a fixed sampling interval T p ∈ R + . Let x s := [x s,1 , . . . , x s,T ] ∈ R T , with T ∈ N, denote a vector containing the time series data measured by sensor s over a window of T samples and X := [x 1 , . . . , x S ] ∈ R T ×S the systemwide measurements over the same sampling window where (·) is the transpose operator. Each X is associated with a tuple (t f , ν), where t f ∈ R + ∪ {∞} denotes a censoring random variable and ν ∈ R + a reference measuring time for when X was measured. If a system failure occurred, t f denotes the time at which the system failure occurred. Otherwise, X is said to be censored and t f is set to ∞. In the latter case, a system failure is still expected to occur, but its exact occurrence time remains unknown.
The RUL for X is defined as with y ∈ R + ∪ {∞}. Since in practice reliable RUL predictions over long time horizons can be unreliable, we define the censored RUL estimate over a fixed prediction time horizon T p as where the choice of T p is application-domain dependent. Its value should capture the dynamics of the degradation process of the system that precedes a system failure and the reaction time required by the system in order to trigger appropriate fault mitigation procedures. To simplify notation, we useȳ = γ(y) as a shorthand for censored RUL values.
Given a training set X := {(X m ,ȳ m )} M m=1 with M training examples, where X m denotes the m-th sensor data matrix andȳ m its corresponding censored RUL value, our goal is to learn a mapping h : For a new X, h identifies whether a failure would occur in the immediate time horizon defined by T p and provide an estimate for the RUL. A classification label θ(X; W 1 ) ∈ {−1, 1} identifies impending failures with θ(X; W 1 ) = 1 indicating that the RUL estimate for X is in the interval [0, T p ] and θ(X; W 1 ) = −1 that the RUL estimate for X is in the interval (T p , ∞). RUL estimates are given by the function f (X; W 2 ). The sets W 1 , W 2 denote the learnable parameters for h.
Once W 1 and W 2 have been learned the censored RUL pre- Characterizing h requires one to tackle a joint binary classification and regression problem. The classification problem will identify whether a failure will occur within the interval [0, T p ]. In the case where a failure is deemed to occur within the interval [0, T p ], the regression problem estimates the corresponding RUL. Otherwise, the RUL is set to ∞ to indicate that no failure is expected to occur within the T p prediction horizon.
EstimatesŴ 1 ,Ŵ 2 for W 1 , W 2 can be obtained as the solution of the following optimization problems: Here, · 2 denotes the 2 -norm operator. The regression problem in Eq. (4b) uses training pairs from X whoseȳ m is not ∞.

A QUANTIZED RUL ESTIMATOR
In this section we introduce a joint problem formulation for identifying whether a system failure will occur over a fixed time horizon and, if so, estimating the corresponding RUL. Rather than using a two-step approach as described in Eq. (4), we formulate the RUL estimation problem as a classification problem with N + 1 classes, namely C := {1, . . . , N + 1} The interval (0, T p ] is divided into N subintervals. Although other interval partitioning strategies are possible, for ease of presentation we consider an equal-length, non-overlapping partitioning of (0, T p ] in which the n-th subinterval in the partitioning is ((n − 1)T p /N, nT p /N ]. The RUL estimate associated with the n-th subinterval is defined as the largest value in the interval, i.e., d n = nT p /N . The (N + 1)-th class identifies a situation in which a failure will not occur in the time horizon (0, T p ].
In order to train a classifier, we modified the censored RUL values in X to generate class labels for every training exam- Note that the sum in the first row of Eq. (5) always yields one nonzero term, which corresponds to the index of the subinterval thatȳ m belongs to. Given the modified training set X C := {(X m ,φ m )} M m=1 , we seek to learn a mapping Ω : R T ×S → P, where P ⊂ R N +1 denotes the set of probability vectors p ∈ R N +1 over the classification labels for the elements of C, which are captured by the random variableΦ that represents the classification label. Thus, the n-th entry of p corresponds to the probability of n being the correct label for X, i.e., p n := Pr(Φ = n|X; Ω) where the dependency of p n on Ω is shown explicitly.
A maximum a posteriori (MAP) estimator can be used to choose a classification label for a new X asφ = arg max n Pr(Φ = n|X). Once available, a class-label estimateφ ∈ C can be mapped to a RUL-value estimate via the mapping f : C → R, which is defined as: Although single-point RUL estimators based on p can be developed, the performance of such estimators can yield inconsistent results as the number of classes considered grows large. Intuitively, slowly occurring degradation can cause the entries of p corresponding to neighboring RUL values to be similar. This is an artifact of the arbitrary partitioning of (0, T p ] to N intervals which can yield class overlap. Thus, a small perturbation in X can cause its classification label to be the one corresponding to a different class. The next section introduces a class of RUL estimators that uses the RUL values corresponding to the top-K values of p to mitigate the effect of inconsistent classification labeling on the RUL estimates.

RUL Estimation via a Top-K Classifier
In classification problems with a large number of classes, the traditional top-1 classification performance can yield inconsistent results. In the case of probabilistic classifiers, this behavior is reflected as one that yields multiple high-value p n 's with similar magnitude. In the context of RUL estimation, inconsistent RUL estimates can impair the ability of the system to trigger effective mitigation behaviors. Late triggering of mitigation behaviors can fail to prevent a system failure, while early triggering of mitigation behaviors can detrimentally impact the tasks being executed by the system.
In this section we propose a top-K classifier that uses the K-largest entries of p as a proxy to select the top-K most likely RUL values {d i1 , . . . , d i K }. Top-K classifiers have been used in the context of image processing to develop robust image classifiers (Chang, Yu, & Yang, 2017). Top-K classification rules are well motivated for classifiers trained by minimizing the cross-entropy loss. In this case, it has been shown that the cross-entropy loss is top-K calibrated for any K (Lapin, Hein, & Schiele, 2016, Prop. 4). A top-K calibrated classifier will, in the limit of infinite training data, achieve the Bayes optimal top-K classification error. Other loss definitions specifically tailored for top-K classification with efficient numerical optimization characteristics can also be considered (Berrada, Zisserman, & Kumar, 2018;Lapin, Hein, & Schiele, 2015).
As noted above, this could be achieved with many different classifiers, however an LSTM-based model has been shown to perform well with extracting temporal relationships and provide effective results for RUL predictions, especially when trained using Cross Entropy Loss (Zheng, Ristovski, Farahat, & Gupta, 2017). The LSTM classifier will be used for the numerical tests in Section 5.
A fundamental question in this case is how the RUL values corresponding to the top-K classification labels should be used to construct a RUL estimator. The top-K RUL estimators d RUL K are proposed as follows: where the weights {w 1 , . . . , w K } satisfying w k ∈ [0, 1] ∀k and K k=1 w k = 1.

Algorithm 1 RUL Estimator via Top-K Classifier
Require: A mapping Ω : R T ×S → P and K. 1: Let X ∈ R T ×S denote a new sensor-data matrix. 2: Compute p = Ω(X) ∈ R N +1 . 3: Let {p i1 , . . . , p i K } denote the top-K entries of p and {d i1 , . . . , d i K } their corresponding RUL values. 4: Set P K = K k=1 p i k and compute w k := p k /P K , ∀k. 5: Compute d RUL K mean via (7a). 6: return RUL estimate d RUL K mean .
Equation (7a) computes a convex combination of the top-K RUL estimates. Equation (7b) defines a more aggressive RUL estimator (i.e. one that will yield the shortest RUL) when compared with Eq. (7a). Equation (7c) can be justified in the cases where the early execution of corrective behaviors may not cause a significant penalty to the system goals when compared with the occurrence of a failure, such as the in-flight failure of an engine. Equation (7c) defines a more conservative RUL estimator that is applicable when the impact of the failure on the system can be tolerated for a period of time or, in the case that an autonomous platform, when the ongoing activity is more important that platform failure. Such a conservative estimate will give the system more incentive to schedule fault-mitigation behaviors with minimal impact to the system goals. An underlying assumption is that the top-K classifications will converge over time to a single value as the failure grows more imminent.
The RUL estimators in Eq. (7) can be further extended as follows: • Setting w k = p i k /P K , ∀k, with P K := K k=1 p i k . • Estimating a failure occurrence interval [d RUL K min , d RUL K max ] via Eq. (7b) and Eq. (7c).
Algorithm 1 summarizes the proposed RUL estimation algorithm via a top-K classifier using Eq. (7a) and the first bullet above. Similar algorithms can be obtained for Eq. (7b) and Eq. (7c) after modifying Procedure 4 of Algorithm 1 appropriately. The following section discusses various metrics for assessing the performance of the estimators in Eq. (7).

Implications of the Selection of the K Value
The choice of parameter K impacts the performance of the top-K RUL estimator. The appropriate choice for K is influenced by the specific classifier design, the resolution of the RUL estimator defined by T p /N , and the dynamics of the degradation within the time-series data. Multiple K values may be optimized using a cross-validation procedure with the evaluation metrics proposed in Section 4. For a classifier with inconsistent RUL estimates, one would expect to see an initial decrease in classification error as the value for K is increase, which is reversed after a threshold for K is crossed. The number of target classes also plays a role in the selection of K. The relative importance of each class increases as the number of classes decreases. Thus, a RUL estimator using a classifier with a larger number of classes may benefit from using a larger K.
The probability mass function (PMF) can be used to characterize the output of the classifier. If the PMF is unimodal, as shown in Fig. 2a, all of the top probabilities will be very close to the original estimate and the improvement will be minor. If the PMF is multimodal, as shown in Fig. 2b, the benefit of using the top K probabilities increases.
The following methods for selecting K are proposed as potential starting points: • Static K Value -K is selected via a cross-validation procedure that would follow the classifier's validating process. The RUL-estimator error can be used as an indicator to identify at what point increasing K might no longer improve the quality of the RUL estimator. An example of using a static K value is demonstrated in Section 5.
• Dynamic K Value -A dynamic K value can be used to overcome artifacts in X and can, thus, outperform a static choice for K. K can be updated using the entropy of the classifier PMF (H := − N n=1 p n log(p n )), which characterizes the information content of the distribution, i.e., the amount of uncertainty in the outcome of a random variable from the distribution . A high (low) entropy value indicates a more (less) informative distribution and suggests the selection of a large (small) value for K.

EVALUATION METRICS FOR QUANTIZED RUL ES-TIMATORS
Classification performance metrics such as accuracy, precision, recall and classification error can be used to assess the performance of RUL estimators proposed in Eqs. (6) and (7). These metrics can be extended to assess the performance of the top-K RUL estimators in Eq. (7) by mapping the corre- sponding estimate to C viā where d RUL K denotes one of the estimators in Eq. (7). These metrics summarize the performance of the classifier while presuming that all classes are equally important and can be used to drive the selection of tuning parameters or the type of classifier implemented. Although valid single-point metrics, these metrics do not take into account the temporal aspect of the RUL estimation problem or the fact that failing to correctly estimate low-value RULs is more critical than failing to predict high-value RULs.
A confusion matrix captures the error distribution of the classifier per class. It can be applied to both binary and multiclass classification problems when the true classification labels are available. For a binary classification problem, the confusion matrix shows four different classification counts, namely true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) as shown in Fig. 3. A TP (TN) indicates a sample in the positive (negative) class was classified correctly, and an FP (FN) a sample in the negative (positive) class that was classified as positive (negative). The multi-class classification model of the confusion matrix can then be extrapolated as follows (Krüger, 2016), see Fig. 4. Per row n ∈ C, the confusion matrix E ∈ N (N +1)×(N +1) comprises a 1 × (N + 1) vector whose n -th entry is m:cm=n 1 {nm=n } . The entries of the n-th row of E, with the n-th entry removed, correspond to the FN count for class n. Similarly the entries of the n-th column of E, with the n-th entry removed, correspond to the FP count for class n. Let 1 denote a vector of ones with appropriate dimensionality, diag(E) as an (N + 1) × (N + 1) matrix comprising the main-diagonal entries of E on its main diagonal, and (·) as the transpose operator. Thus, the (N + 1) × 1 vector α := (E − diag(E))1 captures the FN count profile and the (N + 1) × 1 vector β := (E − diag(E)) 1 captures the FP count profile yielded by h (as defined in Eq. (3)).
Quantized RUL estimators can be compared on the bases on these two profiles and their accuracy score A ∈ [0, 1] through, e.g., the Euclidean distance between Θ := ( α 2 , β 2 , 1 − A) and the ideal score tuple (0, 0, 0). This approach, however, ignores the temporal aspect of the RUL estimation problem and the fact that a false negative estimate that predicts a RUL that is smaller than true RUL is preferable to one that predicts a RUL that is larger than the true RUL. The former case would give the system a chance to react to an impending failure while the latter one would not.
In the context of RUL estimation, given a class n all FN values assigned to classes n > n should be weighed more than those assigned to classes n < n. This can be achieved for each n by using a masking function defined entry-wise as: g n (n ) = λ 1 n < n λ 2 n ≥ n (9) with scalars 0 < λ 1 < λ 2 . Let G := [g 1 , . . . , g N +1 ] , with g n := [g n (1), . . . , g n (N + 1)] , denote the resulting masking matrix. Then one can define an adjusted profile α adj := [G • (E − diag(E))]1, where • denotes the Hadamard product. A similar argument can be used to argue that for a given class n FPs assigned to classes indexed by n with n < n should be weighed more since they will convey an unnecessary sense of urgency for action to system. With these observations, it is possible to define adjusted FP α adj and FN β adj profiles. Then, the tuple ( α adj 2 , β adj 2 , 1−A) can be used to assess the quality of the Quantized RUL estimator by assessing its Euclidean distance from the tuple (0, 0, 0) as before.

NUMERICAL TESTS
In order to place this problem into a real-world context, we consider an autonomous platform monitoring a number of subsystems throughout a mission. This section illustrates the top-K RUL estimation framework proposed in this paper using a specific classifier implementation applied to the turbofan data obtained from the NASA's Prognostics Center of Excellence (PCoE) (Ramasso & Saxena, 2014).

Turbofan Dataset Description
In order to provide a numerical demonstration for the top-K RUL estimation framework developed in this work, we use the turbofan data as provided by NASA's Prognostics Center of Excellence (PCoE) (Ramasso & Saxena, 2014). This dataset was originally used for a data challenge circa 2008, and then released for public access and development of datadriven models for predictive analytics. The training data represents run-to-failure for turbofan components while the test data set is composed of partial failure trajectories. The goal of the data challenge was to identify remaining useful life at the end of each trajectory. In our case, the classifier is a deep neural network that takes the turbofan data and outputs a series of probabilities for each potential class of output.

LSTM-based Classifier Description
Inspired by the work in (Chaoub, Voisin, Cerisara, & Iung, 2021), we chose an LSTM-based classifier to process the turbofan time-series data. LSTMs are a type of recurrent neural networks that use "computational gates" with feedback connections to control the information flow across the network, and thereby to remember information at different time scales. Our LSTM-classifier comprises both LSTM cells and two multilayer perceptron (MLPs) layers. The initial MLP receives all the raw sensor data and transforms it into a feature representation for the LSTM cell. This initial MLP consists of three dense layers and learns useful representations for the normalized raw data. Hyperbolic tangent activation functions are used between each dense layer. The LSTM cell processes the data across the sequence length of the given trajectory and captures structural dependencies across the output of the first MLP block. The LSTM processed data is then passed to the second MLP, which uses hyperbolic-tangent activation functions between each dense layer but not after the output layer. The final layer of the second MLP provides an array of dimension number of sequence by number of classes from which class predictions for each sequence step in time can be extracted. The final MLP layer is extended by a softmax layer that maps the logits output of the MLP into [0, 1] values, which can be interpreted as probabilities. For a given X, the trained LSTM defines Ω and the output of its softmax layer corresponds to the p over the quantized RUL horizon.
In the next section an LSTM is used together with Algorithm 1 to estimate the RUL for several data trajectories from the turbofan dataset.

Numerical Tests on Turbofan Dataset
The LSTM classifier described in Section 5.2 receives full engine run-to-failure trajectories as inputs and predicts the RUL at each time step. Each engine trajectory is a different length and over-sampling the minority classes destroys the time-series nature of the data. In an effort to mitigate this, the mean sequence length of the trajectories was 206 with a standard deviation of 40. Trajectories within 206 +/-40 are selected leaving 179 trajectories. Of these 40 were held aside for testing and the other 139 were used for training. Across all 179 trajectories, each sensor data column is minmax normalized together before splitting them up into their own trajectories. During both training and testing, only one trajectory is passed to the model at a time.
For the purpose of demonstration, only the full run-to-failure sequences, i.e., those traditionally used as training data, were considered, such that the gold RUL (the true RUL at each prediction point) is known for evaluation purposes. The goal of the classifier is to predict the RUL at each cycle of a given test sequence until failure occurs. The set of gold RUL values for a given test sequence is linearly decreasing to zero in each case. The softmax function is applied to the set of logits for each class at each sequence step, resulting in an array made up of the 252 classes (total possible predictions) and 125 cycles as the time horizon. Therefore, T p = 125 is considered to be the end of the prediction horizon.
The set of top-K probabilities and the corresponding RUL values can then be extracted per time index t as shown in Fig. 5. Only one set of sample trajectories for the top-3 probabilities is shown. The performance of the RUL estimators is assessed via where T represents the sequence length (prediction time horizon),d(t) the estimated RUL value at time t, and d(t) the true RUL value at time t. Equation (10) defines the trajectory RUL-estimate root-mean-squared error (RMSE) for the estimatord. What is notable about this test is that using the largest probability to choose the RUL estimate is better than using the second-largest probability. However, using the third-largest probability is the best choice when measured via the trajectory RMSE, which in this test yielded 5.496, 5.562, and 5.180 for the RUL estimates corresponding to the first, second and third probabilities, respectively. Fig. 6 shows the improvement of the top-3 RUL estimator in Eq. (7a) over one that uses the class associated with the largest probability to estimate the RUL. Most of the RUL predictions along the trajectory were improved when the top-3 RUL estimator was used. Further, Fig. 7 shows the minimum and maximum RUL predictions at each prediction point along the trajectory. Table 1 shows the 10-best trajectory projections based on the RMSE, where 40 testing trajectories were used. The Trajectory number in Table 1    yielded by d i1 . As the RUL estimates yielded by the top-3 probabilities are evaluated against the gold RUL, it is clear that the top probability is not always the best choice. Note that in most cases the RUL estimators in (7) yielded a better RMSE than the MAP estimate d i1 .
In order to compare the models described in Section 4, two different classifier models were considered. The two models share a common architecture and training data, but use a different number of training epochs. The first model (Model 1) was trained using 95 epochs while the second model (Model 2) was trained using 35 epochs. Both models were evaluated using the six full-engine trajectories. The resulting confusion matrices are shown in Fig. 8.
The α 2 and β 2 values for each model are computed and the tuples are shown in Table 2 along with their Euclidean distance from the ideal score tuple. Both the confusion ma- Table 1. Trajectory RMSE values obtained using the RUL estimates in Eq. (7) and estimators that always choose the RUL values corresponding to the each of the first, second and third largest probabilities. Each estimator was applied sequentially to each entry of the time-series trajectory as defined in Eq. (10). The best RUL estimates obtained per trajectory are highlighted in green.

SEQUENTIAL RUL ESTIMATOR
The sequential methods incorporate prior RUL estimates to mitigate the impact of inconsistent outcomes due to the "instantaneous" noise and anomalous sensor data. Time series forecasting algorithms (TSFAs) can be used to generate history-base prediction of the RUL that can then be combined with the outcome of the top-K RUL estimator in  Fig. 9.
Other methods, such as a Vandermonde polynomial extrapolation, can yield a RUL prediction by fitting a polynomial to a set of past RUL estimates to extrapolate the future RUL value. Methods using extrapolations make fewer assumptions on the dynamics and distribution of the data, but may require a larger set of RUL estimates for training. A demonstration of the sequential RUL estimator described in this section is outside the scope of this paper.

CONCLUSION AND FUTURE WORK
This paper proposed methods to account for the drawbacks of the traditional classifiers used for RUL estimation. The RUL of the platform is either considered to be ∞, in the case of no detectable degradation or degradation over a time considered too long to be accurate, or the time to a future failure. This RUL estimation problem was cast as a general classification problem for which a MAP estimator can be developed. Although this estimator can yield acceptable results, it  becomes sensitive to small perturbations and outliers as the number of classes considered by the classifier increases. As a way to mitigate this problem, a method that considers the top-K probabilities instead of just the largest one to estimate RUL was proposed. The value for K may be fixed, or variable based on the dynamics of the system. Three different top-K estimators were proposed. The weighted average estimator yielded the better estimation in terms of RMSE. The minimum-value estimator supported mission critical assessments (e.g., platform safety) such that repairs can be accomplished prior to failure. The maximum-value estimator placed higher emphasis on the completion of mission objectives (i.e., trying to accomplish as much as possible prior to the execution of an appropriate fault mitigation behavior).
Next, an approach for assessing and comparing RUL estimators based on a confusion matrix was developed. Typical metrics such as accuracy, precision, and recall work only as long as the classifier output is perfectly correlated to the true RUL.
In real-world cases, two models may be vastly different in terms of RUL estimation, with one only a single minute off at a given time estimate and the other an hour off, yet both could score similarly on accuracy, precision, and recall. Combined with a custom masking function that serves to penalize late predictions (those that would occur after a platform failure), a metric for comparing RUL estimators was proposed. Finally, examples of the proposed methods were evaluated against the well-known Turbofan dataset from NASA's PCoE to demonstrate the benefits of the top-K RUL estimator.
Future work is expected to develop heuristics for dynamic selection of the top-K values based on their proximity to the top values. Numerical evaluation of Kalman prediction for RUL prediction and tracking, and dynamic selection of K are both areas that could yield additional benefits for improving predictive analytics. Additionally, we plan to consider classifiers whose objective function during training captures the fact that a subsequent top-K decision rule is used for RUL estimation.