Auxiliary Particle Filter for Prognostics and Health Management

Accurately predicting the remaining useful life (RUL) of a system is a crucial factor in prognostics and health management (PHM). This paper introduces an auxiliary particle filter (APF) model, which has the advantages of dynamically updating the model parameters and being optimized in computational speed for prognosis applications in real engineering problems. The development of particle filter (PF) in the recent decade focused on increasing the PF model’s complexity to solve more difficult problems. However, the added complexity negatively impacts the computational speed. The number of particles is commonly reduced to compensate for this increased computational burden, but this significantly reduces the accuracy of PF’s posterior distribution. The developed APF model can estimate unknown states and model parameters at the same time with a large number of particles. This algorithm was demonstrated with a dataset from an electric motor accelerated aging experiment. The results show that this model can quickly and accurately predict the RUL and is robust to measurement noise.


INTRODUCTION
Prognostics and health management (PHM) is a process to assess, predict, and maintain the health condition of components, systems, and structures.It facilitates system health degradation models using monitoring data.PHM started to gain attention from both industry and academia in the 1970s (Pecht, 1991).It has proven to be an efficient way to reduce operational costs, increase system availability, and decrease system risks (Z.Zhang et al., 2018).
Remaining Useful Life (RUL) is defined as the reasonable time remaining on a component or a system that still performs at the desired specification (Si et al., 2011).RUL estimation is a critical factor in PHM.Accurately predicting the RUL prevents corrective maintenance, reduces system downtime, and transitions the maintenance strategy from time-based maintenance to condition-based maintenance.This can directly improve operational efficiency and reduce lifecycle costs (Jardine et al., 2006).
There are a large variety of simulations and modeling tools available for PHM purposes.Depending on the data available and knowledge of the predicted system, different methods can be deployed.The prognostic algorithms fall into two categories: experience-based approaches and data-driven approaches.Experience-based methods require system physics to be well-understood.Fuzzy systems (Alamaniotis et al., 2014) or expert systems (Q.Zhang et al., 2015) compare the similarity between an observed situation and predefined failure trends.Those experience-based models correlate expert knowledge and engineering experience with the measurements to estimate RUL.The model is usually straightforward to interpret.Experience-based approaches can yield accurate and reliable predictions if the system and its degradation are well understood.However, in most systems, especially complex systems, fault modes, and physical degradation processes are not well-understood, limiting the usage of experience-based approaches.
Data-driven approaches do not necessarily require a physical understanding of the system, making them more flexible for RUL estimation.Regression-based methods are commonly used, especially in industry, because they are easy to interpret.However, these kinds of methods require a monotonic assumption or stepwise approximation to find RUL (Park & Bae, 2010).This model may generate a conservative estimation if the degradation process is not always monotonic (Si et al., 2011).Deep learning methods (Li et al., 2018) use neural networks to solve problems with a massive dataset.Neural network-based models require a significant amount of data to fine-tune the model.Lacking proper data is a universal phenomenon in the research of PHM.It is hard to guarantee the robustness and accuracy of neural network-based models without sufficient data.Stochastic process models, such as Wiener processes (Tseng & Peng, 2004), Gamma processes, and inverse Gaussian processes (Wang & Xu, 2010) have straightforward mathematical calculations and the physical meanings are easy to understand.However, the numeric computation requires a long run time, limiting the usage for online prognostics.Markovian-based models assume that the future degradation state depends only on the current state.However, these models rely on a strong assumption that each state is conditionally independent.Filtering-based methods (Orchard & Vachtsevanos, 2009), such as Kalman Filter and Particle Filter (PF) solve state-space models for the unobserved degradation process with observed measurement data.
In recent decades, PF has drawn a lot of attention because it can be applied in a broad range of engineering problems (Baraldi et al., 2015;G. Kim et al., 2018;Zio & Peloni, 2011).This Bayesian-based framework is designed for nonlinear and non-gaussian state space model filtering (Gordon et al., 1993).As PFs become more common for prognostic modeling, many modifications are being made for different applications.The typical model simulates or approximates the model parameters before the model initialization step and subsequently does not change the predicted model parameter.This method relies on a strong assumption that the model parameter is static.In real engineering problems, equipment operating conditions, such as environmental temperatures, moisture, and atmospheric pressure, may differ every day.The changing operational conditions might lead to a variable model parameter.Even though the model parameter is static in some cases, the prediction results heavily depend on how accurately the parameter is estimated beforehand, which increases the model uncertainty (Jouin et al., 2016).
The augmented PF (Hu et al., 2015) considers the model parameters as state vectors and uses another state transition function to estimate the model parameters.This method dynamically estimates the model parameter and is applicable to engineering problems with variable model parameters.However, there is no way to quantify the accuracy of the model parameters' state transition functions.
In this paper, an Auxiliary Particle Filter Prognostic Model (APFPM) is proposed to estimate states and parameters simultaneously and apply it in prognostics.This method also treats model parameters as states.First, it performs a classical PF, and then it samples the model parameters in an auxiliary step.In the auxiliary step, model parameters are sampled from a distribution, and the posterior is updated.The weight is then calculated based on the new posterior.
One drawback of PF is it requires more computational resources than other filters.Based on the classical PF model, a significant number of evolved PF models have appeared in the last 20 years.A risk sensitive PF-based prognostics model (Orchard et al., 2010) incorporates a cost function in the importance distribution to generate more particles in the regions with higher cost (possibility).It best fits systems with an anticipated sudden change in the operating conditions.Rabiei et al. (2018) also optimized the PF algorithm with relative entropy; Yu (2017) proposed a prognostic model based on logistic regression and PF; Daroogheh et al. (2017) used neural networks to train the PF model.However, the recent efforts to modify PFs continue to increase the complexity of the model, which makes it more challenging to execute without further increasing the computational cost.
The number of particles is a critical PF parameter.The rate of convergence of the approximate posterior probability distribution is inversely proportional to the square root of particle numbers (Bain & Crisan, 2009).In other words, the more particles, the more accurately the filter approximates the posterior distribution (Elvira et al., 2017).However, increasing the number of particles increases the computational burden significantly.When the model is complex, it is common to sacrifice the number of particles to manage the computational load.Daroogheh et al. (2017) only uses 150 particles for simulation, while Yu (2017) uses 500 particles.Recent applications of PFs in prognostics introduced more degrees of freedom into the model, which significantly slow down the simulation.Researchers have decreased the number of particles to compensate for the extra complexity, but it also reduced the simulation accuracy.
The most time-consuming part of PF is resampling.This paper compares different resampling methods and selects the systematic resampling for APFPM.It significantly reduces the time complexity compared with other common resampling methods, which makes APFPM capable of estimating the RUL using more than 100,000 particles.
This model was demonstrated with data from an accelerated aging experiment on three-phase electrical motors.Induction motors are widely applied in the industry due to their robustness, reliability, and low maintenance requirements (Djeddi et al., 2007).Bearing failure is the most common failure mode for the electric motor, and it happens due to excessive load, high temperature, lousy lubrication, etc. (Gnanaprakasam & Chitra, 2014).A series of accelerated run-to-failure aging experiments was conducted to simulate degradation and extremely noisy measurement data were collected by low-precision sensors (Sharp, 2012;Barbieri et al., 2015).
The paper is organized as follows.Section 2 introduces the methodology of APFPM used for RUL estimation.Section 3 describes the experimental dataset and prognostic parameter generation process.Section 4 presents the results of applying APF to estimate RUL of the experimental dataset and compares it with naïve PF.Section 5 summarizes this work and highlights areas of future work.

METHODOLOGY
Hidden Markov Models (HMM), or state-space models, are widely used for RUL estimation.A more detailed review of HMM can be found in (Cappé, 2005).They can be characterized by the following framework: (1) where Two sequential steps can solve this problem: the prediction step and the update step.In the prediction step, the prior state pdf ,(! !|( #:! ) can be estimated using the Chapman-Kolmogorov equation: The update step uses Bayes Rule to calculate the posterior state pdf ,(! !|( #:! ) ; the recursive factorization of the posterior state pdf is given as: However, the integrals in Eq. (3) and Eq. ( 4) are difficult to solve analytically.Filtering techniques provide an alternative approach to approximate their solutions.A PF is a Monte Carlo method with the capability to solve nonlinear and non-Gaussian systems.

State Estimation of Particle Filter
A PF uses a set of N particles x !& , 6 = 1, ⋯ , 8 to approximate the posterior state pdf ,(! !|( #:! ) with weight 9 !& , where Andrieu et al., 2001): Particles and their corresponding weights are obtained with importance sampling (Andrieu et al., 2001;H. Kim et al., 2000).Samples !!& are drawn from an importance density A(! !|( #:! ) and the weights are defined as: The probability ,>! !& |( #:! @ is estimated with Bayes Rule Eq. ( 7) and the importance density A>! !& |( #:! @ is presented in Eq. (8): Combining Eq. ( 7) and Eq. ( 8), the weight can be updated from Eq. ( 6) to Eq. ( 9) in a recursive format: The simplest form is to use the prior distribution ,( as the importance density, yielding: However, a more reliable importance density was proposed in (Pitt & Shephard, 1999): where the mean of ,( To derive the importance density, the particle index i serves as an auxiliary variable (Chen et al., 2005).This method allows new particles to generate based on the current measurement and particles from the previous timestep.The weight is then estimated with this auxiliary method as However, traditional PF methods rely on a strong assumption that the model parameters are static.Equipment operational conditions, such as environmental temperatures, moisture, and atmospheric pressure, may differ every day in real engineering problems.The changing operational conditions might lead to a variable model parameter.Even if the model parameter is static in some cases, the prediction results still heavily depend on how accurately the parameter is estimated beforehand, which can increase the model uncertainty (Jouin et al., 2016).

Parameter Estimation of Particle Filter
In some cases, not only is it necessary to estimate degradation states but it is also necessary to calculate the unknown parameters.Eq. ( 1) and Eq. ( 2) become Eq.( 13) and Eq. ( 14) if both states and parameters are to be estimated: given D as the model parameter.The joint posterior distribution can then be written as Eq. ( 15) following Bayes' theorem: There are two different approaches to define ,(D !|( #:!"# ) in Eq. ( 15).One way is to incorporate the Markov Chain Monte Carlo (MCMC) method into the PF to estimate the state and provide likelihood information for MCMC to approximate parameters (Andrieu et al., 2010).The joint MCMC and PF approach adds significant computational complexity to the model, which is not suitable for many industrial applications.
The other approach is to treat unknown parameters as states and use APF to estimate both states and parameters (J.Liu & West, 2001;Storvik, 2002).An et al. (2012) proposed to keep the distribution of model parameters unchanged during the prediction step to simulate the evolution of the model parameter.It is simple to implement, but the drawback of this method is that it is not applicable for variable parameter estimation.A proper way to sample the model parameter is required.
A Gaussian random walk method (Chen et al., 2005) is proposed that adapts parameters into new data: where F ! ~H(0, J ! ) is drawn from a Gaussian distribution with zero mean and J ! covariance matrix.Thus, ,(D !|( #:!"# ) in Eq. ( 15) can be approximated in the framework of PF as: We can define K !"# as covariance matrices of all weighted particles.The covariance in Eq. ( 17) is thus K !"# + J ! , which increases over time.This results in more diffused particles.
A kernel smoothing method (J.Liu & West, 2001) uses the shrinkage rule to force the particles to be closer to their mean.The Kernel location L !"# & is specified as Eq. ( 18) with smoothing factor ℎ ∈ (0,1) where D ̅ !"# is the parameter mean at time k-1.Thus, ,(D !|( #:!"# ) in ( 15) is approximated as The mixture probability in Eq. ( 19) has a mean of D ̅ !"# and a covariance matrix K !"# .This kernel smoothing method prevents the covariance from increasing and is ideal for parameter estimation.The weight is estimated as

Sequential Importance Resampling (SIR)
Particle degeneracy is the main problem encountered for the naïve PF model.After a few iterations, more and more weights tend to be zero, effectively leaving only a few particles.Degeneracy occurs when some particles lose track of the actual degradation state, which results in increased weight variance and a more skewed importance weight distribution (Andrieu et al., 2001).
Sequential Importance Resampling (SIR) is the solution to particle degeneracy (Doucet et al., 2000).A resampling algorithm samples a new set of particles x !+ for L = 1, ⋯ , 8 to replace x !& .After the particles are resampled, the particle weights are set to 1/N.
There are various resampling algorithms.A good algorithm should select a representative population of particles with high probability and give enough consideration to low probability particles so that nonlinear behavior can be detected as well.Another essential metric to select a resampling algorithm is that the algorithm should be computationally efficient.
Multinomial resampling (Doucet et al., 2000) is a secure algorithm that is commonly used in many PF tutorials.It computes the cumulative distribution function (CDF) of the weights.A random number between 0 and 1 is generated uniformly.For each random number, this algorithm uses a binary search in the CDF of weights to find where the corresponding weight is.The corresponding particle's array with the selected weights is chosen to form the new array of particles, x !+ .This process has to be repeated N times until all spaces in x !+ are filled.
Stratified resampling (Andrieu et al., 2001;Kitagawa, 1996) divides the weight CDF into N spaces, and then selects one particle from each space randomly.This method ensures that particles with significant weight are appropriately sampled, and all chosen particles are separated by R0, !" S. Systematic resampling (Kitagawa, 1996) is very similar to stratified resampling.It divides the weight CDF into N spaces with a random offset and ensures that all selected particles are apart precisely by # ' .
Residual resampling (J. S. Liu & Chen, 1998) calculates a particles' weight index using the weight of each particle divided by the normalized weight N-1.For particles with a weighted index greater than 1, they are sampled t times, in which t equals the integer part of the weight index.It is not guaranteed to select N particles; the remaining spaces of !!+ are filled with other resampling methods mentioned above.
The most significant disadvantage of PFs is that it is computationally expensive.Resampling is the most timeconsuming part of PF algorithms.Time complexity is usually introduced in computer science to describe an algorithm's computational complexity by counting the number of elementary operations performed.It utilizes the big O notation to represent the worst-case scenario computational complexity.This notation was adopted to calculate the time complexity of each resampling algorithm.
Multinomial resampling requires generating N random numbers and performs a binary search in space for N times; therefore, it has O(T log T) time complexity per iteration.Since the sampling is entirely random, it cannot guarantee that large weights are adequately sampled.Stratified and systematic resampling have a similar sampling distribution in that all parts of particle spaces are sampled, and large weights are sampled more often.However, stratified resampling also requires the generation of N random numbers and performs a binary search for N times, which makes it have the same time complexity as multinomial resampling.Systematic resampling only needs to generate one random number and compares N times only at each space boundary.It has a time complexity of O(n), which is much faster than either the multinomial or stratified resampling method.The only drawback is that systematic resampling samples particles uniformly, which is not as good as stratified resampling in ensuring more resampling for higher weights.Residual resampling sacrifices some stochasticity to ensure that all large weights are sampled.Experiments showed that half of the particles are determined by weight index (Hol et al., 2006), and the other half of the particles have to be determined using other resampling methods.Residual resampling does not distribute samples across the entire CDF evenly and cannot guarantee sampling for reasonably large particles.It is hard to tell the time complexity of residual resampling.It depends on which resampling algorithm is used to deal with the remaining spaces after heavier weights are resampled.
Most PF-based prognostic model researchers use a multinomial resampling method, which is the slowest method.Yu (2017) used residual resampling, and Dong et al. (2014) proposed a resampling procedure based on support vector regression (SVR).The SVR method usually requires O(N 3 ) time complexity, which is much slower than most of the resampling methods, but optimization may be used to speed up the SVR resampling process.It is extremely important to use fast resampling method to boost the computational speed.
In most cases, it is not always necessary to resample at each iteration.It is only required to resample occasionally, when the weight variance exceeds a certain threshold.Effective sample size 8 ,-- (Doucet et al., 2000;Kong & Liu, 1994) is used in Eq. ( 21) to measure the weight variance.When the effective sample size is below a predefined threshold, resampling is performed.

Prognosis Using Particle Filter
Prognostics describes the long-term degradation of the system.PFs can predict the future degradation trajectory and the corresponding particle distribution for each time step.This application goes beyond the traditional filtering method's horizon because there are no new measurements for Bayesian updating.The predicted posterior PDF ,(! ./0, D ./0|( #:. ) at time X + Y is calculated from the last updated posterior PDF ,(! ., D .|( #:. ) and propagated using the state equation until time X + Y .The computation of ,(! ./0, D ./0|( #:. ) is shown (Doucet et al., 2000): Orchard et al. ( 2008) provided the method of computing the above equation, but it is too computationally expensive to solve the above equation numerically.Doucet et al. (2000) proposed to neglect the error generated by particles from time t to X + Y. Therefore, the posterior PDF can be estimated at time X + Y using ,(! ./0, D ./0|( #:. ) confidence interval is recorded when 2.5 percentile and 97.5 percentile of particles exceeds the threshold.The predicted RUL distribution can be calculated as in Eq. ( 24) (Zio & Peloni, 2011):

Auxiliary Particle Filter Prognostics Model Framework
The Auxiliary Particle Filter Prognostics Model (APFPM) framework allows prognostics with an unknown physical model and disparate data sources.The assumptions of APFPM require the following information to be available as model input: ( (2) A measurement transition model that maps the observed measurements to the degradation state x.This step can be done in data extraction and manipulation.Raw data are collected from sensors and need to be processed and fused into a prognostic parameter, also known as a health index.
The prognostic parameter simplifies the measurement transition model and can be considered to be a degradation state.
(3) A failure threshold that defines the boundary between the acceptable working state and the failure state.When the degradation prediction crosses the failure threshold, the system is considered to be failed.The RUL is estimated by subtracting current time from the TOF.
(4) All historical measurements and their corresponding measurement time from 0 to time t.Measurements are process data that can be accessed with the condition monitoring system.For online prognostics, it includes all historical measurements and new measurements.
Measurements are mapped into degradation states via measurement transition models.
The ) can be estimated.
As mentioned in Section 2.2, in the auxiliary step, an auxiliary integer variable i is sampled from the range of [1, N], such that New model parameters D ! ) are then sampled from the a .4normal component kernel density using a Gaussian distribution H> Particle states ! !) can then be sampled.It is important to note that !!
) is sampled from the state transition function using , because the final weight, calculated with Eq. ( 20), is required to test the accuracy of both state and model parameters.Since the new parameter is sampled in the auxiliary step, the weight needs to verify that those parameters are sampled correctly.

Now ,((
) can be estimated to further calculate particle weight 9 ! ) using Eq. ( 20).Particle weights are normalized and then be used to calculate the effective number of particles, as in Eq. ( 21).If the effective number of particles falls below a pre-defined threshold, SIR is performed, and a systematic resampling method is used.
In the prognosis step, the posterior distribution is calculated with Eq. ( 23).For simplicity, it is assumed that the error generated by particles from time X to X + Y is negligible.There is another important assumption in the prognosis step.It is also assumed that the model parameter is changing in the model training step, but the model parameter cannot be updated without a new measurement.The model parameter used in the prognosis step is fixed to the last estimated model parameter.Thus, in the prognosis step, it is assumed that the model parameter does not change substantially from the last estimated model parameter so that the RUL does not deviate significantly from the true value.The pseudo-code of APFPM is here: Sample new D ! ) from ith normal component of kernel density using Gaussian distribution.

Experimental Data Set Description
Accelerated degradation experiments were performed with a series of five horsepower (HP), three-phase, and 220-volt squirrel cage induction motors (Sharp, 2012).Those motors were baked in an oven at 160°C for three days.The motors were then placed in a condensation chamber in a sealed container for a total of 48 hours at 100% humidity.Then those motors were placed back in the oven for a 24-hour heating process to get rid of any residual moisture, also at 160°C.Following the second heating process, the motors were cooled and run for an hour with data collected from an array of sensors.A single cycle took slightly over one week, and the motor underwent thermal and moisture induced degradation.The accelerated degradation testing was adapted from the studies of Upadhyaya et al. (1997) , 1974).The detailed experiment plan can be found in (Barbieri et al., 2015).The list of collected dataset variables is shown in Table 2.

Table 2. Raw Data Variables from Motor Testbed
The data were collected at 10,240Hz for two seconds every 15 minutes in a one-hour collection period.Thus, four data files were obtained in one experimental cycle.In the analysis, data from five different motors were used.Each motor was run-to-failure, and the number of tests (degradation cycles) for each motor is reported in Table 2.In the analysis, there are a few data files that are incredibly faulty due to data collection anomalies.Those outliers were excluded from future data analysis.The number of cycles in the cleaned test to failure data is also shown in Table 3.Each test represents a two-second snapshot of data collection.

Feature Extraction and Prognostic Parameter Generation
Both time and frequency domain features were extracted from the original data.Time-domain features were extracted for all variables.These features are: variance, kurtosis, mean, skewness, rooted mean square (RMS), and the ratio of peak magnitude to RMS.Variance and kurtosis of motor voltage z and kurtosis of motor voltage x were excluded because they have a few huge peaks, which affected the performance of later Principal Component Analysis (PCA) analysis.
Frequency domain features were extracted for X and Y direction vibrational data.Given the possible bearing fault band, as shown in Table 4, peak tracking for these three frequency ranges were extracted (Barbieri et al., 2015).RMS of both X and Y directions on these three faulty frequency ranges were also calculated.Other frequency domain features, including peak to peak, shape factor, impulse factor, and margin factor, were also extracted.There are in total 97 different features extracted from the data set; The second step used PCA to reduce the dimensionality further.PCA is a linear technique that maps variables to a low-dimensional space.It is a useful procedure to reduce the dimensionality of collinear data without losing much information.The first Principal Component (PC) contains the maximum variance of the data set, and the next PC always has less variance than the preceding PC.Remaining variables from IRT were mapped to form PCs, and 18 PCs are retained that contained 90% of the variance.
A Genetic Algorithm (GA) is a stochastic optimization method that has successfully been applied to generate prognostic parameters (Coble & Hines, 2009).A GA was used to construct an optimal prognostic parameter through the maximization of three prognostic metrics: monotonicity, prognosability, and trendability and prognostic parameter metrics score is calculated (Coble, 2010).First, a Savitzky-Golay filter (Savitzky & Golay, 1964) was used to smooth the data.Finally, the output from GA was normalized between 0 and 1 to form the prognostic parameter.Fig. 2 shows the prognostic parameter of 5 motors.The diagram of prognostic parameter generation steps can be found in Fig. 3. From the generated prognostic parameter, the degradation process starts from the beginning of the lifecycle.Different motors degrade at different rates from each other, but in general, they follow a similar degradation path.The prognostic parameter is not very smooth and linear, which makes a PF an ideal method to model this data.
Monotonicity is the most direct and relevant metric for degradation of induction motors.It is expected that these components degrade over time and have no self-healing.The IRT selection phase eliminated a majority of variables that do not have a strong monotonic trend over time.From another aspect, if IRT were skipped and PCA was applied to the entire 97 variables, the degradation-irrelevant variance would be taken into consideration when mapping those variables into PC spaces.It would not only affect the useful variance in the generated PCs but also force the model to contain more PCs to capture (potentially irrelevant) information.Table 6 shows a comparison of prognostic parameter metric scores with a few different data processing methods using the same motor data.It shows that combining IRT, PCA, and GA can generate the best prognostic parameter for the current data set from the choices considered.

PARTICLE FILTER MODEL FOR PROGNOSIS
There is no well-recognized physical model to simulate a complex system like motors.In this research an exponential model is selected to model the degradation: where a and b are unknown model parameters.According to Eq. ( 13), the degradation model Eq. ( 27) can be rewritten in the following recursive format: In the prognostic parameter generation step, all parameters are normalized to have a starting point of 1, indicating the system is completely healthy.West, 2001).The likelihood function of measurement, shown in Eq. ( 29), assumes that the measurement noise * ! in Eq. ( 14) is normally distributed The number of particles was chosen to be 100,000, compared with other research that use 30 to 5,000 particles.Computational speed for multinomial, residual, and systematic resampling method was tested using our dataset.It showed that systematic resampling is around 500 times faster than multinomial.Systematic resampling is also significantly quicker than residual resampling, which is why a systematic resampling algorithm was used in our model.
Models were trained for each of the five motors separately using a leave one out method.When one is trained, the failure data from the remaining four motors are used to calculate the failure threshold.The failure threshold was determined at the 50 th percentile among the four remaining motors' prognostic parameter value at failure.
The mean and standard deviation of the weighted particles were also calculated for each iteration to determine the simulated degradation trajectory.A 95% confidence interval was also calculated.Relative accuracy (RA) is a metric to estimate the accuracy of RUL.It is a measure of the error in prediction relative to the actual value at a specific time, as shown in Eq. ( 30) where \ 6 is the predicted RUL, and \ . is the true RUL.The prediction is more accurate when RA is close to 1.
Precision Index (PI) is another metric that can be applied to estimate the precision of RUL, which measures the relative width of prediction interval.PI can be defined with PI = B\ 6,$.9:; − \ 6,$.$*;B \ 6 (31) where \ 6,$.9:; and \ 6,$.$*; are the upper and lower bound of predicted RUL's 95% confidence interval.The prediction is more precious when PI is close to 0.
In practice, it is not as important to acquire accurate RUL in early life as it is towards the end of life.α-λ performance (Saxena et al., 2010) is a binary metric that evaluates whether the accuracy at a specific time falls within a defined bound.By setting up a bound α equals to 20%, it can be determined if the RUL predictions fall within the bound at λ-fraction of the entire life.It requires the predictions to stay within a cone of accuracy, which is roughly loose at the beginning of life but more stringent as the system approaches the end of life.
A series of simulations were performed using different numbers of measurements to train the model.Fig. 5 shows the α-λ performance of the RUL estimation of Motor 2 as an example case.The horizontal axis "lifetime of learning" indicates the percentage of measurement data in the entire life cycle that was used for APF training.RA and PI of all five motors through their fractional lifetime can be found in Fig. 6 and Fig. 7, respectively.The prognostic parameter for Motor 3 started to flat out in the middle of life cycle and showed a prolonged degradation until around cycle 90, followed by a dramatic decrease in value (Fig. 2).In the model training step, measurement after 0.80time index, which is approximately 80 cycles, has not been used to train the APF.Thus, the model could not catch this change.Besides, this Bayesian model is not designed to handle a sudden and significant change in states, which caused the model to have a bigger span of particle distribution as indicated by high PI value.
APFPM is benchmarked with naïve PF-based prognostic model.In the benchmark, resampling is set to occur in each iteration, instead of waiting until the effective number of particles drop by a half.This approach ensures that both models resample at similar times.The naïve PF model is also initialized the same way as APFPM on Motor 2, without worrying about the auxiliary part and kernel smoothing technique.The naïve PF model utilized the multinomial resampling method, as it is the most commonly seen method in the literature.Both models were run fourteen times with different breakpoints to estimate RULs on a 2015 Macbook Pro with 16GB memories and 2.2GHz Quad-core Intel Core i7 processers.APFPM model finishes the calculation in 19.628 seconds, while the naïve PF model took 471.036seconds to finish.Since 100,000 particles are used to train the model, APFPM is 24 times faster than naïve PF, which is close to their theoretical difference: 16 times, calculated by their time complexity difference <(> ?@A >) <(>) .
Fig. 8 shows the α-λ performance of RUL estimation of Motor 2 using naïve PF model.By comparing it with Fig. 5, it is clear that the naïve PF model has a bigger particle distribution span than APFPM in each data point.  ).However, a high ,(( !|! ! ) ) does not guarantee a more accurate model parameter.Thus the selection of the model parameter is rather random, causing the distribution to be not ideally normal.
In comparison, the parameter distribution of APFPM is strictly normal and resampled using a kernel smoothing method to avoid deviation from the mean value.This algorithm guarantees the parameter and particle to be constrained in a tighter distribution.It is why the APFPM has a smaller 95% confidence interval of particles at each datapoint, seen by comparing the confidence intervals in Fig. 4 and Fig. 15.
The benchmark result also indicates that naïve PF model is more sensitive to a sudden change in the data, while APFPM is more robust to such changes.Following a rapid drop in the measurement data after the 35 th cycle, the trend of naïve PF is dragged downward and predicts an early failure even with very good model initialization.This makes naïve PF model more sensitive to noise, and it is easy for naïve PF model to pick up measurement noise and cause overfitting.Conversely, APFPM requires a good initialization and it takes more iterations to correct itself if there is a change in the process.Thus APFPM is more suitable for situations that have noisier measurement data, smaller time steps, or the process is unlikely to change very often.

CONCLUSION
This paper introduced an auxiliary particle filter-based prognostic framework, APFPM.This APF model can estimate both unknown states and parameters at the same time using a large number of particles.This method is demonstrated using a dataset obtained from an accelerated motor degradation experiment.
Preprocessing of motor degradation data allows it to fuse raw experimental data into prognostic parameters.This paper introduced a data fusion method that combines IRT, PCA, and GA algorithms from a set of features extracted from the original data.It proves that the combination of these three algorithms generates prognostic parameter with highest score in monotonicity, prognosability, and trendability.
APFPM learns the degradation pattern using a prognostic parameter and further estimates the RUL by extrapolating the degradation curve until it crosses a predefined failure threshold.The results showed that this model could accurately predict the RUL.
The initialization step of APFPM requires particles and parameters to be set in a range.By learning the degradation curve, the particles and parameters can automatically adjust themselves to fit the curve using SIR.The kernel smoothing method is applied in sampling parameters to ensure the newly sampled parameters are close to their mean.It does not result in the increase of particle covariance.
Different resampling methods are compared based on the time complexity and the resampling performance of both particles with big weights and particles with reasonably big weights.The systematic resampling method is favored because of its low computational time complexity, which is extremely important in particle filter-based simulations.The systematic resampling method also has good performance on resampling particles with different weights.
A benchmarking between naïve PF-based prognostic model and APFPM is conducted and the use case of two models are discussed.Although APFPM requires an auxiliary step in each iteration, it is still significantly faster than naïve PF because systematic resampling method is deployed.The particles in APFPM are more concentrated in the distribution than naïve PF and the distribution of particle parameters is guaranteed to be normal.APFPM is not sensitive to noise in the prognostic parameter, which makes it more suitable for systems with more measurements and is unlikely to change much.
23) Each particle state !./0& is calculated by applying the degradation state equation from ! ./0"#& .At time t, when the last measurement data ( . is processed by the model, the PF model converts from training to the prognosis step.The parameter D .& becomes unchanged in the prognosis step, and the degradation model propagates until all particles exceed a predefined threshold.The time of failure (TOF) can be determined when the mean of the weighted particles C ! & exceeds the threshold, while a 95%

Fig. 4
Fig.4shows the predicted degradation compared with the actual prognostic parameter.The model was trained until the 75th cycle.The mean of the particles in the training step, representing the degradation trajectory, is shown with the red

Figure 4 .
Figure 4. Motor 2 prognostic parameter with accelerated degradation using APFPM Although the prognostic parameter is noisy and nonmonotonic, the simulated APF trajectory is much smoother.The particles are sparse at the beginning of the cycle, indicated by the broad range of 95% confidence interval.They are more closely clustered later on.This means that the resampling algorithm selects a representative population of particles with high probability.

Figure 5
Figure 5. α-λ performance of RUL estimation of Motor 2 using APFPM Fig.8shows the α-λ performance of RUL estimation of Motor 2 using naïve PF model.By comparing it with Fig.5, it is clear that the naïve PF model has a bigger particle distribution span than APFPM in each data point.Fig.9and Fig.10shows the particle and parameter distribution of naïve PF model at cycle 20, 50 and 70, respectively.In comparison with APFPM, Fig.11and Fig.12shows the particle and parameter distribution of APFPM at cycle 20, 50, and 70, respectively.As the particle and parameter distribution span is extremely narrow at later stage, separate plots of particle and parameter distribution of APFPM at cycle 70 is shown in Fig.13and Fig.14, respectively.A comparison between prognostic parameter with simulated degradation of naïve PF model is shown in Fig.15using the first 70 measurements as training data.

Figure 8
Figure 8. α-λ performance of RUL estimation of Motor 2 using PF

Table 1 .
Resampling method summaries 1) A degradation state transition model to estimate the evolution of states.PF-based models use a physical model to estimate state posteriors.Dynamic updating of states and model parameters adds a lot of flexibility to the model.However, a rough estimate of the degradation state transition model from either visual inspection or regression is required.

Table 3 .
Table 5 gives a complete list of extracted features.Tests to failure data for each motor

Table 4 .
Bearing Fault and Corresponding FrequencyDimensionality reduction techniques reduce the number of variables and thus the complexity of the problem.Two-step dimensionality reduction methods were utilized in this research.The first step was to perform a feature Importance Ranking Test (IRT).Variables with high ranks can be retained.Feature importance is ranked by monotonicity based on the algorithm shown in Eq. (26) because monotonicity is the most intuitive feature for degradation trend.

Table 5 .
List of features extracted for analysis Figure 1.Importance Ranking of all variables extracted from raw data

Table 6 .
Comparison of Prognostic Parameter Metrics Score with Different Data Processing Methods By assuming an arbitrary uncertainty, the state !$ is uniformly sampled from [0.85, 1.15].By roughly calculating the model parameter, it is around 0.013, that is in the range of [0.01, 0.02].Then the parameter x $ is uniformly sampled from [0.01, 0.02].Model measurement noise standard deviation s was set to be 0.1.Process noise & ! was already incorporated in the uncertainty of the model parameter so that it can be ignored.The smoothing factor h was set to 0.2, as suggested by (J.Liu &