Probabilistic Prognosis with Dynamic Bayesian Networks

This paper proposes a methodology for probabilistic prognosis of a system using a dynamic Bayesian network (DBN). Dynamic Bayesian networks are suitable for probabilistic prognosis because of their ability to integrate information in a variety of formats from various sources and give a probabilistic representation of the system state. Further, DBNs provide a platform naturally suited for seamless integration of diagnosis, uncertainty quantification, and prediction. In the proposed methodology, a DBN is used for online diagnosis via particle filtering, providing a current estimate of the joint distribution over the system variables. The information available in the state estimate also helps to quantify the uncertainty in diagnosis. Next, based on this probabilistic state estimate, future states of the system are predicted using the DBN and sequential or recursive Monte Carlo sampling. Prediction in this manner provides the necessary information to estimate the distribution of remaining use life (RUL). The prognosis procedure, which is system specific, is validated using a suite of offline hierarchical metrics. The prognosis methodology is demonstrated on a hydraulic actuator subject to a progressive seal wear that results in internal leakage between the chambers of the actuator.


Background
The rise of complex and costly systems for use in extreme environments has resulted in new challenges in maintenance, planning, decision-making and monitoring for these systems.To reliably execute the missions they were designed for, these systems must be meticulously maintained.Traditional schedule-based maintenance results in unnecessary system downtime and the potential for serious problems to develop between routine maintenance.The alternative, condition-based maintenance (CBM) (Jardine, Lin, & Banjevic, 2006), monitors systems as they operate, alerting personnel when faults occur.Maintenance is performed on-demand, resulting in less downtime and lower costs.Maintenance and operational decision-making, such as resource allocation, may further benefit from knowing the prognosis of a system.A system's prognosis is a measure of its fitness to perform a prescribed task.Quantitatively, prognosis is commonly expressed through the remaining useful life (RUL).RUL quantifies the amount of time until a system reaches some failure criterion, e.g.fault magnitude or performance metric crosses a threshold or system is no longer operable.Ideally, the uncertainty in RUL is quantified by estimating the distribution of RUL, resulting in a probabilistic prognosis.Importantly, probabilistic prognosis assesses the outlook for a specific instantiation of a system, or a particular unit under test (UUT).Measurement data updates the belief about the present state and RUL of the particular UUT.In this way, probabilistic prognosis differs from probabilistic reliability analysis, which aggregates data to obtain a probabilistic reliability estimate for a population as opposed to an individual.Such population statistics may be suitable for tasks such as system design, but less helpful for operational and maintenance decisions that focus on individual units, as is the case in CBM.Prognosis, on the other hand, tracks the health of an individual unit.
A prognosis methodology should thus have several important characteristics.It should provide a distribution of RUL as opposed to a point estimate, thus accounting for the uncertainty coming from many sources i.e. natural variability, information uncertainty, and model uncertainty (note: information includes measurements).In most situations, prognosis about the future is based on diagnosis of the current state; therefore it should account for uncertainty in diagnosis (i.e.uncertainty in damage existence, location and size, and sensor performance uncertainty).It should allow easy transition between situations when measurements are available and when they are unavailable.Finally, the methodology should survive rigorous validation.
Hybrid methodologies combine multiple approaches, i.e., a combination of data-driven and model-based approaches.E.g.Kozlowski (Kozlowski, 2003) uses Autoregressive moving average (ARMA) models, neural networks, and fuzzy logic for estimation of the state of health, state of charge, and state of life of batteries.
DBNs are probabilistic graphical models with diagnostic and prognostic capabilities.They have shown promise in several recent applications.Bartram and Mahadevan (Gregory Bartram & Mahadevan, 2013) have presented an application of DBNs to prognosis of a hydraulic actuator at the PHM conference.Dong and Yang (Dong & Yang, 2008) use DBNs combined with particle filtering to estimate the RUL distribution of drill bits in a vertical drilling machine.While very useful, particle filtering is not the only inference method available for prognosis.Jinlin and Zhengdao (Jinlin & Zhengdao, 2012) use DBNs modeling discrete variables and the Boyen-Koller algorithm for prognosis.Tobon-Mejia et al. (Tobon-Mejia, Medjaher, Zerhouni, & Tripot, 2012) use mixtures of Gaussian HMMs (a form of DBN) to estimate the RUL distributions for bearings.The junction tree algorithm is used for exact inference.The prognosis methodology is validated using the hierarchical metrics proposed by Saxena et al. (Saxena, Celaya, Saha, Saha, & Goebel, 2010).

Motivation
While the preceding literature review identifies several prognosis approaches, prognosis is still an emerging research area with room for much additional work.The combination of DBN and particle filtering has many qualities that are attractive for prognosis: 1) The graphical representation of a problem provided by DBNs aids understanding of interactions in a system.
2) DBNs provide a probabilistic model of the system that accounts for uncertainty due to natural variability, measurement error, and modeling error.
3) DBNs can integrate different types of information that may be encountered during prognosis (including expert opinion, reliability data, mathematical models, operational data, and laboratory data) into a unified system model.4) DBNs can update the distributions of all variables in the network when observations are obtained for any one or more variables.This allows the most recent system measurements to be accounted for in prognosis.
5) The probabilistic state estimate generated during particle filtering contains information about diagnosis uncertainty that can be used in prognosis and decision making.6) Particle filtering algorithms can take advantage of parallel computing, thus reducing computation time.
Additionally, prognosis methodologies reported in the literature are mostly application-specific.There is a need for a prognosis methodology that can make use of heterogeneous information and be applied to a wide range of problems.

Contributions
This paper proposes a framework for probabilistic prognosis.The methodology advances the use of DBNs in prognosis by integrating heterogeneous information.Further, the DBNbased methodology addresses the need for a general prognosis framework for developing validated prognosis methodologies for any system.The methodology provides a means for quantifying diagnosis uncertainty that can be used in prognosis and decision making.
The DBN is constructed from prior information, including physics-of-failure models.The DBN is a store of prior information, but also provides the means for integrating current measurements into a probabilistic estimate of the current state of a system.In this paper, particle filter-based inference is used for diagnosis, and forward sampling in the DBN is used for recursive prediction.Diagnosis uncertainty is quantified using the probabilistic state estimate of the system that results from particle filtering.Sample-based estimates of detection and isolation probabilities as well as density estimates of system damage parameters are developed based on marginal distributions in the state distribution.This information can be used in decision making for improvement of the health monitoring system.The probabilistic state estimate also provides a seamless transition from diagnosis to future state prediction using recursive sampling.The ability of the methodology to estimate RUL is validated using metrics from Saxena et al. (Saxena et al., 2010).The methodology is illustrated for a hydraulic actuator with a seal leak.
The remainder of this paper is organized as follows.Section 2 details the proposed prognosis methodology, including system modeling, diagnosis, prediction, and validation.In Section 3, the proposed methodology is demonstrated on a hydraulic actuator system with a progressive internal leak.Section 4 discusses conclusions and future work.

PROPOSED PROGNOSIS FRAMEWORK
The challenge of prognosis is to minimize the uncertainty in the estimated distribution of RUL given constraints on available information about the system, operating environment and loading conditions, computational resources, and time horizon.In this paper, a DBN-based prognosis framework is proposed.The prognosis framework first constructs a DBN-based system model using heterogeneous information sources.Expert opinion, reliability data, mathematical models, and operational and laboratory data are used in the construction of the DBN model.In particular, inclusion of physics-of-failure models is important in prognosis.The evolution of phenomena such as cracking, wear, and corrosion play a large role in determining the health of a mechanical system.The system model is used for diagnosis to obtain information about the current state of the system.A recursive Monte Carlo sampling then predicts future system states and estimates the RUL distribution.Finally, the prognosis capability of the resulting system model, diagnostic, and predictive algorithms are validated using a four-step hierarchical procedure.The prognosis procedure is shown in Fig. 1.

Dynamic Bayesian Networks
A dynamic Bayesian network is the temporal extension of a static Bayesian network (BN).A static BN, also referred to as a belief network and directed acyclic graph (DAG), is a probabilistic graphical representation of a set of random variables and their conditional dependencies.Variables are represented by nodes (vertices) and conditional dependence is represented by directed edges.Unconnected nodes are conditionally independent of each other.The acyclic requirement means that no paths exist in the graph where, starting at node xi, it is possible to return to node xi.
A DBN describes the joint distribution of a set of variables x on the time interval [0, ∞).This is a complex distribution, but may be simplified using the Markov assumption.The Markov assumption requires only the present state of the variables x t to estimate x t+1 , i.e. p(x t+1 | x 0 , …, x t ) = p(x t+1 | x t ) where p indicates a probability density function and bold letters indicate a vector quantity.A linear relationship between x t+1 and x t is implied by this assumption.Additionally, the process is assumed to be stationary, meaning that p(x t+1 | x t ) is independent of t.This approach to modeling DBNs is developed by Friedman et al. (Friedman, Murphy, & Russell, 1998).
DBNs provide a flexible modeling framework, allowing integration of expert opinion, reliability data, mathematical models (including state-space, surrogate, and physics-offailure models), existing databases of operational and laboratory data, and online measurement information.Bartram and Mahadevan (G. Bartram & Mahadevan, S., 2013) have proposed a methodology for the integration of such heterogeneous information into DBN system models.In the next section, that discussion is extended to consider physics-of-failure models, which are of particular importance in prognosis.
Figure 2 shows a cantilever beam with a potential crack and/or damage at the support.The damage at the support renders a portion of the support ineffective, thus reducing the Validation stiffness of the system and increasing the magnitude of deflections.The variable amplitude load causes the crack, if present, to grow, resulting in increased downward deflections.Figure 3 shows a DBN of the system.Variables for the DBN are described in Table 1.The DBN encodes four system states (healthy, damaged support, cracked, damaged support and cracked) by using binary variables to indicate the presence of each fault.Gaussian Process (GP) regressions model the stress intensity factor and crack length, and a linear regression models the deflection of the beam.The parameters of the DBN may be inferred offline using synthetic data.

Physics-of-Failure Models
A key distinction between a system model capable of diagnosis and one capable of prognosis is that a prognostic model estimates the evolution of damage in the future while a diagnosis model only needs the ability to infer the current state of damage.Diagnostic procedures based on fault signatures or pattern recognition are examples of this.While they may be able to detect and isolate damage, quantification can be done using least-squares based estimation, they do not necessarily have any ability to model progressive damage mechanisms such as crack growth, wear, and corrosion.One of the challenges of prognosis is to develop accurate and comprehensive physics-of-failure models.These damage mechanisms are complex, varying with system design and dynamics, and can interact in many ways.The cantilever beam DBN includes physics of failure models for the three faulty system states through two mechanisms.First, the linear regression for predicting beam deflection changes coefficients depending on the health state of the system.Second, parameters of Paris' law for crack growth as well as crack length are predicted using GP regression.The GP takes the place of direct use of the crack growth law, but the parameters SIF and a are still retained within the DBN.The material dependent properties m and C, while also potential sources of uncertainty, have been taken as constants in this example.

Diagnosis
Diagnosis is the process of detecting and isolating damage in a system and quantifying the magnitude of damage.When the probability of a fault occurring crosses the detection threshold, a fault isolation procedure finds fault candidates for further analysis.To isolate candidate faults, statistical  In the context of prognosis, diagnosis (or more specifically, filtering) provides the initial conditions for prognosis of a UUT.The initial condition for prognosis has a large impact on the accuracy and precision of the RUL distribution.As such, it is important to understand and account for diagnosis uncertainty.
Uncertainty in diagnosis is due to natural variability, measurement error, model error, error due to approximate inference, and any approximations in optimization or least squares procedures used to estimate fault magnitudes.Sankararaman and Mahadevan (Sankararaman & Mahadevan, 2011) propose a methodology for quantifying the uncertainty in diagnosis.This is an integral part of the diagnosis procedure, and it is modified in this paper to accommodate a particle filter (PF) based diagnosis procedure.
Diagnosis can be performed using a DBN model of the system to estimate the state of the system as measurements, z t , become available.The simplest approach is to expand ("unroll") the two time-slice network (Figs.3a and 3b) and compute the states of all the unobserved variables in the system, x t , including faults, using a standard inference technique such as the clique tree algorithm (Koller & Friedman, 2009).However, this is generally a computationally intractable problem (Boyen & Koller, 1998).As a result, approximate inference based on Bayesian recursive filtering is pursued.

Bayesian Recursive Filtering
The procedure for updating the belief about the system state as new information becomes available is called Bayesian recursive filtering.Bayes' theorem is the engine for performing the update.Diagnosis of a dynamic system may be achieved by maintaining the joint distribution over the system variables, parameters, and faults and as new noisy measurements become available via Bayesian recursive filtering.The joint distribution provides the best estimate of whether faults have occurred and what values system parameters and responses may have.This joint distribution is commonly called the belief state   (x  ).  (x  ) = (x  |z : ) ), where (x  |z : ) is the distribution over the variables x  estimate given all previous measurements z : .The belief state estimate includes estimates of the states of faults and system parameters, whose states are otherwise unknown.Equation 2, derived from Bayes' theorem, is the engine for belief state updating: where ( +1 | +1 ) is the likelihood of the measurements, ( +1 | 1: ) is the prior state estimate at time t, ( +1 | 1: ) is the normalizing constant, and  +1 ( +1 ) is the posterior state estimate at time t.

Particle Filtering
Under certain assumptions, such as when the system is linear and Gaussian, the belief state  + (x + ) will be of a known parametric form and computationally efficient solutions to the filtering problem (e.g.Kalman filter, extended Kalman filter, unscented Kalman filter) are available.Outside such assumptions, a computationally feasible method for inference in the DBN is found in particle filtering, a form of sequential Monte Carlo based on Bayesian recursive filtering (see e.g.Chen [42]).
Particle filtering is a method for approximating the distribution of the belief state with a set of samples and weights.1. Cantilever beam DBN (Fig. 3) model variables the basic sequential MC by weighting point masses (particles) according to their importance sampling density, thus focusing on the samples that affect the posterior belief state the most.A comprehensive tutorial on particle filters is given by Ristic et al. (Ristic & Arulampalam, 2004) and in Koller and Friedman (Koller & Friedman, 2009).
In this paper, an algorithm for systems with multiple operating modes (Andrieu, Davy, & Doucet, 2003) that extends the auxiliary particle filter (APF) (Pitt & Shephard, 1999) is used.This auxiliary particle filter has an auxiliary step where samples  ̃ +1 are drawn based on    .Weights for each particle are then calculated based on  +1 .A resampling step, which reproduces particles in proportion to their likelihood (thus eliminating particles of negligible weight), determines a set  ̃  that consists of the    that produce the  ̃ +1 with the largest weight. ̃  are then used to sample   +1 .Resampling may be performed again.Compared to a basic SIS filter with a resampling step, the APF can more closely estimate the true state provided that process noise remains low.Regularization (Ristic & Arulampalam, 2004) is also applied to prevent sample degeneracy.

Fault Diagnosis and Diagnosis Uncertainty Quantification
When using a particle filter, the belief state itself provides the information necessary for fault detection, isolation, and damage quantification.The marginal distribution over combinations of the discrete fault indicator variables is a multinomial distribution, whose parameters are easily calculated from the particles representing the current belief state.Given m fault indicator variables that can take on values of true or false, there are  = 2  combinations of faults, including the healthy condition.The  ℎ combination at the  ℎ cycle has an expected probability The probability of any fault (probability of damage) is then where  0  is the probability of that no faults occur.When  0  is greater than some threshold, an alert may be issued to a decision maker and a prognosis procedure may be triggered.The remaining    ( ≠ 0) are the isolation probabilities of each fault combination.From the belief state,  +1 ( +1 ), the marginal distributions over damage parameters may be constructed from the particles and their weights.
The probabilities pt that parameterize a multinomial distribution are themselves uncertain and follow a Dirichlet distribution.Based on the Dirichlet distribution, the variance of The uncertainty in pt is directly dependent on the number of samples,   .With the detection and isolation probabilities and their corresponding uncertainties as well as estimates of the distributions of damage parameters known, a decision maker is better able to assess the criticality of damage and the appropriate actions to make to balance safety and cost concerns.
Using a particle filtering algorithm, estimates of the health state of the beam are inferred from measurements of the deflection.These estimates are probabilistic and naturally provide estimates of the SIF and crack length.

Prediction
In probabilistic prognosis, possible future states of the system are generated and the remaining useful life (RUL) distribution, (), of the particular unit under test (UUT) is estimated.RUL is the amount of time a UUT is usable until corrective action is required and may be measured in hours, minutes, cycles, etc. Measurements are unavailable and the system model is assumed to be valid under future operating conditions.Prediction can be initiated at any time in the life of a system based on the last available state estimate obtained during diagnosis.However, in this paper, the time of prognosis, tP, the first time point for which a prognosis estimate is obtained, is after the time of fault detection, tD.This ensures that prognosis effort is expended only once a known cause of failure has been identified.Figure 4 illustrates these important prognosis time indices.
One approach to prediction when performing particle filtering on a DBN is a basic sequential Monte Carlo starting with the last available belief state estimate from diagnosis.This belief state estimate is a probabilistic initial condition for prediction that accounts for uncertainty in diagnosis.
Starting with the last belief state estimate (with measurements available), particles are recursively sampled through the two time slice DBN until some termination criterion is met, such as (() = 0) is above some target threshold.Thus, there are   trajectories of the variables of interest beginning at time t, {()} =1   .Each trajectory consists of a series of predictions for the variables of interest, () = {(|), ( + 1|), … (|)}, where the end of prediction (EoP) is the time index of the last prediction before the end of life (EoL) is reached.The EoP depends on frequency and occurs after the end of useful predictions (EoUP), which is the last time index before the required logistics lead time makes corrective active infeasible (and updating of the RUL unnecessary) Particle weights are fixed from the last available measurement, as there is no basis for updating the weights.This results in a particle-based approximation of RUL (similar to the belief state approximation), using the last available set of weights, that accounts for uncertainty in diagnosis and prediction.When a new measurement is obtained, a new RUL distribution may be estimated.
To further tailor the prognosis to a particular UUT, the conditional probability distributions in the DBN may be updated as measurements become available.This may be performed via Bayesian updating of the distributions.If a conjugate prior is available, the update can be performed analytically.Otherwise, techniques such as Markov chain Monte Carlo (MCMC) may be required.
The RUL distribution is sensitive to many aspects of the problem.The initial state estimate provided by the diagnosis algorithm must be accurate.As such, the filtering algorithm and number of particles are important algorithmic design decisions.These decisions also involve a tradeoff between accuracy and computational effort, which must be considered.Optimal sensor placement and improved sensor reliability also impact the accuracy of the diagnosis.
The accuracy of predictive models, including those for inputs (loads) and physics-of-failure models, is another large source of uncertainty in the RUL estimate.Because the prediction is recursive with no measurements available to correct the prediction, errors in prediction compound quickly and must be minimized.
Three additional factors that affect the diagnosis (and prognosis) of a system are the sensitivity of the system measurements to damage, noise in the training and measurement data, and the sampling rate i.e. frequency at which measurements are obtained.If the sensitivity to damage is low and the noise large, diagnosis may be difficult or impossible.Noise in the training and measurement data obscures the true system state and contributes uncertainty to state estimates in diagnosis and prognosis.A low sampling frequency can deteriorate the quality of state estimates in systems that depend on previous states, due to nonlinear dynamic systems as in the actuator system considered in this paper.The state estimates amount to a linearized approximation of the state across the sampling interval (reciprocal of sampling frequency).As sampling rate decreases, the interval becomes wider and the linear approximation worsens.A higher sampling rate decreases the sampling interval and reduces the error due to linearization, thus improving state estimates.In the cantilever beam example, once a crack is diagnosed at tD, estimation of the RUL commences.The last filtered system state estimates of the fault states, load, stress intensity factor, crack length, and deflection serve as initial conditions for predictions.
Figure 5 shows a possible distribution of crack length at tD. Possible trajectories of these system variables are then simulated, one for each particle, providing in an ensemble of trajectories.Figure 6 shows possible trajectories of the crack length over the next 60 load cycles.A zero value on the horizontal axis indicates the distribution of crack lengths at time tD (the distribution of these samples corresponds to Fig. 6).The failure criterion for the beam is its crack length being above a threshold value, indicated by the horizontal red line in Fig. 6.Each particle has a unique crack length and growth trajectory, resulting in a particular number of load cycles until exceedance of the critical crack length.This results in a distribution of the RUL, depicted as a histogram in Fig. 7, with median RUL also shown.

Measurement Gaps
Systems may experience periods of times where measurements are unavailable.This may be a result of the system configuration, availability of measurements, failure of sensing systems, or the desire to have system state estimates at a higher rate than the available measurements.For example, offline inspection data may be available for an aircraft on the ground, while onboard sensing provides a steady stream of information about temperature, altitude, windspeed, pressure, etc.These onboard measurements may only be available for portions of a flight (perhaps during cruising but not takeoff or landing).
Using the same recursive sampling used for RUL estimate, predictions may be produced and used to fill in the information gaps.When a measurement becomes available, the particle filtering algorithm is used to update the last predicted system state.The particle filter update may be performed as long as at least one measurement is available.

Prognosis Validation
Prognosis validation is essential to establish confidence in the RUL estimate.Many sources of uncertainty, including modeling errors, sensor faults, noisy data, and unpredictable loading conditions and operating environments, strongly affect prognosis.Therefore, validation of a prognosis procedure must be done using quantitative performance metrics.These metrics must be carefully chosen, as many issues arise when evaluating prognosis algorithms, such as time scales or the ability to improve accuracy as more measurements are obtained (Saxena et al., 2010).Saxena et al. (Saxena et al., 2010) propose a four-metric hierarchical test to evaluate a prognosis algorithm.This hierarchical test assumes that prognosis will improve as more measurements become available.Together, these four metricsthe prognostic horizon,  −  accuracy, relative accuracy, and convergence -provide a means for testing and comparing prognostic algorithms.
The first two metrics examine the accuracy of the RUL estimates by determining the probability p that the RUL estimate is between ± of the ground truth RUL.This probability p is compared to a threshold value, β.It is desirable for p to be greater than β.The primary difference between the first two metrics is in how  is determined, which results in a stricter test for the second metric than the first.
Prognostic horizon (PH) indicates the time at which RUL estimates using a particular prognostic algorithm for a particular system are within acceptable limits.The upper and lower limits are the ground truth RUL plus or minus a constant α, which is some percentage of the EoL value.PH is the difference between the true EoL time and the time when the prognostic algorithm attains this acceptable level of accuracy ( > ).As this is a validation metric, the true EoL is known.A longer PH is indicative of a better prognostic   algorithm.Figure 8a provides a visual representation of prognostic horizon.
Prognostic horizon maintains upper and lower bounds that are always the same distance from the true RUL.The second validation metric,  −  accuracy, utilizes a stricter criterion that gradually tightens the limits about the RUL estimate (Figure 8b).Additionally, the accuracy of the RUL is considered at time   , where 0 ≤  ≤ 1,   =   + (  −   ), and   is the time at which a prognosis estimate is first obtained.This metric reflects the idea that, as more information is collected about the system, the RUL estimate is expected to improve, and thus the accuracy requirement for the RUL estimate should become more stringent.The  −  accuracy is equal to 1 when the increasingly stringent accuracy requirements are met, and zero otherwise.
In step three, the relative accuracy (RA) of the prognostic algorithm is calculated.
where (t i ) is the non-negative prediction accuracy, EoUP is the end of useful prediction, and P is the time at which the prognosis algorithm makes its first prediction.End of useful prediction is the time after which corrective action cannot be performed before EoL.A high rate of convergence is better and leads to a larger PH.
Using ground truth data, the four validation metrics indicate the overall suitability for prognosis of the system model represented by the DBN and the Monte Carlo inference algorithm.This subsumes many design decisions for the system model, including which variables to include or exclude and their conditional probability distributions, graph structure of the network, training point selection, frequency of updates, number of samples to use in inference, etc.In particular, the numerical example in Section 3 investigates the sensitivity of the validation metrics to sampling rate.

Summary of Proposed Prognosis Framework
This section presented a framework for probabilistic prognosis, and illustrated portions of the framework with a cantilever beam example.Figure 9 summarizes the framework.DBNs are used as a system modeling paradigm due to their ability to handle uncertainty and to integrate many types of information, both in the offline model construction phase and the online belief state updating phase.
For prognosis, it is of particular importance to model complex physics-of-failure phenomena and integrate such models into the DBN.After the DBN model is established, the model is used for tracking the state of a particular UUT.Particle filtering is used to update the belief state as new measurements are obtained.Uncertainty in the state estimate (diagnosis) is quantified, and when a fault is detected, estimation of RUL via recursive prediction begins.The result is an estimate of the distribution of RUL.Section 2.4 considers the situation when there are gaps in the availability of measurements.
When a prognosis procedure (DBN model of system combined with available measurements and filtering algorithm), is designed for a particular system, it is then validated using the 4 step hierarchical procedure outlined in Section 2.5.

COMPUTATIONAL EXAMPLE
A hydraulic actuator system was considered to demonstrate the proposed methodology.Such a system is often used to manipulate the control surfaces of aircraft.The system consists primarily of three subsystems: a hydraulic actuator, critical center spool valve, and an axial piston pump (Fig. 10).First, expert opinion is invoked to determine the scope of the problem, variables and faults to model, and establish the DBN structure.Next, reliability data is drawn upon to determine the conditional probabilities for the faults.The mathematical model of the system is used to generate predictions of the system variables.The predictions are treated similar to operational and laboratory data and used to train a regression model for estimating the reduction in seal orifice area, which is equivalent to the seal leakage area.

Expert Opinion
Expert opinion was considered first to define the basic parameters of the problem.Seven state variables and six discrete faults were selected to model the behavior of the system.
A generic initial structure for the DBN is first selected (Fig. 11) based on expert opinion.This generic two time slice structure consists of the set of faults, F, model parameters, θ, system state, x, and measurements, z.In this structure, faults cause changes in system parameters, which then cause changes in system responses, which are observed.F contains the persistent variables in the DBNtheir future values depend upon their present values.The observations, z, while not connected across time slices, are nonetheless not independent across time slices, but correlated via θ.
Table 2 lists the faults considered in the actuator system and the parameter affected by that fault (the faults are described further in Section 3.1).For each fault, a binary variable that indicates the presence of the fault is added to the network at time t and t + 1.Similarly, a Gaussian variable is added at time t and t + 1 for each affected parameter.Links are drawn   pointing from faults to affected parameters.The parameters are assumed to have Gaussian distributions, whose mean and variance depend on the health state of the system.The leakage area parameter is a special case, as wear and leakage are assumed to be present at all times.The discrete variable for leakage area indicates that the leakage area has increased beyond some threshold value.
Parameters from the current time step and initial conditions from the previous time step are input into a physics-based model of the actuator, which estimates the system responses, assumed to be Gaussian variables.Measurements are then connected to the corresponding system response.Links are also drawn between like faults at time t and t + 1 and like parameters at time t and t + 1.Finally, a Gaussian variable is added at time t and t + 1 for each measurement available.The resultant DBN is shown in Fig. 12 with parameters described in Table 2.

Published Reliability Data
The DBN model of the system should be able to simulate multiple faults and predict system behavior multiple steps into the future for the model to be a useful diagnosis and prognosis tool.The overall failure rate for an actuator may be determined by estimating the base failure rate and making empirical corrections for temperature and fluid contamination (Naval Surface Warfare Center, 2011).The RIAC Databook (RIAC Automated Databook, 2006) and the Handbook of Reliability Prediction Procedures for Mechanical Equipment (HRPPME) (Naval Surface Warfare Center, 2011) give failure rates for many mechanical systems.For illustration of the methodology, a handful of the possible faults for the actuator system are considered in this paper.Table 3 lists the faults that have been considered, the subsystem where they are located, and the parameters affected by those faults.
Failure rates from the literature were then used to calculate the probability of each fault occurring.These probabilities correspond to parameters of the discrete fault indicator variables in the DBN.

Mathematical Behavior Models
Several mathematical models are used in this example.A physics-based model of a hydraulic actuator, described by Kulakowski et al. (Kulakowski, Gardner, & Shearer, 2007) and Thompson et al. (Thompson, Pruyn, & Shukla, 1999) with Eqs.9-19, is integrated into the DBN through a deterministic conditional probability distribution (Koller & Friedman, 2009).This model has been implemented in the Matlab Simulink environment.The model is run recursively in 0.25 second increments (4 samples per second).Each increment has a constant load value determined by a load model.When an abrupt fault occurs, it is programmed into the model at the beginning of the load interval for which it occurs.
The actuator is shown in Fig. 10.The parameters and variables for this system are described in Table 4.The relationships among these quantities are as follows: ̇  =   (13) ̇  =  1   +  0   +  0   (14) For demonstration of the prognosis methodology, the load is simulated using an ARIMA (autoregressive integrated moving average) model.In reality the load on a flight control actuator is depends on many variables related to the dynamics of the aircraft and the desired flight path (see Mahulkar et al. (Mahulkar, McGinnis, Derriso, & Adams, 2010), Karpenko andSepeheri (Karpenko &Sepehri, 2003), andMcCormick (MacCormick, 1995)).Finally, the physics-of-failure model for the seal leak is considered.The seal leakage volume is modeled as in Section 2.2.The wear rate is treated as a random variable whose distribution is learned from laboratory data.The volume of material removed is divided by the contact length of the seal (Fig. 13) to arrive at the leakage area required for use in Eqs.9-19.

Parameter/variable
Generally, seal leakage is due to wear caused by friction between the seal and piston, which removes seal material and allows fluid to pass between the chambers of the actuator.
The cross section of the actuator in Fig. 10 is shown in Fig. 14.The surface area of the seal is ( 2 2 −  1 2 ) , where r1 and r2 are the inner and outer radii of the seal, respectively.
The volume of material removed from the seal per cycle depends on the friction force and sliding distance per cycle and may be calculated by where   is the wear rate of the seal in mm 3 /N/m, F is the frictional force on the seal, d is the total sliding distance, and t refers to the load cycle.
For the seal shown in Fig. 13, where L is the contact length of the seal and P is pressure, the leakage area (considered in Eqs.16-29 in Thompson et al. (Thompson et al., 1999)) based on the volume of material removed is assumed to be   = ()  ⁄ .
While wear is a continuous process, in this paper the occurrence of wear is modeled as a binary event, where modeling begins when the leakage area has reached a value that has detectable The occurrence is modeled using an empirically derived seal failure rate, which modifies an experimentally determined base failure rate for the seal.Details of deriving the failure rate are available in (Naval Surface Warfare Center, 2011).
The wear rate itself varies with factors such as the age of the seal, temperature, contaminants in the fluid.The load experienced by the seal also varies as does the velocity of the actuator.As a result, the volume of material removed and leakage area are nonlinear functions of seal age, temperature, and contaminant concentration.However, for the sake of demonstration, it is assumed that the wear rate is steady, which is possible outside of the initial wear-in phase and under constant environmental conditions.

Operational and Laboratory Data
Operational and laboratory data appear as historical databases and online measurement data.The data is organized in a time-lagged database as in Bartram and Mahadevan (G. Bartram & Mahadevan, S., 2013).This format facilitates structure and parameter learning in the DBN.For the actuator model, the values of system parameters ws β, ps and aleak are estimated.Because the data is synthetic, 20:1 signal to noise ratio (SNR) Gaussian white noise was added to all system response and load data to mimic measurement error.Further, as operational data becomes available during diagnosis, that data is used to update the distributions of the actuator parameters and responses as well as to estimate the parameters of the ARIMA model used in load history simulation.

Diagnosis
Diagnosis is performed as described in Section 2.3.The actuator was operated for 20 seconds with a leak occurring after 6 seconds.At this point, the system has already reached the steady state.Synthetic measurement data was generated using the Simulink model for two cases.In the first case, the data was not resampled (i.e. the sampling frequency remained In the second case, data was resampled at 2 samples per second.This was done to compare the effect of sampling rate on diagnosis and prognosis.The system responses and load were assumed to be measurable while the system parameters including wear rate and leakage area were assumed to be unobservable.Inference via particle filtering (Ns = 250) was performed on the DBN to obtain filtered estimates of the system state.
After obtaining the state estimate at cycle t, the probability of damage was calculated as in Section 2.2.If the probability of damage exceeded 95%, an alarm was triggered.The fault was then isolated and quantified.Figure 15 shows maximum a posteriori (MAP) estimates of the system responses against their measured values for the 2 samples/sec series (not shown are responses for the 4 samples/sec series that appear similar).
It is seen that the MAP system responses track the measured values closely.Figure 16 shows the MAP estimates of the system parameters for the 2 samples/sec series, including the leakage area, and load against the ground truth values.This figure shows how the leakage area changes with time and how well the filtering procedure can infer the value of the leakage area.The system responses in Fig. 15 are sensitive to changes in the supply pressure and leakage area.However, experimentation with the model revealed that these responses were insensitive to changes in the fluid bulk modulus.Changes in bulk modulus, however, may result in effects such as changes in wear rate that have not been included in the ground truth model.In both Fig. 15 and Fig. 16, the good estimates may be attributed to the use of an accurate physicsbased model, but also to the use of synthetic measurement data whose sampling rate matched the internal sampling rate of the generative model.Thus, the performance of filtering is favorably biased.

Diagnosis Uncertainty
Diagnosis uncertainty was quantified during filtering as in in Section 2.3.Figure 17 shows the damage probability as it evolves with time for the 2 samples/sec case.The damage probability passes the alarm threshold 0.5 seconds after the actual occurrence of the fault.
At the time of detection (t = 18 cycles), all particles for the 2 samples/sec series have a value of 1.038E-8 for the leakage area.This is in contrast to a ground truth value of 9.339E-9.
Note that all particles have the same value due to resampling.These leakage area values are the initial condition for prognosis starting at t = 18. Figure 17 may be used to signal an alarm that a fault has occurred.In some cases, this alarm can be used to initiate RUL estimation or other tasks such as inspection.

Prediction
After alerting the presence of a leak above the threshold of 7.578E-9 m 2 , estimation of the RUL distribution was performed as in Section 2.4.The RUL distribution assumes a failure threshold for leakage area of 4.123E-8 m 2 .The RUL distribution at the 25 th cycle is shown in Fig. 18 for the 2 samples/sec series.

Computational Effort
The above diagnosis was performed with 12 Intel Nehalem processor cores working in parallel.Each belief state update in diagnosis takes approximately 30 seconds.Similarly, each prediction step in prognosis takes approximately 30 seconds.The majority of the computational effort is directed towards solving the differential equations for the actuator in the Matlab Simulink environment.
Figure 16.MAP estimates of system parameters and load with ground truth and measured values (2 samples/sec)

Prognosis Validation
By continuing to estimate the new RUL distribution as new measurements become available, the performance of the prognostic algorithm may be evaluated.In Fig. 19, median RUL estimates are plotted against the ground truth RUL with +/-α bounds for the 2 samples/sec series.The +/-α bounds are selected to be +/-15% of the ground truth EoL about the current ground truth RUL. Figure 20 indicates whether the probability of the RUL estimate being between the +/-α bounds at a particular time is greater than a threshold value, taken as β = 0.6.From this information, it is also determined that the prognostic horizon (PH) is about 1.25 seconds.To determine the PH, the first time the probability mass of the RUL distribution lying between the +/-α bounds exceeds and remains above the threshold of β = 0.6 is at t = 13.5 sec is determined.Because the EoL is t = 14.75 sec., PH = 14.75-13.5 = 1.25 seconds.Similarly, Fig. 21 and Fig. 22 show the same plots for the 4 samples/sec series and have a PH of 6.9 seconds.The disparity in PH may be attributed to the sampling rate, as all else is the same.
+/-α bounds that narrow as the EoL approaches are considered for both sampling rates in Fig. 23  show the λ-α accuracy, which is a binary value that indicates whether the probability of the RUL estimate being between the +/-α bounds at a particular time is greater than a threshold value, taken as 0.8 here.The λ-α accuracy is mostly zero for the 2 sample/sec series, indicating that the RUL estimate is too diffuse to pass this test.However, the 4 sample/sec series performs reasonably well, although it reaches EoL prematurely.The higher sampling rate results in much better performance on this metric.Even at 4 samples/sec, performance is not ideal.This is likely due to the limited number of samples (250) used in particle filtering.

Discussion
The DBN-based methodology successfully integrates heterogeneous sources of information to diagnose the system and estimate RUL.Particle-filter based inference provides a seamless method for switching between probabilistic diagnosis and prediction while facilitating uncertainty quantification.
The prognosis validation results indicate that the methodology provides reasonable median estimates of RUL, even as the RUL density estimates are sometimes diffuse.The sampling rate of the measurements is a large factor in whether or not prognosis is successful.Inclusion of inspection data may reduce the uncertainty in the leak area estimate and thus the RUL estimate.The accuracy of prognosis, of course, will vary depending on the system, available information, measurement noise, loading conditions, and environmental conditions.
Computational effort is a persistent issue in particle-based methodologies, affected by the complexity of the system, models involved, simplifying assumptions, filtering algorithms, etc.The prognosis methodology described in this paper is flexible with respect to these decisions, so computational effort will vary.Reductions in computational effort may be achieved by using reduced-order models (e.g.linearized model of actuator), feature selection and dimensional reduction techniques, and improved particle filtering techniques (e.g.Rao-Blackwellized particle filter (Koller & Friedman, 2009)).Additionally, the ability to massively parallelize a particle filter using modern central processing units (CPUs) and graphics processing units (GPUs) provides the potential for greatly decreased computation times.The actuator problem would see immediate decreases in computational time with additional processor cores.
Thus far, the methodology has only been demonstrated using synthetic data, and needs to be tested further using real-world data.Further, more complex physics-of-failure models should be considered.

CONCLUSION
A methodology for DBN-based probabilistic prognosis that integrates heterogeneous information sources and diagnosis uncertainty is presented in this paper.First, expert opinion is used to establish the system definition and basic assumptions.Available reliability data is used to calculate conditional probabilities for fault indicator variables for damage at the support and a crack.Operational and laboratory data are organized in a time-lagged database and used for estimating actuator parameters including the wear rate.The system DBN model is used in online diagnosis via particle filterbased inference.Two sampling rates are considered to show the effects of sampling rate on prognosis.The particle-based state distribution also provides information on diagnosis uncertainty (probability of detection, isolation, and quantification).The particles resulting from filtering integrate seamlessly into a sequential Monte Carlo predictive procedure, used for estimating RUL distribution.The prognosis results are validated using a four-step hierarchical procedure.In the future, the methodology needs to be extended to systems of larger dimension and realistic data, thus requiring feature selection, dimensional reduction, more efficient inference, and massive parallelization using multicore CPUs and GPUs.Further, the full value of diagnosis uncertainty and RUL estimates obtained through the proposed prognosis methodology is realized during riskinformed decision-making for operations, inspection, maintenance, repair, and retirement.

Figure
Figure 1.Proposed Prognosis Methodology

Figure 4 .
Figure 4. Prognosis time indices: r*(t) is the ground truth RUL, tEoUP is the end of useful prognosis, dashed line depicts median r(t).

Figure 7 .
Figure 7. Distribution of remaining useful life.

Figure 6 .
Figure 6.Crack length predictions 60 cycles into the future.

Figure 5 .
Figure 5. PDF of crack length at time of detection.

Figure 9 .
Figure 9. Diagram of proposed prognosis methodology

Figure 15 .
Figure 15.MAP estimates and measured values of actuator position and velocity, servovalve position and velocity, and pressure in each actuator chamber (2 samples/sec) and Fig. 24 for λ = 0.5 and α = 0.20.λ = 0.5 considers the accuracy of the RUL estimate halfway between the time of prognosis and end of life, termed tλ.The bottom plots of Fig. 23 and Fig 24.

Figure 25 and
Figure25and Fig.26show the relative accuracy of the RUL density estimate based on the median RUL value, and shows that the median values are accurate.The 2 sample/sec series again lags the 4 sample/sec series in performance.Based on the relative accuracy and excluding any zero values, the convergence is estimated to be 19.95 for the 4 sample/sec series.Convergence is not calculated for the 2 sample/sec

Table 3 .
List of faults and affected parameters and subsystems

Table 4 .
Model parameters and variables for a spool valve and hydraulic actuator