A Gentle Correctness Verification of the Theory of Uncertain Event Prognosis to Compute Failure Time Probability

Prognosis of the time of first occurrence of events, particularly failures, is a problem that has been studied and analyzed from different perspectives, being referred to as the “first-hitting time” or “first-passage time” problem, among other names. Within the Prognostics and Health Management community, as well as in other disciplines, the probabilistic benchmark for computing the Time-of-Failure consists of Monte Carlo simulations where the failure event is defined as the moment when the health of the monitored system, typically described by a health indicator, crosses a predetermined threshold that denotes a region known as hazard zone. More recently, this problem was formalized through the introduction of new probability measures based on the concept of uncertain event, which correct widely used expressions for these purposes, and extend the typical threshold criterion for declaring a failure (or other event) to a broader notion of uncertain event likelihood. Following a step-by-step approach supported by code programmed in Python language, this paper verifies the correctness of the aforementioned probability measures, illustrating with a simple example how exactly the same results are obtained using either these new probability measures or the universally accepted benchmark.


INTRODUCTION
Prognosticating the time at which an event will occur in the future is a concurrent problem for a wide scientific spectrum, including: physics (Nyberg, Ambjörnsson, & Lizana, 2016), chemistry (Szabo, Schulten, & Schulten, 1980), biology (Hu, Cheng, & Berne1, 2010), neurobiology (Tuckwell, 1988), epidemiology (Lloyd & May, 2011), psychology (Navarro & David Acuña-Ureta et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Fuss, 2009), finance (Bakshi & Panayotov, 2010), economy (Abbring & Salimans, 2021), reliability engineering (Pieper, Dominé, & Kurth, 1997), and Prognostics and Health Management (PHM) (Daigle & Goebel, 2013), to name some examples (Redner, 2001). It can often be found in the literature under the name of first-hitting time or first-passage time. Research in the literature often shows thresholds to declare the occurrence of events (Redner, 2001), although the concept of non-threshold-triggered events under the notion of uncertain hazard zones had been introduced years ago in failure prognosis (Orchard & Vachtsevanos, 2009). The probability distribution for the occurrence time of non-threshold-triggered events was finally rigorously formalized in a recently published work (Acuña-Ureta, Orchard, & Wheeler, 2021). This problem is central in the Prognostics and Health Management (PHM) engineering discipline, although, unlike other disciplines, there are applications where the difficulty of requiring to solve the problem in real-time is added, which turns the prognostic problem into a very challenging task. Consequently, the scientific articles that refer to the study of faults in systems show a noticeable inclination to address aspects of anomaly detection and fault diagnosis, in much greater proportion than the amount of articles that refer to fault prognosis.
The PHM engineering discipline is born and is strongly motivated by applications (Goebel et al., 2017). Adding to this the growing popularity of machine learning methods, most of the research in PHM has been oriented to the use of these datadriven techniques to monitor the health condition of systems. Due to these and other reasons, research in more fundamental aspects in terms of theory still needs to be addressed and matured. The drawback of not proceeding on a solid theoretical basis is that mathematical guarantees and fundamental limits of performance cannot be established. Furthermore, reflection on research results seen purely from the point of view of intuition can lead to biased conclusions, as evidenced earlier in (Acuña & Orchard, 2018) regarding failure prognosis.
In this article, fundamental PHM issues are addressed under an approach that seeks to be pedagogical and invites reflection, which are: What are the theoretically correct ways to calculate the probability distribution of the failure time when prognosticating? How can this calculation be carried out? Can we verify the correctness of the Theory of Uncertain Event Prognosis (Acuña-Ureta et al., 2021) in light of the answers to the previous questions? Different ways to calculate the probability distribution of the time of occurrence of a future system failure are presented and analyzed in Section 2, where the inconsistency of one of them (frequently used) is also exposed and it is explained how it leads to to wrong results based on a biased intuition. Section 3 explains step by step, with a concrete example of fatigue crack growth, how to implement an approach based on the Theory of Uncertain Event Prognosis and it is verified that it leads to correct results when compared with the ground truth given by definition. Finally, conclusions are provided in Section 4.

TIME-OF-FAILURE PROBABILITY DISTRIBUTION
There are two widely known ways used in PHM-related articles to characterize the time of occurrence of a future event in prognostic routines: Whereas some authors prefer mathematical definitions, others prefer to use a more intuitive approach. However, intuitive approaches may present theoretical inconsistencies and may not coincide with the results obtained from strict mathematical analysis. A third form has emerged (Acuña-Ureta et al., 2021), developed with mathematical rigor, that generalizes the prognostic problem and leads to the same results that are obtained by definition. This third way generalizes the problem using the concept of uncertain events. These three different approaches are now presented and discussed for clarity purposes.

Prognostics: Definition of Time-of-Event
Following the same definition widely accepted in various disciplines of engineering and science, the time of occurrence of an event is defined from the hit with a threshold experienced by a variable of interest for the first time. In the context of PHM, you would have: 1. A health indicator Y k (random variable subjected to sources of uncertainty) as a variable of interest.
2. A thresholdx, which determines which values of the health indicator correspond to a fault condition and which do not.
3. An event E, which would be the occurrence of a system failure, which is triggered once the health indicator crosses the failure threshold coming from an operational health condition.
With all these elements, the time τ E in which the failure event E occurs for the first time in the future, considering k p as the present time, can be defined as (Redner, 2001;Daigle & Goebel, 2013) The wide and rigorous acceptance of this definition, adopted even in the community of applied mathematics, poses a reliable base as a benchmark to be able to validate or refute any other type of characterization.
It is very important to note that this definition conceptually and mathematically describes τ E as a random variable, since the system failure depends on a health indicator that depicts a stochastic process provided its evolution is subject to sources of uncertainty. Nonetheless, this definition does not explicitly describe the probability distribution of τ E . The default method for computing its statistics is through Monte Carlo simulations, which is described later in Section 3.1.

Prognostics: An intuitive yet biased approach
Consider that the probability distribution for the failure indicator of a system at the current time, which we denote as k p . Under this scenario, it is not straightforward to deduce how the probabilities of τ E are calculated, the latter being defined according to Equation (1). Figure 1. Illustration of probability mass quantification crossing a threshold in a system with monotonic degradation.
For now, a first step towards calculating the probabilities of τ E would be to propagate the uncertainty about the health indicator into the future, as shown in Figure 1.
Suppose that the health indicator Y k corresponds to the length of a crack. We could define that when a certain length, or threshold, is reached, then a system failure event occurs.
Therefore, a graph can be constructed with the probability mass crossing the threshold at each time in the future, as shown at the bottom of Figure 1. Note that the graph describes an increasing curve starting at zero and reaching the value one. Naturally, the first intuition would be to establish that this curve corresponds to the Cumulative Distribution Function (CDF) of τ E . Mathematically, this could be expressed as: Up to this point the intuition seems to make sense, although you need to understand the assumption behind it. For the aforementioned probability curve to be increasing, it is required that the dynamics of the health indicator be monotonic.
In other words, monotonic degradation is a requirement. But what would happen if this assumption did not hold, i.e., if the system experiences regenerative phenomena? This new situation is illustrated in Figure 2.
Figure 2. Illustration of probability mass quantification crossing a threshold in a system that presents regenerative phenomena.
In this situation, the probability curve is not increasing, so it cannot be interpreted as a CDF. If it is not the CDF of τ E , then what is it? We know for a fact that, for example, systems such as batteries present regenerative phenomena, and even so they have time of failure and therefore a CDF associated with it. Is it that in those cases this is not the way to calculate that CDF? What would be the methodology then? Clearly there is a valid question here, and unfortunately it has not been answered by the community yet. Elaborating an answer would require really understanding how the equality established in Equation (2) is justified, as well as the assumption of monotonic degradation. However, as far as we know, there is no mathematical proof to support this calculation (this is why we refer to it as "intuitive"), although it has been widely applied in articles dealing with the prognostic problem, specially in particle-filtering-based prognostic frameworks. This approach to calculate the probability distribution of τ E does not need to have a mathematical proof to be refuted. It would be enough to show a counterexample and empirically verify that it is not correct, even under the monotonic dynamics hypothesis that it requires. Let us assume the following linear system that characterizes a health indicator Y k : where k p = 0, a = 1.2 is a fixed constant, and ω k ∼ N (0, σ ω 2 ) is Gaussian white noise with standard deviation σ ω = 0.1. We can calculate the resulting probability distribution of τ E following the intuitive method hereby presented and compare it with the benchmark that is established by definition in Section 2.1. This can be done with great precision, since the health indicator is represented by a Gaussian linear system, so the probability mass that crosses the threshold can be calculated analytically. The results are shown in Figure 3 and show that the results differ one another, from which it is concluded that the intuitive approach must be ill-defined, and obeys a biased intuition.

Prognostics: Theory of Uncertain Event Prognosis
Although τ E is well defined in Section 2.1, that definition does not explicitly indicate how the probability of event E (system failure in this context) occurring in time k is calculated, that is, the calculation of P(τ E = k), k > k p , is not made explicit. In (Acuña-Ureta et al., 2021), expressions for performing this computation in both continuous-time and discrete-time are derived and rigorously demonstrated; in addition, the notion of "uncertain event" is introduced in the context of prognostics. All this gives rise to what we will know as Uncertain Event Prognosis Theory. Next, we will present in a very simple way how this calculation is carried out for the discrete-time case. Later in Section 3, simulations explained step by step are carried out in order to verify empirically that the expressions of this theory lead us to exactly the same results that are obtained by definition, according to what is established in Section 2.1.
A qualitative notion of event, denoted by E, is required to obtain a probabilistic expression for its future occurrence time. In PHM, we can define A binary stochastic process {E k } k∈N is defined so that, at each time k, it indicates whether or not the event E occurs with a certain probability. Thus, where E c denotes the non-occurrence of E. Suppose that k p ∈ N is the present time and it is sought to determine the moment in the future at which E will occur for the first time, i.e., we want to characterize τ E = τ E (k p ). This time instant can be formally defined as In other words, under this formulation the random variable τ E indicates Time-of-Failure. Considering that the occurrence of E at time k depends on the health indicator, denoted as y k , the probability distribution of τ E is given by (Acuña-Ureta et al., 2021): . . . p(y kp+1:k )dy kp+1:k .
In simple words, this expression states that the probability of system failure occurring for the first time in a future instant of time k, k > k p , can be computed from averaging all the pos-sible future evolution for the health indicator, evaluating how likely it is that there is system failure at time k and that it has not occurred before. Two elements are required to compute Equation (9): • Joint probability density of future health indicators p(y kp+1:k ), which is given by the dynamic model of the system.
• An uncertain event likelihood function P (E k = E|y) : This probability function can be interpreted as how likely is the system to incur failure at time k, given a health condition y ∈ Y.

GENTLE VERIFICATION OF FAILURE TIME PROBA-BILITY DISTRIBUTION COMPUTATION
This section explains step by step how the computation of the probability distribution of the time of occurrence of a future event is carried out in two different ways: the former presented in Section 2.1, which follows a classic approach where the event is triggered because a variable of interest crosses a threshold, and the latter, where the Theory of Uncertain Event Prognosis presented in Section 2.3 is adopted using an uncertain event likelihood function. If both ways are equivalent, they logically should lead to the same results.
We start by importing -part of-some libraries, such as numpy, matplotlib and scipy, which allow us to use some basic math functions, to plot and to work with Gaussian probability distributions, respectively: #### Import libraries import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm Without lost of generality and in order to maintain a pedagogical tone, in this section we will use as a case study a simple fatigue crack growth model for prognostics, which is one-dimensional and lacks exogenous inputs. The dynamic model for the case study is as follows: where ω k ∼ N (0, σ 2 ω ) and ν k ∼ N (0, σ 2 ν ) are white Gaussian noise, and C, β and n are fixed constants. The variable x k denotes the actual crack length, whereas y k denotes a measurement of x k , that is corrupted with noise.
A criterion must be defined to declare the occurrence of an event, which in this context will be understood as a failure event. It will then be assumed that such an event is defined as follows: E = "Measured crack length greater than or equal tox", wherex corresponds to a maximum tolerance for the length of a crack before declaring a failure event. On the other hand, when prognosticating, we start from a present time or cycle that we denote by k p , and end in a horizon time or cycle, which we denote by k h . Therefore, parameters must be defined for the model and for the simulations: Making a parallel between the parameters declared in the script and the mathematical notation, we have that • C: Equivalent to C.
• n: Equivalent to n.
• kp: Equivalent to k p .
• kh: Equivalent to k h .
• cycles: List of cycle numbers, starting from cycle k p and ending at k h .
• x: List to store actual crack lengths as a function of the cycle number.
• y: List to store measured crack lengths as a function of the cycle number.
• N: Equivalent to N , the number of Monte Carlo simulations.
We can also define functions for Equation (10), of state transition, and for Equation (11), of measurement: The function state transition implements Equation (10). It receives the crack length in cycle k as an argument, i.e. x k , takes a realization of the noise ω k , and returns a new crack length for the next cycle k + 1, i.e. x k+1 . Similarly, the function measurement implements Equation (11). It receives as an argument the length of the crack in cycle k, i.e. x k , takes a realization of the noise ν k , and returns the length of the crack that would be recorded for cycle k, i.e. y k , assuming that a sensor with precision σ ν was used.
With all the elements exposed, we proceed below to explain step by step how to prognosticate the probability distribution for the future occurrence time of the event E (see Equation (12)) using Monte Carlo simulations under two different approaches: a classic one based on hitting a threshold and another based on the notion of uncertain event.

Prognosis conceiving threshold-triggered events
The most classic approach, and the one that constitutes the default benchmark to characterize the probability distribution of the time of future occurrence of an event, is to proceed by definition as explained in Section 2.1. Nonetheless we need a computational method to compute the probability distribution. The most standard procedure consists on performing Monte Carlo simulations and assume that the event is triggered by the hit of a variable of interest with a threshold. Recall that the time of the future event triggered by a threshold is defined mathematically as In simple words, this expression establishes that τ E = τ E (k p ) is the time or cycle greater than k p in which a variable of interest, the measured crack length Y k in this case study, reaches for the first time a value greater than or equal tox, the threshold. Naturally, τ E is a random variable since Y k is a random variable. As a consequence, there is a probability distribution associated with τ E .
In various scientific disciplines, including the PHM community, this mathematical expression is conceptually understood and simulations are performed to try to characterize the probability distribution of τ E based on this conceptualization, which are explained below. The future event can occur at any moment between k p and k h , and each moment will have an associated probability of occurrence. Therefore, we define a list (or ar-ray) ToE prob threshold where these probabilities are stored.
Figure 4. Illustration of a single possible trajectory for the measured crack growth. As soon as the length of the crack hits the threshold, i.e. y k ≥x, for any k ≥ k p , the event E is declared to have occurred at cycle k. If we repeat this procedure N times (with N → +∞), a histogram can be constructed, which would approximate the probability distribution of τ E with arbitrary precision. Note that the curve is not increasing since it is corrupted by measurement noise.
To calculate the probability distribution of τ E , N possible ways (with N → +∞) in which the measured length of the crack can evolve are simulated, to which we refer to as trajectories. Figure 4 illustrates the simulation of a single trajectory. Each of these simulations are iterated in the outer for loop, associating each trajectory with a particular value taken by the variable i. The evolution of each trajectory is then built within the inner for loop indexed by k, that indicates a cycle. Each simulated trajectory leads to a time or cycle in which the threshold is hit for the first time, that in the script is checked with the if condition. By storing this information, a histogram can be constructed to characterize the probability distribution that is sought.

Prognosis conceiving uncertain events
This section explains how to compute the probability distribution of the future occurrence time of an event (system failure) using the Theory of Uncertain Event Prognosis presented in Section 2.3, taking the previous fatigue crack problem as an illustrative example. Since we already have the dynamic model given by Equations (10)-(11), we only need to define the uncertain event likelihood function: Hence, Equation (9) can be written as . . . p(y kp+1:k )dy kp+1:k .
It can be noted that while maintaining the notion of a threshold-triggered event, the Theory of Uncertain Event Prognosis still explicitly provides a way to calculate the probability distribution of τ E . Moreover, it can be shown that implementing Equation (15) is equivalent to following the procedure in Section 3.1. However, going a little further, we will work on the notion of an event so that it is conceptually triggered without a threshold. Let us note that where z ∼ N (0, 1), and Φ µ k ,ση (·) depicts the CDF of a Gaussian distribution of meanx and standard deviation σ ν .
Figure 5. Illustration of a single possible trajectory for the actual crack growth. There is no longer threshold to hit, but there is a likelihood for the declaration of occurrence of the event E at each cycle. The probability with which one of these trajectories triggers the event at cycle k for the first time is equal to the probability of event occurrence at time k multiplied by the probability of non-occurrence of the event in previous cycles. If we repeat this procedure N times (with N → +∞), we can average the results to approximate the probability distribution of τ E with arbitrary precision. Thus, Note that the definition of the uncertain event likelihood function in turn produces an important difference between Equation (15) and Equation (20). In Equation (15), there is integration with respect to the future trajectory of the measured crack length y k (health indicator) and the event is declared once this variable exceeds the thresholdx, as shown in Figure 4. In contrast,in Equation (20) there is integration with respect to the future trajectory of the actual crack length x k and the event has a likelihood of occurrence (hence the concept of the uncertain event likelihood function), as shown in Figure 5.
The emergence of a likelihood function in this case is because the measurement noise is no longer simulated, as it is in the noisy trajectory for y k shown in Figure 4. In contrast, the uncertainty of the measurement noise (its precision) is now analytically incorporated into the calculations, so the region that denotes the hazard zone with red color, now seems to have a color hue, as shown in Figure 5. The more intense the red color, the greater the probability of occurrence of a system failure event.
As in Section 3.1 above, we proceed with the standard method of Monte Carlo simulations to compute Equation (20). However, we need to define a new function that expresses our uncertain event likelihood function: def uel_funtion(x): # uncertain event likelihood return norm.cdf (x, x_bar, sigma_v) Additionally, Equation (20) can be approximated with Monte Carlo simulations as Then we proceed to show how to compute the probability distribution of τ E . The future event can occur at any moment between k p and k h , and each moment will have an associated probability of occurrence. Therefore, we define a list (or array) ToE prob uelf where these probabilities are stored.
We use Monte Carlo simulations to approximate the probability distribution of τ E with Equation (21), so N possible ways (with N → +∞) in which the actual length of the crack can evolve are simulated, to which we refer to as trajectories. The i th trajectory is a sequence of possible actual crack lengths {x Each of these values is randomly computed using the function state transition and the value obtained in the previous cycle as argument. In the variable prob E we store Φx ,σν (x (i) k ), while in the variable prob nE past we store Figure 5 illustrates the simulation of a single trajectory. Each of these simulations are iterated in the outer for loop, associating each trajectory with a particular value taken by the variable i. The evolution of each trajectory is then built within the inner for loop indexed by k, that indicates a cycle. For each simulated trajectory there is a likelihood of event occurrence at a time k.

Comparison between both approaches
To verify that the methodology based on the Theory of Uncertain Events, shown in Section 2.3, effectively leads to exactly the same conclusions as proceeding by definition, as shown in Section 2.1, we proceed to obtain approximations with Monte Carlo simulations by executing the algorithms described in Sections 3.1 and 3.2. Results are generated at different numbers of simulations, denoted by N , in particular for N ∈ {10 2 , 10 3 , 10 4 , 10 5 }. These results are presented in Figure 6, where ToE prob threshold and ToE prob uelf are plotted against cycle.
From Figure 6 it can be seen graphically that, as the value of N increases, it is corroborated that both algorithms converge to the same probability distribution for τ E , thus ratifying the correctness of the Theory of Uncertain Event Prognosis in this case study. It can also be seen that there is an apparent slightly higher rate of convergence with the algorithm described in Section 3.2. This is because while in the algorithm in Section 3.1 the noise that affects the health indicator is simulated taking random realizations, in the algorithm in Section 3.2 that noise is analytically incorporated beforehand into the calculation.
It is important to remark that the Theory of Uncertain Event Prognosis is much more general and its correctness has already been mathematically proven regardless of the case study, since it is stated in a general and abstract way. However, through this specific comparison, the veracity of this result can be more evident, especially for those readers who have a more applied background and do not have advanced knowledge in mathematics that allows them to understand the theoretical background. Figure 6. Computation of the probability distribution of τ E using the algorithms shown in Sections 3.1 (in yellow) and 3.2 (in blue). As can be seen, both algorithms converge to the same result as N → +∞, thus verifying the correctness of the Theory of Uncertain Event Prognosis in this particular example.

CONCLUSION
Three different ways of calculating the time to failure of a system have been presented in this article; by definition (benchmark widely accepted in the scientific community in general), according to an intuitive approach, and finally through the Theory of Uncertain Event Prognosis. Taking into account the approach by definition as ground truth, it is shown that the second approach actually corresponds to a biased intuition, having valid conceptual arguments that question it and also evidencing that in practice the results it leads to do not match what establishes the ground truth. On the other hand, the Theory of Uncertain Event Prognosis is also tested. For this, Monte Carlo simulations are carried out, showing in a very simple way how to carry out an implementation in the Python programming language. It is graphically verified that the simulations do indeed converge to the same results as the ground truth.
In Section 3.2 it was shown how it is that a definition of the uncertain event likelihood function can be re-expressed, thus transforming the prognostic problem. Within the future re-search work, it is intended to explore how it is that these likelihood functions allow to assimilate uncertainty analytically. It is also intended to explore the potential computational savings that this can mean, especially for real-time applications.
Given that the Theory of Uncertain Event Prognosis was originally presented in a very abstract way with advanced mathematics, its potential has not yet been exploited. It was necessary to show its practicality with concrete implementations in order to establish a bridge between theory and practice, for which this article is expected to contribute.