A Hypothesis testing approach to Zero-Fault-Shot learning for Damage Component Classification

Often, in condition monitoring, datasets are asymmetric. That is, for most machines being monitored, there is no labeled fault data, only nominal data (hence, the dataset is asymmetric). Deep Learning and other neural network-based mechanization have difficulty solving this type of problem, as they typically require a full set of labeled data, both nominal and faulted. Zero-Fault Shot learning is a class of machine learning problems with no labeled fault training data. In this class of problems, only nominal data is used for knowledge transfer. In this paper, a mixed hypothesis testing and Bayes classifier it used to provide both inferences to the type of fault and also provide information as to when maintenance should be provided. This is done without any fault data and demonstrates knowledge transfer from a set of nominal components, greatly reducing the cost of implementation and fielding of a system.


INTRODUCTION
Condition monitoring facilitates predictive maintenance policies.For high-asset value equipment, such as a helicopter, predictive maintenance improves safety and operational readiness/asset availability.Safety is improved by identifying component damage that requires maintenance.Scheduled maintenance and inspection may not identify damaged components prior to failure.For example, the M250C47B turboshaft engine has 10 shafts, 12 gears, and 26 bearings.The service life of the turbine is 2000 hours, but other components are "on condition."That is, periodic inspection, a chip detector, and oil analysis are used to determine the engine's condition and when an overhaul is performed.But these inspections may miss damage due to contamination, improper maintenance, over speed, over temp, or over torque.Vibration monitoring is an accepted practice used to identify component damage that is occurring between overhauls or that are missed during an inspection.Being able to detect a damaged component, such as in the M250C47B engine, prior to failure clearly improves safety and improves operational readiness.
Predictive maintenance, or the estimate of a Remaining Useful Life (RUL), allows operators to plan maintenance better.That is, the helicopter operator can turn an unscheduled maintenance event, such as a "Chips Light" (a chip detector triggers a cockpit warning that metalware particles in the oil.This requires a maintenance action), into a planned maintenance event.Helicopters often have required inspections every 50 or 100 hours.Given a predictive maintenance indicator, the operator can: order parts, ensure the correct skill set is available to perform the maintenance and schedule the repair along with the already required inspection.
In some circumstances, condition monitoring and predictive maintenance may allow components to go "on condition" (a maintenance credit) or to extend the component time between overhauls (TBOs).The ability to change an existing maintenance practice through condition monitoring could greatly reduce operations and maintenance costs.The process extending TBO is covered in (SAE HR-1 Standards), while rules for airworthiness certification of a system to achieve a maintenance credit are outlined in the AC29-2C.However, achieving credit on an existing maintenance process is expensive and slow.
Alternatively, on new aircraft, the aircraft manufacturer may incorporate an inspection system that is based on condition monitoring.In effect, the manufacturer would develop two maintenance (Airlines for America, 2018).The first maintenance schedule would incorporate condition monitoring to extend intervals, and the second schedule, would be the fallback in condition monitoring systems data was not available.
In all cases, the condition monitoring system (for rotorcraft, usually this is called a Health and Usage Monitoring System, HUMS) needs to be able to identify the faulted component and RUL.The ability to additionally identify the failure mode can allow for better fault degradation models.For example, if it is known that the bearing fault is an inner race vs roller element or outer race, it is possible to estimate the spall length (Medvedovsky, et al, 2022).Or knowing the type of fault on a shaft can improve the outcome of the inspection.For example, knowing that a shaft is out of balance requires a balancing tool, whereas a high 3/Rev would indicate a replacement of the Thomas coupling.
Given the expense of manual fault detection and the level of expertise required, a number of data-driven detection have been proposed to reduce these costs and automate the process of fault detection.As mentioned, most of the time, there is limited or no labeled data available for training for vibrationbased fault detection.To address this, the concept of Zero-Shot Learning (ZSL) has been proposed.ZSL has been widely used in image classification for its ability to recognize objects from a new distribution (e.g., images in nights instead of morning).The ZSL, in this case, is, using nominal data from machine A, we classify a fault, such as a bearing spall without any fault example, on both machine, A, and B (figure 1).For application in mechanical diagnostics (Zhang, Wei, 2022) highlights the limited availability of fault datasets and the cost and time required to collect such data.As with other ZSL approaches, (Zhang, Wei, 2022) proposes a data augmentation strategy and transfer learning (Zhang et al 2022).In transfer learning, the diagnostic models reuse the previously learned knowledge is applied to the new diagnosis task, so that accurate fault identification can also be achieved using a few faulty samples.
For example, consider a bearing.There are at least four failure modes: cage, ball, inner, and outer race fault.Bearing analysis can be performed using envelope analysis (Abboud, et al, 2017).Envelop analysis used the spectrum of the absolute value of the Hilbert transform of a heterodyned signal.The condition indicator (CI) for an outer race fault would be the spectral power associated with outer race frequency, the BPFO (Ball Pass Frequency Outer).To represent an inner race fault, the power spectrum at the BPFO would be "Transferred" to the BPFI.There may be some scaling to the CI to take into account the difference in the transfer function from the fault to the sensor and to account physically that the inner race is modulated by the shaft rate (usually about half of the energy for a similar length spall).
Figure 1 The suggested mechanization for zero-fault-shot learning.
However, for most applications, there is no fault data whatsoever.In these cases, an alternative approach to transfer learning is proposed.This paper uses a non-gaussian hypothesis test to establish a health index threshold from which the individual CI thresholds are calculated.This, in turn, is used as the input of a Bayesian classifier for determining the fault class.The configuration data is then transferred to all other similar assets (e.g.Bell 407, 407GX or 407GXi).This is, in effect, true ZSL as it is based solely on nominal data, greatly reducing the cost of HUMS deployment.
The main contribution of this paper is it demonstrates, using a mathematical model for nominal case, that a statical approach of nominal data can be transferred across different machines, to allows classification on both machine A, the training machine, and machine B, a machine on which we have no information, as illustrated in figure 1.

BASIS OF THE HYPOTHESIS TEST MODEL
Consider that many CI based on the spectrum.That is, the Fourier transform is applied to a time-domain signal, and the magnitude is taken.The CI is then for some frequency (k), for example, the BPFO: For a nominal bearing, the distribution of the real and imaginary part of the Fourier transform of k is Gaussian.Then, the distribution of k can be modeled as: For the nominal condition, f(X) is the real part and f(Y), the imaginary part, are independent, such that P(X,Y) = P(X)P(Y).Then the joint probability function is: This probability distribution function (PDF) is in terms of the real and imaginary parts of the Fourier transform.However, one needs the PDF of magnitude M, as this is the CI in eq. 1.
If F is defined as phase, then X = M cos(F), and Y = M sin(F).Setting dXdY = M dM dF, the joint probability function is: As phase is independent of magnitude and uniformly distributed, then: As such, the probability distribution of f(M) can be shown to be (Proakis, 1995): This defines a Rayleigh distribution, representing the CI for a signal failure mode, such as the inner race of bearing.However, as there are multiple failure modes (as noted for a bearing: cage, ball, inner or outer race faults), the strategy is to define the health of the component at the normalized energy of the component.That is, the Health will be defined as a function of n CIs normalized by power: Where Y is a vector of CIs that have identical and independent PDFs, while crit, is the critical value for a given probability of false alarm (PFA).This assumption is enforced through the use of a whitening transformation using a Cholesky decomposition (Bechhoefer et. al, 2011).The Cholesky decomposition of a Hermitian, positive definite matrix results in A = LL*, where L is a lower triangular, and L* is its conjugate transpose.By definition, the inverse covariance is positive definite Hermitian, where S is the covariance of the CIs, such that: The critical (crit, eq. ( 9)) value is taken from the ICDF for the HI for a given probability of PFA.For this paper, this is 1e-6.As noted, CIs are assumed to have Rayleigh-like PDFs (e.g., heavily tailed).For Gear CIs and Bearing CIs (where magnitudes are biased by root mean square (RMS)), a transform is used to make the CI more Rayleigh.The transform "left shifts" the CI.For example, a shift such that the .05CDF (cumulative distribution function) of the CIs is assigned to 0.0.
Note that the PDF for the Rayleigh distribution uses a single parameter, b, defining the mean µ = b (p/2) 0.5 , and variance As a result of applying the whitening transform, the value for b for each CI will then be: such that: For the HI equation in ( 9), the normalized energy of the CIs, it can be shown to be a Nakagami PDF (Bechhoefer, He, Dempsey, 2011).The descriptive statistics for the Nakagami are h = n, and w = 2n /(2-p/2), where n is the number of CIs used in the HI calculation.
The HI provides actionable information as component health.
The HI, as a hypothesis test, rejects the null hypothesis that the component is nominal.Measurements/acquisitions provide evidence of degradation and alert operations and maintenance personnel to the need for maintenance.From a maintainer perspective: • The HI reflect the current component's damage, where the probability of exceeding an HI of 0.5 is the PFA.• A warning (yellow) alert is generated when the HI is greater than or equal to 0.75: maintenance should be planned.• An alarm (red) alert is generated when the HI is greater than or equal to 1.0.Continued operations could cause collateral damage.• The threshold-setting process ensures that the probability of a false alarm is exceedingly small when the HI reaches 1.
A component with a HI value does not define a probability of failure for the component nor that the component fails when the HI is 1.0, as the model is built around the alpha error (probability of false alarm), not beta error (probability of missed detection).Instead, defining maintenance at an HI of 1 initiates a proactive policy to change operator behavior.

Zero-Shot Learning: Knowledge Transfer
In most ZSL strategies, given one fault, knowledge transfer is used to model a different type of fault.In this hypothesis testing model, learning is performed through the question: over a set of CIs, what would the alarm threshold be for CI i, given that the other CIs were at their nominal value?
In the process of calculating the CI covariance, it is a simple matter to calculate the mean value for each CI.Then for each CI in the HI, an optimization problem is solved to minimize the error of the CI value for a warning or alarm limit.For example, consider the case of a high-speed bearing in a wind turbine.The mean values and offset (correction to make the PDF more Rayleigh like) for the cage, ball, inner and outer race are for the nominal data, and alarm limit (calculated for the optimization listed in figure 2) are respectively.

The Bayes Classifier
For a simple decision space, e.g., the bearing is nominal, or the inner race is damaged, P(Hi|z) is the probability that Hi is true given measured CI observation, z (bold indicates that z could be a vector of CI data).The correct hypothesis is the one corresponding to the largest probability of the n possible states of the component.The decision rule will be to choose Ho (the null hypothesis) if: P(Ho|z) > P(H1|z), P(H2|z),... P(Hm|z) (12) Else choose the decision space (e.g., damaged bearing due to a fault mode) with the greatest P(Hi|z).The null hypothesis P(Ho|z) will represent the default case of a nominal component.
As an example, consider the binary case, where the decision rule becomes: This notation means that if the ratio is greater than 1, reject the Null hypothesis.

Figure 2 Process Flow to Calculate Statistics for Bayes Classifier
This is the maximum a posteriori probability criterion, where the selected hypothesis corresponds to the maximum of two posterior probabilities.Using Baye's theorem: ,  = 0,1 where P(Hi) is the probability of Hi based on the measured CI observations (e.g.parameter data), such that: Rearranging terms, the test is then: .(|' ! ) .(|' " ) Using ( 16), one can now define the likelihood ratio as: l(z) = p(z |H1)/p(z |H2).Because the likelihood ratio is continuous and differentiable, the natural log can be taken.As the log is monotonically increasing, the log likelihood ratio test becomes:

The Bayes Classifier Using the Normal Distribution
The classifier uses the Normal distribution with a n dimension decision space.This decision space describes the parameters associated with the ZSL algorithm.
The Null hypothesis: H0.It is defined as the mean of the parameter vector space, m0, representing the mean CI values for the nominal component.The probability distribution function for the parameter vector, z, given H0 is defined by as: find value that minimizes er such that: While an alternative hypothesis (e.g. for i = 1 to n -1 fault types) is: The normalized distance squared measured between z and any m is: Substituting the distance function into (17) gives the loglikelihood ratio test: The most likely state of the component is the case where the normalized distance squared between z, and m0 (plus an offset that represents the log ratio of test case probabilities) is greater than the normalized distance between z and m1.

EXAMPLE PROBLEM: AUX DUPLEX BEARING
When developing a HUMS for an helicopter type, such as the Bell 407GXi, the system must first be receive a supplemental type certificate (STC) from the Federal Aviation Administration.There is, at this point, capability for developing the CIs for a component, but no configuration for fault detection.The aircraft after, two or three hours of flight time, generates CIs.In general, one wants enough data to estimate the covariance of the CIs (eq.10).
After sales of additional kits, say three aircraft, the covariance, offset, mean value and alert values are updated.With this small fleet, there is now a measure of the within aircraft variance, and between aircraft variance.When new aircraft systems are sold, such as the 407 (older analog aircraft) or the 407GX (first digital cockpit 407), they and all other aircraft with a similar transmission receive the same configuration.In this way, knowledge transfer of the configuration occurs.
In September 2020, a HUMS was installed on a new 407GXi.
The system configuration was developed Mar 2018 and later updated in Jan 2019.When a new HUMS is installed, there is a review of the system to check for correct functionality.
The review showed, occasionally, large energies on the Auxiliary Duplex bearing on the transmission (figure 3).
Note that Figure 3 is a full year's worth of data and that initially, only a few hundred acquisitions were available.
Recall that the rule is that maintenance is scheduled when the HI is greater than 0.75, and maintenance is recommended when the HI is 1.0.This decision is based on the filtered HI (burnt red line in Figure 3), as it is acknowledged that the data can be noisy.Using the Bayes Classifier (eq.21) with configuration transferred from the original configuration a year prior (table 1), the decision classes are: 1 is nominal, 2 is a cage fault, 3 is a ball fault, 4 is an inner race fault, and 5 is an outer race fault.
Figure 4 shows CIs used in the HI calculation in figure 3.One question is, why is the data so noisy?Consider that the Aux bearing is in the lowest part of the gearbox.As this was a new gearbox, a) it was unlikely that the bearing was damaged, and b) one could hypothesize that there were meal wear particles in the oil.As a ball rolls in its grove, metal particles are occasionally captured between the ball and the races.This causes the larger "peaks".
After approximately 120 hours of run time, the bearing HI Trend was approaching 0.75.The decision was made to replace the gearbox oil and flush the gearbox (scheduled with an existing inspection).It was thought that this would remove the wear particles in the oil, and reduce the HI/ball CI.This did work (see figures 3, and 4), but evidently, the damage had been done.After a month (index 2800) the bearing HI, and ball CI increased.In figure 5, the probability of a ball fault is increasing.As the aircraft was coming for an annual inspection (at index 3312), the decision was made to replace the bearing (Figure 6).This was a particularly difficult bearing fault as if it were an inner or outer race, or a rolling element vs. a ball, the CI and resulting HI would not be so time-dependent and random.

CONCLUSION
Zero-Shot Learning is essential for a commercially successful monitoring system deployment.While seeded fault testing is a powerful aid for learning and system evaluation, it is not practical for most applications as the expense and time required would be unacceptable.The lack of, basically, any fault data renders most ZSL techniques as not practical.This is especially true for aircraft where the design of reliability of the systems will, by definition, makes fault rare and dataset "long-tailed." What is available to practitioners of condition monitoring is nominal data.What is presented is a ZSL approach based on hypothesis testing to define configuration statistics which can be used by a Bayes classifier for not only component health determination but also fault classification.This knowledge transfer is demonstrated in that configuration was developed initially on three Bell 407GX aircraft, was applied to new Bell 407GXi aircraft, and successfully detected a bearing fault.
Demonstration of ZSL will allow acceptance for aviation systems to go "on condition."Acceptance means that both manufactures of HUMS and aircraft Original Equipment Manufactors that buy those systems will work towards a certification process to allow for maintenance credit.This will facilitate urban air mobility, which is more cost-sensitive to maintenance than other aviation sectors.

Figure 3
Figure 3 Aux Bearing Health from Sept 2020 to Sept 2021

Figure 4
Figure 4 Aux Bearing CIsThere were approximately 20 acquisitions per hour.In figure5, the upper plot shows that the classifier identifies a damaged ball in the bearing.This is further confirmed in figure4.The lower fault shows the probability of class 3 occurring in an hour.This represents the probability of ball fault in the bearing.

Figure 5
Figure 5 Top: Decision Class from Bayes Classifier, Bottom: Probability of a Ball Fault

Figure 6
Figure 6 Aux Bearing with Ball Spall This repair returned the bearing HI to a nominal value and subsequently flew six hundred hours with no problems.

Table 1
Calculated Statistics for Example Problem