Health Index Generation based on Compressed Sensing and Logistic Regression for Remaining Useful Life Prediction

Extracting suitable features from acquired data to accurately depict the current health state of a system is crucial in data driven condition monitoring and prediction. Usually, analogue sensor data is sampled at rates far exceeding the Nyquist-rate containing substantial amounts of redundancies and noise, imposing high computational loads due to the subsequent and necessary feature processing chain (generation, dimensionality reduction, rating and selection). To overcome these problems, Compressed Sensing can be used to sample directly to a compressed space, provided the signal at hand and the employed compression/measurement system meet certain criteria. Theory states, that during this compression step enough information is conserved, such that a reconstruction of the original signal is possible with high probability. The proposed approach however does not rely on reconstructed data for condition monitoring purposes, but uses directly the compressed signal representation as feature vector. It is hence assumed that enough information is conveyed by the compression for condition monitoring purposes. To fuse the compressed coefficients into one health index that can be used as input for remaining useful life prediction algorithms and is limited to a reasonable range between 1 and 0, a logistic regression approach is used. Run-to-failure data of three translational electromagnetic actuators is used to demonstrate the health index generation procedure. A comparison to the time domain ground truth signals obtained from Nyquist sampled coil current measurements shows reasonable agreement. I.e. underlying wear-out phenomena can be reproduced by the proposed approach enabling further investigation of the application of prognostic methods.


INTRODUCTION
Main objective of feature extraction in diagnostics and prognostics is to identify and isolate fault or degradation specific information in signals of technical applications (Jardine, Daming Lin, & Banjevic, 2006) (e. g. vibration, current, voltage, flow, . . . ).This usually involves A/D-conversion at appropriate sampling rates and further processing via time, time-frequency or frequency-domain methods (Lei et al., 2018).The output of these methods contains in general an already more condensed form of the desired information but also unwanted redundancies.A subsequent compression or dimensionality reduction has to be employed to get rid of those redundancies and preserve only essential information.Crucial for the whole process to be successful is feature choice and feature quality which in turn is considerably influenced by the system designer's experience and expertise.Furthermore, not only the type of methods being used to generate and select appropriate features is important, but also how those methods scale to huge amounts of sampling data and how they make use of limited resources (e. g.CPU, network, storage, . . .).As in many technical applications the acquired signals contain only a small portion of useful information, a lot of data is handled and stored that will be discarded anyway.To overcome this problem, an approach termed Compressed Sensing (CS, sometimes Compressive Sampling) has been simultaneously proposed by (Candes, Romberg, & Tao, 2006) and (Donoho, 2006), which allows to perform data compression and sampling in one step, leading to noticeably lower computational costs and data traffic.Its appealing mathematical properties and its simplicity led to a fast adoption in various research fields.Especially in image compression (e. g. medical imaging), radar and object tracking interest keeps increasing.Its fielding in condition based maintenance is a novel and very promising approach, which aims to replace the classical and aforementioned feature processing (sampling, feature calculation, dimensionality reduction, rating) by just one "analogue-to-information" step consequently reducing computational costs and complexity.Major drawbacks of already existing studies employing CS for condition monitoring are twofold: (1) features are extracted from the reconstructed signal hence just relocating computation time (Wang, Xiang, Mo, & He, 2015;Liu, Zhang, & Xu, 2017) and ( 2) feature extraction methods are applied directly to the CS coefficients right after compression (H.Ahmed, Wong, & Nandi, 2018).To the best of our knowledge, the approach presented in (Knoebel, Wenzl, Reuter, & Gühmann, 2017) and in the following paper, is the first attempt to directly use compressively sampled data for health index generation without further feature extraction.The remainder of this paper is structured as follows: in II the application example is outlined followed by a brief introduction to CS.Furthermore, its usage as feature generator as well as the health index generation process based on logistic regression is covered.Results are presented and discussed in III and IV.The paper is concluded in section V where also an outlook on work in progress is given.

METHODS
In the following section the application example is explained and a brief introduction to CS and logistic regression is given.Furthermore, their joint usage for feature extraction is outlined.

Application
The health index generation approach is applied to runto-failure datasets of translational electromagnetic actuators (TEA).A schematic cross section drawing of such an actuator is depicted in Fig. 1.When a constant voltage U c is applied to the coil, magnetic flux and hence electromagnetic force builds up, resulting in the plunger to move in z (Fig. 2).Once the supply voltage is switched off, a spring retracts the plunger to its initial position.During this back and forth movement wear-out phenomena occur in the tribologi- cal bushing/plunger pairing, leading to an increase in friction forces.This effect is intensified by flux leakages and manufacturing imperfections, causing a misalignment of the whole assembly and hence unit-to-unit variability.The time it takes the plunger to reach its final position is called switching time τ .It is generally used as a performance indicator by engineers and technicians, since it is directly related to an increase in friction and hence rising switching times.The exact value of τ can either be evaluated by using an oscilloscope or by a custom tailored algorithm working on Nyquist-sampled coil current measurements for finding the characteristic constriction representing the end of plunger movement.The aforementioned datasets each contain I = 126000 coil current and voltage measurements of J = 10 artificially aged units.For each unit a vector of ground truth data τ j (t) = τ j 0 , . . ., τ j I exists, containing the determined switching time τ j i at time instant t i .The actuator is said to be defective once the sequence τ j (t) crosses a predefined threshold within its specified lifetime.During the test all units were exposed to the same environmental conditions, i.e. ambient temperature and applied voltage U c = 24V .

Compressed Sensing
One fundamental assumption CS is relying on is signal sparsity.Let x ∈ R n be a sparse vector and Φ an appropriate sensing matrix of size R m×n with m < n, then the compressed representation of x is given by the underdetermined linear system y = Φx with y ∈ R m . (1) Ideally, x has to be sparse with at most k non-zero coefficients for CS to work (denoted as k-sparsity).The size of Φ, i.e. the number of measurements m to be taken, hereby strongly depends on sparsity and not on signal length n.However, most signals of technical applications are not directly sparse in time-domain and hence have to be represented in an appropriate orthonormal basis Ψ ∈ R n×n where sparsity can be achieved.This in turn leads to the system given in (2) with c being the coefficient vector of x on Ψ.
Usually, c is "only" compressible, as an ideal Ψ perfectly sparsifying x is often unknown but could be learned at the expense of additional computational costs (Han, Jiang, Sun, Wang, & Yang, 2018).I.e., the support supp(c) = {i : c i = 0} of k-sparse and compressible vectors is inherently different, as they contain lots of small and near-zero coefficients obeying a decreasing behaviour when sorted in descending order (with some constant values α and γ).
Hence, the faster |c i | is decaying, the more compressible the signal is.By thresholding near zero coefficients and keeping only the k largest ones, a compressible vector can be represented by its k-sparse approximation.This approach is often used when the signals under consideration are not compressible in standard bases, but essentially one is performing transform coding and not Compressed Sensing.In contrast, the output of a proper CS-system would be a compressively sampled vector y which in turn is used for reconstructing the original signal of interest.For this step to be successful, not only sparsity (compressibility) is a fundamental assumption of CS, but also some key properties of the measurement system itself, following the subsequent premises: (a) the information contained in the signal is preserved and (b) it is ensured that two sparse signals x 1 , x 2 ∈ Σ 2k (with x 1 = x 2 and the set of all k-sparse signals sharing the same support being Σ k = {x : x 0 ≤ k}) don't give the same reconstruction result (Eldar & Kutyniok, 2012).In this context incoherence is one important property, measured by the mutual coherence of a matrix A ∈ R m×n .Here a low mutual coherence is beneficial as it reduces the amount of compressive measurements to be taken.Another property guaranteeing recovery of k-sparse signals even under noisy conditions, is the Restricted-Isometry-Property (RIP) of order k.It is satisfied by A if there exists a δ k ∈ (0, 1) such that ) Generally it is very difficult to prove any of the discussed criteria for arbitrary matrices as a combinatorial search over m k sub-matrices would be necessary.However, random matrices with entries drawn independently from N (0, 1 m ) are known to meet the aforementioned requirements well and are hence employed as sampling matrix Φ in this study.Discrete cosine transform (DCT) is used as sparsifying basis Ψ, since the mutual coherence of the resulting sensing matrix Θ = ΦΨ is known to be small and the monitoring data (measured coil current) is compressible in the DCT basis.

CS in the academic context
As already mentioned, the scope of CS is reconstructing the sparse representation x or c of an analogue signal from its compressively sampled measurements y.I.e.one would have some sampling device that is able to directly acquire the signal of interest in its compressed form y (where the critical task now would be to find x, or c respectively).As such real world sampling devices still show poor performance or are simply not yet available (Harms, Bajwa, & Calderbank, 2013), it is common practice in academia to artificially compress the previously Nyquist sampled analogue signals using a known finite-dimensional CS system (H.O. A. Ahmed & Nandi, 2019).The same procedure is adopted for this study, allowing the investigation of the proposed health index generation approach in a more controllable setting.

Compressive Feature Generation
The aforementioned key requirements ensure information preservation during the compression step and hence enable signal reconstruction with high probability (Knoebel et al., 2017).In contrast to other studies employing CS for condition monitoring purposes, reconstruction is not of interest here.Instead, the condition monitoring approach presented in this paper is based on the question whether compressed signal representations directly contain the desired failure and deterioration specific information, rendering reconstruction procedures and succeeding computationally expensive feature processing methods redundant (see Fig. 3 for a schematic diagram).More precisely, the approach is based on the mathematical properties of the sampling system Θ which acts as an isometry on the data.Key enabler is the strong relationship between RIP and the Johnson-Lindenstrauß-Lemma (Baraniuk, Davenport, DeVore, & Wakin, 2008) which basically states that a linear embedding of some points of interest in a lower dimensional space is approximately distance preserving if the entries of the transformation matrix are drawn

Signal
Compressed Sensing y Health Index Generation Reconstruction Figure 3. Flow chart of the proposed approach.Instead of using the compressively acquired information for reconstruction of the original signal, it is directly used as feature vector for health index generation.
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2019 from appropriate distributions.This holds for the mapping given in Eq. ( 2), indicating that failure and wear-out specific signal characteristics survive the compression step.

Reconstruction Algorithms
Signal reconstruction can e.g. be conducted via greedy search or combinatorial algorithms.For verifying the suitability of the selected measurement system, reconstruction is initially performed for some compressed measurements using 1 -minimisation (also known as basis pursuit) For further information on algorithms and methods, the reader is referred to (Eldar & Kutyniok, 2012).

Health Index Generation
One major problem in data-based condition monitoring is the identification of an appropriate health index relating the current degradation state to a numeric value between 0 and 1 (degraded, healthy).Ideally, the health index should be monotonically increasing/decreasing so that it can be depicted by linear or nonlinear (quadratic or exponential) degradation models.Furthermore, physical interpretability is desirable as the health state can directly be related to the underlying wear-out phenomena.However, as the approach presented in this paper uses compressively sampled signal representations as features, the identification of a suitable health index is not straight forward.The underlying assumption is, that in timedomain the measured current profiles (see Fig. 2) of defective units are very similar -if not identical -and that these characteristics map to the compressed space making a distinction between new and deteriorated systems possible.To illustrate the assumption that compressed data conveys the necessary information to perform condition-monitoring, Fig. 4 shows the percentage change between two instances in component lifetime for each of the n = 250 CS coefficients.It can be observed that advancing deterioration leads to deviations from a baseline up to 500 to 1000 %, whereas a completely new system shows standard deviations below 10 % for the majority of CS coefficients.This (relative) change in coefficient magnitude can be used to capture the aforementioned failure and wear specific characteristics.To fuse all CS coefficients into one numeric value with reasonable range, a Logistic Regression (LogReg) approach -which has already been applied to similar problems (Spezzaferro, 1996;Caesarendra, Widodo, & Yang, 2010) -is used.
LogReg relates a set of independent variables to a dichotomous dependent variable where the output has two states (1,0).It is based on the general logistic model where (6) can be interpreted as the probability of the dependent variable equalling 1.In the present context, the independent variables correspond to the CS coefficients y, and the dependent variable represents the probability P (y) of the system under consideration being in a healthy state (i.e. the two states defective or healthy).Employing the logit-transform, which is denoted as the logarithm of the odds (7), the nonlinear problem of ( 6) is transformed to the linear one given in ( 8), where m is the size of the compressed vector y.
O(h) := P(y) 1 − P(y) (7) By plugging g(y) = β 0 + m i=1 y i β i from ( 8) in ( 6), the desired health index can be calculated for each sample y.Using an Expectation Maximization (EM) approach the regression weights β 0 and β i of ( 8) are identified based on reference data.To account for unit-to-unit variability, a LogReg model for each component j is built based on the individual as-new and a generalized (fleet specific) worn out condition obtained from historical failure data y f .Once the model is fully parametrised, the current health index at monitoring instant t i can be calculated for each component.As already described, an actuator is termed to be defective when the series τ j (t) permanently crosses a predefined threshold value within its specified lifetime (T tr = 80ms, N = 2 • 10 6 working cycles).This condition is met by the devices 1, 2, 4 and 10 at different time instances t i and working cycles n i respectively, which leads to the following reference set As the devices 3, 5, 6, 7, 8 and 9 reached their specified lifetime of N = 2 • 10 6 cycles without crossing the threshold T tr , they can be used to validate the HI generation process.
For generating the necessary reference datasets the following requirements can be summarised: a) The initial state of component j in compressed feature space is defined by the first k measurements after initial operation Y j new = y(τ j 0...k |τ j 0...k < T tr ).b) The worn-out condition is assumed to be described in compressed feature space by a multivariate normal distribution for normality with estimate value μi , measurement y i and variance function V (•) with φ = 1 for binomial distributions.h i is a distance measure for the respective data point towards the centroid of its data space.The pseudo-code for the whole health index generation process is given in Alg. 1. Please note, that the condition "Component is new" relates to the break-in phase described in the following section.
Algorithm 1: Health Index Generation

RESULTS
For visualizing the proposed approach, datasets of three actuators are exemplarily selected.Hereby, two actuators (unit 1 and 10) did not reached their specified lifetime, whereas unit 3 is in a health state for the considered time span.Figure 5 shows the normal probability plot of the Pearson residuals of the logistic regression models of units 1, 10 (defective) and 3 (healthy).All three datasets show a slight tendency towards a skewed distribution on both ends of the data range, but match the normality assumption quite well indicating a good fit of the models.Figures 6, 7 and 8 show the LogReg based HIs (ranging from 1 to 0, i.e. 100% to 0% health) in comparison with the respective time series ground truth signals τ plotted over the whole 2 • 10 6 working cycles.The corresponding threshold value of T tr = 80ms is given as a dashed line for reference.In time-domain, a typical break-in behaviour can be observed for units 3 and 10 (depicted in boldface) which is characterised by an initial increase in switching time (up to a local maximum τ ) with a succeeding drop to a local minimum.Despite unit 1 not showing the aforementioned behaviour, all calculated HIs are in a steady-state from start indicating normal/healthy operation.Once the time-domain signal starts to break away (τ (t i ) > τ ), the calculated HIs start to decrease as well.The ground truth signals of units 1 and 10 permanently cross the threshold value at ≈ 1.74 • 10 6 and ≈ 0.68 • 10 6 cycles indicating a failure.The corresponding HI of unit 10 reaches its minimum at ≈ 0.68•10 6 cycles as well, whereas the HI of unit 1 shows quite different behaviour and reaches a quasi steady-state at around N ≈ 0.9 • 10 6 cycles.Unit 3 behaves as expected and shows a reasonable agreement between HI and ground truth drift.

DISCUSSION
By inspecting the results given in Fig. 7 and 8, it can be observed that the HI does not cover a range of (1,0) but instead one of approximately (0.95,0.05) even when the ground truth threshold is reached and the HI should be close to 0. However, this behaviour is expected, as values on both ends of the range are predetermined by the chosen training labels and thus limited to those values due to the mathematical properties of the logistic function/model (see assumption (d) of sec.2.3).These characteristics also result in the HI showing steady-state behaviour during and after the break-in phase, although the ground truth values undergo some fluctuations.The missing break-in phase of unit 1 (Fig. 8) is most likely due to a properly aligned assembly which results in lower new compared to units 3 and 10.Generally, choosing the individual as-new condition Y j new is critical for the overall system performance, since the amount of fluctuation "dampening" during the break-in phase strongly depends on how long the unit is considered to be new.By labelling the break-in phase as initial/new system behaviour, a compromise between allowing a certain amount of drift and providing enough sensitivity to degradation can be achieved.This sensitivity of the HI generation approach towards changes in the system's health is important in two ways: (1) no time delay between wear-out phenomena and the HI is desirable and (2) healing effects -which can occur in real systems -have to be depicted.Furthermore, the applied methods (especially CS as key enabler) show good robustness against noise due to their mathematical properties.When comparing the generated HI of unit 1 given in Fig. 8 with its ground truth signal, this advantage is directly visible.The very prominent outliers of τ (t i ) of unit 1 in the range of ≈ 1.5 • 10 6 cycles are caused by noise failing the custom tailored algorithm to reliably find τ in the time-domain signal.To compensate for these algorithmically induced problems, a succeeding outlier treatment would be necessary imposing additional computational loads, counteracting the savings made at preceding processing steps.To evaluate the validity of the generated HIs, the normalized cross-correlation and correlation-coefficients of each ground-truth/HI pair were analysed, indicating a good agreement of both signals (see Tab. 1).As the Nyquist sampled time-domain information would not be available in a CSbased system (unless the original signal is reconstructed), the ability to qualitatively reproduce the ground truth signal with the HI is of central interest to meet the requirements outlined in the introduction to sec.2.3.However, an in-depth evaluation can only be realised by using both, the HI and ground truth signals as inputs for prognostic algorithms, where especially the performance regarding unit 1 is of interest.

CONCLUSION AND FUTURE WORK
In this paper an approach towards health index generation from compressively sampled data is proposed.By directly using the compressed representation of an analogue signal as features, a dedicated and computationally expensive feature extraction and selection procedures is rendered redundant.Using a logistic regression approach, the current condition of the system can be fused into a health index with reasonable range, allowing a comparison with the time-domain ground truth.As both signals are in good agreement, it can be concluded that the underlying wear-out phenomena can be captured by the constructed health index.Especially when considering applications with limited resources, this approach offers great potential by reducing the necessary amount of storage space, computational power and time.However, defining reasonable failure states in the compressive feature spacewithout the need for run-to-failure tests -is still problematic and under research.Due to the stochastic nature of the generated health index, current and future work focuses on implementing a Wiener process based remaining useful life prediction approach with concurrent state and parameter estimation (Si, Hu, Chen, & Wang, 2011).Applied to the same run-to-failure experiments as used in this study, prediction results are encouraging and reasonable.

Figure 4 .
Figure 4. Percentage change in CS coefficients: new/ageing (black) and new/defective system (gray).It can be observed from the bar plot that all coefficients y i undergo some change during the component lifetime.
which is parametrised based on the fleet specific reference Y aged .c) Y j new , Y aged ∈ R m ×m with m ≥ m. d) The training samples ỹj i,new and ỹi,aged are drawn from Y j new and Y aged and are labelled with (d-1) 0.95 corresponding to 95 % health (d-2) and 0.05 corresponding to 5 % health.The model fit is verified for each test object by checking the standardized Pearson residuals

Figure 5 .
Figure 5. Normal Probability plot of the normalized Pearson residuals for the logistic models of units 1, 3 and 10.

Figure 6 .Figure 7 .
Figure 6.Health index HI, switching time τ and threshold T tr of test unit 3. The break-in phase is depicted in boldface over the cycle range of 0 to ≈ 0.121 • 10 6 .

Figure 8 .
Figure 8. Health index HI, switching time τ and threshold T tr of test unit 1.

Table 1 .
Cross-correlation peak values and correlationcoefficients between τ and HI.