Interpretable neural network with limited weights for constructing simple and explainable HI using SHM data

Recently, companies all over the world have been focusing on the improvement of autonomous health management systems in order to enhance performance and reduce downtime costs. To achieve this, the remaining useful life predictions have been given remarkable attention. These predictions depend on the proper designing process and the quality of health indicators (HI) generated from structural health monitoring sensors based on prior established multiple prognostic evaluation criteria. Constructing such HIs from noisy sensory data demands powerful models that enable the automatic selection and fusion of features taken from those relevant measurements. Deep learning models are promising to autonomously extract features in scenarios with a huge volume of data without requiring considerable domain expertise. Nonetheless, the features established by artificial neural networks are complicated to comprehend and cannot be regarded as physical system characteristics. In this regard, the goal of this paper is to extend a new model; an interpretable artificial neural network that enables the automatic selection and fusion of features to construct the most appropriate HIs with remarkably fewer parameters. This model consists of additive and multiplicative layers that provide a feature fusion that better reflects the system’s physical properties. Additionally, the weights are discretized in two ways: a) using a ternary form with values {-1, 0, 1}, and b) relaxing the aforementioned ternary form by rounding the weights at the first decimal point in the range of [-1, 1]. Both discretization techniques have the ability to softly control the number of parameters that should be ignored. This trick guarantees interpretability for the neural network by extracting simple yet powerful equations representing the constructed HIs. Finally, the model’s performance is evaluated and compared with other approaches using a practical case study. The results show that the proposed approach's designed HIs are both interpretable and of high quality according to the criteria of the HI's evaluation

predictions depend on the proper designing process and the quality of health indicators (HI) generated from structural health monitoring sensors based on prior established multiple prognostic evaluation criteria. Constructing such HIs from noisy sensory data demands powerful models that enable the automatic selection and fusion of features taken from those relevant measurements. Deep learning models are promising to autonomously extract features in scenarios with a huge volume of data without requiring considerable domain expertise. Nonetheless, the features established by artificial neural networks are complicated to comprehend and cannot be regarded as physical system characteristics. In this regard, the goal of this paper is to extend a new model; an interpretable artificial neural network that enables the automatic selection and fusion of features to construct the most appropriate HIs with remarkably fewer parameters. This model consists of additive and multiplicative layers that provide a feature fusion that be e eflec he em physical properties. Additionally, the weights are discretized in two ways: a) using a ternary form with values {-1, 0, 1}, and b) relaxing the aforementioned ternary form by rounding the weights at the first decimal point in the range of [-1, 1]. Both discretization techniques have the ability to softly control the number of parameters that should be ignored. This trick guarantees interpretability for the neural network by extracting simple yet powerful equations representing the constructed HIs. Finall , he model e fo mance i evaluated and compared with other approaches using a practical case study. The results show that the proposed approach's designed HIs are both interpretable and of high quality according to the criteria of the HI's evaluation.

INTRODUCTION
The Health Indicator (HI) is an important index of a structure or engineering system that shows the state of the component's health so that suitable maintenance decisions could be taken. The sensory data collected by structural health monitoring (SHM) techniques can be used to extract the information needed to generate a HI. However, the raw nature of all the initial data produced by SHM methods makes it possible that it is not helpful. The design of HIs for diagnostic and prognostic purposes from often uninformative raw sensor data is indeed a challenging task, but a necessary feature (Galanopoulos, Milanoski, Broer, Zarouchas, & Loutas, 2021). Despite the fact that HI has its own advantages, such as interpretability and a more direct relationship to the component's damage (health) state, it can be imported into a prognostic model to forecast the remaining useful life (RUL). It should be highlighted that higher qualified HI results in more accurate RUL predictions, which improves decisionmaking strategies.
The efficiency and reliability of HI throughout service life significantly influences the performance of diagnostic and prognostics approaches (Loutas et al., 2019). HIs that could be utilized for diagnostic and prognostic purposes should conform to the HI's evaluation criteria. Three main and wellknown prognostic criteria are Monotonicity (Mo), Prognosability (Pr), and Trendability (Tr). Mo reflects a general ascending or descending trend of the HI, while Pr quantifies the distribution of the HI failure values. On the other hand, Tr determines whether degradation trajectories of a specific system/structure/component possess the same underlying pattern (Nick Eleftheroglou, Zarouchas, Loutas, Alderliesten, & Benedictus, 2018). The high Mo, Pr, and Tr scores for a designed HI imply that it is a very suitable index for use in a prediction model. Even a simple prognostic model, such as linear regression, can accurately predict RUL given high criteria scores. Calculating Tr and Pr requires the availability of two or more specimens/components, resulting in more than two HIs. If a function can produce HIs with high scores on the above-mentioned prognostic criteria for a collection of similar components, the function can be approved for the future in order to make decisions concerning maintenance tasks.
There are two types of HIs that can be addressed: physical (pHIs) and virtual (vHIs) (Galanopoulos et al., 2021). The first are generated directly from physical measurements, such as static or dynamic strains, ultrasound, temperature, or a combination of these. In fact, the input signals gathered by SHM sensors or their simple combination can sometimes be directly considered as HIs based on the criteria scores, avoiding the need for an additional function or process to analyze and fuse the sensor signals and provide HI, which is relatively rare, especially for complex systems and structures. The latter are typically altered to produce desirable properties such as Mo, Tr, and Pr, which considerably improve prognostic efficiency (Hu, Youn, Wang, & Yoon, 2012;Wen, Zhao, Chen, & Li, 2021). However, another critical aspect of a HI that cannot be assessed by prognostic criteria is its interpretability. Several data-driven algorithms and models have been developed in recent years to provide a good candidate for HI. Yet, the resulting HI functions from datadriven models are almost significantly more complicated than they can be comprehended. Also, the more interpretability of a function, the less overfitting. As a result, the main contribution of the current work is the development of a model to address this challenge.
Common and conventional artificial neural networks (ANNs) use additive neurons, which means that the yields are added together after the inputs are multiplied by weights. As a result, the ability to multiply the inputs together is lost, especially in cases when multiple inputs are significant, such as SHM sensory signals. This mathematical operator may result in a simpler, more comprehensive, and more interpretable function rather than merely considering additive neurons. For the CMAPSS dataset, for example, the HI function proposed by Nguyen and Medjaher (2021) consists only of multiplication and division operators between the features, with no summing operator usage. To replace multiplication and division operators with only summing operators (if feasible), a greater number of weighted summation operators is most likely required, resulting in a more complex, uninterpretable, and incomprehensible HI function.
In the present work, the multiplicative neurons and layers alongside the additive ones will be combined together to make a HI. In order to construct a straightforward yet effective equation for the HI and ensure interpretability for the ANN, the weights are discretized in two ways: first, using a ternary set, and second, softening the ternary set by rounding the weights at the first decimal number. The proposed approach is investigated using the turbofan engine degradation simulation dataset released by NASA Ames Prognostics Data Repository, which is extensively applied in the PHM area (Ramasso & Saxena, 2014). The results of PCA, KPCA, and genetic programming (GP) will be compared with the findings. The rest of the paper will be divided into three sections, including Workflow, Results and discussions, and Conclusions.

WORKFLOW
First, the overall steps of pre-processing, de-noising, and division of data into training, test, and validation are briefly discussed in the Data section. Then, the HI construction method is presented which includes the additive neuron, the multiplicative neuron, discretized weights, and building the interpretable ANN. Finally, Health indicator evaluation criteria will be presented.

Data
In the present work, the CMAPSS (Ramasso & Saxena, 2014) dataset (the subset FD001) is used to validate the proposed approach. This dataset was developed by the C-MAPSS tool, which models different deterioration conditions of the fleet of engines from a baseline condition to the final failure in the training data and a duration before the end of life (EoL) in the test data. Except for the first and second columns, which are the ID and deterioration time steps for each engine, and the following three columns, which identify the engine operational parameters, the remaining 21 columns refer to the signals of 21 sensors.
Signals that are constant across all time steps can have a negative effect on data analysis. Thus, at first, data with the same upper and lower bounds is identified and eliminated. In this regard, 6 sensors (1 st , 5 th , 10 th , 16 th , 18 th , and 19 th out of 21) are withdrawn, while 15 remain. The signals are then denoised to enhance the quality of the subsequent features and HI. In this regard, a regression by a polynomial function of degree four is employed (Nguyen & Medjaher, 2021). Following that, the smoothed signals (features) can be selected as HIs or extracted (feature extraction) and fused (feature fusion) to build a suitable HI. Finally, by importing the designed HIs into the prognostic models, RUL can be predicted.
The subset FD001 in the CMAPSS includes 100 train and test trajectories each. To accurately investigate the prognostic criteria (Mo, Tr, and Pr), however, only the training dataset which includes data up to EoL can be used. Therefore, 20% of the training dataset is also taken into consideration and discussed as a validation portion that is not involved during model training.

HI construction method
Before introducing the proposed methodology for designing an appropriate HI, three popular methods are briefly outlined, and their resulting HIs will be compared with the current approach.
For health and performance indicators, principal component analysis (PCA) can be used to discover lower-dimensional representations of data. Nevertheless, whenever confronted with inhomogeneity and time-varying patterns of system/component degradations, PCA, which is based on a linear transformation of the original data, reveals its shortcomings. As a result, various PCA variants, such as Kernel-PCA (KPCA), PCA-based K-nearest neighbors (KNN), and PCA-based Gaussian mixture models (Thieullen, Ouladsine, & Pinaton, 2012), have been proposed to cope with the challenge of nonlinear data. Yet, the principal components (PCs) generated by the aforementioned PCAbased approaches are not explainable, which could be problematic in some situations. Furthermore, they are typically employed for diagnostic purposes (Ding et al., 2010;Yu, 2011), and hardly ever for prognostics ones that demand further signal processing techniques (Benkedjouh, Medjaher, Zerhouni, & Rechak, 2013;Mosallam, Medjaher, & Zerhouni, 2016). ANN and deep learning (DL) models can be used to autonomously create HIs in scenarios with a huge volume of data without requiring considerable domain expertise. Nonetheless, the features established by DL are complicated to comprehend and cannot be regarded as physical system characteristics. In this regard, a two-stage automated-HI-construction framework based on genetic programming (GP) was proposed, claiming that it requires minimal human involvement and facilitates the generation of interpreted HI (Nguyen & Medjaher, 2021) .
Making an ANN interpretable is not a straightforward task as it depends on the specific domain. Constructing HIs has been recently shown to be effective by just applying some mathematical operators (summation, multiplication) to the extracted features from sensory data (Nguyen & Medjaher, 2021). In this section, we introduce the idea of constructing automatically such mathematical operators inside the ANN to produce simple, yet effective HIs without reducing the high accuracies that deep learning could offer. It should be noted that the ANN is not going to output the equation, but it is the equation itself.
In the present work, multiplicative and additive neurons alongside together are presented with the goal of developing HI, and the results are compared with the outputs of genetic programming (GP) (Nguyen & Medjaher, 2021), PCA, and KPCA models. Semi-supervised learning is used in this study to generate HI by implicitly incorporating HI evaluation metrics (Moradi, Broer, Chiachío, Benedictus, & Zarouchas, 2022). A hypothesized ideal HI function is defined using the prognostic criteria to generate (labels) targets for a supervised ANN to extract the HI function. The optimal function is a quadratic polynomial (HIt = t 2 ), which is defined by usage time (t) (Moradi et al., 2022). The functions should be normalized using max-min normalization to adopt Pr as a recursive reconstruction method of HI.

Looking Inside the ANN -Additive Neuron
An ANN consists of a collection of connected units called artificial neurons that are grouped together into layers. Signals pass through each layer as inputs. The outputs of one layer become the inputs of the next one. Given some inputs that come after the previous layer, it is possible to apply he ANN f ndamen al e a ion fo each ne on e a a el is: is the weight corresponding to the link between the ( 1) th layer's i th neuron to the th layer's th neuron, and b is the bias of the neuron that is added to shift the output of the neuron accordingly. The final output of the neuron is calculated by adding a nonlinearity via an activation function F(N) which has the only constraint to be differentiable to the points of interest. While the ANN is being trained, the weights and biases of each neuron, which represent the learnable parameters of the network, are trying to adjust their values by minimizing a loss function (or maximizing an objective function) through backpropagation (Buscema & misuse, 1998). Again, the loss function must be differentiable to the points of interest. This neuron is defined as additive since it uses the summation operator over the weighted inputs.
Applicable ANNs for constructing HIs demand thousands or even millions of parameters that make them powerless in terms of interpretability. Indeed, they are called black-box models as it is impossible to interpret the equation that maps the inputs to the outputs. To extract a useful equation that could describe a HI, a small number of neurons and layers should be involved. We assume that more than two 8-neuron layers could again produce a large, physically unexplainable equation. At first glance, it appears impossible for an ANN to be trained with such a small number of parameters and produce accurate results. Indeed, even with small datasets, it will surely underfit the data. Nevertheless, adding physical properties to the ANN could solve that issue. For the HI construction, physical properties could be simple multiplications and summations among the extracted features, which are formed by the multiplicative and additive layers, respectively, as we will see in the next subsections.

Looking Inside the ANN -Multiplicative Neuron
Forcing the layers to produce such operators demands a modification of the fundamental equation of the neuron (Eq. (1)). Thus, as mentioned in (Durbin & Rumelhart, 1989), instead of having a typical additive neuron, we could have a multiplicative neuron by converting the summation step (∑ =1 ) into a multiplication step (∏ =1 ) with the weights as exponents in a product instead of weights in a sum. This modification demands a logarithmic activation to the inputs before feeding them to Eq. (1) and an exponential activation afterwards. Following these moderations, the equation for converting an additive neuron into multiplicative is as follows: (2) In the literature, there is some ambiguity surrounding the e m m l i lica i e ne on because it is also used for replacing the summation operator ∑ =1 with a multiplication operator ∏ =1 , which slows down the training due to the derivatives that are needed for backpropagation (Schmitt, 2002) and is substantially different from the definition in Eq. (2) that is of our interest. Figure 1 demonstrates the conversion process from additive to multiplicative neurons. Using only these two kinds of activation functions helps the ANN to avoid adding extra nonlinearities that could produce a complicated equation. An important remark here is that forcing these specific activation functions to the neurons limits their scalability as they have the constraint that the inputs should be positive to apply the logarithm. Nevertheless, this is not a pitfall in current work since the inputs could be easily rescaled to the desired range. Finally, since the proposed multiplicative neuron comes naturally from the additive one, the convergence laws of neural networks are satisfied provided that the logarithm exists. Figure 1. Additive and multiplicative neuron process.

Discretized Weights
Learning with continuous weights is very advantageous as the training is stable and the optimal solution can be found. However, this does not help in constructing compact equations for HIs as the ANN architecture is usually complicated with millions of weights. Even in extreme cases where only few weights are non-zero, having continuous values with many decimal digits guarantees a complicated model. Unfortunately, learning in a continuous space is unavoidable since training an ANN with discrete weights is impossible as the gradients do not exist for back-propagation. A simple solution to having a compact equation could be to o nd he eigh al e o he decimal of in e e d ing testing, but this will negatively alternate the outputs; in many cases, the ANN may become even ineffectual.
Ideally, we wish our weights to be discrete to a specific decimal point or even integers without reducing the accuracy. To address this challenge, ternary weights have been recently introduced (Deng & Zhang, 2022). Rather than rounding the weights to specific decimal digits, the idea is to train an ANN by converging the weights to specific values, in this case to {-1, 0, 1 hich al o e lain hi e na e m. Ce ainl , there are cases where we need weights to be somewhere between those three integers. This method does not force all the weights to become integers but a percentage of them, which can be controlled.
Because the full-precision weight space is too large to find an appropriate ternary solution, as mentioned in (Deng & Zhang, 2022), the continuous weight space should be restricted via tanh(w): Now, the weights are restricted to the hyperbolic tangent space ranged in the desired [-1, 1]. This conversion works only by an additional term to the loss function: where , is the Mean-Squared Loss (MSE) between true and predicted outputs respectively over data points, is a regularization constant, is the number of layers, and is the shape controller of the loss function . Those and are additional hyperparameters that need tuning for training the ANN. By using the above transformations and loss functions, the gradients exist and are proved to be minimum at tanh(w) = -1, tanh(w) = 0 and tanh(w) = 1 when 0 < < 2 (the proof can be found in (Deng & Zhang, 2022)). Another important fact from Eq. (5) is that the percentage of zeros in the trained ternary weights by minimizing ′ i o i i el ela ed o and co ld be monitored to have more or fewer zeroed weights (sparsity control). This is really useful in cases where we have larger ANN architectures and we still wish to have compact equations for HIs by zeroing (increasing ) more weights. The advantage of this modification to the weights and the additional term to the loss function is that the ANN is capable of making accurate predictions by also keeping the weights to their ternary form, and controlling the percentage of them that should be equal to zero.

Building the Interpretable ANN
An ANN from its nature is a function approximator where a complex equation maps the input data to the desired output. Constructing appropriate HIs via an ANN demands millions of parameters, hence, retrieving the equation is impossible.
To create an interpretable ANN that could be translated into an expressive and compact equation representing a HI, it is necessary to reduce the number of parameters by retaining its efficiency at high levels. This interpretability is satisfied by combining the discretization of the weights, the sparsity control of the weights, and the utilization of both multiplicative and additive neurons. The weight discretization as well as the sparsity control keep only the important parameters of the ANN which converge to the predefined discrete values during training. Simultaneously, the aforementioned combination of neurons considers the physical properties that silently exist behind the features that construct a HI. Consequently, these tools can now recover the equation behind the ANN that expresses accurately the feature selection and fusion processes, i.e., the HI.
To clarify, many multiplicative/additive neurons within a layer form a multiplicative/additive layer, respectively. The general architecture of the ANN is shown in figure 2. At first, the inputs are fed into a multiplicative layer. Each neuron in the layer is a multiplication between the inputs with different weights and a bias according to Eq. (2). Having many neurons results in different ways of multiplying the inputs. Next, an additive layer with a single neuron sums the outputs of the multiplicative layer to produce the final output. Adding more neurons to the additive layer just makes the ANN more complex and it is very possible to unnecessarily overuse some of the inputs. When a HI's equation is retrieved, terms that refer to a single input are frequently visible rather than a combination of them. For instance, if x1, x2, and x3 are the inputs, we may have an equation x1x2x3 + x1. Using only the outputs of the multiplicative layer to be fed into the additive, it is impossible to produce such an equation. Therefore, the most general architecture is to use the inputs in both additive and multiplicative layers. As such, the outputs of the multiplicative layer are concatenated with the inputs and then are fed into the additive layer. Figure 2. The ANN architecture. The inputs are fed into a multiplicative layer and each neuron applies a multiplication operator. Then, the outputs are concatenated with the inputs and are driven into the additive layer which consists of one neuron and the output is received.
The inputs of the ANN could be either a sequence of raw sensor data, a de-noised format, or some extracted features. The outputs are a sequence of points that form a HI and the trained ANN is the equation for constructing it. Because of varying sequence lengths for each sensor, a preprocessing step is needed before feeding them to the ANN. This step guarantees that the length of the time-series samples will be equal. There are two approaches to achieving this. The simpler one is upsampling by interpolation via adding more data points until every sequence is equal to the largest one. The estimation of those data points depends on the chosen interpolation technique. The second approach is to add pseudo-data points at the end of each sequence until it reaches its maximum length. This could be done via padding with a purposeless value. Then the preprocessed inputs could be fed into the ANN, and the output would be available. The sensitive part of this approach comes during the loss calculation. The padded lengths should be carefully removed to avoid biasing the backpropagation by these pseudo-values. Then, after the parameters are updated, the lengths should be padded again to proceed to the next forward pass. By using this technique, the second approach does not have any approximation step like the first one. However, the training time increases dramatically. In the current case, it was observed that both approaches generate similar results, of which the first one was selected since it is straightforward.
Until now, the equation for constructing the HI was not completely expressive since the weights could have any real value. Using Eqs.
(3)-(6) during training, the majority, if not all, of the weights become ternary by moving towards the integers -1, 0, or 1. In practice, values can converge to the desired ones, but they do not always match. In such cases, the values can be safely rounded during test time without reducing the accuracy. As can be seen in the results section, all of the weights become ternary using a de-noised version of the sensor data, but this is not happening when using their raw version. In this last case, a few weights could be anywhere between [-1, 1], which could be simply rounded to the first decimal digit with a negligible reduction in accuracy as long as most of them are in their ternary version. The cause of having some non-ternary weights after training is the messy raw signals. Thus, there is a trade-off between converting the weights to their ternary version and minimizing the loss that depends on the regularization hyperparameter . Having a high means that we prefer to have more ternary weights (better minimization of ) and, consequently, a more compact equation than having optimal model predictions (not an optimal minimization of ). Luckily, we aim to build a HI that provides high criteria scores (Mo, Tr, and Pr) rather than merely exact target values, thus focusing more on creating compact equations.

Health indicator evaluation criteria
A HI must fulfil a set of requirements in order to be accepted a predictive parameter. Mo, Pr, and Tr (Coble & Hines, 2009), the three major criteria for evaluating a HI utilized in this work, are defined as follows: , , 1, 2, … , where represents the vector of HI on the j th sample, M represents the number of samples monitored, and Nj denotes the number of observations on the j th sample. sgn and are the sign and Pearson's correlation functions, respectively. The range of the three HI criteria is [0, 1], with 0 representing the lowest and 1 representing the best HI quality. The measurement times for and are denoted by and , accordingly. The covariance is denoted by cov, while the standard deviations of and are denoted by and , respectively. To account for all of the following prognostic criteria at once, an objective function called "Fitness" (N Eleftheroglou, 2020) is used: where the fitness score ranges across [0, 3], with 0 being the worst HI quality and 3 reflecting the optimum.
It should be noted that these criteria can only be regarded when all degradation histories up to EoL are available (training dataset). Otherwise, Tr and Pr cannot be measured appropriately (e.g. test dataset).

RESULTS AND DISCUSSIONS
In this section, after comparing the raw and de-noised sensor signals in accordance with HI evaluation criteria, HIs produced using the proposed model alongside PCA, KPCA, and GP approaches are evaluated. To assist the ANN in converting its continuous weights into their ternary form, the weights were uniformly initialized in the range [-1, 1]. As it will be discussed later, achieving appropriate results utilizing the raw sensor signals demands the relaxation of the ternary discretization to a softer version, where float discrete values with one decimal point bound into the same range could be used. Figures 3 and 4 show the raw and de-noised data for the signals of 15 sensors for the train and test datasets, respectively. The results revealed that the de-noising process adopting 4th-degree polynomial regression was effective. The HI evaluation criteria, consisting of Mo, Pr, and Tr, were also calculated for 15 sensors and reported in Table 1, demonstrating that the de-noising process improves the criteria scores. The test dataset has lower scores, which is reasonable considering that degradation trajectories up to EoL are not accessible. As a result, the scores for 20% of the training dataset as a validation portion, which is not incorporated during model training, are reported in Table 2.

Health indicators (HIs)
The first principal components of the PCA and KPCA methods can be considered as HI (see figure 5). These results obtained by training the algorithms on the entire training dataset. As can be noticed, standardizing data before to applying the PCA and KPCA algorithms is quite effective for both raw and de-noised data. The de-noising process also leads to an improvement in the fitness scores, but not as much as the standardization process. According to Table 1, the best fitness score for raw inputs is 2.58 (sensor 8), and this score has been enhanced using the PCA algorithm after standardization up to 2.85 (10.47%). This score for de-noised inputs from 2.91 (sensor 8) has been boosted up to 2.94 (1%). However, the KPCA method was unable to enhance the quality of HI with respect to both raw and de-noised inputs, implying that the CMAPSS data has a linear rather than a nonlinear relationship. Thus, for this dataset, a relatively suitable HI can be generated using PCA, and the findings argue that there is no need to develop complicated models for CMAPSS such as deep neural networks (which is what was and is happening nowadays, resulting in tremendous publications). This is also valid for RUL prognosis, as higher HI yields more accurate RUL prediction. This argument could be attributed to the fact that the data is the outcome of a simulation process rather than reality, and several known equations were most likely employed in the simulation process (plus noise).
One of the limitations of the PCA and KPCA algorithms from the standpoint of HI, as previously stated, is the noninterpretability of the generated principal components. As a result, alternative, appropriate approaches to this challenge, such as two-stage GP (Nguyen & Medjaher, 2021), should be developed. The results of the proposed approach are described in the following paragraphs.
The proposed model, which employed the de-noised sensor values from the 4th-degree polynomial regression and was trained on 80% of the training dataset, yielded the following equation: 0.14 5 15 8 9 10 14 0.2 (11) where is the corresponding de-noised sensor i. The sensors that did not contribute to this equation have zeroed weights, whilst the rest have {-1, +1}. Having only one multiplication between the de-noised sensors means that only one multiplicative neuron contributes to the additive layer with a bias 0.14. The additive neuron has a bias of 0.2. Figure 6b (right) shows the constructed HIs for each sample of the validation set, resulting in high scores for the three criteria (monotonicity, trendability, and prognosability). Indeed, as shown in Table 5, the total criteria score is 2.9461, indicating that the ANN was able to efficiently combine the de-noised sensors to generate a higher criteria score than utilizing only the best sensor. Table 3 contains the ANN hyperparameters.
The following is the equation generated by directly applying the proposed model to raw sensor data: where is the corresponding data of sensor i. The HI equation includes more terms when utilizing raw data than using de-noised data, as expected, and it is also difficult to obtain efficient results when solely using the ternary format of the weights. Indeed, some weights of the multiplication layer needed to be float numbers that were rounded to their nearest first decimal digit to produce Eq. (12). The constructed HIs for each sample of the validation set are shown in figure 6a (right). The raw sensor data criterion scores are lower than the de-noised version, as shown in Table 5, with a fitness criteria score of 2.7407. Again, the ANN was able to efficiently fuse the raw sensor data to produce a superior criteria score than if only the best sensor was used. The ANN hyperparameters are stored in Table 4. Because dealing with raw data requires searching within a larger space of weights, we doubled the number of neurons in the multiplicative layer. This adds complexity during training, but thanks to the control of sparsity, we could remove the unwanted weights to produce again a compact equation. Therefore, by doubling the neurons, we had to also inc ea e he h e a ame e and fo inc ea ing he zeroed weights and emphasizing more on this process, respectively. In addition to the suggested approach's results, the results of the state-of-the-art work (two-stage GP model (Nguyen & Medjaher, 2021)) are demonstrated in figure 6 (left) for comparison. It should be noted that the equation  derived from the two-stage GP model is based solely on denoised data, but we also applied it to raw data for comparison. The HI evaluation criteria for the validation and entire training set, respectively, are shown in Tables 5 and 6.

PCA KPCA
(a) Raw sensor data (without standardization) (b) Raw sensor data (with standardization) (c) De-noised sensor data (without standardization) (d) De-noised sensor data (with standardization) Figure 5. The first principal component of the PCA (left column) and KPCA (right column), with and without standardization using zero-mean normalization, for both raw and de-noised training dataset.
The proposed model employing the de-noised data has the highest fitness score (2.95) of all. Despite the fact that PCA's HI has a high close score (2.94), the resulting HI equation is complex to interpret. The GP model also generates a high score (2.93), but the authors did not consider all of the inputs in the second stage (which is responsible for feature fusion task) and instead chose the highest-quality inputs according to the feature extraction in the first stage. It should be noted that increasing the number of neurons and layers in the ANN could have resulted in even higher fitness scores, but with more complicated functions and less interpretability. As a result, the findings demonstrate that the proposed approach is superior based on the highest score as well as interpretability.

GP Proposed model
(a) Raw sensor data (b) De-noised sensor data Figure 6. HIs constructed by (right column) the proposed and (left column) two-stage GP models, with (a) raw and (b) de-noised data using a validation (of 20% of training) set.

CONCLUSIONS
Designing a qualified HI, which matches the evaluation criteria including monotonicity, trendability, and prognosability, and in the meantime being interpretable for an engineering system/structure in PHM is a challenge. ANN can be employed to fuse the SHM data in order to construct the desired HI. Making an ANN interpretable, on the other hand, is a difficult task that varies depending on the domain. In addition, most ANNs use additive neurons, which means that after the inputs are multiplied by weights, the yields are added together. As a result, the ability to multiply the inputs together is lost, perhaps leading to a more basic network and function. If only summing operators are used instead of multiplication and division (if feasible), a larger number of weighted summation operators will be required, resulting in a more complicated HI product. As a result, in the current study, both multiplicative and additive neurons were employed to generate HI. The HI function has also been simplified by using discretized (ternary) weights with sparsity control. Based on the highest score as well as interpretability, the findings show that the proposed approach is superior.