Real time model-based diagnosis enabled by hybrid modeling

In this paper we propose a hybrid modeling approach for generating reduced models of a high fidelity model of a physical system. We propose machine learning inspired representations for complex model components. These representations preserve in part the physical interpretation of the original components. Training platforms featuring automatic differentiation are used to learn the parameters of the new representations using data generated by the high-fidelity model. We showcase our approach in the context of fault diagnosis for a rail switch system. We generate three new model abstractions whose complexities are two order of magnitude smaller than the complexity of the high fidelity model, both in the number of equations and simulation time. Faster simulations ensure faster diagnosis solutions and enable the use of diagnosis algorithms relying heavily on large numbers of model simulations.


INTRODUCTION
In model-based approaches, the diagnosis engine is provided with a model of the system, nominal values of the parameters of the model and values of some of its inputs and outputs. The main goal of a diagnosis engine is to determine from only this information the presence of a fault and to isolate it. There is a rich literature on model-based diagnosis results proposed independently by the artificial intelligence (de Kleer, Mackworth, & Reiter, 1992) and control (Gertler, 1998), (Isermann, 2005), (Patton, Frank, & Clark, 2000) communities. Modelbased diagnosis requires accurate models to detect and isolate faults in physical systems. For real-time diagnosis, such models need to simulate withing an allotted time interval. Typically, the more accurate models are, the more complex become and hence it takes more time to simulate them. Traditional model-based diagnosis include filters (e.g., Kalman Ion Matei et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. filter (Kalman, 1960), particle filter (Arulampalam, Maskell, & Gordon, 2002)), or optimization based-techniques that estimate a number of parameters whose deviation from their nominal values indicate the presence of a fault. These methods rely on model simulations either for one sample period (Kalman and particle filters) or for some time horizon (optimization based). The simulation time becomes even more stringent in the case of the particle filter where a possibly large number of particles require their own model simulations. One may argue the many system admit ordinary differential equations (ODEs) models as representations and a one time step forward propagation is not necessarily a complex operation. Although this is true for some physical systems, many others require differential algebraic equations (DAEs) as mathematical representations. DAE simulations require the use of the Newton-Rhapson algorithm that in turn requires the inversion of a Hessian matrix. The Hessian matrix size depends on the complexity of the model (e.g., number of equations and variables).
In this paper we propose a hybrid modeling approach to reduce the complexity of a high fidelity model of a physical system. The reduced complexity model is used by a diagnosisengine to detect and isolate system faults. Since we use the model for model-based diagnosis, here we care on the effects of the model complexity on the simulation time. The hybrid modeling approach is based on: (i) identifying the system components responsible for making the simulation time taking a long time, (ii) finding new parameterized representations for such components, and (iii) learn the parameters of the new components We make sure that the chosen representations preserve, at least in part, the physical meaning of the original physical components. Such a meaning is particularly useful in diagnosis since it points to a physical explanation of a faulty behavior.

PROBLEM DESCRIPTION
We consider physical systems whose behavior can be described by a set of DAEs of the form where x represents the state vector, u is a vector of inputs, and y is a vector of outputs. We consider parametric faults: faults that can be described through changes in system parameter values. Parametric faults do not impose significant constraints on the type of faults we can detect and isolate. Indeed, as shown in our previous work (Honda & et al., 2014;Minhas et al., 2014;Saha & et al., 2014), we can augment the physical model with fault modes inspired by the physics of failure. The physics-based fault augmentation process adds additional equations to the model. These new equations are dependent on parameters whose activation induces the simulated faulty behavior. The type of faults introduced are domain dependent. We can cover electrical (short, open connections, parameter drifts), mechanical (broken flanges, stuck flanges, torque losses due to added friction, efficiency losses), or fluid (blocked pipes, leaking pipes) domains.
Let F = {F 0 , F 1 , . . . , F L } denote a set of faults we would like to detect and isolate, where F 0 denoted the normal behavior. The diagnosis objective is to determine a classifier f : Y → {F 0 , F 1 , . . . , F L }, where Y is a set of observations of the system behavior, typically given as time series that are processed sample by sample (online) or as a batch (offline). We associate a set of fault parameters {θ 1 , . . . , θ L } to each of the fault modes with nominal values θ * i for i ∈ {1, . . . , L}. The classifier fault detection scheme is defined as a variation of the observations from their expected values. The fault isolation is based on the deviation of the fault parameters from their nominal values, i.e., θ i − θ * i ≥ i , where i is a fault specific threshold that can depend on the noise statistics for example. Several fault parameter deviations are simultaneously possible, hence there may be some ambiguity in the fault diagnosis. This case happens when the sensor measurements do not contain enough information to differentiate between distinct faults. The fault are tracked either online or offline using filters or optimization based parameter estimation techniques.

RAIL SWITCH MODEL
As a case study, we consider a rail switch system used for guiding trains from one track to another. The rail switch is composed of a servo-motor and a gear-mechanism for scaling the rotational motion and for amplifying the torque generated by the electrical motor. The rail load is composed by a mechanical adjuster, and tongue-rails. The rail switch model has 5522 equations, and the rail component on its own has 4768 equations. We note that the majority of the model complex-ity is concentrated on the rail model which was obtained by using a finite element analysis approach (FEA). In particular, each beam was approximated as a sequence of connected 2D mass-spring-dampers, where the springs and dampers oppose the rotational motion. Hence producing a reduced representation of this model improves its usability, especially in real time applications. The input for the rail switch signal is a reference signal for the servo-motor controller for each of the two direction of motions. The time horizon for each input reference signal is 7 sec. Using the high-fidelity model, it takes more than 7 sec to simulate the model for 14 sec, simulation time that includes the rail displacement in both directions. Our objective is to replace the rail component with a simpler representation, to significantly reduce the simulation time.

FAULT AUGMENTATION
In this section we describe the modeling artifacts that were used to include in the behavior of the system four fault operating modes: misaligned adjuster bolts (left and right), obstacle and missing bearings. These fault modes were reported to be of interest by a rail system operator we collaborated with. Obviously there are many other fault modes of interest at the level of the point machine for example. Such faults are more readily detected due to the rich instrumentation present at the servo-motor. Misaligned adjuster bolts: In this fault mode the bolts of the adjuster deviate from their nominal positions. As a result, the instant at which the drive rod meets the adjuster (and therefore the instant at which the switch rail starts moving) happens either earlier or later. For example, in a left-to-right motion, if the left bolt moves to the right, the contact happens earlier. The reason is that since the distance between the two bolts decreases, the left bolt reaches the adjuster faster. As a result, when the drive rod reaches its final position, there may be a gap between the right switch blade and the right stock rail. In contrast, if the left bolt moves to the left the contact happens later. The model of the adjuster includes parameters that can set the positions of the bolts, and therefore the effects of this fault mode can be modeled without difficulty. Figures  1 and 2 show a comparison between the nominal behavior and the misaligned left and right bolts, respectively on the motor current and angular velocity.
Missing bearings: To minimize friction, the rails are supported by a set of rolling bearings. When they become stuck or lost, the energy losses due to friction increase. A component connected to the rail was included to account for friction. This component has a parameter that sets the value for the friction coefficient. By increasing the value of this parameter, the effect of the missing bearings fault can be simulated. Figure 3 shows a comparison between the nominal behavior and the missing bearing behavior on the motor current and angular velocity.  Obstacle: In this fault mode, an obstacle obstructs the motion of the switch blades. In case the obstacle is insurmountable, a gap between the switch blades and the stock rail appears. The effect on the motor torque is a sudden increase in value, as the motor tries to overcome the obstacle. To model this fault we included a component that induces a localized, additional friction phenomenon for the switch blades. This component has two parameters: the severity of the fault and the position. For very high severity the switch blades cannot move beyond a certain position. Figure 4 shows a comparison between the nominal behavior and the obstacle present behavior on the motor current and angular velocity. Remark 4.1 The behavior of the angular velocity seems to be little affected by the presence of rail switch faults. This observation can be explain by the fact that the angular velocity is the controlled variable. Hence the motor will vary the torque (current) in an attempt to track the angular velocity reference.

ACAUSAL MODELING
Acausal models are physics based models typically constructed from first principles. Unlike the causal models used in signal processing and control, components of acausal models do not have inputs and outputs but ports (connectors) through which energy is exchanged with other components or the environment. This is the modeling formalism used in the Modelica (Fritzson, 2015) language or in Simscape. Ports are characterized by variables whose type determines how they are manipulated when two ports are connected. For example at a connection point, all flow variables sum up to zero (flow conservation), while all non-flow variables are equal. Examples of flow variables include current, force, torque while examples of non-flow variables include potential, velocity, angular velocity. Typically, the product between a flow and a non-flow variable has the physical interpretation of instantaneous power. The acausal modeling formalism is an instance of the more general port-Hamiltonian formalism (van der Schaft & Jeltsema, 2014). The behavior of acausal components is determined by a set of constitutive equations of the form f (x; w) = 0, rather than by a causal map (with or without memory). The vector of variables x can include port variables (flow, non-flow) and internal variables (states, algebraic variables), while w is a vector of component parameters.
We use acausal models to give simplified representations of the rail component initially constructed using a finite element approach which typically induces a higher complexity. To learn the parameters of the constitutive equations there are two main scenarios that can be considered. In the first scenario, we assume that we can directly measure the component variables. This has the advantage that we can in theory perform the model learning in isolation, without considering the entire model. For this approach to work we need to careful choose the model representation to avoid learning trivial models. The second scenario assumes we have only indirect information about the behavior of the model through measurements that do not include the rail component variables. In this case, the learning must include the entire rail-switch model and it is more computationally intense. Since we have access to the high fidelity model and hence we can directly measure every model variable, we consider the first scenario. We use two type of representations for the (acausal) rail component: causal 1 and acausal.
It the causal case we assume that some variables are inputs while other variables are outputs. This assumption is not adhoc. It comes from a causal analysis of the entire system model that produces causal relationships between the system variables. This causal analysis is typically performed before simulating a dynamical system represented as a DAE (Casella, 2011). Once the input/output variable assignment is done, we select a representation for the constitutive equations e.g., a neural network (NN), and move to the parameter learning step. Note that instead of assigning the component variables to an input/output category, we can try to learn the component parameters by assuming that all variables are inputs and the output is zero for all inputs. This approach can only work when considering the entire system model, case which introduces a regularization effect that prevents learning a trivial equation such as the constant zero map. Indeed, a zero map playing the role of a constitutive equation can make the system model unsimulatable due to a singular Jacobian of the system DAE.
In the acausal case the constitutive equations emulate physical laws. In what follows, we discuss different options for the constitutive equations that guarantee that the overall system model can be simulated. Since the behavior of the component can be fairly complex, we may need a large set of constitutive equations. To avoid arbitrary choices of constitutive equation maps, we propose using networks of generalized mass spring dampers (gMSD). In such a network, each node is a composition of one generalized mass, spring and damper in a parallel connection, and each link is a composition of one spring and damper. To ensure that the component modeled as a network of gMSDs does not destabilize the overall system model, we impose conditions on the gMSDs that ensure that the model can be simulated. Such a condition is dissipativity. A dissipative component cannot generate energy internally. A formal definition of a dissipative component is given in what follows.
We force a causal behavior for the acausal component.
We propose two types of maps for the gMSD. The first type if based on a polynomial representation as described in the following proposition.
Proposition 5.1 Consider a component represented as a network of gMSD where the behavior of the masses, springs and dampers are given by: respectively, where the scalars m i , d i and c i are nonnegative, and n is the polynomial order. Then the component is dissipative.
An alternative definition for the gMSD is given in the following proposition.
Proposition 5.2 Consider a component represented as a network of gMSD where the behavior of the masses, springs and dampers are given by: respectively, where m(·, ·, ·), k(·) and d(·, ·) are non-negative scalar functions. Then the component is dissipative.
Note that we have a lot of freedom with respect to how we can model the functions m(·, ·, ·), k(·) and d(·, ·). We can modeled them for example as NNs, where we make sure that the last layer imposes a non-negative output through a "ReLu" layer or by taking the square of the output of the last linear layer. Since the constitutive equations may contain differential equations, we will need to use learning platforms with ODE solving capabilities (e.g., Pytorch (Paszke & et al., 2017), TensorFlow (Abadi & et al., 2015), DAETools (Nikolić, 2016)), if the state derivatives are not measured.

HYBRID RAIL SWITCH MODEL
In this section we introduce several approaches for simplifying the rail switch component model. In addition to model simplification, we will also focus on preserving the physical interpretation of the reduced model, through appropriate choices of constitutive equation maps. We assume that we have access to the variables at the connection point between the adjuster and the rails. In particular, we assume we can directly measure the force F , position x, velocity v and acceleration a. We use the two modeling approaches: Causal approach: we determine a causal relation between the force, position, velocity an acceleration and use a causal map such as a NN to model the relation between them. The resulting component model is still acausal though, with an imposed variable dependence. Acausal approach: we model the rail component as a combination of generalized mass, spring, dampers as defined in Propositions 5.1 and 5.2. We will show that one mass-springdamper component is sufficient. The training data is generated by simulating the high fidelity rail switch model. The input signal is the current applied to the servo-motor, which correlated with a desired velocity profile. Typically, predetermined current trajectories are fed to the servo-motor to generate the rail motion. In our case, we will use random inputs to push the rail. We will record the force, position, velocity and acceleration trajectories and use them as training data. Each time series corresponds to a time interval of 100 sec, sampled at 0.1 sec. When appropriate, we use one time series for training or several of them.

Causal modeling
In this approach, we assign causality relations to the variables at the connection point between the adjuster and the rails. Since the serve-motor tracks a pre-specified speed pattern, our intuition should tell us that the position and velocity of the rails are set by the motor. This intuition is confirmed by a causal analysis performed by looking at the block lower triangular (BLT) transformation (Casella, 2011) that depicts the causal relations between the system variables 2 . Hence, we model the rail behavior by using a causal map F = g(u; w), where g : R 3 → R is a map described by a NN with one hidden layer g(u) where, the input u = [x,ẋ,ẍ] is a vector containing the position, speed and acceleration, the output F is the force, and } is the set of parameters of the map g. We have employed a two steps training process. In the first step we train the parameters of the map in isolation, considering the map g only. We used 15 time series containing trajectories of the force, position, speed and acceleration. We used the Keras (Chollet & et al., 2015) deep-learning training platform, proceeded by splitting the data into training (70%) and test (30%) data sets. We chose the hidden layer dimension to be 50, and trained the NN parameters using a decaying learning rate. The validation results are shown in Figure 5, where we depict the true vs. predicted output samples using as input the test data set. The MSE for the validation data is M SE test = 415.46. Although it may appear a large value, it must be interpreted relative to the values of the force used in training and validation, since the training data was not normalized to maintain the physical interpretation. We used the weights of the Keras model to implement a Modelica component with one port and the constitutive equation given by , where u = [x,ẋ,ẍ]. Next, we executed a fine tuning of the com-2 The BLT transformation is too large to be included in a plot. Figure 5. Validation of the learned model ponent parameters by performing a parameter learning step using the entire rail switch model. This way, the rest of the model equations are considered, adding an additional regularization effect. We chose a gradient-free optimization algorithm, namely the Powell algorithm, to avoid using gradient approximations. The Modelica rail switch model was converted into a functional mockup unit (FMU) (Blochwitz & et al., 2011), and integrated with the Powell algorithm in Python. Although gradient free algorithms are typically slow for a large number of variables, we did not have to run the algorithm for a large number of iterations since we used the Keras solution as initial parameter values. The result of this additional step was a 20% improvement of the loss function applied to the test data. The newly learned Modelica component has 8 equations.

Acausal modeling
We showed in the previous section how we can use causal maps inside acausal components. The advantage of the causal representation is that we can use main stream deep learning platforms to learn the parameters of the causal map. There is a significant disadvantage though: it is not clear if the obtained component is reusable. By reusability we understand the ability to use the component in different configurations and still behaving as expected. From a numerical perspective view, this means that we should be able to compute the acceleration when the force becomes the input (position and speed are state variables and considered known from the previous simulation step). The acausal modeling approach guarantees this. Using the observation that the rail opposes motion, we modeled the rail as a combination of a generalized mass-spring damper in a parallel connection. We use two types of gMSD models: polynomial and NN. We considered a linear mass model: F m = mẍ for both cases. In the polynomial case, we considered the following models for the spring and damper, respectively: The set of parameters we have to learn is w = {m, c 0 , c 1 , c 2 , d 0 , d 1 , d 2 , x f ix }. Unlike to previous section, we considered as input the force, and as outputs the position and velocity. The model parameters are the solution of the following constrained optimization problem: subject to: where t i are time samples of the time series. We chose to include only the position and velocity since as states they are automatically generated during the simulation process. The optimization problem used one time series only and used a nonlinear least square algorithm. We relied on the DAETools (Nikolić, 2016) Python package to implement the optimization algorithm since it provides access to the gradients of the cost function, hence gradient approximations are not needed. In particular we used the IDAS solver with default parameters that is endowed with sensitivity analysis. The resulting optimal parameters are as follows: c * 0 = 6.5 × 10 3 , c * 1 = 0.45, c * 2 = 4.15 × 10 4 , d * 0 = 5.96 × 10 2 , d * 1 = 0, d * 2 = 0, m * = 1.5 × 10 2 , s * 0 = 1.077. We repeated the learning process when the acausal rail model is represented using NN representations. In particular we chose as models for the spring and damper F c = c(x,ẋ) 2 (x − x f ix ) and F d = d(x,ẋ) 2ẋ , respectively, where c(x,ẋ) and d(x,ẋ) are modeled as a NN with one hidden layer of size 15 and tanh as activation function. Using the DAETool, we solved the following optimization problem: subject to: With the neural network representation we are able to recover a more detailed behavior for the speed. The number of equations of the rail component under the acausal polynomial and NN representations are 7 and 11, respectively.We validated the learned models by integrating them within the overall rail switch model. We generated 25 time series with random inputs for the servo-motor used for the four rail switch models: the high fidelity one, and three low fidelity corresponding to the causal NN, acausal polynomial and acausal NN represen-tations, respectively. An example of such time series that corresponds to the force applied to the rail is shown in Figure  6. In the case of the force, the acausal representations have roughly the same statistics, while in the causal case, the MSE has both the variances and mean comparable, but slightly smaller. This again should not be a surprise since the rail causal model is tailored for our scenario. In other words the model may be overfitted. In a different usage scenario, the casual representation may not even simulate. Hence we have a trade-off between accuracy and generalizability.

FAULT DIAGNOSIS
We use the high-fidelity model as the ground truth, and use the fault component and parameters to generate faulty behavior. Note that since the faults are not at the level of the rail component, there is no need to train surrogate models for the rail in each fault mode. For the adjuster bolt misalignment, we consider one at a time, 50 mm and 200 mm to the left and to the right bolt misalignment. Two parameters have been introduced in the adjuster model that allow for bolt misalignment modeling, whose nominal values are zero. In the case of the missing bearing fault, the missing bearing component introduces a viscous type of friction corresponding to a viscous coefficient of d = 5000 N s/m. If necessary we can model other type of friction models, e.g., Coulomb friction. The component responsible for the simulation of an obstacle has two parameters: the fault intensity and the obstacle location. The fault intensity dictates how much opposing force the obstacle generates against the rail motion induced by the motor. We model the opposing force as a localized viscous force. The localization of the force was achieved by allowing the viscous coefficient to be non-zero only in a neighborhood of the obstacle location. For example d can be modeled as x o is the obstacle location, 1(x) is the step function and δ is a small positive scalar. This means that d is non-zero only inside the interval [x o − δ, x o + δ]. The obstacle position is chosen at 10 cm from the left side initial position of the rail. The effects of the fault modes on the motor current and angular velocity for the chosen parameter were shown in Figures  1-4. The objective of the fault diagnosis is to detect which of the four fault modes is present by tracking the parameters of the fault modes. We consider the single fault scenario, that is only one of the four fault modes is active at some time instant. We can use simultaneous parameter tracking (all parameters of the fault model are tracked) or we can run in parallel tracking algorithms that estimate the parameters of one of the four fault modes only. In our case we would have four parallel algorithm. Based of the parameter deviation from their nominal values we declare the presence of a fault mode.

Optimization-based parameter estimation
Filtering based techniques either have constraints on the class of models they can work with (e.g., Kalman like filters), or they require significant computational resources (e.g., particle filter). In this work we estimated the fault parameters for each of the four fault modes using an optimization-based parameter estimation algorithm. The loss function was defined as the mean square error (MSE) between the simulated variables and the "observed" variables (motor current, motor angle and angular velocity). The observed variables were generated using the high-fidelity models and contain simulations over a time horizon of 14 sec containing both switch motions: left to right and right to left. The variables are sampled at 0.1 sec. The optimization algorithm requires loss function evaluations that in turn requires model simulations.
The model simulations were done using Functional Mockup Units (FMU) (Blochwitz & et al., 2011) representations of the Modelica models. We tested the optimization algorithm for the three versions of the reduced complexity models: causal NN, acausal NN and acausal polynomial representations of the rail. We tested the parameter estimation using several optimization algorithm including gradient-based and gradientfree algorithm. The best results were produced by the the differential evolution algorithm and they are presented in what follows. Since such an algorithm requires many loss function evaluations it is imperative for the model simulations to be fast. In average, the acausal polynomial, acausal NN and causal NN representations take 0.3 sec, 0.5 sec and 0.9 sec, respectively over the 14 sec time horizon. For the same time interval, the high fidelity model takes 7 sec. The FMUs were used in Python scripts implementing the parameter estimation algorithms. The model simulations were executed on PC with a 12 cores Intel Xeon 3.5 GHz CPU, with 64 GB of RAM. We recall that the starting position of the rail is 1 m, value dictated by the initial conditions of the motor and the positions of the different reference points in the rail model. We estimated simultaneously all the fault parameters as well.
The results for the three representations of the rail model are shown in Table 4. We obtained reasonable small MSE values, but it is more challenging to distinguish between the faults modes. Recalling that the obstacle was introduced at 1.1 m we can exclude the obstacle fault mode (the fault intensity is irrelevant outside the obstacle position). The parameter corresponding to the missing bearing fault mode has a value in the hundreds for two of the rail representations. Although they may appear not to have a significant impact on the behavior of the rail switch, without some prior information about what is a significant value it is difficult to draw a conclusion about this fault mode. The good news is that the left bolt fault parameter was reasonably well estimated. Although not zero, the right bolt fault parameter values are small enough to eliminate this fault mode as a possible source of faulty behavior.

CONCLUSIONS
We proposed a hybrid modeling approach to simplify a high fidelity model of a rail-switch system. In particular, we used simplified representations for the rail component using machine learning inspired models. The representations preserved the physical interpretation of the rail component. The model complexity of the model abstractions (i.e., number of equations) is reduced by two orders of magnitude. A similar reduction in the order of magnitude is obtained with respect to the simulation time of the rail switch model over a full motion cycle of the rail. The new model abstractions were used for the rail fault diagnosis. The rail switch model was augmented with additional behavior to include parameterized fault modes. An optimization based approach was used to estimate the fault parameters. We demonstrated that using algorithms that track separately the fault parameters of each of the four fault modes produce accurate diagnosis results. The MSEs and the parameter values are used by the diagnosis engine to produce a diagnosis solution.