Integration of Nonlinear Dynamics and Machine learning for Diagnostics of a Single-Stage Gear Box

The current study concerns diagnostics of a one-stage gear-box based on the integration of physics and machine learning. A physics-based model of this system is developed, then a nonlinear dynamic analysis is performed. The accuracy of the model is validated by comparing fundamental phenomena observed in synthetic and experimental data. To address the diagnostics problem synthetic data are generated for faulty and healthy conditions. Further, physics-informed features are extracted from the phase space of the dynamic system. It is shown that these features are highly informative about the health condition of the system. Also, their advantages over purely statistical features are demonstrated by a feature ranking technique. Subsequently, they are used as inputs in a machine learning model that is developed and optimized for fault diagnostics. The performance of the proposed method is investigated from different aspects, e.g., the accuracy of fault classification, robustness to noise, and generalization to unseen scenarios.


INTRODUCTION
The competitiveness of the majority of engineering applications in industry is significantly influenced by cost effective maintenance and operational safety. Major safety and cost repercussions can result from unexpected downtime and maintenance. Industrial systems that are highly developed and complicated require extremely expensive and sophisticated maintenance methods. For instance, in 2001, American factories spent more than $1.2 trillion on maintenance of which up to half was lost due to inefficient maintenance (Heng, Zhang, Tan, & Mathew, 2009). Ahmad Al Qawasmi, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Therefore, it is imperative to continually develop and improve existing maintenance algorithms to guarantee secure and effective day-to-day operations.
Rotating machinery are one of the most crucial elements in the industry. Since sub-systems in rotating machines interact with each other in a nonlinear fashion changes in any of these sub-systems can significantly affect the overall performance. The key parts of the majority of rotating machinery are bearings, gears, and shafts, and most failures and breakdowns are caused by these components having flaws. A very significant issue that requires a comprehensive solution in order to achieve significant improvements in reliability and safety is fault diagnosis of practical rotating machinery. Although there is a large and diverse literature, many of the diagnostic approaches in use are extremely ad hoc and heuristic, which prevents them from having a broad range of application. They are also ineffective for interdisciplinary complex systems, which are becoming more and more common in current technology.

Diagnostics and Prognostics Methodology
This section provides a brief overview of maintenance tactics and explains the rationale behind including diagnostics and prognostics into maintenance. Additionally, it offers cuttingedge techniques for fault diagnosis.
The main objectives of maintenance can be summarized as follows.
(1) Increasing revenue by optimizing machine efficiency and operating time.
(2) Cutting expenses by avoiding unnecessary maintenance and downtime as much as possible.
(3) Increasing safety by lowering the possibility of undiscovered defects.
The three categories of maintenance procedures that are in practice are breakdown maintenance, time-based pre-ventative maintenance, and condition-based maintenance (CBM) (Bond, 2011).
The principle of breakdown maintenance, also known as reactive or corrective maintenance, is to execute action as soon as a failure is noticed. 56 % of manufacturers employ breakdown maintenance, which is the conventional maintenance approach. The maximum operational period between machine shutdowns is provided by this maintenance method. But because of unforeseen malfunctions, there is a considerable possibility of an emergency shutdown. Breakdown method is appropriate for low-cost machine applications that have minimal impact on production or worker safety. Timebased preventative maintenance involves regular inspection and maintenance regardless of the state of the system's health. In the U.S., 78% of manufacturers use this approach. The selection of a maintenance interval that offers a low failure probability between planned maintenance services is vital to the success of the preventive maintenance approach. An estimated 30% of preventative maintenance tasks are unnecessary, hence preventative maintenance may do extraneous maintenance. Furthermore, catastrophic breakdowns are not completely avoided by preventative maintenance.
These drawbacks have motivated the development of condition-based maintenance. The foundation of the CBM approach, sometimes referred to as predictive maintenance, is the execution of online analyses of the present state of the machine without interfering with regular machine operation. The CBM technique increases system dependability, reduces the likelihood of system failure by up to 70%, lowers maintenance expenses by 25%, and cuts down on the number of maintenance procedures by 50%, all of which diminish the impact of human error (Metrics for Assessing Maintenance Effectiveness, 2003).
Because of its financial benefits, CBM has attracted a lot of interest, as seen by the fact that 43% of US manufacturers utilize it. The most cost-effective technique, according to the Electric Power Research Institute (EPRI), is predictive maintenance or CBM ($9.00 per horsepower (hp) per year) (Metrics for Assessing Maintenance Effectiveness, 2003).

Condition-Based Maintenance
The primary components of the CBM process are shown in Figure 1. For a given system, sensors gather numerous signals for a specific system without interfering with regular machine operation. However, these signals cannot be used at first hand to extract robust and useful information about the structure's health. These signals are subjected to a feature extraction procedure in order to provide reliable and practical system knowledge. The ability to extract an effective collection of features that may characterize the system response is essential for a reliable and accurate forecast of the system health state. These features should provide as much information as  possible about the intrinsic dynamics of the system. These features should be informative and non-redundant. Finally, a decision is made regarding the system's maintenance through diagnostics and prognostics based on the information that has been extracted.
A key part of the CBM method is condition monitoring, which is focused on tracking a system's present state and forecasting its future state while the system is in operation. The process of condition monitoring can be divided into three phases: fault detection, fault diagnostics, and fault prognostics. In condition monitoring applications, fault refers to any unexpected change in the dynamics of the system. Figure 2 demonstrates the condition monitoring phases and their connection with maintenance. Prognostics and diagnostics help decision-makers understand the state of the system. The benefit of condition monitoring is its exceptional capacity to treat issues before they escalate into significant failures, which would reduce the machine's typical lifespan.
Physics-based methods rely on the system's mathematical model for knowledge, which may be parameterized and expanded to account for a variety of situations, including faults, various loading conditions or speeds, and unknown domains of the system's reaction to unidentified phenomena. The difficulty of creating precise models for some complicated systems; however, is the fundamental disadvantage of this strategy. Furthermore, since fully representative models cannot be created, models typically lack specificity. The biggest problem for diagnostic procedures is the requirement for effective methods that function most of the time; current engineering systems would not be adequately safe if they were placed into operation with ad hoc diagnostics that could capture some of the infrequent failures. For the above reasons, we think that diagnostics has been a challenging issue.

Gear Diagnostics
Vibration and acoustic techniques are a primary emphasis in current fault diagnostics of gears because of the valuable insights they offer about the state of rotating equipment (Bajric, Zuber, & Isic, 2013). Due to the extremely nonlinear nature of faults and the intricate nonstationary dynamics, diagnosing gear problems is still a difficult task (Jardine, Lin, & Banjevic, 2006). Bifurcation and limit cycles, as well as multi-periodic, quasi-periodic, and chaotic responses, are a few phenomena that may be seen in nonlinear dynamic systems and are reported in industrial equipment (Jiang, Zhu, Li, & Peng, 2016).
Various techniques have been developed to study gear fault detection and diagnostics (Mohammed & Rantatalo, 2016;Yuan, He, & Zi, 2010;Li, Zhang, & Wu, 2017) and these can be time domain, frequency domain, or time-frequency domain methods. However, these methods do not always guarantee stable classification and/or lack generalizability for systems with complex nonlinear responses. For example, time domain techniques such as Time Synchronous Averaging (TSA) are inefficient in different gear faults, especially in the case of multiple simultaneous faults in different gears or in the early phase of faults (Kwuimy, Kankar, Chen, Chaudhry, & Nataraj, 2015). In addition, the TSA technique can be time-consuming and is often computationally expensive (Vachtsevanos, Lewis, Hess, & Wu, 2006). Although sideband frequencies analysis (in frequency domain techniques) detects faults in the gearbox, it falls short of distinguishing gear faults, as they may be located in other components of the gearbox (Peng, Yu, & Luo, 2011). The problem is that most conventional analysis and feature extraction techniques for gear diagnostics are linear because the engineering systems were originally designed to perform sufficiently in a linear regime (Abarbanel, 1996). However, the nature of these systems is unavoidably nonlinear, and nonlinearity creates additional complications, this is indeed the focus of our work for many years (T. Mohamad, Nazari, & Nataraj, 2020;T. H. Mohamad, Chen, Chaudhry, & Nataraj, 2018;Samadani, M. and Kwuimy, CA. Kitio and Nataraj, C., 2013;Kwuimy & Nataraj, 2012). This is particularly true in applications such as machinery diagnostics; therefore, there is a need to extract robust features that are able to characterize the dynamics observed in a time series.
Most engineering systems operate in nonlinear regimes. Real systems exhibit many phenomena that can only be predicted by nonlinear models. Failure is certainly a nonlinear phenomenon, where an estimate of 89% (Aeronautics & Space, n.d.) of failure patterns are random. This means that an overwhelming number of systems are not at risk of age-related failure and preventive maintenance is ineffective (Aeronautics & Space, n.d.).
Our overall objective is to provide reliable feature extraction techniques that can effectively capture the critical system dynamics caused by faults for a range of industrial applications. These techniques are suitable for intricate interdisciplinary systems and have broad applicability.
In this paper, we develop a hybrid diagnostic algorithm for dynamic systems using a combination of nonlinear dynamic analysis, physics, statistics, and artificial intelligence techniques. This nonlinear diagnostic technique is demonstrated for gear crack detection of an involute spur gear. The objective of this work is to utilize nonlinear characteristics in an artificial intelligence setting to detect cracks in gears using the Phase Space Topology method (PST).

MATHEMATICAL MODEL
In this section, the system response of a gearbox is numerically simulated with healthy and faulty gears. The one-stage mass-spring-damper 6-DOF model with involute spur gear tooth profile, shown in Figure 3, was abated from Bartelmus (Bartelmus, 2001). The system is powered by an electric motor with a torque M 1 and loaded with torque M 2 . Between a pair of meshing gears, the smaller gear which connected to the motor M 1 is called the pinion while the larger gear engaged by the pinion and connected to the load M 2 is called the gear. Bearings, which are attached to the gearbox casing, support the shafts where the pinion and the gear are mounted.
The equation of motion for the pinion in the y direction is: and the gear equation of motion in the y direction, For rotational motion of the pinion and the gear about the z axis, the equations of motion are: For the rotational motion of the motor and the load about the z axis, the equations of motion are: Z axis Y axis x axis y1 y2 Figure 3. A schematic of one-stage gearbox model.
All the parameters of the system are listed in Table 1, where I represent mass moment of inertia, M represent torque, m represent mass, k represent stiffness, c represent damping, y represent displacement and θ represent angular displacement. The nomenclature of the full list of symbols is given in the Appendix. In order to simulate the response of the system, two components need to be computed: mesh stiffness k t and mesh damping coefficient c t . Mesh stiffness k t is discussed in detail in the next section. As standard practice in gear literature the mesh damping coefficient c t is assumed to be proportional to the mesh stiffness (Xinhao, 2004) and is given as follows: where µ is a scale constant and is given in Table 1.

Mesh Stiffness Calculation
Gear box dynamics is based on the variation of mesh stiffness in addition to the transition of the single/double-tooth-pair contact. Localized gear de- Table 1. Main parameters of the gear system (Xinhao, 2004 k c = k p = k g = 4.4 × 10 4 N m/rad Damping coefficient of coupling c c = c p = c g = 5.0 × 10 5 N m/rad Radial stiffness of the bearing k r = 6.56 × 10 7 N/m Damping coefficient the bearing c r = 1.8 × 10 5 N s/m Scale constant µ = 3.99 × 10 −6 s fects are usually reflected in geometry changes in the tooth. Consequently, these faults will cause changes in the gear mesh stiffness. Therefore, it is important to model the gear mesh stiffness for various health conditions, i.e., defect free and tooth crack. In 1987, Yang et al. used the stored potential energy in the meshing system including Hertzian energy, bending energy and axial energy to model the effective mesh stiffness analytically (Yang & Lin, 1987). Tian (Xinhao, 2004) modified the model to include the shear energy as well. The final expressions for calculating the effective mesh stiffness k t of defect-free gears and gears with cracked tooth can be found in (Xinhao, 2004).
In order to develop a technique to diagnose tooth crack, we assume a crack at the root of the pinion with depth q along the tooth width. We consider two cases: (1) h c ≥ h r & α 1 > α g , as shown in Figure 4(a) and depth q = 1.3 mm and the angle v = 45 • , and (2) h c ≥ h r & α 1 ≤ α g or when h c < h r , as shown in Figure4(b) and depth q = 3.1 mm and the angle v = 45 • . For both cases, the Hertzian-contact stiffness will remain unchanged, since the contact surface has no defect and the width L is constant. The only stiffnesses that change due to the influence of the crack are the bending and shear stiffness. As the crack depth increases, the total mesh stiffness decreases within the double pair mesh duration. Ad-ditionally, the total mesh stiffness in both cracked tooth cases decreases compared to the defect-free condition within the double pair mesh duration.

PHASE SPACE TOPOLOGY METHOD
Phase space trajectory can be used to characterize the nature of the system in a qualitative fashion as is done traditionally in nonlinear dynamics (Eckmann & Ruelle, 1985). Much work has been devoted to extracting information from these topological patterns in order to compare attractors (Carroll, 2015). The Phase Space Topology family of methods (PST) is, however, based on characterizing the phase space trajectories with quantitative measures. The PST family of methods was first originated by Samadani et al. (Samadani, M. and Kwuimy, CA. Kitio and Nataraj, C., 2013).

Density-based orthogonal technique
In this section, we employ the density-based orthogonal technique (T. H. Mohamad & Nataraj, 2017a;T. H. Mohamad, 2021) to detect tooth cracks in gears. We solve Eqs (1)-(4) numerically using parameters given in Table 1 for defect-free and for the two cases of the cracked gear. The time history comparing the velocity for the pinion was produced for the healthy and faulty cases in Figure 5. With the introduction of the crack, the amplitude increased at several periods across the velocity signal. This is due to the drop in the mesh stiffness. The crack is larger in Case 3, thus the effect is higher.
To construct the phase space and to design the machine learning algorithm, samples of the simulated system response are divided into N number of segments depending on the length of the simulation and the window size. Selecting the window size is a key factor in the success of the classification algo- rithm. The correlation between a healthy and faulty window was calculated, and a window length below 0.045 seconds gave a high correlation. Thus a 0.05 second window size was selected to produce the phase space for the healthy and the faulty cases as shown in Figure 6.
To produce a sufficient number of samples the model was simulated for a time period of 5 seconds; however, to mimic a real word environment where a faulty system should shutdown quickly some classification trials in this study was for shorter periods of time. The kernel density function is used to convert the samples from phase space to an easier form to extract informative features.
Consider X=(x 1 , x 2 , ..., x n ), an independent and identically distributed sample data drawn from a distribution with an un-known density function f . The shape of the density function is estimated by its kernel density estimator and is given by: where, the hat,ˆindicates that it is an estimate, and the subscript indicates that its value depends on h. Here, h > 0 is a smoothing parameter called the bandwidth, and K(.) is the kernel function that satisfies the following requirements.
There is a range of kernel functions that can be used, including uniform, triangular, biweight, triweight, Epanechnikov and normal. Due to its conventional and convenient mathematical properties, we use the normal density function in our approach, defined as the following: The kernel density estimator's performance depends primarily on the bandwidth parameter h. Small values of h will cause the density estimate to be undersmooth, while large values will lead to an oversmooth density estimate. The optimal value for the bandwidth can be calculated using Silverman's rule of thumb (Silverman, 2018) for Gaussian kernel functions as follows: where,σ is the standard deviation of the samples and n is the number of samples. It is important to have the number of samples n as high as possible for estimating the density distribution. However, it should be noted that increasing n will also increase the computation cost.
In order to preserve as much information as possible, the density-based orthogonal technique can learn the representation of the phase space density to reconstruct it. This is achieved by approximating the phase space density distributions using a series of orthogonal bases where the coefficients of these bases are used as features. When the approximation matches the actual density distribution as shown in Figure 7, the polynomial coefficients arguably retain the most information present in the phase space. The extracted features are called density-based orthogonal features.
To approximate the phase space density, let z be a state of the system and y d =f h (z) be its density computed using the kernel density estimator. y d is then approximated with Legendre orthogonal polynomials. However, it should be understood that the density estimation of the phase space may also be approximated using other orthogonal polynomi- In this section, we develop a support vector machine (SVM) (Vapnik, 1998;Kappaganthu, 2010) as a classifier to separate the healthy and faulty involute gears. Eqns.
(1)-(6) are numerically solved to obtain pinion and gear responses using parameters given in Table 1 for healthy and faulty cases with a crack depth of 1.3 mm and 3.1 mm. The data samples are normalized and shuffled and divided into a training set of 70% and a testing set of 30%. The window size of the samples is optimized by calculating the correlation between healthy and faulty data. After many iterations, we choose a window size of 0.05 seconds with a simulation time of 5 seconds. Support Vector Machines (SVM) with radial basis function is used to classify healthy and faulty cases. To optimize the feature extraction process ten fold cross validation is utilized. Furthermore, feature forward selection is applied to enhance the SVM performance. Finally, hyper parameters are optimized to robust the SVM.
The algorithm with a window size of 0.05 seconds is applied for pinion and gear responses to classify the healthy and faulty cases with a crack depth of 1.3 mm with a simulation time of 5 seconds which results in an accuracy of 99.5%, as shown in Table 2.
After achieving robust results, the algorithm was put under a new test to classify the healthy and faulty components with a  Table 3.
To experiment further, the ratio of healthy to faulty samples is manipulated, from 1:1 to 2:1 and 1:2. However, no significant difference was recorded resulting in similar accuracy.

Testing Under Noisy Signals
After achieving excellent accuracy under variable conditions, the robustness of the classifier in practical conditions is tested by adding white Gaussian noise to the pinion signal for both healthy and faulty cases with the crack depth of 3.1 mm, as shown in the Figure 8. The signal to noise ratio (SNR) is varied from the range of 20 dB to 18.1 dB. It is observed that the accuracy started to drop when SNR is 18.5 dB, resulting in accuracy of 81% as shown in Table 4. With further decrease in SNR to 18.1 dB the accuracy of the algorithm dropped below acceptable level.

CONCLUSIONS
In this work, we have developed a hybrid classifier to diagnose a one-stage gearbox by integrating nonlinear dynamics and machine learning techniques. A physics-based model for the gear system is used to extract features. The features of phase space of the gear and pinion responses are obtained by the density-based orthogonal technique. Subsequently, we developed an SVM algorithm to classify healthy and faulty gears and pinions. The faulty cases are considered with a crack depth of 1.3 mm and 3.1 mm. We have achieved an accuracy of around 99% in all cases. The same algorithm is used to classify healthy and faulty cases with noisy signals. The algorithm performs well till the SNR is 18.4 dB after which the accuracy dropped considerably. Our future work aims at investigating the size of different cracks using the regression technique. Vertical radial damping coefficient of the output bearings k g Torsional stiffness of the output flexible coupling k p Torsional stiffness of the input flexible coupling k t Mesh stiffness k y1 Vertical radial stiffness of the input bearings k y2 Vertical radial stiffness of the output bearings m 1 Mass of the pinion m 2 Mass of the gear y 1 Linear displacement of pinion in the direction vertical to teeth in mesh (the y direction) y 2 Linear displacement of gear in the y direction y d Density computed using the kernel density estimator z The state of the system θ 1 Angular displacement of pinion θ 2 Angular displacement of gear θ g Angular displacement of load θ m Angular displacement of motor