Anomaly Detection of Servomotors Subject to Highly Accelerated Limit Testing

Companies utilize highly accelerated limit testing (HALT) to ensure efficient product development by accelerating loading conditions in the qualification process. The aim of qualitative accelerated testing such as HALT is to effectively and clearly identify early behavioral anomalies. To this end, this study utilizes machine learning techniques for detecting anomalies in servomotors in A was using using -nearest were using implies that the relationship components in the early detection machine learning successfully identified The proposed approach of HALT with the machine learning algorithm supports prognostic health management of servomotors.


INTRODUCTION
In recent years, reliability has become a concerning issue for connected electronic products (Kwon, et al. 2016). The Society 5.0 initiative established in the 5th Science and Technology Basic Plan in Japan (2016-2021) proposes a human-centered society with a fusion system of cyber and physical spaces (Shiroishi, Uchiyama, and Suzuki, 2018). Thus, connected electronic products play an important role in the implementation of Society 5.0. There has been an increase in the number of connected electronic products owing to the expansion of the market for such products, and this has made it more difficult to control their reliability (Kwon, et al. 2016).
Accelerated testing is used to assess product reliability; the objective of accelerated testing is to identify potential design weaknesses, provide information on item dependability, or achieve necessary reliability/availability improvements as defined by IEC 62506 (International Electrotechnical Commission 2013). There are three accelerated test methods depending on the damage model: 1) Qualitative accelerated testing can identify the failure modes of the products. Further, product weaknesses can be addressed in the early stages of the design process for improving design quality. 2) Quantitative accelerated testing employs a cumulative damage-based life estimation model to determine the reliability of the product. 3) Quantitative time and event compressed testing estimates the lifetime of components where wear-out in active use is the major failure mode.
Reliability testing can assist in the verification and validation of product design; several standards (e.g., MIL-STD-810) have been developed for reliability testing. Some failure modes of products are identified during reliability testing; however, unknown failure modes can appear after the product is introduced to the market. Further, reliability testing requires a long time and limits competitive product development. For example, thermal cycling testing for electronic components takes several months because of the number of cycles required; however, some reliability issues remain even if products have been assessed through several reliability tests because the use environment of the product can be different from that assumed by the designer and the product structure can become more sophisticated, which makes it difficult to predict failure.
Highly accelerated limit testing (HALT) is a qualitatively accelerated method used for testing electronic units (Hobbs, 2000). Its basic concept was introduced as the phrase "highly accelerated life test" in 1988 (Munikoti and Dhar, 1998).
Originally, HALT was designed to accelerate a universally accepted standard testing method such as MIL-STD-810. However, stresses introduced by HALT were too high for using the same reliability model as that employed in traditional accelerated life testing methods (Gray and Paschkewitz, 2016). The IEC standard 62506 defines HALT as a "highly accelerated limit test," and its purpose is identifying specific failure modes of units by generating external stresses, which include temperature, vibration/shock, or a combination of vibration/shock and thermal cycling (International Electrotechnical Commission 2013). The HALT process comprises multiple stress steps for extracting operational and destructive limits to various stresses such as temperature and vibration. The operational limit is defined as a "soft" failure when the unit can still operate with the stress removed; the destructive limit, a "hard" failure when the unit requires repair. Functional tests were performed during HALT to determine the operational limit of the system.
HALT has been adopted for many products, and some companies utilize HALT to further improve the design process for their products (Gray and Paschkewitz, 2016;Prakash, 1998). For example, Allied Telesyn established an HALT facility at a New Zealand Research and Development Center; several product faults were identified before the product was introduced into the market (Gray and Paschkewitz, 2016). In addition to hardware faults, software faults such as abnormal light-emitting diode (LED) activity, switch tuning errors, and system crashes are also identified.
The failure modes identified under HALT vary according to different applications. Charki et al. (2011) conducted a statistical analysis of HALT to study the robustness of a product and discussed the recommendations on experimental conditions of HALT. Catelani and Ciania (2014) proposed a customized HALT process for avionics applications. Chen et al. (2013) performed HALT for DC/DC converters; they also considered additional functional stresses to identify the failure modes related to DC/DC converters. To this end, they monitored several key parameters such as output voltage, efficiency, and ripple for identifying the operating limits. Some failure modes can be identified by monitoring the key parameters. Li and Feng (2006) applied HALT to determine the thermal fatigue of solder joints in surface mount technology (SMT) main boards. Several procedures have been proposed to improve HALT to predict the lifetime of products (Aoki et al., 2019). However, unexpected failures occur during the HALT process because multiple environmental stresses are applied to main boards. Li and Feng concluded that HALT can quickly identify the operating limits of products, but it is difficult to develop accelerated models for predicting the lifetime of products with multiple stresses such as HALT. Thus, HALT is limited by its dependence on product characteristics and loading conditions.
A critical issue in the use of HALT is the identification of the failure modes of products, which is achieved based on the functional tests introduced under HALT. When potential failure modes are identified, the criteria of the functional test can be determined based on the physics of failure (Aoki et al., 2019;Pecht, 2009). Multiple stresses induced by HALT are different from those in the field environment, and there remains an uncertainty in detecting the weak point of a system. Sakamoto et al. compared failure modes observed at HALT with the results of the failure mode and effect analysis (FMEA) (Sakamoto, Hirata, and Shibutani, 2018); it was not easy to identify some failure modes using FMEA. Thus, there is a need for another effective approach for identifying the failure modes at HALT.
Anomaly detection is a key concern in functional tests (Pecht, 2009). In conventional reliability tests, such as thermal cycling, monitoring is optimized to predict the failure mode. However, various failure modes can be found during HALT, which makes anomaly detection ambiguous (Sakamoto, Hirata, and Shibutani, 2018). Further, functional test monitoring requires a high sampling rate because highfrequency random vibrations (up to 5000 Hz) are generated by the HALT shaking table. Thus, big data analysis is required to detect anomalies of the unit, and a data mining technique such as a machine learning algorithm is a critical component (Omar, 2015).
Given this context, our study presents the anomaly detection of a unit under HALT using a machine-learning algorithm. HALT was performed using a robot kit comprising 12 servomotors and sensors. The input/output voltage for each component was monitored during the test, and a machine learning technique was applied to monitor the data and identify the anomaly score of the unit.

EXPERIMENTAL PROCEDURE
In this study, the unit under test is a programmable robot kit (Rapiro, Kiluck) that comprises 12 servomotors, a lightemitting diode (LED) board, a distance-measuring sensor (GP2Y0A21YK), and an Arduino-compatible main board. A functional block diagram of the unit is shown in Figure 1. As shown in the figure, the unit uses six large and six small servomotors. The torques of the large and small servomotors at 4.8 V were 2.5 kgf-cm and 1.5 kgf-cm, respectively. Pulse width modulation (PWM) was used to control the servomotors; a PWM signal was processed using an 8-bit microcontroller (ATmega328P). The power supply was controlled by a three-terminal regulator (LD29150) and a DC/DC converter (OKL-T/6-W12). The regulator was connected to a microcontroller, a distance-measuring sensor, and an LED. The DC/DC converter was used to drive the servomotors. The unit was operated during HALT. In this study, the motion of the servomotors was programmed and controlled using a microcontroller.

Stress Steps Under HALT
HALT is used to identify the failure modes of the unit. The test comprised five stress steps: cold, heat, rapid thermal, vibration, and combined steps (International Electrotechnical Commission 2013). We follow a typical procedure based on a previous study (International Electrotechnical Commission 2013; Charki 2011).
1. The cold step starts at 0 °C, and the temperature decreases by 10 °C until the component exhibits anomalous behavior. The dwell time at each temperature is 10 min. The temperature at which failure occurs is set to the lower limit (LL).
2. In the heat step, the temperature increases with a thermal step increment of 10 °C, and the upper limit (UL) is defined as the temperature of failure.
3. The rapid thermal change test identified thermomechanical and/or functional failures because of rapid thermal changes. The upper and lower temperatures were set at (UL -10 °C) and (LL +10 °C) to avoid failure at heat/cold steps, respectively, because this test focused on the effect of temperature change on the unit. If failure modes did not appear, the maximum number of cycles was set to five.

The vibration step identified mechanical and/or
functional failures attributed to six degrees of freedom (DoF) vibration. The increment in the vibration was 10 ms, and the duration time was 10 min; the test was continued until failure occurred. The level of vibration for the failure was set to the vibration limit (VL). If no failure occurred, the maximum acceleration of the HALT chamber was set to VL.
5. The combined stress step identified failures attributed to thermal cycling and vibration. The thermal cycling and vibration conditions were determined using LL, UL, and VL.
This study used a Qualmark Typhoon 2.5 HALT chamber. The 700-mm-square shaking table produced 6-DoF vibrations in the unit. The root mean square of the acceleration signal (Grms), normalized to the value of acceleration due to gravity, was controlled during the vibration and combined steps. The frequency cut-off of Grms was 5 kHz. The rapid thermal change was more than 60 °C/min with liquid nitrogen and a nichrome wire heater.

Test Conditions
The unit was placed in the center of the vibration table and fixed with two poles and a string, as shown in Figure 2. Figure 3 shows the test conditions of the servomotors. Two control boards were prepared to investigate the failures. One control board was placed in the unit under HALT, and the other, outside the chamber. The servomotors of the unit were classified into groups A, B, and C. In group A, the control board was outside the HALT chamber, and the servomotors (head and R hand) were inside the HALT chamber. The head was set to move right and left at a constant speed, and the R Hand was set to maintain a constant angle. It was possible to observe the change in the servomotor with respect to the experimental conditions under normal usage of the power supply portion by monitoring these power supply voltages.  In group B, the control board was inside the unit in the chamber, and the servomotor was outside the chamber, which corresponds to the R and L shoulder rolls. The R shoulder roll was set to move right and left at a constant speed, and the L shoulder roll was set to maintain a constant angle. At the position where these servomotors were installed in the chamber, a dummy servomotor that was not connected to the power source was installed; the shoulder and body were fixed with an adhesive (CEMEDINE Super X) so that the angle of the arm did not change. The laboratory room temperature was set to 25 °C; by monitoring the power supply voltage of group B, it was possible to observe the change in the power supply part with respect to the normal environmental conditions of the servomotor.
In group C, the control board and eight servomotors were placed inside the unit under HALT. The R shoulder pitch was set to move right and left at a constant speed, and the other servomotors were set to maintain a constant angle. It was possible to observe the change when the power supply part and the servomotor were simultaneously stressed by monitoring the power supply voltage of group C. Table 1 summarizes the experimental conditions for groups A-C. Visual inspection was performed during each step to identify anomalies in the motion of the unit. The voltages of the components were measured for all stress steps. A total of 15 voltage points, 12 servomotors, a distance-measuring sensor, and an LED, as listed in Table 2, were monitored using a data collection system (National Instruments, NI9205). The sampling rate was 8000 samples/s (Sa/s). Further, the temperature was monitored for thermal, rapid thermal changes, and combination stress steps. The T-type thermocouples were placed close to the control board and the HALT chamber. In addition, acceleration was monitored during the vibration stress step; the accelerometers (ICP ® accelerometer) were attached to the head, left shoulder, and right foot. The sampling rate of the thermocouples and accelerometers was 1 Sa/s.

Anomaly Detection for Servomotors
The anomaly detection of the unit was performed using a machine learning algorithm (Pecht, 2009;Ide 2015). The unit was operated during HALT, and the functional test monitored all servomotors with a high sampling rate. It is difficult to identify anomalies attributed to stress steps using monitoring data. Machine learning techniques can provide useful information for outlier time-series data as anomaly scores.
Anomaly detection based on measuring data is an effective approach for the prognostics of the unit. In this study, the knearest neighbor algorithm (kNN) was employed to detect the anomaly of each servomotor (He and Wang, 2007;Nesreen et al. 2010;Tian, et al. 2015;Kang, et al. 2016). kNN provides a classification of the observed data with a distance to a training dataset. k represents the number of nearest neighbors to be considered; the anomaly score of the i-th observed data is defined as (He and Wang, 2007) = − ln + ln + Here, M, ε, and B represent the dimension of the data, distance between the observed data and the k-nearest neighbors of the normal dataset, and a constant determined by the characteristics of the dataset, respectively.
A sliding window technique was used to apply kNN to timeseries data. Assume a time-series dataset, = { 1 , 2 , … , } with a length of T. A subsequence time-series data with a length of w was defined using a sliding window technique.
The training dataset tr and test dataset were prepared for the kNN; tr did not contain any anomaly data. For tr and , a subsequence time-series dataset was produced as shown in Eq. 2. For a subsequence time-series dataset of test dataset , k-nearest subsequence datasets were searched in the training dataset tr . The anomaly score was obtained from the Euclidean distance between and the k-nearest subsequence dataset.
Further, anomaly detection was carried out using R, which is a free software environment for statistical computing and graphics. The package "rflann," which provides the R interface to the Fast Library for Approximate Nearest Neighbors, was used (Muja, Lowe, and Yee, 2017). The package "ff" was also used to manage large data with fast access (Adler, et al. 2008).
The data for the initial 4 s of the monitoring data were used as the training data for each servomotor. The variation in the initial anomaly scores was negligible. In this study, a simple method is used with k = 1. The sliding window length was set to 10, and k = 1. These parameters depend on the characteristics of the dataset and the computational performance. The score of each servomotor was calculated for each stress step.

Anomaly Detection Between Servomotors
The failure modes of the servomotor could be affected by the components of the servomotor and other servomotors. The Gaussian graphical model (GGM) is used to characterize the relationships among the servomotors. The measurement points summarized in Table 2 are represented as nodes, and the relationships among the measurement points are represented by the precision matrix. We used a graphical lasso estimator to obtain the precision matrix. The precision matrix Λ can be obtained by maximizing log detΛ − tr(SΛ) − ‖Λ‖ 1 , where S represents the sample covariance matrix and represents the regularization parameter. Further, the R package "glasso" is used for implementing a graphical lasso.
The anomaly score for GGM is defined as the change in the structure of the graph. The precision matrix changed when the servomotors exhibited anomalous behaviors. Changes in the precision matrix were assessed as anomaly scores by the Kullback-Leibler (KL) divergence. An anomaly score of the i-th observed data is expressed by Anomaly detection was conducted in the same manner as kNN in Section 2.3.

Cold Stress Step
Servomotors in the chamber (groups A and C) showed anomalies during the cold-stress step. The initial positions of their shafts were observed to shift at -70 °C and in the direction of +15°. The output voltages for groups B and C servomotors were unstable at -70 °C. Further, the servomotor of the L Shoulder Pitch was unstable below − 10 °C; the lower limit was set to − 70 °C. When the chamber returned to room temperature, Groups A and B worked normally, which implies the failure was a soft failure, and the lower limit was set as the lower operating limit.

Heat Stress Step
The behavior of servomotors during the heat stress step was similar to that during the cold stress step. The shaft of the servomotors in groups A and C began to shift to more than 40 °C, and the initial position of the shaft shifted in the direction of +9° at 80 °C. The heat stress step test was completed at 80 °C because of the heat resistance of the exterior resin of the unit. The servomotors worked normally when the chamber returned to 30 °C. The upper limit was set to 80 °C as the upper limit of operation.

Rapid Thermal Change Step
The rapid thermal change step was performed between − 60 °C and 70 °C. The lower and upper temperatures were determined from the lower and upper limits presented in Sections 3.1 and 3.2. Similar behaviors to the cold and heat stress steps were observed, and the visual inspection did not reveal anomalous behavior.

Vibration Stress Step
The visual inspection revealed no functional failure during the vibration stress step. The unit is not fixed to the shaking table directly because a function of the unit includes walking. Then, the 6 DoF vibration did not work as a stress to the unit. The vibration stress step was completed at 60 Grms, which is the limit of the HALT system used. The vibration limit (VL) was set to 60 m.

Combined Stress Step
The combined stress step was performed with five thermal cycles between − 60 °C and 70 °C. The maximum vibration level was set to 60 ms, and the increase in each step was set to 12 m. The shafts of groups A and C servomotors shifted as observed during the cold and heat stress steps. However, no functional failure occurred until the five combined cycles were completed.

Anomaly Scores of Each Servomotor by kNN
Anomaly scores under the cold stress step using the kNN algorithm are shown in Figure 4. The anomaly score was smoothed using the moving-average technique; the interval of the moving average was set to 1000. Further, the anomaly scores increased with decreasing temperature. In all cases, the score increased drastically after -70 °C. The score of the R hand (group A) was lower than that of the L shoulder roll (group B) and L shoulder pitch (group C). The score of the L Shoulder Pitch was increased to over 0.02 at -70 °C, and these values were greater compared to those of groups A and B. Group A servomotors were controlled by a controller outside the HALT chamber, and group B servomotors were outside the HALT chamber, as shown in Figure 3. This implies that both the controllers and servomotors were stressed by decreasing temperature. The value of the anomaly score depended not only on the servomotor but also on the control board.  Anomaly scores under the heat-stress step are illustrated in Figure 5. Anomaly scores increased with increasing temperature; the score of the R Hand (group A) was relatively low compared with those of the servomotors in groups B and C. The score of the L Shoulder roll drastically increased at 70 and 80 °C. The peak value was increased to over 0.1 and higher than that of L Shoulder pitch (group C). The group B servomotors were outside the chamber. This implies that the interaction between the servomotor and controller is significant and cannot be overlooked. The score of heat-stress step was less than that under the cold stress step. Further, the cold stress step was a severe condition for the servomotors and the control board.
Anomaly scores during the rapid thermal change step are shown in Figure 6; the scores increase at lower temperatures. A drastic increase is observed in the scores when the temperature changes from lower to upper, as shown in Figure  6(a). The score gradually increased at the upper temperature. Further, the score decreased when the temperature was changed from upper to lower. The score of the R Hand (group A) was higher than those of groups B and C when the temperature changed from lower to upper limits. The lack of data observed during the rapid thermal change step was attributed to problems in the data collection system. However, this lack of data was not defined as an anomaly event due to HALT because the visual inspection did not indicate an anomaly in the motion of the servomotors.
Anomaly scores under the vibration stress step are shown in Figure 7. The score was relatively low compared to the scores for cold, heat, and rapid thermal change steps. The anomaly scores under the combined stress step are shown in Figure 8. The trends of these scores are similar to those under the rapid thermal change step, as shown in Figure 6.   When servomotors were inside the HALT chamber, shaft deviation was observed at the upper and lower temperatures; shaft deviation depends on the specifications of the servomotor such as the thermal time constant. On comparing groups A and B, when the controller was stressed in the chamber (group B), the anomaly score was higher than that of group A. When the servomotor and the controller were inside the chamber (group C), the anomaly score was similar to that of group B; this implies the anomaly of the controller is dominant at the upper and lower temperatures.
Although the inspection did not find anomalous behavior, the k-NN algorithm could identify different failure modes of servomotor drive systems under rapid thermal change steps. Further, the anomaly scores for the rapid thermal change and combined stress steps showed different trends from those for the cold/heat stress steps. The highest score was observed in group A when the temperature changed from lower to upper. Scores of groups B and C depended on the controllers in the HALT chamber because the controllers of groups B and C were subjected to stresses caused by HALT.
Training data for kNN were obtained from the initial data at the beginning of each step. An increase in the score depended on the upper and lower temperatures. The increase in the score of group A depended on the servomotor; the score increased with a rapid change in temperature. Thus, a possible cause of anomalous behavior during rapid temperature changes depends on thermo-mechanical stresses. When temperature changes, mechanical stresses are generated because of the deformation mismatch between the materials. A precise machine component such as a servomotor can be affected by rapid thermal changes.
The vibration stress step did not identify a failure mode because the unit was not directly fixed to the shaking table.
Accelerations in the unit were lower than the acceleration of the shaking table. Thus, the anomaly score by kNN under vibration stress did not show a significant increase.

Anomaly Behavior Between Servomotors by Gaussian Graphical Model
As shown in the previous subsection 4.1, the interaction between the servomotors and controller cannot be overlooked. Gaussian graph models were constructed using 15 measurement points as summarized in Table 2. Figure 9 shows the Gaussian graphical models observed under the cold stress step. The regularization parameter was set to 0.75. At the beginning of the cold step, as shown in Figure  9(a), there are two clusters of servomotors (X1-X12) and sensors (X13-X15). Figure 9(b) shows the graph model at -70 °C of the cold stress step, where a soft failure is observed as mentioned in Section 3. Two servomotors of the head (X1) and right hand (X4) in group A are connected and separated from groups B and C. The change in the topology of the graph implies that the interaction between components also changed.
The change in the topology of the graph was assessed as Kullback-Leibler divergence. The anomaly score obtained by KL is plotted in Figure 10. The score of the L shoulder pitch (group C) increased at -50 °C. The score by kNN in Figure 4 increased gradually with a change in temperature, and it reached a peak value at -70 °C. The KL score showed an anomalous behavior before that of the kNN. The R Hand (Group A) was not sensitive to anomalous behaviors. The score of the L Shoulder Roll (Group B) dropped at -60 °C and increased again at -70 °C. The KL accumulates changes in the correlation between the variable of interest in the GGM and all variables that have a direct correlation. Therefore, the greater the number of variables that are directly correlated with the variable of interest, the greater is the cumulative change and the greater is the degree of anomaly. The KL score was sensitive to the change in the interaction between servomotors, and it could provide earlier warnings before the observed failure.

CONCLUSIONS
Voltages from electronic products subjected to HALT were measured using a kNN machine learning algorithm to detect not only anomalous behavior, but also the onset of anomalies. An anomaly score (metric) based on the time-series training data for each servomotor was calculated, and subsequence time-series data were defined using a sliding window technique and compared with the training data. This approach was verified through voltage monitoring and inspection.
The anomaly score can be used as the precursor of the failure modes for both the controller and the servomotor. In this study, servomotors were classified into three groups to analyze the failed components of the system. When the control circuit board was inside the HALT chamber (groups B and C), the anomaly score increased to over 0.2 and 0.1 during the cold and heat steps, respectively. However, when the control circuit board was outside the HALT chamber, anomalous behavior attributed to the servomotors was observed during the rapid thermal change and combined stress steps. These observations show that the anomalous behavior of the servomotors can be classified in part by the degree of thermal change. When both components were in the HALT chamber (group C), the trend of the anomaly score was like that of group B, wherein the control circuit board was subjected to HALT. An anomaly score of group C was higher than that of group B, which suggests that the anomaly score of group C included the anomalous behavior of both the servomotor and the control board.
In this study, the operating limit was defined by inspection (e.g., the deviation from the initial position of the shaft); however, the anomaly score increased before the operating limit. For example, although the cold stress step stopped at -70 °C based on the inspection of the voltage fluctuation of the servomotor and the physical observation of the shaft, the anomaly score increased to over 0.1, starting at-60 °C. Thus, the kNN algorithm indicated anomalous behavior prior to the physical observation of an anomaly.
Finally, the anomalous behavior between the components was assessed based on the change in the graph using a Gaussian graphical model. When a soft failure was observed at the operating limit of the cold stress step, the Gaussian graph indicated that the components for which the control board outside the HALT chamber (Group A) were isolated.
The changes in the Gaussian graph were assessed as anomaly scores using KL divergence. The KL score increased to over 0.2 at -50 °C, which was earlier than that observed by the kNN algorithm at -60 °C. This implies that the relationship between the components aids in the early detection of anomalies in servomotors.