Structure Fatigue Crack Length Estimation and Prediction Using Ultrasonic Wave Data Based on Ensemble Linear Regression and Paris’s Law

This paper presents methods for the 2019 PHM Conference Data Challenge developed by the team named "Angler". This Challenge aims to estimate the fatigue crack length of a type of aluminum structure using ultrasonic signals at the current load cycle and to predict the crack length at multiple future load cycles (multiple-step-ahead prediction) as accurately as possible. For estimating crack length, four crack-sensitive features are extracted from ultrasonic signals, namely, the first peak value, root mean square value, logarithm of kurtosis, and correlation coefficient. An ensemble linear regression model is presented to map these features and their second-order interactions with the crack length. The Best Subset Selection method is employed to select the optimal features. For predicting crack length, variations of the Paris’ law are derived to describe the relationships between the crack length and the number of load cycles. The material parameters and stress range of Paris’ law are learned using the Genetic Algorithm. These parameters will be updated based on the previous-step predicted crack length. After that, the crack length corresponding to a future load cycle number for either the constant amplitude load case or variable amplitude load case is predicted. The presented methods achieved a score of 16.14 based on the score-calculation rule provided by the Data Challenge committees, and was ranked third best among all participating teams.


INTRODUCTION
Fatigue cracks are a common type of faults in structural systems. They contribute to about 90% of the failures of metallic structures (Campbell, 2018). If fatigue cracks are not detected and addressed early, they will jeopardize the longterm durability and reliability of structural systems (Wang, He, Guan, Yang, & Zhang, 2018). For example, the Eschede train accident was caused by a fatigue crack in one of the train wheels (Esslinger, Kieselbach, Koller, & Weisse, 2004). Therefore, it is vital to detect, estimate, and predict the crack progression in structural systems.
Ultrasonic signals are sensitive to crack initiation and propagation and are widely employed in structural health monitoring (Qing, Li, Wang, & Sun, 2019). The occurrence of cracks and different crack levels will exhibit different ultrasonic signatures. Courtney, Drinkwater, Neild, and Wilcox (2008) found that a fatigue crack will increase the nonlinearity of the bispectral responses of ultrasonic signals. They applied this finding to detect cracks in a steel steering actuator bracket. Lim, Sohn, DeSimio, and Brown (2014) isolated the crack-induced spectral sidebands of ultrasonic signals using time-frequency (TF) analysis. Isolated sidebands could be used to detect cracks in an aircraft fittinglug under various temperatures and load conditions. However, the abovementioned techniques cannot estimate crack length quantitatively. To address this problem, currently, data-driven methods are widely used. Typically, crack-sensitive features will be extracted from ultrasonic signals via signal processing techniques, and then be mapped to crack length estimations using data-driven models. For examples, Liu et al. (2013) extracted a feature named the energy ratio change from ultrasonic signals. This feature was fed into an artificial neural network (ANN) to diagnose the crack length and location of plates. Lim, Sohn, and Kim (2018) also utilized an ANN to quantify the fatigue crack length of aluminum specimens by inputting two features extracted from ultrasonic signals and the specimen thickness as well as elapsed fatigue cycles to the ANN.
Data-driven methods usually face a trade-off between model complexity and data amount. Simpler data-driven models often perform better when the data amount is limited. For example, He et al. (2013) built a linear regression model based on three features extracted from ultrasonic signals to quantify the crack length of riveted lap joints. For the 2019 PHM Conference Data Challenge, for which the data amount is limited, simple linear regression models may perform better than complex models to estimate the crack length.
In the above literature, the current or past crack length is estimated. We wish to further predict future crack length so that proper maintenance strategies could be scheduled in advance. The effect of load cycle number on crack propagation must be known to predict crack lengths for future load cycle numbers. In the last century, the famous Paris' Law was developed to describe the growth rate of fatigue crack under specific load conditions (Paris, 1961;Paris & Erdogan, 1963;Pook & Frost, 1973). Using the Paris' Law model and its variations can predict crack growth under constant or variable amplitude load if the material parameters are known. For constant-amplitude-load scenarios, it is easy to determine the stress intensity factor range in the Paris' Law model. To name a few, an extension of Paris' Law considering the Wohler SN curve (Cui, 2002) was developed for predicting fatigue crack growth (Pugno, Ciavarella, Cornetti, & Carpinteri, 2006). Rajabipour and Melchers (2015) adopted a variation of Paris' Law for estimating the growth rate of metal crack in hydrogen-assisted fatigue cases. For variable-amplitude-load cases, it is difficult to determine the range of the stress intensity factor. Beretta and Carboni (2011) conducted a fatigue crack growth study for railway axles under variable load based on Paris' Law. However, the variable load was discretized into different load blocks in which the load was still treated as a constant. Such discretization may induce relatively big error into the predicted crack length (Huang, Torgeir, & Cui 2008). For the abovementioned studies, the material parameters in the Paris' Law model were assumed to be known. But this is not the actual case since material parameters have uncertain properties and may vary across experimental objects. Therefore, in this study, an optimization framework based on the Genetic Algorithm (GA) is proposed to obtain the optimal material parameters and equivalent effect of the variable amplitude load, which enables the Paris' Law model to predict crack lengths under variable load scenarios.
The rest of the paper is organized as follows. The tasks of the 2019 PHM Conference Data Challenge are formulated in Section 2. Models for crack length estimation and prediction are detailed in Section 3 and Section 4, respectively. Conclusions are drawn in the Section 5.

PROBLEM DESCRIPTION
For the 2019 PHM Conference Data Challenge, a fatigue experiment is conducted to investigate the crack propagation property of aluminum plates. Eight aluminum plate specimens (named from T1 to T8) of the same type are used. For each specimen, a cyclic mechanic load is applied to generate fatigue cracks. Among these specimens, T1-T7 are subjected to the same load with a constant amplitude, while T8 is subjected to a different load with a variable amplitude. The frequencies of the constant and variable amplitude loads are both 5 Hz.
During the experiment, for each specimen, ultrasonic signals are collected at several different load cycles. To measure the ultrasonic signals, the mechanical load is paused and then immediately resumed after measurement. Two tests, named Run 1 and Run 2, are conducted under each cycle. A schematic illustration of the sensing mechanism for ultrasonic signals is shown in Figure 1 (an actuator and a receiver form a sensing pair). The actuator sends out a wave (actuated ultrasonic signal), which is measured by the receiver (received ultrasonic signal). When collecting the ultrasonic signals, an optical microscope is intermittently used to identify the location of cracks and to measure crack length. That is, each time ultrasonic signals are collected, real crack lengths are also measured.
The goal of the data challenge is to estimate and predict the crack length at specific cycles. Data (i.e., ultrasonic signals, real crack, and load profiles) from six specimens (T1-T6) among the eight total specimens will be used as training datasets. The data from the two remaining specimens (T7 and T8) are used for validation. Only the first few ultrasonic measurements of T7 and T8 are available for training. Two specific tasks are to (a) estimate the crack length at the current cycle and (b) predict the crack length for future cycles.
The major challenges for accomplishing these tasks include: (a) Limited data amount. The training set only contains 37 (cycles/run) × 2 (runs) = 74 data samples. With such limited data, some powerful but complex models, like deep learning models (Goodfellow, Bengio, & Courville, 2016), may not perform well. To tackle this challenge, expert knowledge of ultrasonic signals and crackpropagation mechanisms must be employed to develop proper data-driven methods.
(b) Variable and different load conditions. The load applied to T8 (variable amplitude load) is different from that applied to T1-T7 (constant amplitude load). This means the model trained with the data under constant load conditions will be tested under variable load conditions, which bring challenges to the generalization ability of the developed models. To tackle this challenge, in the crack length estimation task, load variation independent features will be used for T8. For the crack prediction task, the variable amplitude load will be converted to an equivalent constant amplitude load.

CRACK LENGTH ESTIMATION
The flowchart of the developed method for estimating crack length is shown in Figure 2. Firstly, the ultrasonic signals are denoised with a band-pass filter. Then, the first wave package (FWP) will be truncated out from the filtered signals for feature extraction. Four features, namely, the first peak value, root mean square value, logarithm of kurtosis, and correlation coefficient, will be extracted from the FWP. Based on these features, a crack-detection algorithm is proposed to detect whether a crack has occurred at a specific cycle. If the crack does not occur, the crack length will be zero. If a crack occurs, an ensemble linear regression model will be employed to estimate the crack length. The cycle number serves as an additional feature when building the crack length estimation model for T7 in particular.

Band-Pass Filtering
A band-pass filter is applied to suppress white noise and unwanted impulses in the raw ultrasonic signals. A finite impulse response (FIR) filter is used to avoid phase distortion of ultrasonic signals. The specifications of the filter are selected to retain outstanding frequency components and eliminate as much noise as possible. After analyzing the spectra of all ultrasonic signals, the following specifications were selected for the band-pass filter: first stop frequency 1 = 50 kHz, first pass frequency 1 = 100 , second pass frequency 2 = 500 kHz, and second stop frequency 2 = 1000 kHz. Note that the sampling frequency is = 20 MHz.
A sample ultrasonic signal of specimen T2 before and after filtering is shown in Figure 3. Figure 3 shows that unwanted impulses and noise are filtered out after applying the bandpass filter. Note that after filtering, the signals from Run 1 and Run 2 are in good agreement.

Truncating
The received ultrasonic signals contain three informative three parts, as shown in Figure 4: the synchronization part, the FWP, and the rest. The synchronization part is the synchronization signal of the actuated signal and is independent from the crack. The FWP contains the waves transmitted through the crack. The rest contains both the transmitted waves and reflected waves. Only the FWP will be used in the following procedures, such as the feature extraction.
The length of the FWP equals the actuation signal (also the synchronization part). The location of the first peak (LFP) is used to locate the FWP. Specifically, 150 sample points before and 200 points after the LFP will be identified as the FWP of a received ultrasonic signal. The start and end locations of the FWP can be determined as:

Feature Extraction
While ultrasonic waves travelling through a structure, reflection and scattering will happen when encountering cracks (Lim, Sohn, DeSimio, & Brown, 2014;He, Huo, Guan, & Yang, 2020). Due to the reflection and scattering of ultrasonic waves at the crack location, a part of the wave energy will be dispersed. The longer the crack is, the more energy will be dispersed (Wang, He, Guan, Yang, & Zhang, 2018;He, Huo, Guan, & Yang, 2020). As a result, the energy of the received ultrasonic signals will decrease as the crack size increases. The wave peak value and the root mean square value are features to measure the energy of the received ultrasonic signals but in different aspects. The wave peak value reveals the peak energy, while the root mean square value reveals the average energy.
The structure discontinuity caused by crack would distort the wave shapes of the ultrasonic signals (He et al., 2013). The kurtosis reflects the peakedness of the received signals. As crack length increases, more abrupt jumps may be caused by the wave distortion, and thus the kurtosis is expected to increase accordingly. Also due to the wave distortion, the correlation coefficient between the baseline signal (received signals without crack) and the received signals with cracks will change (He et al., 2013). As the crack size increases, the distortion would increase, and consequently the correlation coefficient is expected to decrease. Based on the above knowledge, four crack-sensitive features, namely, the first peak value ( 1 ), root mean square value ( 2 ), logarithm of kurtosis ( 3 ), and correlation coefficient ( 4 ), are extracted from the truncated FWPs. The extracted features of specimens T1-T6 with respect to crack length are shown in Figure 5.
Figure 5(a) shows the first peak values, i.e., the amplitude of the first peak of the FWP (as shown in Figure 4). The first peak values show a decreasing trend as crack length increases, except for some distorted cycles (i.e., the fourth cycle of T1 and fifth cycle of T6).  The reason to take the logarithm of kurtosis is to scale the kurtosis values of different specimens to a similar range. The definition is as follows: where and are the mean and standard derivation of the FWP, respectively, L is the length of the truncated signal s, and denotes the ith data point of the truncated signal. Figure 5(c) shows that the logarithm of kurtosis increases as the crack length increases except a distorted value at the fourth given cycle of T1.

Figure 5(d) displays the correlation coefficients of FWPs
with the corresponding FWP of the last zero crack cycle of a specimen. The last zero-crack cycle means that the crack length is zero before it and that the crack is initiated afterward. The correlation coefficients show a decreasing trend, except for a distorted value at the fourth cycle of T5. Figure 5 shows that when the crack length increases from zero, the features 1 and 2 decrease while the feature 3 increases. When the crack is not initiated, the features 1 and 2 increase and the feature 3 decreases as the cycle number grows (Observed from specimens T3 and T5. For other specimens, only one data sample with zero crack is provided, and thus we have no chance to observe this trend). To further support this observation, the relationship between features ( 1 , 2 , 3 ) and cycles of specimens T3 and T5 are plotted in Figure 6. The crack lengths of first few cycle numbers are also marked in Figure 6. From Figure 6, we can clearly observe that the feature values of 1 and 2 increase before crack initiation and decrease after, and vice versa for

Crack Length Estimation
An ensemble linear regression model is built to estimate the crack length when crack is detected. The diagram of the proposed model is shown in Figure 7. Firstly, all the features are normalized to suppress the derivations of different specimens. To normalize the features, the first zero cycle of each specimen is selected as the reference for that specimen. Features 1 , 2 , and 3 are normalized by dividing the corresponding feature values of the first zero cycle. Feature 4 is normalized inherently because the correlation coefficient is calculated with respect the first zero cycle.
In addition to features extracted from ultrasonic signals, the cycle number is a natural feature for crack length estimation (Rajabipour, & Melchers, 2015). The cycle number will then be taken as the fifth feature (named 5 ) to estimate crack length, for T7 only while not for T8. Because the load applied to T7 at each cycle is the same as that of T1-T6, but T8 is different. The cycle number 5 is normalized in the following way: 5 = ( 5 -50 )/25,000. Here, 50 means the cycle number of the last zero crack. For T8, only features 1 -4 will be used, which are independent of the load conditions. When the features are normalized, six linear regression models are built based on the normalized features and their intersections. In these models, one of specimens T1-T6 is taken as the validation set, and the rest are used as the training set. While building these models, the Best Subset Selection (BSS) method (James, Witten, Hastie, & Tibshirani, 2013) is used to select effective features in each model. Three of the six models will be selected and grouped as an ensemble to estimate the crack length for T7 and T8. For each model in the ensemble, two (Run 1 and Run 2) crack lengths will be obtained for each cycle. The average of the larger lengths from each model will be the final estimated crack length for this cycle. The larger estimation is used to ease the asymmetric penalty (PHM Society, 2019). Please be advised that as the load conditions applied to specimens T7 and T8 are different, separate models will be built for them but following the same diagram, as shown in Figure 7.  Beyond scope of the algorithm end Step 3: Update the new cycle when ultrasonic data are newly measured, repeat Step 2 The six linear regression models built for T7 are shown in Table 2. Here, the validation error is the mean absolute percentage error. Using Model 1 as an example, the detailed steps of building these models are listed below: Step 1: Split training and validation sets. For Model 1, the training set includes T2-T6 and the validation set is T1.
Step 3: Build models. For = 1, 2, … , Step 4: Select models. Select a single best model (that with the smallest validation error) among 1 , 2 , … , . The selected best model will be the Model 1.
Repeating the above four steps but with different training and validation sets, other five models (for T2-T6 respectively) will be obtained accordingly, as shown in Table 2. Please be advised that, as the crack may not grow linearly with load cycles (Pugno et al., 2006), the optimal order of 5 is also selected from 0 to 1 with a step size of 0.02 while implementing BSS.
In Table 2, the regression statistics, i.e., R-square and pvalue, are also displayed to reveal how well these models are. As the R-square values of these models are larger than 0.81 and the p-values are smaller than 2.2e-16, we believe that the patterns in the training data are well captured by these models.
Among these models, Models 1, 3, and 4, which have smaller validation errors, will be selected to estimate the crack length of T7. Model 5 is not selected though its validation error (11.19%) is smaller than that of Model 1 (13.14%). The reason is that the validation set of Model 5 contains only two effective data samples (crack length > 0, see Figure 5). Such a small validation set would make Model 5 bias to T5, and may perform poorly when generalized to T7. Further following the diagram in Figure 7, the crack length of T7 is obtained, as shown in Table 4.
For specimen T8, the same strategy as T7 is taken and the obtained six linear regression models and their statistics (i.e., R-square values and p-values) are shown in Table 3. Correspondingly, three models with much smaller validation errors, i.e., Models 1, 3, and 4, are selected to estimate the crack length for T8. Compared to T7, the validation errors for the selected models of T8 are generally larger. This makes sense because one more independent feature ( 5 ) is utilized for T7. The final estimated crack length for T8 is also given in Table 4. Table 4 shows that estimated crack length is generally close to the real crack length, except during Cycle 40,167 of T7 and Cycle 70,000 of T8. For these two cycles, the real crack length is zero, which means the crack-detection algorithm outputs false alarms at these two cycles. This is possible because the crack-detection algorithm (Table 1) is developed based on extremely limited data (only a few data samples from two specimens, T3 and T5).

CRACK LENGTH PREDICTION
Based on the estimated crack lengths for the current load cycle numbers in Section 3, the crack length for the future load cycle numbers will be predicted using the Paris' Law. This section firstly introduces the formula of Paris' Law. Next, two variations of its formula are derived to describe the relationships between crack length and load cycle number. Based on the first derived formula, the material parameters of the specimen are estimated using the GA (Lee, 2018). The crack length corresponding to a specific load cycle number can be predicted using the estimated specimen material parameters and the second derived formula.    Table 1; -will be predicted in the next section.

Crack Propagation Stages
The relationship between crack growth rate and stress intensity factor can be divided into three stages, namely Stages 1, 2, and 3, as shown in Figure 8. Here, ⁄ is the crack growth rate, and ∆ is the stress intensity factor range.
In Figure 8, Stage 1 is the crack initiation stage, during which the crack grows slowly. In Stage 2, crack growth occurs at a medium rate, and the relationship between crack growth rate and stress intensity factor range can be represented with the Paris' Law model. In Stage 3, the crack grows at a high rate since it is near to a complete fracture. In many engineering applications, crack propagation and growth are mostly assumed to occur in Stage 2. For the data challenge, we also assume the crack is within Stage 2.

Paris' Law
Paris's Law (Paris, & Erdogan, 1963) describes the relationship between crack growth rate and stress intensity factor range and it can be expressed with Eq. (2): where a is the crack length, N is the number of load cycles, is the fatigue crack length increment, is the increase of number of load cycles, / denotes the crack-growth rate, ∆ is the stress intensity factor range, and C and m are material parameters of the specimens. How to obtain ∆ , C, and m is detailed later.

Stress Intensity Factor Range
The stress intensity factor range ∆ can be calculated by the following equation: where Y is the geometric factor of the specimen, which is usually assumed as = 1, and ∆ is the applied stress range.
For specimens T1 to T7, the applied fatigue load has a constant amplitude spectrum, while for specimen T8, it is subjected to a variable amplitude load. The schematic of these two kinds of load spectra are shown in Figure 9.
From Figure 9, for the case of constant amplitude load, the stress range can be calculated using the following equation: For the variable amplitude load, the stress intensity factor range cannot be calculated in the same way as its constant counterpart. But an equivalent stress range, namely ∆ , can be adopted to represent the equivalent effect caused by the given variable amplitude load (Huang, Torgeir, & Cui 2008). How to calculate the equivalent stress range will be explained later.

Specimen Material Parameters
The material parameters C and m of the two specimens have the property of uncertainty, but they actually vary in specific ranges. Specifically, for metallic materials, m varies between 2 and 4, and C varies from 1 × 10 −13 to 1 × 10 −11 (Li, Wang, & Gong, 2012). Exact values of C and m may differ by specimens. How to determine the values of C and m will be introduced in Section 4.3.

Derived Formulae of Paris' Law
To estimate the material parameters C and m, one must resort to using Paris' Law since C and m are two main parameters in the formula. Based on the original formula of Paris' Law as shown in Eq. (2), it is difficult to obtain C and m since the accurate value of the crack growth rate / is hard to be estimated from the experimental fatigue crack data. However, it is feasible to obtain the load cycle numbers and their corresponding crack lengths from experiments. To estimate material parameters C and m with the load cycle numbers and their corresponding crack lengths, the original formula of Paris' Law should be changed into its variations that describe the load cycle numbers and their corresponding crack lengths. To this end, based on the Paris' Law in Eq. (2) and the calculation formula for ∆ in Eq.(3), the relationship between load cycle number increment ∆ 0 and crack length is obtained by modifying the original Paris' Law model, which is shown in Eq. (5): where 0 is the initial crack length, 0 is the corresponding load cycle number, is an arbitrary crack length, is the corresponding load cycle number, and ∆ 0 is the cycle number increment from 0 to . Based on Eq. (5), the following Eq. (6) is derived to calculate an arbitrary crack length if the initial crack length 0 and the load cycle number increment ∆ 0 are known: Equation (6) shows that if material parameters C and m and the load cycle increment ∆ 0 are known, then the crack length corresponding to a specific load cycle number can be predicted. On this basis, Eq. (6) will be adopted to predict crack length.

Crack Length Prediction
For crack length prediction, there are two cases: constant amplitude load (specimens T1 to T7) and variable amplitude load (specimen T8). How to predict crack length for these two cases will be introduced separately.

Constant Amplitude Load Case
The procedure for crack length prediction with a constant amplitude load is shown in Figure 10. In total, crack lengths ( = 1,2, … , ) must be predicted at load cycle numbers ( = 1,2, … , ). The crack lengths are predicted one by one. The predicted crack length for load cycle number will be used to predict the crack length for load cycle +1 .
Firstly, the data are preprocessed to determine the initial dataset, namely the load cycle number N and crack length a. They will be used for predicting crack length for specimen T7.
The and values of specimens T1-T6 are shown in Figure  11. Because specimens T1-T7 were made by the same manufacturing process and they were subjected to the same load conditions in the fatigue test, it is reasonable to assume that their crack progression trajectories are similar to each other. From Figure 11, it is seen that the longest crack length observed is 7.46 mm, and the T4 dataset contains most thorough information about crack length growth in terms of load cycle numbers. We believe that T4 can characterize the most complete crack progression process of these specimens, and assume that the relationship curve for the load cycle number and crack length of specimen T7 follows the same pattern as that of specimen T4 upon its crack occurrence since they are subjected to same load conditions. This will contribute to estimating the load cycle number of T7 corresponding to the maximum crack length of 7.46 mm.
Based on the estimated crack length of T7 shown in Table 4, the load cycle subjected to T7 is calculated when its crack is 7.46 mm. Figure 12 shows the corresponding calculation method.
Polynomials are used to fit the known T4 data, and the coordinates of points A and B can be obtained according to the fitting curve of T4. Here, point A denotes the coordinates of 2 mm crack length and its corresponding load cycle number for the fitted curve of T4, while point B denotes the coordinates for a 7.46 mm crack length and its corresponding load cycle number. We assume that the crack variation with load cycle of T7 is consistent with that of T4, for example, the crack of T4 should propagate according to the equidistant curve A'B' of T7's fitting curve AB. In this way, the vertical coordinate difference between points B and B' is equal to that between points A and A'. Therefore, the vertical ordinate value of B' (i.e., the load cycle number for a 7.46 mm crack length) is obtained. The coordinates of B' are calculated as Cycle 54,795 and crack length 7.46 mm.
The point (54,795, 7.46) is now used as a data point and added to the estimated dataset for specimen T7 (shown in Table 4). Based on our study, the first two pieces of the estimated dataset of specimen T7 should be discarded since they greatly deviate from the relationship curve between the load cycle number and crack length of specimen T7. The initial dataset used for predicting crack length in T7 is shown in Table 5.  Because C and m have the property of uncertainty, we adopt the GA to estimate their values. The data available to conduct the GA are very limited. To augment the data amount, we fit the relationship curve between load cycle number and crack length for specimen T7. Polynomial curve fitting is employed. After that, a dataset containing the load cycle increment ∆ and crack length is obtained and further used to estimate C and m.
Based on Eq. (5), the optimization problem to be solved with the GA for estimating C and m can be formulated as shown in Eq. (7): where = ( , ) is the parameter vector to be estimated, ∆ is the load cycle increment, and 0 is the initial crack length.
After solving the above optimization problem, the estimated values of C and m are substituted into Eq. (6). Then, the crack length corresponding to a specific load cycle number can be predicted. Table 6 gives the predicted crack length of T7 under different load cycles.

Variable Amplitude Load Case
The procedure for predicting crack length of the variable amplitude load case (specimen T8) is shown in Figure 13. The major difference between the procedure for T8 and that for T7 is that the equivalent stress range due to variable load needs to be determined.
Likewise, the first step is to determine the initial dataset for predicting the crack length of specimen T8. Based on the provided data of and of specimens T1-T6, the maximum crack length considered in the provided data is 7.46 mm. Because T8 has a variable amplitude load spectrum, which is different from those of specimens T1-T7, the relationship curve between the load cycle number and crack length of specimen T8 will definitely not follow the same patterns as those of the specimens with constant load spectra. Therefore, the relationship curve for load cycle number and crack length of specimen T8 should be built based on its own given data.
The estimated crack length for T8 at the first several cycles is presented in Table 4 in Section 3. When the first crack length of T8 is 1.76 mm, the corresponding number of load cycles is 7,000. When the first cracks appear in T1-T6, the load cycle is generally less than or almost equal to 6,000. The load cycles of T8 are relatively larger than those of T1-T6.
According to the typical a-N curve (Pugno et al., 2006), when the load cycle is very large, the crack will grow rapidly at a nearly linear rate. Therefore, it is reasonable to assume that the crack length of T8 will linearly increase from the first two crack lengths to 7.46 mm. The load cycle corresponding to a crack length of 7.46 mm can be obtained by linear interpolation from any two of these three data points: {(7,000, The estimated load cycle number corresponding to the maximum crack length 7.46 mm will serve as a part of the dataset. It will be integrated into the data in Table 4 for specimen T8. The integrated dataset will be used as the initial dataset for estimating the material parameters C and m and the equivalent stress range ∆ . The initial dataset used for predicting crack length of T8 is tabulated in Table 7.
The GA is adopted to estimate C, m, and ∆ . Firstly, the relationship curve between and is fitted for specimen T8 via polynomial curve fitting. Then, a dataset containing the load cycle number increment ∆ and crack length is obtained.
where = ( , , ∆ ) is the parameter vector to be estimated, ∆ is the load cycle increment, and 0 is the initial crack length.
After solving the above optimization problem, the estimated values of C, m, and are substituted into Eq. (6). Then, the crack length corresponding to a specific load cycle number can be predicted. Table 8 gives the predicted crack length of T8 under different load cycles using the GA method.

SUMMARY AND CONCLUSION
To overcome the challenges of limited data amount and variable load conditions for structure crack length estimation and prediction, models based on physical knowledge and data-driven methods were proposed. For estimating crack length, four crack-sensitive and load variation independent features were extracted based on the understanding of the ultrasonic signals. Linear ensemble models were then built to estimate the crack length of test specimens. For predicting crack length, the modified Paris's Law, a physical model, was employed to predict the crack length based on historical data. The GA learns the parameters (e.g., the material parameters) of the model.
The estimated, predicted, and real crack lengths are summarized and listed in Table A in the Appendix. The estimated and predicted crack lengths are close to the real crack values and obtain a score of 16.14 using the data challenge score calculation rule (PHM Society, 2019). The proposed methods, based on ensemble linear regression and Paris' Law, provide a good reference for monitoring structural health given limited data and variable operation conditions.
In this work, a simple crack detection algorithm based on limited observation was employed. It made some false alarms when applied to T7 and T8. In the future, a better crack detection algorithm may be explored, like TF analysis (Courtney, Drinkwater, Neild, & Wilcox, 2008;Lim, Sohn, DeSimio, & Brown, 2014