Automatic Threshold Setting and Its Uncertainty Quantification in Wind Turbine Condition Monitoring System

Setting optimal alarm thresholds in vibration based condition monitoring system is inherently difficult. There are no established thresholds for many vibration based measurements. Most of the time, the thresholds are set based on statistics of the collected data available. Often times the underlying probability distribution that describes the data is not known. Choosing an incorrect distribution to describe the data and then setting up thresholds based on the chosen distribution could result in sub-optimal thresholds. Moreover, in wind turbine applications the collected data available may not represent the whole operating conditions of a turbine, which results in uncertainty in the parameters of the fitted probability distribution and the thresholds calculated. In this study, Johnson, Normal, and Weibull distributions are investigated; which distribution can best fit vibration data collected from a period of time. False alarm rate resulted from using threshold determined from each distribution is used as a measure to determine which distribution is the most appropriate. This study shows that using Johnson distribution can eliminate testing or fitting various distributions to the data, and have more direct approach to obtain optimal thresholds. To quantify uncertainty in the thresholds due to limited data, implementations with bootstrap method and Bayesian inference are investigated.


INTRODUCTION
Wind turbines are generally subject to aleatory uncertainty due to stochastic nature of the weather and the wind itself. In addition to the stochastic nature that a turbine may experience under normal condition (not experiencing any faults), the varying loads that a wind turbine experience makes monitoring its condition inherently challenging. However, having a condition monitoring system (CMS) dedicated to wind turbines is vital for an effective maintenance program. Such Kun Marhadi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. program can help ensure maximum uptime of the machine by minimizing downtime. An example of such system has been demonstrated by (Andersson, Gutt, & Hastings, 2007). Most CMS for wind turbine applications are based on vibration as described by (Tavner, 2012) and (Crabtree, 2011). A case study of using vibration monitoring to detect and diagnose a fault in the generator bearing of a wind turbine in a real industrial application has also been presented by (Marhadi & Hilmisson, 2013).
As explained by (Marhadi & Hilmisson, 2013), primary components monitored in wind turbines (for vibration based CMS) are the generator, gearbox, main bearings, and tower. Usually accelerometers are installed on these components, and there could be up to 10 accelerometers installed in a wind turbine. The data acquisition unit in a wind turbine usually collects vibration data continuously from each sensor. Different vibration measurements are considered in monitoring different components of a wind turbine. To monitor generator bearings for example, several measurements are used in different frequency ranges. The overall vibration RMS level, ISO RMS [10 -1000 Hz], high frequency band pass ), high frequency crest factor (HFCF), and several harmonics or orders of the running speed of the generator (e.g. 1X , 2X) are computed by the data acquisition unit continuously from each sensor. Depending on different failure modes or types of fault, there could be more measurements needed and computed from a sensor. To detect gear related problems in a gearbox for example, the tooth/gear mesh frequencies and sideband levels are usually computed in addition to other broad band measurements such as the ISO RMS.
It is widely practiced in the industry that each individual measurement is trended over time to detect certain failure modes. When the trend from a specific measurement (e.g. HFBP or ISO RMS) crosses over a predefined threshold, it will trigger an alarm or warning. Thus it is very important to set the thresholds correctly in order to minimize the number of false alarms. (Bechhoefer & Bernhard, 2007) explained that this practice in reality may generate many false alarms. How-ever, the implementation of this practice in the industry is very straight forward. Moreover, it is more intuitive and understandable for many vibration practitioners who normally evaluate vibration data to look at each individual measurement trend.
Since wind turbines are often located in remote locations or offshore, whose access are often difficult, it is important to be able to plan visits and maintenance of the turbines. (Marhadi, 2015) demonstrated that as a fault develops, having multiple levels of severity assessments, where the urgency of maintenance can be estimated at each severity level (lead time), can help plan maintenance. This is mainly based on trending individual measurement, where increasing trends of different vibration measurements are used as indicators of various stages of a fault development.
Given there could be up to 10 sensors installed in a turbine and the number of measurements computed from an individual sensor could vary from 3 to more than 10, the number of thresholds that needs to be set up is consequently large. It is impractical to set them manually. Considering that each wind turbine is unique like an individual, it is necessary to set the thresholds uniquely to each turbine. It will be even very inefficient if there are thousands of turbines with CMS whose thresholds need to be set manually. More importantly, setting a threshold is often a trade off between missing real alarms due to a fault development and having false alarms. Thus it is important to be able to set the thresholds at the optimum levels automatically with minimum number of adjustments over time. (Marhadi & Hilmisson, 2013) explained that some limits are determined based on statistics. It is often based on the assumption that the distribution of a vibration measurement follows the Normal (Gaussian) distribution. (Jablonski, Barszcz, Bielecka, & Breuhaus, 2013) discussed a methodology for automatic threshold calculation in a large monitoring system, including a wind turbine application. (Jablonski et al., 2013) showed that different data types or vibration measurements could have significantly different probability distributions from Gaussian. They investigated several distributions and their comparison in fitting various data types for threshold calculation. (Bechhoefer & Bernhard, 2005) have also presented a case where Gaussian distribution is not appropriate to describe the probability distribution of first order magnitude (1X) of a helicopter shaft. They further explained that it is important that the underlying distribution of a measurement is correct so that the threshold can be determined based on low probability of false alarm.
Earlier work to determine alarm threshold has been presented by (Cempel, 1990), where he investigated the thresholds estimation based on Chebyshev's inequality, Weibull and Pareto distributions. The work also showed its possible application in prognosis although it is more complicated, such as what (Cempel, 1987) showed. Later (Bechhoefer & Bernhard, 2004) described a methodology to set alarm thresholds that takes into account variance between aircraft and various aircraft state parameters (e.g. operating conditions). The work assumed that the underlying data for estimating thresholds have approximately Normal distribution. (Bechhoefer & Bernhard, 2005) further demonstrated that thresholds based on Gaussian statistic could yield greater false alarms than anticipated, and discussed using non-Gaussian distribution, such as Rayleigh distribution for analysis of shaft components.
Using a linear transformation to whiten different vibration data types or condition indicators, (Bechhoefer, He, & Dempsey, 2011) presented a method to set a threshold of gear health, also known as health indicator, based on probability of false alarm. The algorithm to define health indicator as a function of condition indicators was developed using three statistical models, namely order statistic, sum of condition indicators, and normalized energy. The models were developed with the assumption that the condition indicators follow Gaussian distribution or Rayleigh distribution.
In the aforementioned work, a lot of investigations were done to determine the most appropriate underlying distribution of the vibration data before a threshold is set. It is often necessary to fit several distributions to the data available, and to choose the most appropriate one based on a goodness-of-fit test, such as in (Jablonski et al., 2013). Rather than trying to fit various distribution functions, it could be more practical to choose a distribution function that can fit a family of distributions, such as Pearson family of distributions and Johnson family of distributions. Thus there are no needs to fit various distribution functions or to compare different thresholds set based on different distributions. This paper focuses on using Johnson family distribution as a unified approach to model a wide variety of distribution functions that describe various vibration data in wind turbine condition monitoring applications. The significance of this work is that by using Johnson distribution there is no need to test various distributions to find the most appropriate one to describe vibration data. Optimum thresholds that minimize false alarms can be set up based on only one distribution. This is later demonstrated in the paper. Considering the number of vibration measurements from a wind turbine, the number of wind turbines whose thresholds need to be set up, and the stochastic nature of wind turbine operating conditions that may necessitate classification of operating conditions as later described in the paper, this approach is very practical. Its implementation is simple without much computational efforts, and it can be implemented quickly in large scale industrial setting. For comparison purposes, other distributions are investigated, namely Normal and Weibull distributions. In condition monitoring system the data available are usually 2 sufficient for statistical analysis, however it is not necessarily true for wind turbine applications due to various seasons or wind conditions that a wind turbine can experience in a year. Ideally at least a whole year is necessary to collect data in order to reflect the true underlying distribution. However it is clearly impractical to collect a year data before CMS is applied with the correct thresholds. Although this condition is widely known, there has not been much work or study in this area. Another contribution of this paper is to present the effects of having limited data available (e.g. a few days, a few weeks, or a few months) in wind turbine thresholds setting and the possible false alarms generated. Bootstrap method and Bayesian inference are investigated for uncertainty quantification with possible industrial applications in mind.

JOHNSON FAMILY DISTRIBUTION
Johnson distribution is a family function that can fit different distribution shapes. It is not necessary to test different distributions that will give the best fit to a set of sample data because Johnson family distribution has the flexibility to fit data with a large range of different distribution shapes. A brief description of the Johnson distribution function is provided here.
Fitting data with Johnson distribution involves transforming a continuous random variable x,whose distribution is unknown, into a standard Normal (z) with mean 0 and variance 1 according to one of the four normalizing translations proposed by (Johnson, 1949). The general form of the translation is where z ∼ N (0, 1), γ and δ are shape parameters, λ is a scale parameter , and ξ is a location parameter. The translation functions that map different distributions to the standard Normal distribution in the Johnson distribution function are as follows: for lognormal family(S L ), ln y + y 2 + 1 for unbounded family(S U ), where y = x−ξ λ . If equation 1 is an exact normalizing translation of x to a standard normal random variable, the cumulative density function (CDF) of x is given by where Φ(z) denotes CDF of standard Normal distribution, for lognormal family(S L ), (−∞, +∞) for unbounded family(S U ), [ξ, ξ + λ] for bounded family(S B ), (−∞, +∞) for normal family(S N ).

(4)
The probability density function (PDF) of x is then given by where f (y) = df dy . For more information one can refer to (DeBrota, Dittus, Swain, Roberts, & Wilson, 1989).
There are four methods to estimate Johnson parameters (γ, δ, ξ, λ) as described by (DeBrota et al., 1988), namely: moment matching, percentile matching, least squares, and minimum L p norm estimation. The reader can refer to (DeBrota et al., 1988) for detailed description of each method. In this work, the moment matching method is used with implementation based on (Hill, Hill, & Holder, 1976) due to its simplicity in Scilab (Enterprises, 2012).
The moment matching method involves determining the family distribution first by the location of skewness, β 1 and kurtosis, β 2 in Figure 1. This figure represents the original identification chart published by (Johnson, 1949), with positive goes downward in the y-axis (β 2 ). The number of parameters to be estimated is then determined by solving a system of non-linear equations between the sample moments and the corresponding moments of the fitted distribution. A brief procedure of the method can be described as follows: 1. Calculate the moments of x : m 2 , m 3 and m 4 . 2. Calculate the skewness and kurtosis of x : β 1 ≡ m 2 3 /m 3 2 and β 2 ≡ m 4 /m 2 2 . 3. Use the chart in Figure 1 to determine the family or transformation function used.

THRESHOLD SETTING
An alarm threshold can be set based on a predetermined probability of false alarm (pf ). This value is essentially a design parameter that can be changed to suit the condition monitoring needs. In this work, the predetermined probability of false alarm is set at 10 −4 . Thus knowing the underlying probability distribution of the data, it is the same as finding the 99.99 percentile of the distribution or finding the inverse CDF, see equation 6. The inverse CDF of Johnson distribution in this work is computed using Scilab CASCI library, see (Enterprises, 2012).
Setting an alarm threshold involves collecting vibration data 3 Figure 1. Johnson distribution family identification chart.
over a period of time. Depending on how the data are collected, some preprocessing may be needed, such as outliers removal. Next, a probability distribution function is fitted to the data collected and its parameters are estimated. Based on the estimated parameters, a threshold is set following equation 6. Figure 2 illustrates the steps to determine an alarm threshold. In regards to industrial application, the steps are simple to implement.

Data collection
Fit a probability distribution function, p (estimate its parameters)

DATA COLLECTION FROM A WIND TURBINE
Data used in this study were taken from Generator Non Drive End of a 3 MW turbine. For a typical generator bearing monitoring performed by Brüel & Kjaer Vibro (B&K Vibro), there could be up to or more than 10 different vibration data or measurements generated from a sensor. For simplicity of this study, only ISO RMS and High Frequency Band Pass (HFBP) data are used for analysis. HFBP is usually used as early indicator of potential bearing related problems, and ISO RMS is usually used as general indicator of faults developing into a later stage. These two measurements or indicators can reflect the general conditions of generator bearings across all turbine types. For more specific problems, such as looseness or imbalance, other measurements or indicators are needed.
ISO RMS and HFBP are computed in the time domain (computing the root mean squared of the signal) after applying the appropriate filter settings. The sample length is set so that it captures approximately 10 revolutions of the generator rotation. The vibration is sampled at 25600 per second.
The data were collected for approximately two months while the turbine was running during its normal operating conditions and producing power at least above 100 kW. No known mechanical faults existed during the data collection period. The data were collected by the data acquisition unit on the turbine and sent every 5 minutes to a remote surveillance center. Data collection interval could actually vary in the real or commercial condition monitoring systems. It often depends on the choice of monitoring strategy of the machine.
As described by (Marhadi & Hilmisson, 2013), since a wind turbine operates over a wide range of speeds and loads, it is important to set thresholds within more or less the same operating condition. Thus changes in measured vibration levels are indeed due to developing faults, and not due to changing operating conditions. Typical B&K Vibro monitoring strategy for wind turbines is to divide the operating conditions of a wind turbine into 5 different operating power classes (OPC) based on the power produced by the wind turbine. For a 3 MW turbine, the power classes are as follow: 100 -700 kW (Class 1), 700 -1300 kW (Class 2), 1300 -2000 kW (Class 3), 2000 -2700 kW (Class 4), and 2700 -3200 kW (Class 5). Thus each measurement is classified based on in which operating condition it is taken. No data are recorded when the turbine operates below 100 kW or above 3200 kW. Figure 3 and 4 present the distributions of ISO RMS and HFBP taken over a period of approximately two months in two power classes. Through out the paper only data from the first two power classes are presented for better clarity and organization. Johnson, Normal, and Weibull distributions are fitted in each type of data for comparison. The figures show that even though the data type is the same (e.g. ISO RMS), however the distribution in different power classes can be significantly different. In this example, the Johnson family type that fits each data type is found to be bounded Johnson distribution (S B ).
The alarm thresholds were then computed following steps described in section 3. In this work, all data are assumed to be 4 Figure 3. Histogram of HFBP data with different distributions fit in 2 power classes. The false alarm rates of the whole data were computed when the thresholds set based on the whole data were used. The results are presented in tables 3 and 4. Thresholds based on Johnson and Weibull distributions generally result in the lowest false alarm rates. However, there are some thresholds that result in false alarm rates that are not within the specified probability of false alarm. Thresholds set based on Normal distribution are more likely to have higher false alarm rate. This shows the difficulty in fitting the most appropriate distribution to the data. For Normal distribution, it only has parameters that describe location and scale of a population. Thus it is very difficult to describe data that are not symmetric. Another example, the type of Johnson family fitted to the data is bounded (S B ) in all power classes for both HFBP and ISO RMS since the data determine this family to be the most suit-   (Marhadi, Venkataraman, & Pai, 2012). However, having the data determine the most appropriate family is more consistent with the objective of using Johnson distribution and should also be more accurate (due to more likely representing the data). In this way optimum and conservative estimate of the threshold can be achieved.

THRESHOLD CALCULATION BASED ON LIMITED DATA
Ideally, the vibration data collected to set alarm thresholds should reflect all normal operating conditions (without any mechanical faults and the turbine has gone through all possible weather and seasonal conditions) in order to set the thresholds effectively. This data collection may take up to a year, and it is clearly impractical. A more practical approach is to collect a month of vibration data (or even less than a month), 5 Table 3. HFBP false alarm rates (%) at 2 OPCs when thresholds are set based on the whole data. and set the thresholds based on the collected data.
Realistically, the turbine may not have gone through all normal operating conditions after a month of operation. Within almost two months of data collection with every 5 minutes interval of data recording, the numbers of vibration data in each OPC from the turbine used in this study are as follows: 3067 data in OPC 1, 1960 data in OPC 2, 1673 data in OPC 3, 1595 data in OPC 4, and 1719 data in OPC 5. The underlying question is: do these numbers reflect the operating conditions for the rest of the year? Experience has shown that thresholds can be set based on these data, but adjustments might be necessary after a couple of months. For all practical purposes the number of adjustments needs to be minimum.
To investigate the effects of having limited data (not enough data to capture all operating conditions of a turbine) in setting alarm thresholds, the vibration data collected from each OPC are re-sampled uniformly with the following numbers of samples: 720, 360, 180, and 90. It is assumed that all vibration data collected represent the overall operating conditions of the turbine. Another assumption is made that in a worst case scenario, vibration data from a turbine are collected and sent every hour (e.g. to reduce data collection). With this assumption, the vibration data available in this study represent approximately 3 months of data. Then the numbers of re-samples from these data represent 30 days, 15 days, 7.5 days, and 3.75 days of data. Although the numbers of samples look statistically sound, in reality, they may reflect only short periods of the turbine operational time (order of days).
Tables 5 and 6 present the false alarm rates when the thresholds set based on limited data are used or checked against the whole data available. As the number of data used to compute thresholds decreases, the false alarm rates can either increase or decrease. This indicates that the data available are crucial for thresholds setting. Smaller false alarm rates can be achieved if the sampled data are more representative of the actual distribution. Figures 5 and 6 give visual representations of how the distributions of sampled data could actually be different from the whole population.  (7)), which can be done to reduce fluctuation in the data and to provide smoother trending. In this study, α = 0.01 and Alarming can be done on the averaged data over time. As stated earlier, the averaged data are smoother and provide a clearer picture when a mechanical fault develops, e.g. by increasing vibration level over time. The false alarm rates are 6 Figure 6. Emperical CDF of ISO RMS from various sampled data in 2 power classes. zero in most cases (e.g. different number of samples to set thresholds) when the averaged data are checked against the computed thresholds. Trending the averaged data also ensures that the machine condition is indeed entering an abnormal condition when the trend crosses a threshold.
The data used to represent overall operating conditions of the turbine are indeed limited. However the main purpose of this exercise is to show the effects of having limited data. The study shows the importance of having data as representative as possible to the overall operating condition data.
Sampled data from wind turbines vary a lot in reality depending on when the data are collected, e.g. seasons and operating conditions. Thus for example thresholds set based on data sampled during winter time will be different from the ones set based on data sampled during summer time. The study rep- resents this reality because it is clearly impractical to collect one year data before setting thresholds. A fault could develop within a year, and it could be undetected if no appropriate thresholds are set. The next section describes possibilities of dealing with such limited data.

QUANTIFYING UNCERTAINTY IN LIMITED DATA
The previous sections have shown that in wind turbine applications, the number of available data can be statistically large, but not necessarily represent the actual distribution of the data or all operating conditions of a turbine. Having limited amount of data generally leads into uncertainty in choosing the appropriate probability distribution to fit the data. Moreover, even if the correct probability distribution is known, having limited amount of data that do not represent the actual population can results in wrong estimates of the distribution parameters. Thus the thresholds set based on these data could be either too low or too high (not optimum).
It is beneficial to quantify the uncertainty of thresholds (the confidence bounds) set based on limited data. This can be done by first quantifying the uncertainty of the statistical distribution parameters. Different methods are available, both analytically (e.g. maximum likelihood estimate) or based on re-sampling techniques (e.g. bootstrap) and Bayesian estimate. (Marhadi et al., 2012) have described that there have been no analytical methods to estimate uncertainties (confidence bounds) of Johnson distribution fitted to some data. methods, e.g. Bayesian inference. Although the implementation is simple, bootstrap method is known to have some limitations as described by (Chernick, 1999), such as problems with estimating extreme values and variance of a distribution that has a very large/infinite variance. For comparison and to overcome some limitations of bootstrap method, a Bayesian inference procedure is used to estimate the distribution of Johnson, Normal, and Weibull parameters and the resulting bounds of the thresholds.

Bootstrap Method
Bootstrap technique re-samples the sampled data of 720, 360, 180, and 90 with replacements, and obtains new sets of 720, 360, 180, and 90 data. After each sampling, the distribution parameters are estimated using the selected samples, and the thresholds are calculated based on the estimated parameters of the distributions. Due to sampling with replacement, some samples are repeated in the new selected set. Bootstrap sampling is applied 1000 times, and the statistical parameters estimated are computed for each sample set in 1000 bootstrap repetition.
For estimating Johnson distribution parameters, in each selection set the appropriate Johnson family distribution (S L , S B , S U , or S N ) is determined using moment values of the data in the selection set. The results of the bootstrap techniques are the 2.5 and 97.5 percentiles of the thresholds set based on each distribution studied. They provide lower and upper bounds of the thresholds with 95% confidence. This information provides flexibility for an engineer to choose the thresholds within the lower and upper bounds. The false alarm rates are then computed again as the lower and upper bound thresholds are used on the whole data available to simulate a real situation when only limited amount of data available to set thresholds. The results are presented in tables 7 to 10. As one may expect, the lower bound thresholds result in higher false alarm rates and the upper bound ones result in lower rates. Generally the upper thresholds set based on both Johnson and Weibull distributions result in low false alarm rate. The main concern is always whether the thresholds have been set optimally by choosing the most appropriate distribution describing the data. Since the underlying distribution of data collected is not always known beforehand, fitting Johnson distribution can be a general or middle ground solution. Some thresholds determined from limited data are very closed to the thresholds determined from the whole data (e.g. ISO RMS threshold in OPC 2 in figure 12). Some of them are higher or even lower than the thresholds determined from the whole data, but the upper and lower bounds enclose the thresholds from the whole data (e.g. HFBP thresholds in OPC 1 in figure 9). Comparing the bounds of thresholds based on Johnson and Weibull distributions, they are relatively in the same order of magnitude. However, Weibull distribution tends to result in larger upper bounds (e.g. see ISO RMS thresholds in figure 12). Thus choosing Johnson distribution as the basis of setting threshold in this case is a more conservative approach.
8 Figure 9. Confidence bounds of HFBP thresholds from bootstrapping various sampled data in 2 power classes. Thresholds are based on Johnson distribution.

Bayesian Inference
Bayesian inference is a statistical method that allows using observation data (x) to infer the unknown parameters (θ) of a distribution that may describe the data. The unknown parameters are represented as PDF. Bayes theorem allows to relate the condition probability distribution of the observed data (x) given the distribution parameters (θ), p(x|θ) to the condition probability of the parameter (θ) given the observation data (x), p(θ|x) as shown in equation 8, where p(θ|x) is the posterior PDF of θ given x, l(θ|x) = p(x|θ) is the likelihood of data x given θ, and p(θ) is known as the prior distributions of θ. The prior here reflects prior knowledge of θ before any data are considered.  The likelihood is the same as the PDF chosen to fit the data. The prior is usually subjective. The posterior distribution is then obtained by multiplying the prior and all the likelihood functions according to the number of observed data (n) as Implementation of Bayesian inference for each distribution studied in this work is slightly different. Each implementation will be briefly described.

Johnson Distribution
The inference of Johnson distribution parameters follows the procedure outlined by (Marhadi et al., 2012). Sampling the joint distribution function (posterior distribution) in equation 9 is often difficult and required using a Markov Chain Monte 9 Figure 11. Confidence bounds of HFBP thresholds from bootstrapping various sampled data in 2 power classes.
Thresholds are based on Weibull distribution. Carlo (MCMC) method. In (Marhadi et al., 2012), they used a Metropolis method to sample the posterior distribution. They also chose to use non-informative prior or flat prior, with an infinite interval. They reported that sampling the four parameters of Johnson distribution simultaneously could cause the Metropolis method fail to converge. It is more likely to achieve convergence by inferring only two parameters, namely γ and δ assuming the estimates for location and scale parameters (ξ and λ) are more accurate to obtain.
Following findings in (Marhadi et al., 2012), only γ and δ are inferred in this work. Based on the sampled data, Bayesian inference of Johnson S B , S L , S N or S U distribution can be performed. It is determined based on the moments of the data using moment matching method as described in section 2. Bayesian inference is performed with a random walk Metropolis method with 4000 burn-in iterations period and 2000 samples from the posterior distribution. The scale parameters (variance) of the proposal distribution/density (a bivariate Normal distribution with zero covariance) are adjusted so that acceptance rate between 30% to 50% can be achieved. For more details description of the Metropolis method, one can refer to (MacKay, 2003). It is found that even when only γ and δ are inferred in this work, convergence of the Metropolis method can be difficult to achieve when flat prior is used. Thus Normal priors for γ and δ are investigated. Again, prior is often subjective and could be subject to more detailed investigation in future work.
It is assumed that γ and δ are distributed according to Normal distribution. The means are assumed to be equal to the first estimates of γ and δ of the sampled data. The variance is difficult to estimate. However, after some trials and errors, it is found that standard deviations of 0.5 of the means (first estimates of γ and δ) could result in satisfactory convergence. Figure 13 shows

Normal Distribution
Bayesian inference for normal distribution has been discussed thoroughly by (Box & Tiao, 1973). The normal distribution has PDF in the form of The joint posterior distribution between σ and µ can be described as proportional to p(µ, σ)p(x|µ, σ 2 )p(s 2 |σ 2 ). The priors for µ and σ are approximately independent so that p(µ, σ) is approximately equal to p(µ)p(σ), where the prior for σ and µ are assumed to be p(µ) is proportional constant and p(σ) = 1/σ. Hence the posterior distribution can be shown to be where The conditional posterior distribution for µ and σ can then be described as which is a normal distribution with meanx and variance σ 2 /n, and It is shown in (Box & Tiao, 1973) that ns 2 /σ 2 has chi-square (χ 2 ) distribution with n degrees of freedom. Thus a Gibbs sampling (see for example (MacKay, 2003)) can easily be performed to obtain parameters µ and σ from their conditional posterior distributions. Similar to Bayesian inference of Johnson distribution described in the previous section, 2000 samples of the parameters are obtained after 4000 burn-in iterations period. The 2000 samples of the parameters are then used to determine thresholds based on Normal distribution. The 2.5 and 97.5 percentiles of the thresholds are determined to provide lower and upper bounds.

Weibull Distribution
In this work three-parameter Weibull distribution is used with PDF as follows where β is the shape parameter, η is the scale parameter, and τ is the location parameter. Bayesian inference of Weibull distribution in this study follows some of the work by (Green, Roesch Jr., Smith, & Strawderman, 1994). Parameters β and η are constrained to be positive, and thus Jeffrey's prior for positive parameters are adopted, where p(β) ∝ (1/β) and p(η) ∝ (1/η). Shape and scale parameters are usually more detrimental in setting alarm thresholds than the location parameter τ . Thus τ in this study is not inferred (constant at the estimated value of the data available). The posterior distribution is shown to be Similar to Bayesian inference of Johnson distribution, the posterior distribution in equation 17 is sampled using random walk Metropolis method with 4000 burn-in iterations period and then obtaining 2000 samples. The scale parameters of the proposal distribution (a bivariate Normal distribution) are adjusted so that acceptance rate between 30% to 50% can be achieved. The 2000 samples from Bayesian inference are then used to determine alarm thresholds based on Weibull distribution, and the 2.5 and 97.5 percentiles of the thresholds are determined to provide lower and upper bounds.
The false alarm rates are computed again as the lower and upper bound thresholds are used on the whole data available. The results are presented in tables 11 to 14. Comparing the results from Bayesian inference and from Bootstrap method, the false alarm rates are more or less in the same order of magnitude with the exception of thresholds based on Johnson distribution. Using lower thresholds of HFBP based on Johnson distribution result in significantly higher false alarm rates. This is because the lower bounds are very low as can be seen in figure 14. Figures 14 and 15 show the lower and upper bounds of the thresholds based on Bayesian inference of Johnson distribution using 90, 180, 360, and 720 data. In comparison to results from bootstrap, the bounds for HFBP are generally larger, with the lower bounds are generally much lower. These bounds results in much higher false alarm rates if they are used. Only in OPC 1 where HFBP thresholds from 360 and 720 data have much higher upper bounds than the bounds from bootstrap. These results could be due to the choice of prior, which is subject to further study. On the contrary, the bounds for ISO RMS are generally much tighter than bounds from bootstrap. The lower and upper bounds of thresholds based on Weibull distribution are also shown in figures 16 and 17. Comparing the bounds from Bayesian inference and from bootstrap method, they are in the same order of magnitude. However, bounds from Bayesian inference of Weibull distribution are slightly tighter than bounds from bootstrap. These results are encouraging to prevent setting thresholds too high.
Using Bayesian inference to quantify uncertainties in setting alarm thresholds is actually attractive when large quantity of historical data are available because the method facilitates learning. However there are still some challenges that need to be solved before it can be used in real industrial applications, such as having a faster/efficient method to sample the posterior distribution. In case of using an MCMC method, there is not yet a well established method to determine how many burn-in iterations are needed that guarantees convergence. Convergence could potentially be achieved after a long burn-in period that requires long computational time.
In regards to using Johnson distribution, proper selection of the priors still needs further investigation so that sampling the posterior distribution is computationally efficient, and the whole 4 parameters could possibly be inferred.
In the actual wind turbine condition monitoring at B&K Vibro, an alarm is not always generated when a measurement crosses a threshold in any power classes. A more complex system is implemented to prevent false alarms, see for example the work by (Marhadi & Hilmisson, 2013). This paper simply presents a general framework to set alarm thresholds automatically with possible industrial applications in mind, and how the uncertainties in setting the thresholds can be quantified when only limited data are available. The method could be useful not only in wind turbine applications, but also in other machineries.

CONCLUSION
A method to set alarm thresholds automatically based on fitting different distributions to vibration data has been presented. In particular, using Johnson distribution eliminates the need to test various distributions that could fit the collected data most appropriately. Thus it can prevent choosing incorrect distribution that may result in setting sub-optimal thresholds. Results in this study show that low false alarm rate can be achieved by utilizing Johnson distribution. The implementation is simple and straightforward, which should also be applicable in machineries other than wind turbines.
The problem of having limited data in wind turbines that may not represent the whole or most operating conditions of a turbine has been investigated based on bootstrap method and Bayesian inference. Lower and upper bounds of alarm thresholds are obtained using both methods, and the false alarm rates are investigated when these thresholds are used. These could provide information where to set the thresholds effectively. Bootstrap is generally simple to implement, while Bayesian inference has more complicated implementation. Results from both methods are comparable in most cases. However, initial results in this study suggest that Bayesian inference could potentially prevent from setting the thresholds too high once the challenges of its implementation can be overcome.
Future work may include investigation of the effectiveness of the method when it is actually implemented to a wide number of turbines to catch real mechanical faults. Comparison with other methods or the more established ones could be made in this way, and the effectiveness of each method can be validated. Future work may also include finding the most effective method to estimate Johnson distribution parameters other than the moment matching method used in this study. Comparison with more various distributions, such as Gaussian mixture distribution or Rayleigh distribution could also be made. Figure 16. Confidence bounds of HFBP thresholds from Bayesian inference of various sampled data in 2 power classes. Thresholds are based on Weibull distribution.