A Probabilistic Machine Learning Approach to Detect Industrial Plant Faults
Fault detection in industrial plants is a hot research area as more and more sensor data are being collected throughout the industrial process. Automatic data-driven approaches are widely needed and seen as a promising area of investment. This paper proposes an effective machine learning algorithm to predict industrial plant faults based on classification methods such as penalized logistic regression, random forest and gradient boosted tree. A fault’s start time and end time are predicted sequentially in two steps by formulating the original prediction problems as classification problems. The algorithms described in this paper won first place in the Prognostics and Health Management Society 2015 Data Challenge.
fault detection, machine learning, random forest, PHM data challenge, data-driven method, gradient boosted tree, penalized logistic regression
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32.
Chiang, L. H., Kotanchek, M. E., & Kordon, A. K. (2004). Fault diagnosis based on fisher discriminant analysis and support vector machines. Computers & chemical engineering, 28(8), 1389–1401.
Chiang, L. H., Russell, E. L., & Braatz, R. D. (2000). Fault diagnosis in chemical processes using fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemometrics and intelligent laboratory systems, 50(2), 243–252.
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1). Springer series in statistics Springer, Berlin.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378.
He, Q. P., & Wang, J. (2007). Fault detection using the knearest neighbor rule for semiconductor manufacturing processes. Semiconductor manufacturing, IEEE transactions on, 20(4), 345–354.
Hosmer Jr, D. W., & Lemeshow, S. (2004). Applied logistic regression. John Wiley & Sons.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.
Korbicz, J., Koscielny, J. M., Kowalczuk, Z., & Cholewa, W. (2012). Fault diagnosis: models, artificial intelligence, applications. Springer Science & Business Media.
Lee, S., Park, W., & Jung, S. (2014). Fault detection of aircraft system with random forest algorithm and similarity measure. The Scientific World Journal, 2014.
Rish, I. (2001). An empirical study of the naive bayes classifier. In Ijcai 2001 workshop on empirical methods in artificial intelligence (Vol. 3, pp. 41–46).
Rosca, J., Song, Z., Willard, N., & Eklund, N. (2015). PHM15 Challenge Competition and Data Set: Fault Prognostics, NASA Ames Prognostics Data Repository (http://ti.arc.nasa.gov/project/prognosticdata-repository), NASA Ames Research Center, MoffettField, CA.
Samanta, B. (2004). Gear fault detection using artificial neural networks and support vector machines with genetic algorithms. Mechanical Systems and Signal Processing, 18(3), 625–644.
Wang, L., & Yu, J. (2005). Fault feature selection based on modified binary pso with mutation and its application in chemical process fault diagnosis. In Advances in natural computation (pp. 832–840). Springer.
Yin, S., Ding, S. X., Haghani, A., Hao, H., & Zhang, P. (2012). A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark tennessee eastman process. Journal of Process Control, 22(9), 1567–1581.