Development of Short-Term Forecasting Models Using Plant Asset Data and Feature Selection



Published Jun 8, 2022
Cody Pradeep Ramuhalli Vivek Agarwal Nancy Lybeck Mike Taylor


Nuclear power plants collect and store large volumes of heterogeneous data from various components and systems. With recent advances in machine learning (ML) techniques, these data can be leveraged to develop diagnostic and short-term forecasting models to better predict future equipment condition. Maintenance operations can then be planned in advance whenever degraded performance is predicted, thus resulting in fewer unplanned outages and the optimization of maintenance activities. This enables lower maintenance costs and improves the overall economics of nuclear power.

This paper focuses on developing a short-term forecasting process that leverages a feature selection process to distill large volumes of heterogeneous data and predict specific equipment parameters. A variety of feature selection methods, including Shapley Additive Explanations (SHAP) and variance inflation factor (VIF), were used to select the optimal features as inputs for three ML methods: long short-term memory (LSTM) networks, support vector regression (SVR), and random forest (RF). Each combination of model and input features was used to predict a pump bearing temperature both 1 and 24 hours in advance, based on actual plant system data. The optimal inputs for the LSTM and SVR were selected using the SHAP values, while the optimal input for the RF consisted solely of the response variable itself. Each model produced similar 1-hour-ahead predictions, with root mean square errors (RMSEs) of roughly 0.006. For the 24-hour-ahead predictions, differences could be seen between LSTM, SVR, and RF, as reflected by model performances of 0.036 +- 0.014, 0.0026 +- 0, and 0.063 +- 0.004 RMSE, respectively. As big data and continuous online monitoring become more widely available, the proposed feature selection process can be used for many applications beyond the prediction of process parameters within nuclear infrastructure.

Abstract 17 | PDF Downloads 26



short-term forecasting, support vector regression, long short-term memory, Shapley Additive Explanation, Variance Inflation Factor, Random Forest, Feature selection, feedwater and condensate system, nuclear

Akinwande, M. O., Dikko, H. G., & Samson, A. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 05(07), 754–767. doi: 10.4236/ojs.2015.57075
Alzubi, J., Nayyar, A., & Kumar, A. (2018). Machine Learning from Theory to Algorithms: An Overview. Journal of Physics: Conference Series, 1142(1). doi: 10.1088/1742-6596/1142/1/012012
Atamuradov, V., Medjaher, K., Dersin, P., Lamoureux, B., & Zerhouni, N. (2017). Prognostics and Health Management for Maintenance Practitioners - Review, Implementation and Tools Evaluation. International Journal of Prognostics and Health Management, 8.
Bechhoefer, E., Schlanbusch, R., &Waag, T. I. (2016). Techniques for large, slow bearing fault detection. International Journal of Prognostics and Health Management, 7(1), 1–12. doi: 10.36001/ijphm.2016.v7i1.2358
Booth, A. L., Abels, E., & McCaffrey, P. (2021). Development of a prognostic model for mortality in COVID-19 infection using machine learning. Modern Pathology, 34(3), 522–531. doi: 10.1038/s41379-020-00700-x
Cabrera, D., Guam´an, A., Zhang, S., Cerrada, M., S´anchez, R. V., Cevallos, J., . . . Li, C. (2020). Bayesian approach and time series dimensionality reduction to LSTM-based model-building for fault diagnosis of a reciprocating compressor. Neurocomputing, 380, 51–66. doi: 10.1016/j.neucom.2019.11.006
Chornovol, O., Kondratenko, G., Sidenko, I., & Kondratenko, Y. (2020). Intelligent forecasting system for NPP’s energy production. Proceedings of the 2020 IEEE 3rd International Conference on Data Stream Mining and Processing, DSMP 2020, 102–107. doi: 10.1109/DSMP47368.2020.9204275
Daoud, J. I. (2018). Multicollinearity and Regression Analysis. Journal of Physics: Conference Series, 949(1). doi: 10.1088/1742-6596/949/1/012009
Davò, F., Alessandrini, S., Sperati, S., Delle Monache, L., Airoldi, D., & Vespucci, M. T. (2016). Post-processing techniques and principal component analysis for regional wind power and solar irradiance forecasting. Solar Energy, 134, 327–338. doi: 10.1016/j.solener.2016.04.049
Diez-Olivan, A., Del Ser, J., Galar, D., & Sierra, B. (2019). Data fusion and machine learning for industrial prognosis: Trends and perspectives towards industry 4.0. Information Fusion, 50, 92–111.
Drucker, H., Burges, C. J., Kaufman, L., Smola, A., Vapnik, V., et al. (1997). Support vector regression machines. Advances in neural information processing systems, 9, 155–161.
Farzad, A., Mashayekhi, H., & Hassanpour, H. (2019). A comparative performance analysis of different activation functions in lstm networks for classification. Neural Computing and Applications, 31(7), 2507–2521.
Godwin, J. L., & Matthews, P. (2013). Classification and detection of wind turbine pitch faults through scada data analysis. IJPHM Special Issue on Wind Turbine PHM, 90.
Gohel, H. A., Upadhyay, H., Lagos, L., Cooper, K., & Sanzetenea, A. (2020). Predictive maintenance architecture development for nuclear infrastructure using machine learning. Nuclear Engineering and Technology, 52(7), 1436–1442. doi: 10.1016/
Greff, K., Srivastava, R. K., Koutn´ık, J., Steunebrink, B. R., & Schmidhuber, J. (2016). Lstm: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222–2232.
Hall, M. A., & Smith, L. A. (1998). Practical feature subset selection for machine learning.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8).
Hong, C. W., Lee, C., Lee, K., Ko, M. S., & Hur, K. (2020). Explainable artificial intelligence for the remaining useful life prognosis of the turbofan engines. Proceedings of the 3rd IEEE International Conference on Knowledge Innovation and Invention 2020, ICKII 2020(1), 144–147. doi: 10.1109/ICKII50300.2020.9318912
Jain, R. K., Smith, K. M., Culligan, P. J., & Taylor, J. E. (2014). Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy. Applied Energy, 123, 168–178.
Karasu, S., Altan, A., Bekiros, S., & Ahmad, W. (2020). A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series. Energy, 212, 118750. doi: 10.1016/
Kong,W., Dong, Z. Y., Jia, Y., Hill, D. J., Xu, Y., & Zhang, Y. (2017). Short-term residential load forecasting based on lstm recurrent neural network. IEEE Transactions on Smart Grid, 10(1), 841–851.
Li, R., Verhagen, W. J., & Curran, R. (2019). Comparison of data-driven prognostics models: A process perspective. In 29th european safety and reliability conference.
Lundberg, S. M., Erion, G. G., & Lee, S. I. (2018). Consistent individualized feature attribution for tree ensembles. In 31st conference on neural information processing systems (nips 2017). Long Beach, CA, USA.
Lundberg, S. M., & Lee, S. I. (2017a). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017-Decem (Section 2), 4766–4775.
Lundberg, S. M., & Lee, S.-I. (2017b). A unified approach to interpreting model predictions. In I. Guyon et al. (Eds.), Advances in neural information processing systems 30 (pp. 4765–4774). Curran Associates, Inc.
Mangalathu, S., Hwang, S. H., & Jeon, J. S. (2020). Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219(May), 110927. doi: 10.1016/j.engstruct.2020.110927
Marcilio, W. E., & Eler, D. M. (2020). From explanations to feature selection: Assessing SHAP values as feature selection mechanism. Proceedings - 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images, SIBGRAPI 2020, 340–347. doi: 10.1109/SIBGRAPI51738.2020.00053
Monirul Kabir, M., Monirul Islam, M., & Murase, K. (2010). A new wrapper feature selection approach using neural network. Neurocomputing, 73(16-18), 3273–3283. doi: 10.1016/j.neucom.2010.04.003
Moon, J., Kim, Y., Son, M., & Hwang, E. (2018). Hybrid short-term load forecasting scheme using random forest and multilayer perceptron. Energies, 11(12), 1–20. doi: 10.3390/en11123283
Müller, I. M. (2021). Feature selection for energy system modeling: Identification of relevant time series information. Energy and AI, 4, 100057. doi: 10.1016/j.egyai.2021.100057
Niu, T., Wang, J., Lu, H., Yang, W., & Du, P. (2020). Developing a deep learning framework with two-stage feature selection for multivariate financial time series forecasting. Expert Systems with Applications, 148, 113237. doi: 10.1016/j.eswa.2020.113237
NRC. (1998). BWR/4 Technology Manual (R-104B). Ml022830867, 1–442.
O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality and Quantity, 41(5), 673–690. doi: 10.1007/s11135-006-9018-6
Ozturk, T., Talo, M., Azra, E., Baran, U., & Yildirim, O. (2020). Automated detection of COVID-19 cases using deep neural networks with X-ray images. Computers in Biology and Medicine(January).
Pan, F., Yang, L., Li, Y., Liang, B., Li, L., Ye, T., . . . Zheng, C. (2020). Factors associated with death outcome in patients with severe coronavirus disease-19 (Covid-19): A case-control study. International Journal of Medical Sciences, 17(9), 1281–1292. doi: 10.7150/ijms.46614
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pham, L., Luo, L., & Finley, A. (2020). Evaluation of Random Forest for short-term daily streamflow forecast in rainfall and snowmelt driven watersheds. Hydrology and Earth System Sciences Discussions(June), 1–33. doi: 10.5194/hess-2020-305
Pokharel, S., Sah, P., & Ganta, D. (2021). Improved prediction of total energy consumption and feature analysis in electric vehicles using machine learning and shapley additive explanations method. World Electric Vehicle Journal, 12(3). doi: 10.3390/wevj12030094
Remeseiro, B., & Bolon-Canedo, V. (2019). A review of feature selection methods in medical applications. Computers in Biology and Medicine, 112(July), 103375. doi: 10.1016/j.compbiomed.2019.103375
Salcedo-Sanz, S., Cornejo-Bueno, L., Prieto, L., Paredes, D., & García-Herrera, R. (2018). Feature selection in machine learning prediction systems for renewable energy applications. Renewable and Sustainable Energy Reviews, 90(March), 728–741. doi: 10.1016/j.rser.2018.04.008
Sendlbeck, S., Fimpel, A., Siewerin, B., Otto, M., & Stahl, K. (2021). Condition monitoring of slow-speed gear wear using a transmission error-based approach with automated feature selection. International Journal of Prognostics and Health Management, 12(2), 1–15. doi: 10.36001/IJPHM.2021.V12I2.3026
Shahidi, P., Maraini, D., & Hopkins, B. (2020). Railcar Diagnostics Using Minimal-Redundancy Maximum-Relevance Feature Selection and Support Vector Machine Classification. International Journal of Prognostics and Health Management, 7(4), 1–13. doi: 10.36001/ijphm.2016.v7i4.2524
Song, F., Guo, Z., & Mei, D. (2010). Feature selection using principal component analysis. Proceedings - 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization, ICSEM 2010, 1, 27–30. doi: 10.1109/ICSEM.2010.14
Sthle, L., & Wold, S. (1989). Analysis of variance (ANOVA). Chemometrics and Intelligent Laboratory Systems, 6(4), 259–272. doi: 10.1016/0169-7439(89)80095-4
Theriault, K. (2016). Boiling Water Reactors. In Nuclear engineering handbook. CRC Press. (ISBN: 10.1201/9781315373829-5)
Yildirim, H., & Revan Özkale, M. (2019). The performance of ELM based ridge regression via the regularization parameters. Expert Systems with Applications, 134, 225–233. doi: 10.1016/j.eswa.2019.05.039
Yu, J., Hong, B., Park, J. Y., Hwang, J. H., & Kim, Y. K. (2021). Impact of Prognostic Nutritional Index on Postoperative Pulmonary Complications in Radical Cystectomy: A Propensity Score-Matched Analysis. Annals of Surgical Oncology, 28(3), 1859–1869. doi: 10.1245/s10434-020-08994-6
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. Journal of Applied Science and Technology Trends, 1(2), 56–70. doi: 10.38094/jastt1224
Zhang, Y., Peng, Z., Guan, Y., & Wu, L. (2021). Prognostics of battery cycle life in the early-cycle stage based on hybrid model. Energy, 221, 119901. doi: 10.1016/
Technical Papers