Towards Developing a Novel Framework for Practical PHM: a Sequential Decision Problem solved by Reinforcement Learning and Artificial Neural Networks



Luca Bellani Michele Compare Piero Baraldi Enrico Zio


The heart of prognostics and health management (PHM) is to predict the equipment degradation evolution and, thus, its Remaining Useful Life (RUL). These predictions drive the decisions on the equipment Operation and Maintenance (O&M), and these in turn influence the equipment degradation evolution itself. In this paper, we propose a novel PHM framework based on Sequential Decision Problem (SDP), Artificial Neural Networks (ANNs) and Reinforcement Learning (RL), which allows properly considering this feedback loop for optimal sequential O&M decision making. The framework is applied to a scaled-down case study concerning a real mechanical equipment equipped with PHM capabilities. A comparison of the proposed framework with traditional PHM is performed.

Abstract 47 | PDF Downloads 35



PHM, Maintenance planning, Artificial neural network, Sequential Decision Problem, Reinforcement Learning

Aissani, N., Beldjilali, B., & Trentesaux, D. (2009). Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach. Engineering Applications of Artificial Intelligence, 22(7), 1089–1103.
Arulampalam, M., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE
Transactions on Signal Processing, 50(2), 174-188. doi: 10.1109/78.978374
Banjevic, D., & Jardine, A. (2006). Calculation of reliability function and remaining useful life for a markov failure time process. IMA Journal of Management Mathematics, 17(2), 115-130. doi: 10.1093/imaman/dpi029
Baraldi, P., Balestrero, A., Compare, M., Zio, E., Benetrix, L., & Despujols, A. (2012). A new modeling framework of component degradation. Advances in Safety, Reliability and Risk Management - Proceedings of the European Safety and Reliability Conference, ESREL 2011, 776-781.
Barde, S. R., Yacout, S., & Shin, H. (2016). Optimal preventive maintenance policy based on reinforcement learning of a fleet of military trucks. Journal of Intelligent Manufacturing, 1–15.
Benardos, P., & Vosniakos, G.-C. (2007). Optimizing feedforward artificial neural network architecture. Engineering Applications of Artificial Intelligence, 20(3), 365–382.
Cadini, F., Sbarufatti, C., Corbetta, M., & Giglio, M. (2017). A particle filter-based model selection algorithm for fatigue damage identification on aeronautical structures. Structural Control and Health Monitoring, 24(11). doi: 10.1002/stc.2002
Cadini, F., Zio, E., & Avram, D. (2009). Monte carlo-based filtering for fatigue crack growth estimation. Probabilistic Engineering Mechanics, 24(3), 367-373. doi: 10.1016/j.probengmech.2008.10.002
Camci, F., Medjaher, K., Atamuradov, V., & Berdinyazov, A. (2018). Integrated maintenance and mission planning using remaining useful life information. Engineering Optimization, 1–16.
Cannarile, F., P., B., Compare, M., Borghi, D., Capelli, L., & Zio, E. (2018). A heterogeneous ensemble approach for the prediction of the remaining
useful life of packaging industry machinery (B. A. E. G. C. v. E. K. T. E. V. J. E. S. Haugen S. (Ed.) & C. P. Reliability Safe Societies in a ChangingWorld,
Eds.). London.
Compare, M., Baraldi, P., & Zio, E. (2018). Predictive maintenance in the industry 4.0. Journal of Quality and Maintenance Engineering. (Submitted)
Compare, M., Bellani, L., Cobelli, E., & Zio, E. (2018). Reinforcement learning-based flow management of gas turbine parts under stochastic failures. The International Journal of Advanced Manufacturing Technology, 99(9-12), 2981–2992.
Compare, M., Bellani, L., & Zio, E. (2017, June). Availability model of a phm-equipped component. IEEE Transactions on Reliability, 66(2), 487-501. doi:
Compare, M., Bellani, L., & Zio, E. (2017). Reliability model of a component equipped with phm capabilities. Reliability Engineering & System Safety,
168, 4 - 11. (Maintenance Modelling) doi:
Compare, M., Marelli, P., Baraldi, P., & Zio, E. (2018). A markov decision process framework for optimal operation of monitored multi-state systems. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability. (Article in Press) doi: 10.1177/1748006X18757077
Ding, F., Tian, Z., Zhao, F., & Xu, H. (2018). An integrated approach for wind turbine gearbox fatigue life prediction considering instantaneously varying load conditions. Renewable Energy, 129, 260-270. doi: 10.1016/j.renene.2018.05.074
Dragomir, O., Gouriveau, R., Dragomir, F., Minca, E., & Zerhouni, N. (2014). Review of prognostic problem in condition-based maintenance. 2009 European Control Conference, ECC 2009, 1587-1592.
Enserink, S.,&Cochran, D. (1994). A cyclostationary feature detector. In Proceedings of 1994 28th asilomar conference on signals, systems and computers (pp. 806–810).
Frank, J., Mannor, S., & Precup, D. (2008). Reinforcement learning in the presence of rare events. In Proceedings of the 25th international conference on machine learning (pp. 336–343).
Gardner, W. A., & Spooner, C. M. (1994). The cumulant theory of cyclostationary time-series. i. foundation. IEEE Transactions on signal processing, 42(12), 3387–3408.
Giannoccaro, I., & Pontrandolfo, P. (2002). Inventory management in supply chains: a reinforcement learning approach. International Journal of Production Economics, 78(2), 153–161.
Gollier, C. (2002). Time horizon and the discount rate. Journal of economic theory, 107(2), 463–473.
Hanachi, H., Mechefske, C., Liu, J., Banerjee, A., & Chen, Y. (2018). Performance-based gas turbine health monitoring, diagnostics, and prognostics: A survey. IEEE Transactions on Reliability. (Article in Press) doi: 10.1109/TR.2018.2822702
Haykin, S. S., Haykin, S. S., Haykin, S. S., & Haykin, S. S. (2009). Neural networks and learning machines (Vol. 3). Pearson Upper Saddle River.
Jardine, A., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483-1510. doi: 10.1016/j.ymssp.2005.09.012
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237–285.
Keizer, M. C. O., Teunter, R. H., & Veldman, J. (2017). Joint condition-based maintenance and inventory optimization for systems with multiple components. European Journal of Operational Research, 257(1), 209–222.
Khamis, M. A., & Gomaa, W. (2014). Adaptive multiobjective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework. Engineering Applications of Artificial Intelligence, 29, 134–151.
Kim, C., Jun, J., Baek, J., Smith, R., & Kim, Y. (2005). Adaptive inventory control models for supply chain management. The International Journal of Advanced Manufacturing Technology, 26(9-10), 1184–1192.
Kober, J., Bagnell, J., & Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11), 1238-1274. doi: 10.1177/0278364913495721
Kozin, F., & Bogdanoff, J. (1989). Probabilistic models of fatigue crack growth: Results and speculations. Nuclear Engineering and Design, 115(1), 143-171. doi: 10.1016/0029-5493(89)90267-7
Kuznetsova, E., Li, Y.-F., Ruiz, C., Zio, E., Ault, G., & Bell, K. (2013). Reinforcement learning for microgrid energy management. Energy, 59, 133–146.
Kwon, D., Hodkiewicz, M., Fan, J., Shibutani, T., & Pecht, M. (2016). Iot-based prognostics and systems health management for industrial applications. IEEE Access, 4, 3659-3670. doi: 10.1109/ACCESS.2016.2587754
Leser, P., Hochhalter, J., Warner, J., Newman, J., Leser, W., Wawrzynek, P., & Yuan, F.-G. (2017). Probabilistic fatigue damage prognosis using surrogate models trained via three-dimensional finite element analysis. Structural Health Monitoring, 16(3), 291-308. doi: 10.1177/1475921716643298
Ling, Y., & Mahadevan, S. (2012). Integration of structural health monitoring and fatigue damage prognosis. Mechanical Systems and Signal Processing, 28, 89-104. doi: 10.1016/j.ymssp.2011.10.001
Lisnianski, A., Elmakias, D., Laredo, D., & Ben Haim, H. (2012). A multi-state markov model for a short-term reliability analysis of a power generating unit. Reliability Engineering and System Safety, 98(1), 1-6. doi: 10.1016/j.ress.2011.10.008
Liu, Z., Jia, Z., Vong, C.-M., Han, J., Yan, C., & Pecht, M. (2018). A patent analysis of prognostics and health management (phm) innovations for electrical systems. IEEE Access, 6, 18088-18107. doi: 10.1109/ACCESS. 2018.2818114
Lojowska, A., Kurowicka, D., Papaefthymiou, G., & van der Sluis, L. (2012). Stochastic modeling of power demand due to evs using copula. IEEE Transactions on Power Systems, 27(4), 1960–1968.
Mnih, V., Badia, A., Mirza, L., Graves, A., Harley, T., Lillicrap, T., . . . Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In (Vol. 4, p. 2850-2869).
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., . . . Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
Pipe, K. (2009). Practical prognostics for condition based maintenance. Annual Forum Proceedings - AHS International, 2, 1670-1679.
Pontrandolfo, P., Gosavi, A., Okogbaa, O. G., & Das, T. K. (2002). Global supply chain management: a reinforcement learning approach. International Journal of Production Research, 40(6), 1299–1317.
Power, T., & Verbic, G. (2017). A nonparametric bayesian model for forecasting residential solar generation. In 2017 australasian universities power engineering conference (aupec) (pp. 1–6).
Rahimiyan, M., & Mashhadi, H. R. (2010). An adaptive q-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Transactions on Systems, Man, and Cybernetics Part C, Applications and Reviews, 40(5), 547.
Riedmiller, M. (2005). Neural fitted q iteration–first experiences with a data efficient neural reinforcement learning method. , 317–328.
Ripley, B. D. (2007). Pattern recognition and neural networks. Cambridge university press.
Rocchetta, R., Bellani, L., Compare, M., Zio, E., & Patelli, E. (2019). A reinforcement learning framework for optimal operation and maintenance of power grids. Applied Energy, 241, 291–301.
Rocchetta, R., Compare, M., Patelli, E., & Zio, E. (2018). A reinforcement learning framework for optimisation of power grid operations and maintenance. In 8th international workshop on reliable engineering computing: computing with confidence.
Rodrigues, L., Yoneyama, T., & Nascimento, C. (2012). How aircraft operators can benefit from phm techniques. IEEE Aerospace Conference Proceedings. doi: 10.1109/AERO.2012.6187376
Sankararaman, S. (2015). Significance, interpretation, and quantification of uncertainty in prognostics and remaining useful life prediction. Mechanical Systems and Signal Processing, 52-53(1), 228-247. doi: 10.1016/j.ymssp.2014.05.029
Sankararaman, S., Ling, Y., Shantz, C., & Mahadevan, S. (2011). Uncertainty quantification in fatigue crack growth prognosis. International Journal of Prognostics and Health Management, 2(1).
Saxena, A., Celaya, J., Saha, B., Saha, S., & Goebel, K. (2010a). Evaluating prognostics performance for algorithms incorporating uncertainty estimates.
IEEE Aerospace Conference Proceedings. doi: 10.1109/AERO.2010.5446828
Saxena, A., Celaya, J., Saha, B., Saha, S., & Goebel, K. (2010b). Metrics for offline evaluation of prognostic performance. International Journal of Prognostics and Health Management, 1(1).
Sigaud, O., & Buffet, O. (2013). Markov decision processes in artificial intelligence. John Wiley & Sons.
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., Van Den Driessche, G., . . . Hassabis, D. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484-489. doi: 10.1038/nature16961
Simes, J., Gomes, C., & Yasin, M. (2011). A literature review of maintenance performance measurement: A conceptual framework and directions for future research. Journal of Quality in Maintenance Engineering, 17(2), 116-137. doi: 10.1108/13552511111134565
Speijker, L., Van Noortwijk, J., Kok, M., & Cooke, R. (2000). Optimal maintenance decisions for dikes. Probability in the Engineering and Informational Sciences, 14(1), 101-121.
Sutton, R., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1), 181-211. doi: 10.1016/S0004-3702(99)00052-1
Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (Vol. 135). MIT Press Cambridge.
Szepesv´ari, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, 4(1), 1–103.
Tsitsiklis, J. N., & Van Roy, B. (2002). On average versus discounted reward temporal-difference learning. Machine Learning, 49(2-3), 179–191.
Van Horenbeek, A., & Pintelon, L. (2013). A prognostic maintenance policy-effect on component lifetimes. In 2013 proceedings annual reliability and maintainability symposium (rams) (pp. 1–6).
van Noortwijk, J. (2009). A survey of the application of gamma processes in maintenance. Reliability Engineering & System Safety, 94(1), 2 -
21. (Maintenance Modeling and Application) doi:
Walraven, E., Spaan, M. T., & Bakker, B. (2016). Traffic flow optimization: A reinforcement learning approach. Engineering Applications of Artificial Intelligence, 52, 203–212.
Wang, Y.-C., & Usher, J. M. (2005). Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 18(1), 73–82.
Zio, E. (2012). Prognostics and health management of industrial equipment. Diagnostics and Prognostics of Engineering Systems: Methods and Techniques, 333-356. doi: 10.4018/978-1-4666-2095-7.ch017
Zio, E. (2016). Some challenges and opportunities in reliability engineering. IEEE Transactions on Reliability, 65(4), 1769-1782. doi: 10.1109/TR.2016.2591504
Zio, E.,&Compare, M. (2013). Evaluating maintenance policies by quantitative modeling and analysis. Reliability Engineering & System Safety, 109, 53–65.
Technical Papers