Reinforcement Learning Control for Natural Circulation in a Marine Pressurised Water Reactor Cooling System

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jul 3, 2026
Felipe Montana Will Jacobs Visakan Kadirkamanathan Gary Brooks Andy Mills

Abstract

In safety-critical systems, a system fault response can lead to a system shutdown. While safe at a component level, this poses safety challenges for the system as a whole, requiring an additional system to manage this process. In pressurised water reactor (PWR) submarines a loss of coolant pump can force a shutdown by dropping the control rods, referred to as SCRAM. To avoid this, a possible response is to use natural circulation, a degraded operating mode characterised by strong non-linearities in system dynamics, to provide a limited level of functionality. Under these conditions, conventional model-based control approaches become difficult to apply, as the assumptions underlying nominal system models no longer hold. This paper investigates the feasibility of using reinforcement learning (RL) as a fault-response control strategy for systems operating under degraded and poorly modelled conditions. RL provides a data-driven framework capable of learning control policies directly from a black-box model or simulator, without requiring an explicit analytical model. However, when applied in a safety-critical fault management context, understanding and validating the learnt control policy is essential. We analyse the policy learnt through RL by approximating it with a transparent surrogate model and through visualisation of the policy actions. We further assess the robustness of the policy to modelling errors, providing insight into its sensitivity to discrepancies between the simulated environment and the real system. The proposed
approach is evaluated using a simplified submarine reactor cooling loop model that captures key features of fault-induced operation, including changes in system dynamics due to platform pitch and cascading faults. The results demonstrate the potential of reinforcement learning for interpretable control
under faulted conditions.

How to Cite

Montana, F., Jacobs, W., Kadirkamanathan, V., Brooks, G., & Mills, A. (2026). Reinforcement Learning Control for Natural Circulation in a Marine Pressurised Water Reactor Cooling System. PHM Society European Conference, 9(1), 1–10. https://doi.org/10.36001/phme.2026.v9i1.4899
Abstract 0 | PDF Downloads 0

##plugins.themes.bootstrap3.article.details##

Keywords

Reinforcement learning, Safety-critical systems, Control system

References
Bastani, O., Pu, Y., & Solar-Lezama, A. (2018). Verifiable reinforcement learning via policy extraction. Advances in Neural Information Processing Systems, 31.

Christiano, P., Shah, Z., Mordatch, I., Schneider, J., Blackwell, T., Tobin, J., ... Zaremba, W. (2016). Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint arXiv:1610.03518.

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., ... others. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897), 414–419.

Gomez, H., Bures, M., & Moure, A. (2019). A review on computational modelling of phase-transition problems. Philosophical Transactions of the Royal Society A, 377(2143), 20180203.

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (pp. 1861–1870).

Hans, A., & Udluft, S. (2010). Ensembles of neural networks for robust reinforcement learning. In 2010 Ninth International Conference on Machine Learning and Applications (pp. 401–406).

Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., ... others. (2017). Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286.

Lillicrap, T., Hunt, J., Pritzel, A., Hess, N., Erez, T., Tassa, Y., ... Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Mandlekar, A., Zhu, Y., Garg, A., Fei-Fei, L., & Savarese, S. (2017). Adversarially robust policy learning: Active construction of physically plausible perturbations. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3932–3939).

Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., & Peters, J. (2022). Robust reinforcement learning: A review of foundations and recent advances. Machine Learning and Knowledge Extraction, 4(1), 276–315.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... others. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In International Conference on Machine Learning (pp. 2817–2826).

Rajeswaran, A., Ghotra, S., Ravindran, B., & Levine, S. (2016). EPOpt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283.

Recht, B. (2019). A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 2(1), 253–279.

Reyes, J. (2005). Natural circulation in water-cooled nuclear power plants: Phenomena, models, and methodology for system reliability assessments (Tech. Rep.). Dr. Jose Reyes.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Sutton, R. S., Barto, A. G., et al. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge, MA: MIT Press.

Tang, H., Rabault, J., Kuhnle, A., Wang, Y., & Wang, T. (2020). Robust active flow control over a range of Reynolds numbers using an artificial neural network trained through deep reinforcement learning. Physics of Fluids, 32(5).

Wang, H.-N., Liu, N., Zhang, Y.-Y., Feng, D.-W., Huang, F., Li, D.-S., & Zhang, Y.-M. (2020). Deep reinforcement learning: A survey. Frontiers of Information Technology & Electronic Engineering, 21(12), 1726–1744.

Zhang, H., Chen, H., Boning, D., & Hsieh, C.-J. (2021). Robust reinforcement learning on state observations with learned optimal adversary. arXiv preprint arXiv:2101.08452.

Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D., & Hsieh, C.-J. (2020). Robust deep reinforcement learning against adversarial perturbations on state observations. Advances in Neural Information Processing Systems, 33, 21024–21037.

Zhang, Y., Tiňo, P., Leonardis, A., & Tang, K. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5), 726–742.
Section
Special Session: PHM for Maritime Safety