Reinforcement Learning Control for Natural Circulation in a Marine Pressurised Water Reactor Cooling System

Felipe Montana; Will Jacobs; Visakan Kadirkamanathan; Gary Brooks; Andy Mills

doi:10.36001/phme.2026.v9i1.4899

Reinforcement Learning Control for Natural Circulation in a Marine Pressurised Water Reactor Cooling System

PDF

Published Jul 3, 2026

DOI https://doi.org/10.36001/phme.2026.v9i1.4899

Felipe Montana

University of Sheffield

Will Jacobs

University of Sheffield

Visakan Kadirkamanathan

University of Sheffield

Gary Brooks

Rolls-royce

Andy Mills

University of Sheffield

Abstract

In safety-critical systems, a system fault response can lead to a system shutdown. While safe at a component level, this poses safety challenges for the system as a whole, requiring an additional system to manage this process. In pressurised water reactor (PWR) submarines a loss of coolant pump can force a shutdown by dropping the control rods, referred to as SCRAM. To avoid this, a possible response is to use natural circulation, a degraded operating mode characterised by strong non-linearities in system dynamics, to provide a limited level of functionality. Under these conditions, conventional model-based control approaches become difficult to apply, as the assumptions underlying nominal system models no longer hold. This paper investigates the feasibility of using reinforcement learning (RL) as a fault-response control strategy for systems operating under degraded and poorly modelled conditions. RL provides a data-driven framework capable of learning control policies directly from a black-box model or simulator, without requiring an explicit analytical model. However, when applied in a safety-critical fault management context, understanding and validating the learnt control policy is essential. We analyse the policy learnt through RL by approximating it with a transparent surrogate model and through visualisation of the policy actions. We further assess the robustness of the policy to modelling errors, providing insight into its sensitivity to discrepancies between the simulated environment and the real system. The proposed
approach is evaluated using a simplified submarine reactor cooling loop model that captures key features of fault-induced operation, including changes in system dynamics due to platform pitch and cascading faults. The results demonstrate the potential of reinforcement learning for interpretable control
under faulted conditions.

How to Cite

Montana, F., Jacobs, W., Kadirkamanathan, V., Brooks, G., & Mills, A. (2026). Reinforcement Learning Control for Natural Circulation in a Marine Pressurised Water Reactor Cooling System. PHM Society European Conference, 9(1), 1–10. https://doi.org/10.36001/phme.2026.v9i1.4899

Abstract 47 | PDF Downloads 26

Keywords

Reinforcement learning, Safety-critical systems, Control system

References

Bastani, O., Pu, Y., & Solar-Lezama, A. (2018). Verifiable reinforcement learning via policy extraction. Advances in Neural Information Processing Systems, 31.

Christiano, P., Shah, Z., Mordatch, I., Schneider, J., Blackwell, T., Tobin, J., ... Zaremba, W. (2016). Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint arXiv:1610.03518.

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., ... others. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897), 414–419.

Gomez, H., Bures, M., & Moure, A. (2019). A review on computational modelling of phase-transition problems. Philosophical Transactions of the Royal Society A, 377(2143), 20180203.

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (pp. 1861–1870).

Hans, A., & Udluft, S. (2010). Ensembles of neural networks for robust reinforcement learning. In 2010 Ninth International Conference on Machine Learning and Applications (pp. 401–406).

Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., ... others. (2017). Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286.

Lillicrap, T., Hunt, J., Pritzel, A., Hess, N., Erez, T., Tassa, Y., ... Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Mandlekar, A., Zhu, Y., Garg, A., Fei-Fei, L., & Savarese, S. (2017). Adversarially robust policy learning: Active construction of physically plausible perturbations. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3932–3939).

Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., & Peters, J. (2022). Robust reinforcement learning: A review of foundations and recent advances. Machine Learning and Knowledge Extraction, 4(1), 276–315.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... others. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In International Conference on Machine Learning (pp. 2817–2826).

Rajeswaran, A., Ghotra, S., Ravindran, B., & Levine, S. (2016). EPOpt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283.

Recht, B. (2019). A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 2(1), 253–279.

Reyes, J. (2005). Natural circulation in water-cooled nuclear power plants: Phenomena, models, and methodology for system reliability assessments (Tech. Rep.). Dr. Jose Reyes.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Sutton, R. S., Barto, A. G., et al. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge, MA: MIT Press.

Tang, H., Rabault, J., Kuhnle, A., Wang, Y., & Wang, T. (2020). Robust active flow control over a range of Reynolds numbers using an artificial neural network trained through deep reinforcement learning. Physics of Fluids, 32(5).

Wang, H.-N., Liu, N., Zhang, Y.-Y., Feng, D.-W., Huang, F., Li, D.-S., & Zhang, Y.-M. (2020). Deep reinforcement learning: A survey. Frontiers of Information Technology & Electronic Engineering, 21(12), 1726–1744.

Zhang, H., Chen, H., Boning, D., & Hsieh, C.-J. (2021). Robust reinforcement learning on state observations with learned optimal adversary. arXiv preprint arXiv:2101.08452.

Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D., & Hsieh, C.-J. (2020). Robust deep reinforcement learning against adversarial perturbations on state observations. Advances in Neural Information Processing Systems, 33, 21024–21037.

Zhang, Y., Tiňo, P., Leonardis, A., & Tang, K. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5), 726–742.

Issue

Vol. 9 No. 1 (2026): Proceedings of the European Conference of the PHM Society 2026

Section

Special Session: PHM for Maritime Safety

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:

As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.

First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##