Development of an end state vision to implement digital monitoring in nuclear plants

tained — all while scaling to both computational and storage demands. This paper summarizes an end state vision of how to shift from costly, labor-intensive preventative maintenance to cost-effective predictive maintenance.


INTRODUCTION
The nuclear industry is witnessing the possibility of premature closures of nuclear power plants (NPPs) for economic reasons, despite having excellent safety records and an industry-wide capacity factor of 93% (Bryce, 2020;Nuclear Energy Institute, 2021). This economic strain is due to large capital, operating, and fueling costs for legacy lightwater plants, with the major cost driver being operations and maintenance (O&M) (Bragg-Sitton, Boardman, Rabiti, & O'Brien, 2020). The average total generating cost (i.e., fuel + capital + operating) was $29.37/MWh in 2020 with $18.27/MWh of which stemming from O&M costs (Nuclear Energy Institute, 2021). The O&M costs related to nuclear energy are roughly three times as large as competing gas turbines (U.S. Energy Information Administration, 2022). By reducing O&M costs, nuclear energy can become more economically competitive with other energy sources and could potentially avoid premature closures.
The United States Department of Energy Office of Nuclear Energy's Light Water Reactor Sustainability (LWRS) program is aiding this effort by researching technologies to improve reactor economics and reliability, sustain safety, and extend operational lifetimes (Office of Nuclear Energy, 2022). The major technical areas facing the domestic nuclear fleet include plant modernization, flexible plant operation and generation (FPOG), risk-informed systems analysis (RISA), materials research, and physical security. NPP sites collect and store large volumes of data gathered from 1

Development of an End State Vision to Implement Digital Monitoring in Nuclear Plants
various equipment and systems. These datasets typically include plant process parameters, maintenance records, technical logs, online monitoring data, and equipment failure data.
The collection of such data affords an opportunity to leverage data-driven machine learning (ML) and artificial intelligence (AI) technologies to provide diagnostic and prognostic capabilities within the nuclear power industry. From a maintenance standpoint, this can be achieved by leveraging datadriven algorithms that better diagnose potential faults within the system (Lei et al., 2020). Improved model accuracy can help reduce unnecessary maintenance, thus lowering the costs associated with parts, labor, and costly planned, forced, or extended outages. From an operations perspective, cost savings can be generated by shifting from routine-based monitoring to online monitoring by taking advantage of advancements in sensors and wireless communication technologies. Advancements in data storage, mapping, management, and analytic would inform the transition from onsite-to cloud-based computing and storage services. Online monitoring would reduce the number of operator man-hours required for taking routine measurements, while cloud computing services would generate cost savings by reducing the amount of hardware needing to be purchased and maintained -all while scaling to both computational and storage demands. These technologies reflect modernization and digitization of plants and the shift from costly, labor-intensive preventative maintenance to cost-effective predictive maintenance.
The end state vision for the domestic light water reactor fleet, as presented in this paper, suggests utilization of wireless sensors and wireless technologies to reduce routine-based monitoring, application of online monitoring to improve diagnostic and prognostic capabilities for asset health assessment, data visualization to better inform the human in the loop (HITL) decision-making, and cloud computing to reduce the onsite computational and storage burden. This vision can be compared with prognostic and health management (PHM) systems used in other industries (e.g., the aerospace industry).
Online health monitoring and prognostics may be not well received in the nuclear field due to regulatory requirements, but have been used for decades in aviation (Ranasinghe et al., 2022) and other industries. Vehicle health monitoring was used in the 1970s by the National Aeronautics and Space Administration to monitor aircraft with remote data acquisition units (Woodard, 2003). These data were fed to a command and control unit capable of performing vehicle-level analysis using fuzzy logic. In the 1980s, data were used for decision making in regard to current and future conditions using integrated vehicle health management (Ranasinghe et al., 2022). Integrated vehicle health management integrated sensors with AI to diagnose problems and recommend solutions to better inform maintenance and repair activities (Benedettini, Baines, Lightfoot, & Greenough, 2009). Health management systems in aerospace have continued to advance, evolving into the current state-of-the-art integrated system health management (Ranasinghe et al., 2022). Integrated system health management is a distributed system for storage, processing, and AI, and is used to evaluate component conditions, detect anomalies, identify a system's current and future states, and then recommend responses to mitigate failures (Figueroa, Holland, Schmalzel, & Duncavage, 2006). These health management systems have been used in operation and can serve as a benchmark for nuclear. However, the nuclear industry faces several unique challenges when implementing a PHM system, including communications (e.g., wireless sensor monitoring), digitization of data infrastructure (i.e., digitization of written operator and maintenance logs), regulations, cybersecurity, and the need for HITL.
This paper focuses on addressing challenges associated with different aspects of enabling digital monitoring in NPPs -namely, the application of advanced sensor technologies (particularly wireless sensor technologies) and data-sciencebased analytic capabilities -in order to advance online monitoring, predictive maintenance capabilities, visualization strategies for enhanced decision making, and the utilization of cloud computing. These objectives were accomplished via a structured approach from data collection to decision making, as shown in Figure 1. Some of the solutions to specific challenges discussed in this paper include: 1. A general methodology for techno-economic analysis (TEA) of wireless sensor modalities for monitoring equipment status, especially in balance-of-plant systems within NPPs (Manjunatha & Agarwal, 2022). 2. Application of data-science-based techniques for decision making and discovery in order to develop and evaluate integrative algorithms for making diagnostic/prognostic estimates of equipment condition, using structured and unstructured heterogeneous data distributed across space and time (i.e., analytics at scale), including new data collected from NPP wireless sensors . 3. Identification of tools and visualization schema to present the right information to the right person in the right format at the right time (Mortenson, Miyake, & Boring, 2021). 4. Validation of the developed approaches and algorithms by using independent data from an operating plant . 5. Evaluation of cloud-based resource requirements to enable cost-savings related to data storage, system maintenance, and computation .
The research and development progress in addressing some of the above mentioned challenges lay the foundation for enhancing plant performance (efficiency gains and economic competitiveness), as well as for potential implementation of Figure 1. Steps for leveraging digital monitoring to enable cost-effective predictive maintenance for NPPs.
standard online monitoring, data analytics, and enhanced decision making via cloud-based solutions, sparking a transition from preventative maintenance to predictive maintenance.
The rest of this paper is arranged as follows: Section 2 presents the TEA of wireless sensor modalities; Section 3 details online monitoring, including diagnostic and prognostic capabilities; Section 4 covers data visualization; Section 5 details the benefits of cloud computing; and Section 6 concludes by highlighting the paper's significance then discussing future work and challenges.

WIRELESS SENSOR TECHNO-ECONOMIC FRAMEWORK
Due to cyber security, power/battery source, radiofrequency interference, and response times, the nuclear industry has yet to adopt wireless technology widely (Application of Wireless Technologies in Nuclear Power Plant Instrumentation and Control Systems, 2020). But to support online monitoring of plant components and assets, new wireless sensors and wireless communication technologies are required to replace old sensor technology and ensure a reliable flow of data. To enable the installation of wireless sensors on plant assets, a wireless architecture based on a distributed antenna system (DAS) to support multiple communication types was evaluated. Idaho National Laboratory studied a wireless network deployment strategy that would enable applications ranging from low-to high-power needs, low-to high-frequency ranges, and short-to long-range communica-tions, as seen in Figure 2 (Manjunatha & Agarwal, 2022). The whole network topology is predominantly operated using a DAS long-term evolution system or wireless local area network system, since they can enable the following: 1. High bandwidth and data transmission rates, with low latency 2. Prioritized data transmission, based on the required quality of experience/service 3. Provision for most of the wireless technologies to have either a Wi-Fi or DAS system as their back-end network (e.g., a Bluetooth device can connect to Wi-Fi in the back end to upload its data to the internet) 4. The system to act as a bridge between end devices (or other wireless technologies) and the internet or an outside network 5. Easy network maintenance by assimilating all the networking technologies into a single network architecture.
A one-size-fits-all solution is not an option, considering the diverse range of wireless technologies, each with different quality of service, latency, and bandwidth requirements (Barker, 2017;Manjunatha & Agarwal, 2022). Therefore, a heterogeneous network may be highly desired. network capacity and coverage. Then, the operational expenditures (OPEX) and capital expenditures (CAPEX) are taken into consideration to ensure that the technologies are cost efficient. Economic performance indicators for this model include: the total cost of ownership (TCO) and the net present value (NPV), which indicates the level of profitability of the deployed technologies. Network performance indicators include throughput and latency. Using this framework, different wireless network architectures can be compared to determine the best fit for a particular site. Christin, Mogre, and Hollick (2010) provided a detailed survey for quality of service and experience on wireless networks for industrial automation.
Additionally, wireless vibration sensors are utilized by plant sites to measure vibrations in rotating equipment (e.g., motors and pumps). These wireless vibration sensors could be uniaxial, biaxial, or triaxial. They can also feature different wireless connectivity characteristics, such as communicating information over a cellular gateway, enterprise Wi-Fi, or the 900 MHz band. Regarding the different wireless connectivity modes, cellular gateways and Wi-Fi show comparable performance, while the 900 MHz band offers a limited capability for transmitting a measured vibration spectrum or waveform. These vibration-monitoring instruments contain accelerometers that sense changes in the amplitude and frequency of dynamic forces that can impair rotating equipment. Identifying degradation at its onset by analyzing vibration measurements enables personnel to identify issues (e.g., imbalance, looseness, misalignment, or bearing wear) in assets prior to significant degradation and failure. This gives the plant more options and more time to respond, allowing for more effective resolutions.

ONLINE MONITORING
Once the data have been collected, they must be analyzed to produce diagnostic and prognostic results. The data for this research centered around the condensate pumps (CP) and condensate booster pumps (CBPs) within a boiling-water reactor's feedwater and condensate system (FCS). The FCS's primary purpose is to purify, preheat, and pump water from the main condenser back into the reactor vessel, making reliable operation of this system essential (NRC, 1998). The available sensor data were recorded for a 5-year period and include variables such as: 1. Generator gross load (MW) 2. Average feedwater flow rates (million gallons/second) 3. Temperatures from the feedwater pumps, CPs, CBPs, and associated motors ( C) 4. Pressures within the condenser, CPs, CBPs, and turbines (psig) 5. Current to the CP and CBP drive motors (amps).
These raw data signals were first preprocessed (i.e., via data cleaning, feature scaling, and pruning) (Li, Verhagen, & Curran, 2019), since raw data are subject to missing information, outliers, sensor and process noise, and different scales . This heterogeneous dataset, in combination with information from the computerized maintenance management system, was used to identify potential condition indicators for the CPs. The computerized maintenance management system contained maintenance logs that were considered the ground truth, as they recorded when plant components were repaired or replaced. In Figure 4, a temperature condition indicator based on the differences between the component's current temperature and the seasonal average was created to search for faults in the component's history. A fault (labeled in red) was found preceding the outage. This temperature condition indicator was combined with other process variables (e.g., gross load, flows, pressures, and currents) by using principal component analysis. With three principal components, 87% of the information (i.e., variation) within the dataset was captured. Support vector machines (SVM) were used to search year by year for similar faults in the CP component's history. The accuracy of the fault labels ranged from 81.4 to 99.7%. This variability stems from incorrectly labeled faults, as well as from other faults that were not in the training data or to which the condition indicators were insensitive. One way to improve this accuracy is by expanding the training dataset.
Another element of online monitoring is the prediction of future measurements and conditions pertaining to the FCS. For short-term forecasting, essential steps include problem formulation, data preprocessing, feature selection, model selection, model development, and model evaluation . The problem forumlation included selecting the response variables and determining the prediction horizon (i.e., number of time steps ahead to predict). The feature selection included variance inflation factors (VIF) and Shapley additive explanations (also known as SHAP values). Variance inflation factors can determine which variables contain high values of multicollinearity (i.e., highly correlated with one another). Multicollinearity can can lead to variability in regression analyses, as the strong relationship between the independent variables distorts the relationship with the dependent variable (Daoud, 2018). Remov- ing the highly correlated variables can eliminate this effect. SHAP values are based on a game-theoretic concept that considers each input feature as a "player" on a "team" of features that work together to influence the model's overall output (Booth, Abels, & McCaffrey, 2021). By determining the magnitude of the individual feature's influence on the model's output, SHAP values can be used as a feature selection tool by only selecting those features that significantly effect the model's outcome.
Three different ML methods (i.e., long short-term memory [LSTM] networks, support vector regression [SVR], and random forest [RF]) were employed to estimate two different forecast horizons (i.e., 1 hour and 1 day) for a pump temperature within the FCS . Figure 5 shows the 1-day-ahead pump temperature predictions using SHAP-determined inputs for LSTM, SVR, and RF models. The temperatures were anonymized to protect the NPP's identity. LSTM and SVR most closely adhered to the actual values being predicted, while RF showed significant deviations from time steps 700 to 1,000.
By accurately predicting future plant conditions and degradation, the appropriate maintenance actions can be scheduled so as to avoid unplanned downtimes and asset failures. More details about this short-term forecasting process can be found in .

VISUALIZATION
Following data collection and model building, one of the biggest challenges centers on the HITL decision-making and model output visualization. Though ML may automate much of the tedious manual taskwork involved in checking param- Figure 5. Predictions of pump temperature 1 day ahead, using the SHAP-determined input to LSTM, SVR, and RF models . eters for changes, digital monitoring must ultimately convey information to human users or operators so as to enable them to make timely decisions. Mortenson et al. (2021) reviewed the need for data visualization at NPPs, described the underpinnings of the human visual system (which enables humans to decipher such visualizations), offered system design recommendations unique to visualizations in support of digital monitoring, and described the current visualization platforms available.
Visual considerations should include colors and salience, information density, Gestalt principles, and guided attention. Color can provide useful information about the details of an object (e.g., the ripeness of a fruit), but can also draw your attention to important details (e.g., a red stop sign). Information density pertains to issues concerning the spacing, clutter, or amount of information being visualized (Van den Berg, Cornelissen, & Roerdink, 2009). This information crowding makes it harder to distinguish between objects or features in close proximity to one another. Gestalt principles are a series of common pattern perception techniques (e.g., similarity, symmetry, proximity, and continuation) useful for understanding how humans perceive visualizations. Guided attention investigates why humans can only focus on a few objects at a time within a given environment and how to effectively draw their focus to where it needs to be. Wolfe (2021) provided a detailed guide explaining visual processing and how understanding it can lead to effective visualizations and visual displays.
With regard to nuclear applications of data visualization, added emphasis should be placed on usability, saliency-ata-glance, and error prevention (Mortenson et al., 2021). The following is a non-exhaustive list of recommendations that could aid in these goals:

Consequential information should clearly indicate if
there is a significant deviation from an expected pattern.

Interdependent information should be grouped meaning-
fully (e.g., related signals may appear in the same visual box).
3. Dashboards displaying high-level information for monitoring plant conditions should also allow the ability to delve into additional lower-level information to provide context or specific historical data.
4. There should be a clear distinction between measured and predicted information, since all predictive information entails some level of uncertainty.
ML and AI algorithms can serve as powerful tools for predicting and diagnosing problems within a NPP environment, but unless this information is provided in a meaningful, actionable way to the HITL, the benefits of these technologies will not be fully realized.

CLOUD COMPUTING SERVICES
Through additional computing power and storage, cloudbased services afford many new opportunities for transitioning to an offsite centralized maintenance and diagnostics (M&D) center, but this also produces new challenges in terms of networking and security. Many of the features offered by cloud-based services (e.g., Microsoft Azure, Amazon Web Services, and Google Cloud) are similar to one another, and comparisons of these different service providers were provided by Wankhede, Talati, and Chinchamalatpure (2020). The services most relevant to NPPs (i.e., those pertaining to networking and security, storage and databases, and AI) were reviewed.
"Networking" covers all communications-related aspects occurring between onsite and cloud-based resources, including security, privacy, and redundancy measures. Connecting to online resources entails inherent security risks, as all the data must cross the internet. However, these data can be encrypted in a private tunnel as they are sent out. Encryption helps block attacks and prevent eavesdropping while traveling over untrusted networks. Many cloud-based services also have distributed denial of service protection. "Storage and databases" includes data storage, upgrading, patching, backups, and monitoring. Storing large amounts of data, as is required for NPPs, can necessitate the purchasing and maintaining of expensive equipment. This burden of purchasing and maintaining storage and databases can be offloaded onto the cloud-based service. These databases also provide opensource frameworks for big data clustering and autoscaling. By properly managing and clustering the data, useful insights can be obtained while simultaneously reducing overall computational costs. Cloud-based services also provide off-theshelf AI tools for automated ML, anomaly detection, computer vision, and natural language processing. When combined with their storage and databases, these AI services can provide an end-to-end ML lifecycle, from data labeling and preprocessing to building and training models to validation and deployment (Microsoft, 2022).
A more detailed description of the features, capabilities, and challenges of cloud-based services can be found in .

DISCUSSIONS AND CHALLENGES
Discussed here is a complete end-to-end vision for conducting online monitoring as data are collected wirelessly, stored, preprocessed, modeled, and visualized. Cloud computing offers potential cost-savings opportunities, as the responsibility of purchasing, updating, and maintaining IT-related equipment for storage and computing is offloaded to the cloud provider. Additional savings can be generated by having a single M&D center provide analysis for multiple plant sites, as the expenses of such analysis could then be shared among the participating sites. However, because these cost-savings opportunities must not come at the expense of availability or cybersecurity, added emphasis is placed on these features. Table 1 shows how each topic discussed in this paper contributes to the LWRS pathways. Many of these topics have cross-cutting research and contribute to multiple areas. All the topics contribute to plant modernization, as they utilize emerging technologies, digitization, and AI. FPOG activities are improved by adding infrastructure to enhance situational awareness as legacy plants may enter new operating states during load following, or by utilizing excess heat for secondary processes (e.g., high-temperature steam electrolysis) (Talbot et al., 2020). Online monitoring can improve RISA by minimizing measurement uncertainties, leading to enhanced safety and economic efficiencies (Office of Nuclear Energy, 2022).
The end state vision detailed in this paper for the digitization and online monitoring of LWR systems, with support from a centralized M&D center through cloud computing, can be compared to PHM frameworks proposed in aviation. Yang, Wang, and Zhang (2016) presented the idea of a PHM big data center utilizing cloud computing to data mine information from experiences and failures from flight data, in combination with other sources of information (e.g., airports, repair factories, and spare parts warehouse). Ultimately, this system would provide support for decisions made by industry, government, and aviation participants. In similar fashion, the nuclear industry could adopt a centralized big data approach to aggregate and data mine new solutions and support decisions made by those with a vested interest.
However, nuclear still faces several unique challenges when implementing an online-based PHM system. These challenges include communications, digitization of data infrastructure, cybersecurity, and the necessary HITL. Communications includes all sensors and systems that are currently wired, as well as new wireless sensors to be installed. The result is a mixed system that must be resolved before uploading data to the cloud system. Legacy LWRs currently contain mostly wired sensors (e.g., resistance temperature detectors and pressure detectors that penetrate the containment building) (Westinghouse, 2005), while newer sensors such as vibration, acoustic, and valve position monitors are primarily wireless.
Digitization of data infrastructure involves the incorporation of operator notes and maintenance logs that may be handwritten or preserved in a non-standardized format. Digitization of these notes may be associated with additional hardware requirements (Hunton & England, 2021). Natural language processing may also be necessary to extract salient information from the digitized operator notes and maintenance logs in order to incorporate it into ML and AI models.
With cloud computing, there is an added cybersecurity concern, but this research falls under one of the Department of Energy's Office of Nuclear Engineering research areas rather than a LWRS research initiative (Sandia National Laboratories, National Nuclear Security Administration, & Idaho National Laboratory, 2021). These cybersecurity concerns include security while uploading and downloading information to the cloud via the internet or other means, and also privacy, as this information is both sensitive and proprietary.
The current legacy LWRs must also inform the HITL, who ultimately takes the course of action. This may be mitigated in advanced reactors, as there is a push towards automation, but adequate and efficient informing of reactor operators will always be a concern in the current domestic fleet. ML and AI solutions must be able to accurately recommend courses of action and be transparent enough to explain how the ML model arrived at its decision when interrogated. Explainability of AI is crucial for widespread industry adoption of this technology. Dr. Agarwal was awarded the 2019 Presidential Early Career Award for Scientists and Engineers, as well as the 2016 Laboratory Director Early Career Achievement Award. He also received the American Nuclear Society Human Factors, Instrumentation, and Control Division's Ted Quinn Early Career Award in 2019. He has authored 80 peer-reviewed publications. He has one U.S. patent and has co-authored a book chapter. He is a member of the American Nuclear Society.

ACKNOWLEDGMENT
Nancy Lybeck, Ph.D. is the department manager for the Instrumentation, Controls, Data Science department at Idaho National Laboratory. She graduated from Montana State University with a Ph.D. in mathematics, followed by a postdoctoral fellowship in industrial mathematics at the Center for Research in Scientific Computation at North Carolina State University. Prior to joining INL in 2010, she worked for Sentient Corporation, where she served as a principal investigator for small business innovation research projects focusing on prognostic health management. Nancy continued her work in prognostic health management at INL, where she also provides modeling and analysis support for a variety of projects.