Optimal Maintenance Policy for Corroded Oil and Gas Pipelines using Markov Decision Processes

This paper presents a novel approach to determine optimal maintenance policies for degraded oil and gas pipelines due to internal pitting corrosion. This approach builds a bridge between Markov process-based corrosion rate models and Markov decision processes (MDP). This bridging allows for considering both short-term and long-term costs for optimal pipeline maintenance operations. To implement MDP, probability transition matrices are estimated to move from one degradation state to the next in the pipeline degradation Markov processes. A case study is also implemented with four pipeline failure modes (i.e., safe, small leak, large leak, and rupture). And four maintenance actions (i.e., do nothing, adding corrosion inhibitors, pigging, and replacement) are considered by assuming perfect pipeline inspections. Monte Carlo simulation is performed on 10,000 initial pits using the selected corrosion models and assumed maintenance and failure costs to determine an optimal maintenance policy.


INTRODUCTION
Corrosion is the primary failure mechanism of oil and gas pipelines, and among different corrosion mechanisms, pitting corrosion is the most common one (Heidary & Groth, 2021). Therefore, finding an optimal maintenance policy for oil and gas pipelines undergoing pitting corrosion is an essential aspect for their integrity management (Kishawy & Gabbar, 2010), to minimize the cost of unnecessary maintenance and unpredicted failures, and maximize the reliability of the pipelines.
An optimal policy should consider both the myopic and the Roohollah Heidary et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. long-term consequences, and MDP is a powerful modeling technique that considers both short and long-term risks. However, MDP has rarely been used to find optimal maintenance policies for systems that involve continuous degradation because of two main reasons: (1) A system's degradation state in a Markov process is not continuous but discrete, and (2) Assigning the probability transition between states is usually subjective and not a trivial task (Sánchez-Silva, Frangopol, Padgett, & Soliman, 2016). The main contribution of this paper is directed at the second challenge. In order to conduct an MDP analysis, we propose a procedure to define the transition probabilities and calculate the probability transition matrix. This will enable transition between states of a Markov process for pipeline degradation due to internal pitting corrosion.
In contrast to the traditional reliability methods that rely on population data, prognostics and health management (PHM) approaches use data of the specific system/component to predict the remaining useful life (RUL) (Tsui, Chen, Zhou, Hai, & Wang, 2015). PHM methodologies are categorized into physics-of-failure (PoF), and data-driven approaches (Imanian & . The prediction of PoF approaches is more reliable when these models are calibrated with reliable data. However, since they are based on some approximations and simplifying assumptions when the degradation process is complex (e.g., pitting corrosion), it isn't easy to estimate the model parameters and validate the results (An, Kim, Choi, 2015). Therefore, it is more practical to use data-driven approaches for integrity management of pipelines undergoing pitting corrosion (Shibata, 1996;Valor, Caleyo, Alfonso, Rivas, & Hallen, 2007). Pros and cons of different data-driven models to be used for pipeline corrosion are discussed elsewhere (Heidary, Gabriel, Modarres, Groth, & Vahdati, 2018), and they ranked based on their "appropriateness" and practicality. According to , Gamma process-based models (e.g., (Zhang & Zhou, 2014), (Heidary & Groth, 2020)) and Markov process-based models (e.g., (Valor et al., 2007;Caleyo, Vela´zquez, Hallen, Valor, & Esquivel-Amezcua, 2010)) are considered the best models to model the degradation of the pipeline due to pitting corrosion. An important advantage of the Markov process-based degradation models is that in conjunction with Markov decision processes, Markov process-based degradation models can be used for maintenance optimization to quantify both short-term and long-term risks/costs of a maintenance policy.
In this paper, a Markov process-based pitting corrosion model, a requirement to use MDP, is utilized to find an optimal maintenance policy. Since corrosion is a continuous degradation process, it is more appropriate to model the corrosion process with density transition rates than transition probabilities. However, extracting the probability transition matrix from density transition rates is challenging. In addition to assigning a proper probability transition matrix between states, another challenge lies in defining appropriate reward or cost value for each state-action pair. A practical framework is proposed to address these two issues in this work. Finally, an optimal maintenance policy is estimated for a case study by using this framework.

PITTING CORROSION MODELING BY MARKOV
PROCESS The stochastic process X(t), t 0 is a continuous-time Markov chain (Markov process), if for all s, t 0, and nonnegative integers i, j, x(u), 0  u  s: where X(t), represents the condition (state) of the system at time t.
Equation (1) is the Markovian property. A time-continuous, stochastic process is a Markov process if it satisfies the Markovian property. Specifically, given that the system is in state i at time s (X(s) = i), the future states (X(t+s)) do not depend on the previous states (X(u) = x(u), 0  u < s). In addition, if P {X(t + s) = j|X(s) = i} is independent of s, the Markov process is said to have homogeneous or stationary transition probability (Ross et al., 1996).
In Markov processes, the transition intensity, ij , between states i and j, is defined in such a way that the probability of transition between states i and j in an infinitesimal time interval t, is i t and the probability of more than one transition in this time interval is negligible.
Provan and Rodriguez (Provan & Rodriguez III, 1989) developed a non-homogeneous Markov process model to model pitting corrosion for the first time, and many researchers have used Markov processes to model pitting corrosion degradation since then. Some of those works are reviewed in (Valor et al., 2007;Caleyo et al., 2010), and the pros and cons of each of them are discussed in . In those works, the pipeline thickness is divided into some equally spaced states, and by defining the last state(s) as the failure state(s), the reliability or availability of the pipeline is calculated at each time.
The methodology that was proposed in (Timashev, Malyukova, Poluian, & Bushinskaya, 2008), is used in this research to extract density transition rates between states of a pipeline segment. (Timashev et al., 2008) has assumed that pitting corrosion follows a homogeneous pure birth Markov process. This means the transition is possible only from state i to state i + 1, and transition rates between states are timeindependent. The differential equations that describe the pure birth Markov process have the form of Kolmogorov's forward equations (Eq. (2)). Given the probability of being at each state at time t, P i (t), the homogeneous density transition rates between states can be calculated by solving these differential equations sequentially.
Here n i (t) is the number of those pits that at time t their maximum depths are in the i th state, and N represents the total number of pits. By solving Eqs. (2) and (3), the transition intensities, i , can be calculated.
When in-line inspection (ILI) data is available, the number of pits at each state at time t can be counted and P i (t) can be estimated by using Eq. (3). When inspection data for a specific pipeline is unavailable (non-piggable pipelines, piggable pipelines without inspection data), a generic corrosion growth model of the pipelines with similar operational conditions and material properties is needed. For the latest case, N initial pits are propagated through a corrosion growth model (e.g., Eq. (15)) with Monte Carlo simulation. The number of pits in each state at time t is counted and, by solving Eqs. (2) and (3), the intensity transition rates between states can be estimated.

MARKOV DECISION PROCESS
A Markov Decision Process is a 4-tuple (S, A, P, R) where S represents the finite set of the states, A is the finite set of the actions, P is the probability transition matrix, and R is the received reward or incurred cost per each state-action pair.
In the current context, at time step t, the maximum depth of a pit is assumed to be in the state s 2 S, and after taking maintenance action, a 2 A, the state of the system switches to a new state s 0 , with transition probability P (s 0 |s, a) and incurred cost equal to R(s, a). MDP is used here to find an optimal policy to make a trade-off between myopic costs and long-term costs of different maintenance actions that are discussed later. It is also assumed that the inspection is perfect (i.e., the system's state is investigated with certainty at each inspection time). One way to compare the effect of each policy is to compare the value functions (long-term reward or cost function) of different policies. Eq. (4) shows the value function for MDPs.
where is the discount factor.
This equation defines an infinite horizon problem, which is applicable for pipeline maintenance problems because, usually, pipelines are designed for an infinite horizon time, and the optimal policy should be time-independent. A variety of methods, e.g., value iteration, policy iteration, linear programming, are developed to solve MDPs (Billinton & Allan, 1992).
In the following two subsections, assigning a proper probability transition matrix and reward function are discussed.

Probability Transition Matrix
To find the probability of being at each state at time t, the matrix multiplication method that is commonly used in discretetime Markov chains can be used in continuous-time Markov processes.
For discrete-time Markov chains, the probability of being in each state at the n th time step, P (n), can be obtained by matrix multiplication method ( Eq. (5)) In this equation, P (0) is the initial probability vector which indicates the probability of being in each state at the starting time of the process. PTM is the probability transition matrix for a one-time step, and n is the number of the time steps.
For continuous-time Markov processes, instead of one time step, a probability transition matrix can be defined for a small enough interval t, in which the probability of more than one transition is negligible. Proper selection of this interval plays a critical role, and comprehensive knowledge of the system behavior is required. When this knowledge is not available, an approximation can be attained by selecting an initial value for t and then decreasing this value until the difference between the results of two consecutive values of t are within an acceptable tolerance (Billinton & Allan, 1992). Since in Markov processes, the amount of time that this stochastic process spends in a state before making a transition into a different state is exponentially distributed (Ross et al., 1996), the number of transitions in this interval between the two states follows a Poisson distribution. Therefore, the initial value for t, can be estimated by using Eq. (6).
Where P (N ( t) = n) represents the probability of occurrence of n transitions in a finite time interval of length t. By solving Eq. (7), the initial value for t can be estimated. Based on this equation, the probability of more than one transition in the time interval t is equal to one minus probability of zero or one transition in this time interval.
where P negl is a subjective value that if P (N ( t) > 1) < P negl , it can be assumed that P (N ( t) > 1) is negligible. m represents the maximum transition rate between states in this equation. The maximum transition rate is selected because it gives the minimum infinitesimal interval applicable for other interstates. The probability transition matrix (PTM) for this time interval, P T M Int , is calculated using Eq. (8).
This equation indicates that for the pure birth Markov process, the probability of transition from state i to state i + 1 is i t, the probability of staying in the state i is 1 i t, and the other probabilities are zero.
After finding probability transition matrix for the time interval t, this matrix can be multiplied by itself n times Rupture (Billinton & Allan, 1992). The P T M Int is multiplied by itself as is given in Eq. (9) to find probability transition matrix in each inspection interval.
P T M = P T M n Int , n = decision Interval/ t (9) This procedure is used in this work to build a bridge between the Markov process-based pitting corrosion models and the MDP to find an optimal maintenance policy for degraded pipelines due to pitting corrosion.

Risk-based Decision Making by Using MDP
Another important factor in using MDP is to define a proper cost matrix (R(s,a)), which indicates the incurred cost given a state-action pair. Defining this matrix without considering different cost aspects of the system performance makes the MDP analysis pointless or even misleading. This work uses a risk-based cost estimation framework to investigate the incurred cost of different failure modes of oil and gas pipelines under pitting corrosion.
According to (Valor, Caleyo, Alfonso, Vidal, & Hallen, 2014), there are four potential failure modes for pipelines due to pitting corrosion: safe, small leak, large leak, and rupture. Table 1 shows different conditions that lead to each of these failure modes.
In this table, d represents pit depth, thk is pipe wall thickness, P op is operating pressure, P ft is failure pressure for a defect of limiting depth d = 0.0009t, P f is failure pressure for a defect of depth d, and P R is rupture pressure for a defect of depth d. Using 0.8thk instead of thk is according to typical industry practice (Zhou, 2010).
Different failure pressure prediction models like B31G, B31G modified, RSTRENG and PCORRC are available. The PCORRC is used in this paper as an accurate model that needs minimum defect geometry information (Zhou, 2010) and is given by Eq. (10).
Here u represents the ultimate tensile strength of the pipe's material, D is the pipe's diameter, l is pit length in the longitudinal direction, and is the model error with mean equal to 0.97 and standard deviation equal to 0.105 that is added to this model in (Leis, Stephens, et al., 1997). The rupture pressure model that is proposed in (Kiefner, Maxey, Eiber, & Duffy, 1973) is used in the current paper and is shown in Eq. (11).
where Q = Monte Carlo simulation is used to estimate the probability of occurrence of each failure mode, given the state of the Markov model (maximum depth of the pit). Using this simulation, variability of different parameters, e.g., ultimate tensile strength, pipe diameter, and pipe thickness, are considered in failure and burst pressure calculation.
The next step is to find the incurred cost value for each stateaction pair, R(s, a). For this purpose, applicable maintenance actions for a specific pipeline system must be defined based on the available equipment and knowledge.
The incurred cost for each state-action pair can be estimated by Eq. (12).
where I represents the inspection cost which is assumed to be independent of the state of the system; M s,a represents the maintenance cost (e.g., maintenance equipment, shutdown cost during maintenance operation) which depends on the state and the action; and C s represents the risk-based failure cost given the state of the system. By definition, the total expected risk-based failure cost can be estimated by Eq. (13) (Modarres, Kaminskiy, & Krivtsov, 2016).
Where F is the number of the failure modes (in this case four, i.e., safe, small leak, large leak, rupture), P (Mode f |s) is the probability of occurring of f th failure mode, given the state of the system, and C f is the failure cost given the failure mode (e.g., loss of production cost, loss of life or property cost, environmental cost). The flowchart of the proposed approach is shown in Figure 1 and its application is illustrated in a case study as following.

CASE STUDY
This section implements the proposed framework in a case study to find an optimal maintenance policy for a pipeline under internal pitting corrosion. The characteristics of this pipeline are given in Table 2.
Where D max (t) is the maximum depth of the pit, k is the proportionality factor, t 0 is the pitting initiation time, and ↵ is the exponent. k and ↵ are functions of operation parameters (e.g. pH, temperature). This model was used in (Ossai et al., 2015) to find the relationship between operational conditions and depth of internal pits. The extracted model for the pit depth growth in (Ossai et al., 2015) is used as the inputs for this case study, which is given in Eq. (15). Here 0 represents the intercept, j is regression coefficient and y j is j th predictor variable (i.e., operational parameters) that affects internal pitting corrosion. A summary of the extracted parameters and coefficients in (Ossai et al., 2015) are given in Table 3 and Table 4.
The number and size of the pits on a pipeline depend on different parameters (e.g., parameters in Table 3). The number of pits may vary from a few to thousands of pits (Dann & Maes, 2018;Valor et al., 2015). In this study, the thickness of the pipe is divided into eight equally spaced states, and by applying Monte Carlo simulation, 10,000 initial pits are propagated based on the growth model given in Eq. (15). Then, the probability of being in each state at time T=5 years is calculated by using Equation (3). Probability transition rates between the states that are calculated by solving differential Equations (2) that are given in Table 5.
It is crucial to select a proper value for time T in this procedure. The minimum value for T is when the pits' depths are distributed over all states. The later in the lifetime of the pipeline this procedure is used, the more accurate the results would be (Timashev et al., 2008). Therefore, the proposed method in this paper is more beneficial for the aged pipelines that have some field inspection data available. This is the case  for a majority of the currently in-operation pipeline systems.
The small enough interval t (i.e., the probability of more than one transition is negligible in this interval) for this example is calculated by using the procedure that is explained in Section 3.1. By using Eq. (9) and assuming that the decision interval is equal to 1 year, the estimated PTMs for assumed actions are estimated.
Some maintenance actions are commonly used to mitigate internal corrosion in oil and gas pipelines, including pigging, adding corrosion inhibitors, biocides, internal coating, cladding, cathodic protection, and process optimization.
Among these methods, the application of corrosion inhibitors to mitigate internal corrosion is the most-trusted method in the oil and gas industry and is necessary for the use of carbon steel (Papavinasam, 2013). An expert team should assess the applicability of each action for a specific pipeline, and from among the identified appropriate methods, the most cost-effective strategy would be determined by the MDP analysis.
For this case study, four maintenance actions are assumed: Do nothing, add corrosion inhibitor, pigging, and replacement. P T M Int for "do nothing" action, is calculated by using obtained i , given in Table 5, and Eq. (8). Then, PTM for "do nothing" is obtained by using Eq. (9). All rows of the PTM for "replacement" are equal to the first row of PTM for "do nothing". It is more complicated to find an accurate pitting corrosion rate model in the presence of a maintenance action that directly affects the pitting corrosion rate. For some of these maintenance actions, there are instructions to simulate the effect of inhibitors on the corrosion rate in the laboratory (Papavinasam, 2013). For "adding corrosion inhibitor" and "pigging" actions, PTMs are obtained by modifying PTM for "do nothing" subjectively. To have more reliable results, field or lab corrosion data in the presence of these actions are required to estimate appropriate PTMs.
In this study, it is assumed that the MC s,a (maintenance cost given state and action) values are state-independent. However, on some occasions, it might be necessary to consider the dependency between the state of the system and the cost of the maintenance action. Maintenance costs and failure mode costs that are assumed in this work are shown in Table 6 and  Table 7 respectively.
Based on this assumption, an optimal maintenance policy for each section of the pipeline is calculated by both value iteration and policy iteration methods as shown in Table 8. Comparing the cost of this policy with the cost values of several arbitrarily selected policies confirms that the MDP identified "optimal solution" as the minimum cost value. According to these results, the optimal maintenance policy is "do nothing" when the maximum depth of pits is in state 1, "add corrosion inhibitor" when the maximum depth of pits is in state 2, 3, 4, or 5, "pigging" when the maximum depth of pits is in state 6, and "replacement" when the maximum depth of pits is in state 7 or 8.

CONCLUSIONS
An approach is proposed in this paper to identify the optimal maintenance policy for aged oil and gas pipelines undergoing pitting corrosion by using Markov Decision Process (MDP) modeling. Despite different Markov process-based corrosion rate models that have been proposed for modeling pitting corrosion, MDPs have rarely been used to find optimal maintenance policies for corroding pipelines. The main challenge in using MDPs for pipelines maintenance optimization is that estimating the probability transition matrix from the density transition rates between states is complex. This paper modeled the corrosion process by a pure birth Markov process.
The resultant density transition rates are used with the matrix multiplication method to find the probability transition matrix needed for the MDP analysis.
In addition, a risk-based cost estimation framework is used to find the expected failure costs. A case study is performed with four possible failure modes caused by pitting corrosion Table 8. Resulting optimal maintenance policy.
State 1 2 3 4 5 6 7 8 Optimal action 1 2 2 2 2 3 4 4 (small leak, large leak, and rupture). Also, four maintenance actions are assumed to apply to a specific pipeline. And an optimal maintenance policy is determined by assuming hypothetical costs for these failure modes, maintenance actions, and inspections. This paper proposed a novel approach to fill the gap between the Markov process-based corrosion rate models and Markov decision process for integrity management of the degraded oil and gas pipelines due to pitting corrosion. Performing a sensitivity analysis in terms of the cost and maintenance plan will be an aspect of the future work of this research.
It is also worth mentioning that the lack of real ILI data is a big challenge in the PHM of the oil and gas pipelines. Hence, we highly recommend that the owners of oil and gas pipelines and operating companies collect the operational conditions and inspection data and make them available in the public domain to make it possible for the researchers to validate their new corrosion degradation models. This collaboration finally leads to a decrease in the number of unexpected failures and unnecessary maintenance of oil and gas pipelines.