Data-Driven Diagnostics and Prognostics for Modelling the State of Health of Maritime Battery Systems-a Review

Battery systems are becoming an increasingly attractive alternative for powering ocean going ships, and the number of fully electric or hybrid ships relying on battery power for propulsion and maneuvering is growing. In order to ensure the safety of such ships, it is of paramount importance to monitor the available energy that can be stored in the batteries, and classification societies typically require that the state of health of the batteries can be verified by independent tests – annual capacity tests. This paper discusses data-driven diagnostics for state of health modelling for maritime battery systems based on operational sensor data as an alternative approach. It presents a comprehensive review of different datadriven approaches to state of health modelling, and aims at giving an overview of current state of the art. Furthermore, the various methods for data-driven diagnostics are categorized in a few overall approaches with quite different properties and requirements with respect to data for training and from the operational phase. More than 300 papers have been reviewed, many of which are referred to in this paper. Moreover, some reflections and discussions on what types of approaches can be suitable for modelling and independent verification of state of health for maritime battery systems are presented.


INTRODUCTION
There is currently a significant push for emission reduction and a change to more environmentally friendly technologies for maritime transport. Electric or hybrid ships using batteries Erik Vanem et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
is an attractive alternative for many shipping segments with significant environmental benefits and large potential for fuel, cost and emission savings.
The safety of battery-powered ships is important. Fire and explosion are obvious risks, but another central aspect is ensuring that the available energy stored in the batteries is sufficient to cover the power demand. Loss of propulsion power in a critical situation can lead to serious accidents such as collision or grounding. Therefore, a reliable estimation and prediction of the actual available energy of a battery is crucial.
Battery systems are ageing, meaning that the energy storage capacity degrades by calendar time and by charge/discharge cycles. The ageing process affects both the amount of charge that can be stored and the performance of the power delivery. For ships relying on energy from onboard battery systems, it is important to ensure that the capacity of the battery system is sufficient for the safe operation of the vessel at all times. Thus, accurate evaluation and verification of the capacity and performance of maritime battery systems is crucial to safe and sustainable operation of battery powered ships. It is noted that other aspects of battery degradation may be equally important. For example, degradation does not only affect the capacity, but also fire safety and thermal runaway properties are influenced by degradation (Geisbauer, Wöhrl, Mittmann, & Schweiger, 2020;D. Ren et al., 2019).
Due to its safety criticality, class societies typically require annual validation testing of battery State of Health (SOH) for ships utilizing battery systems for propulsion or manoeuvring purposes. There are many challenges with this approach, and data-driven approaches to SOH monitoring and prediction are believed to be attractive alternatives. From a practical point of view, the annual capacity test is time consuming and typically requires that the ship is taken out of operation for one full day per year. Moreover, the accuracy of the test is questionable due to several factors influencing the results, such as variability in loads, temperatures and Depth of Discharge (DOD). Maritime battery systems are typically designed for a 10-year lifetime while the ships are designed for 25-30 years. When battery systems are approaching their end of useful life (EOL) reliable estimation of SOH will become much more important and making correct decisions on remaining useful life (RUL) will have great financial and safety implications.
This paper aims at describing the state of the art in data-driven methods for SOH estimation of maritime battery systems. It is based on a thorough literature survey, presented in (Vanem, Bertinelli Salucci, Bakdi, & Alnes, 2021), and outlines various approaches reported in the scientific and engineering literature for utilizing sensor data to estimate the effect of degradation on the available capacity of such battery systems.

Condition Monitoring for Battery Systems
With a rechargeable battery system, the amount of energy available at all times will vary continuously as the battery is repeatedly charged and discharged, and the state of charge (SOC) is a measure of the extent to which the battery is charged relative to its capacity. That is, a fully charged battery will have SOC = 100% and a fully discharged battery will have SOC = 0%. The State of Latent Energy (SoLE) is a a similar measure of the amount of available useful energy in the battery (Rozas, Troncoso-Kurtovic, Ley, & Orchard, 2021) that do not rely on a normalization constant.
The capacity of a battery to store energy will typically degrade over time, and the state of health (SOH) is a measure of the battery's capacity relative to its nominal capacity, that is, the initial capacity when the battery is new. Formally, the State of Health of a battery can be defined as where C Available denotes the available capacity of the battery and C N ominal refers to the nominal capacity. It should be noted that alternative definitions of SOH exist, for example based on internal resistance, and that these are generally not identical. Although there is a correlation between capacitybased SOH and resistance-based SOH it is important to be aware that SOH is not unambiguously defined. In this paper, the main focus is on available capacity and unless otherwise noted, SOH should be taken to mean capacity-based state of health as defined in Eq. (1).
The cycle life or battery life of a rechargeable battery refers to the number of full discharge-charge cycles the battery can experience before its end of life, and many different factors influence the actual cycle life, including the rate and depth of the cycles and temperature. Alternatively, battery life can sometimes be described in terms of cumulative discharge (total amount of charge delivered by the battery over its lifetime) or equivalent full cycles (summation of partial cycles as fractions of full charge-discharge cycles).
Currently, all maritime battery suppliers are required to have a SOH estimation algorithm and to verify the SOH annually through in-situ capacity testing. As ship-to-shore connectivity has immensely improved over the past few years it is natural to evaluate whether a sensor-based monitoring system can both reduce downtime for the operator and improve the quality of the SOH verification.
Condition monitoring systems typically include diagnostics and prognostics. Within such a framework, state of health estimation corresponds to the diagnostics part where reliable estimation of state of health reflects the energy storage capacity of the battery at any given time. This would be influenced by the operating history of the battery system. Prognostics in this context would amount to predicting the remaining useful life of the battery or the time until the battery needs to be replaced. This would require some threshold to be specified for when the battery reaches its end of life, which could be in terms of SOH below a specified limit, as well as some assumptions on the future operation of the system and should predict future degradation trends based on this.

Battery Technologies and Terminology
Batteries have been around for a long time and battery technologies are continuously being developed since the first electrochemical batteries were invented in the late 18th century. Fundamentally, an electrochemical battery cell must consist of an inner ionic channel allowing for transport of ions, two materials with interfaces where the exchange of electrons and chemical reactions can occur and an outer electrical channel for transport of electrons. The materials where exchange of electrons occur are referred to as the positive and negative electrodes and the ionic channel are referred to as the electrolyte. Currently, the most widespread type of rechargeable batteries are lithium-ion batteries, and there exist a range of different chemistries with different characteristic. An overview of maritime and offshore battery systems can be found in (DNV GL, 2016).
For the purpose of establishing data-driven models to estimate state of health and predict remaining useful life, it should be acknowledged that the different chemistries may have very different characteristics with regards to how different factors influence the degradation. Furthermore, battery cells can be of different forms and types, and the most common lithium-ion cell types for maritime applications are cylindrical, prismatic and soft pouch cells. These form factors have different performance characteristics.
The C-rate is a measure of the rate of which a battery is being charged or discharged and is defined as the current through the battery divided by the current needed to discharge the battery's nominal rated capacity in one hour. It has the unit 1 /h (per hour).
A battery consist of one or more connected electrochemical cells. Cells can be connected in series in order to increase the electric potential of the battery or connected in parallel in order to increase the capacity of the battery. More complicated configurations and architectures may involve several cells in different combinations of series and parallel connections.There may be uneven loads and temperatures on the different cells, and different cells within the same battery may experience different degradation trends. Often, lab testing is carried out on single cells and operational sensor data from a battery system typically include measurement both at cell and system level.
A rechargeable battery will be operated in cycles consisting of charging, discharging and rest periods throughout its operational life. The charging and discharging cycles could be at different rates (C-rates) and depths (range of SOC), and the rest periods can occur at different state of charge. All of these are influencing the degradation of the battery. During normal operation the batteries will often be operated under variable conditions, with constantly varying rates and depths of the cycles and rest periods, and sensor measurements of battery voltage, current and temperature over time can describe the operational profile Modern batteries are equipped with a battery management system (BMS), which are very important for the safe operation of the battery, and also for optimizing the use of the battery (Weicker, 2014). A BMS should monitor the state of a battery at all times and protect the battery from operating outside its safe operating area to prevent accidents such as explosion or thermal runaway. It collects sensor measurements of basic parameters such as voltage, current and temperature and uses these to calculate and monitor various derived parameters and quantities such as state of charge and state of health.
Some factors that influence the degradation of a battery are well known, even though the degradation mechanisms may be different for different battery types and chemistries. The degradation leads to loss of capacity, power fade and increase in internal resistance. As a result of this, the terminal voltage and range of state of charge will be reduced as the battery degrades. Typically, degradation and capacity loss are ascribed to calendar ageing and cyclic ageing effects. A list of factors influencing battery health presented in (Balagopal & Chow, 2015) include temperature, charging and discharging cycles, depth of discharge, overcharging charge/discharge rate and calendar ageing. An overview of important battery degradation mechanisms as well as their causes and effects are also given in e.g. (Vetter et al., 2005;Birkl, Roberts, McTurk, Bruce, & Howey, 2017).
The degradation may not be similar in the beginning of life (BOL) and when approaching end of life (EOL). Typically, one expect to observe a so-called knee-point in the degradation curves, where a sudden change from relatively moderate degradation to a more aggressive degradation occurs towards the EOL. Maritime batteries should typically be replaced before a knee-point occurs in the degradation curves, to avoid swiftly deteriorating battery capacities during operation.
The availability of high quality data with sufficient accuracy, resolution, relevance and completeness is important for developing and training data-driven models for state of health estimation. There are essentially two different ways of obtaining such data, i.e. data collected from measurements of battery systems during operation and data collected from laboratory experiments. Some limitations of actual measurements are that the availability of sufficiently long time series are scarce and the fact that one may have only partial information and control of the conditions under which the data is collected. Batteries are also typically replaced before their end of life, so data from the critical period when the batteries approach their end of useful life will often not be available. On the other hand, lab data can be collected under controlled, often idealized conditions, where for example temperatures and loads can be kept constant throughout the experiments. However, this may not be very representative for realistic operation of the batteries, with variable loads and operating conditions.

Classification Rules for Electric Ships
Electrical power systems have been used onboard ships for a long time, and recently fully electrical or hybrid ships depending on battery power for propulsion have become attractive for many ship segments. Ocean going ship are subject to classification rules (DNV GL, 2020a), and DNV has an additional class notation, BATTERY, for battery powered vessels (DNV GL, 2020b). The Battery(Power) class notation is required for all ships -all-electric or hybrid -that relies on battery power for propulsion and the Battery(Safety) notation applies to all vessels with lithium-ion battery systems with an aggregated rated capacity of more than 20 kWh and not having the Battery(Power) notation. The annual capacity test is a requirement of the Battery(Power) notation.

LITERATURE SURVEY ON DATA-DRIVEN MODELS FOR SOH ESTIMATION
This paper aims at presenting state-of-the-art in data-driven models for state of health of maritime battery systems. The amount of literature on this topic is enormous and it seems an impossible task to cover all relevant papers and reports in the academic and engineering literature in detail. Notwithstanding, the literature survey presented herein are believed to give a fair overview of different approaches to data-driven modelling of the condition of batteries, with an emphasis on the more recent literature and on the various overall approaches that can be taken.
In the following, a high-level review of methods proposed in the scientific literature will be given, with a focus on datadriven methods based on sensor measurements from batteries in operation. An effort is made to group models in a few main categories, although some proposals may include elements from various categories. Typically, methods are grouped into experimental methods such as various forms of measurements, model-based methods relying on electrochemical or equivalent circuit models and pure data-driven methods. However, the distinction is not always crisp, and a combination of techniques will typically be employed.  Gandiaga, et al., 2016;Ungurean, Cârstoiu, Micea, & Groza, 2017;Huixin, Qin, Li, & Zhao, 2020;Xiong, Li, & Tian, 2018;Lipu et al., 2018;Lucu, Martinez-Laserna, Gandiaga, & Camblong, 2018) for recent reviews on capacity estimation. Several review studies focus on prognostics and RUL estimation of lithium-ion batteries, see e.g. (Y. L. Wu, Fu, & Guan, 2016;Su & Chen, 2017;Saha, Goebel, & Chrisophersen, 2009).

Direct Measurement Techniques
Different approaches for more or less direct measurements of state of health exist and are proposed for online SOH estimation. Some of these can be based on continuous measurements such as time series of currents, voltages and temperatures, whereas others are based on measurements collected during particular experiments or procedures (Karlsen, Dong, Yang, & Carvalho, 2019). For example, the annual test currently required for maritime battery systems used for propulsion utilizes a coulomb counting technique and a controlled charging/discharging procedure. This is one approach to SOH verification, but the need for specific charging and discharging cycles under controlled environments, with con-stant temperature and C-rate, means that normal operations need to be disrupted for a period of time. Other measurement techniques also exist, see e.g. a more comprehensive overview in (Barai et al., 2019). Ideally, methods that can be used based on continuous measurements of variables that are routinely collected under normal operations without the need for specific instrumentation or procedures are preferable.

Coulomb Counting
Coulomb counting, also referred to as current integration method, integrates the current to or from the battery during a full cycle to determine the capacity directly, according to the basic relation where Q is the capacity, I(t) is the current at time t and t 0 and t 1 refers to the times of SOC = 0% and SOC = 100%, respectively. That is, the current is integrated over a full cycle from full to empty (or from empty to full) to count how much electric charge the battery can store. Often, the equation above can be modified by also including the Coulombic efficiency, which is tacitly assumed to be unity in Eq.
(2). One practical problem with this approach is that it requires a full charge/discharge cycle to be able to estimate the maximum capacity and this is hardly ever experienced in actual normal operations. Moreover, the measurements need to be performed under controlled conditions, with constant, typically low, C-rate and a specific ambient temperature and is therefore not directly applicable as an online method. In addition, subjecting the battery to full cycles between 0% and 100% may contribute to accelerated degradation and such tests risk shortening the lifetime of the battery. The annual test for maritime battery systems are based on Coulomb counting and therefore needs to take the vessel out of service to perform a series of controlled charge and discharge cycles.
Capacity estimation can possibly be based on coulomb counting of deep cycles (not necessarily full), at reasonably homogeneous conditions with respect to C-rates and temperatures. The relationship between total capacity, Q and state of charge at times, t 1 and t 2 is as follows, where also the Coulombic efficiency η, is included.
Note, however, that for this approach to be useful there is a need for accurate and reliable SOC estimates.
Estimation of SOH based on Coulomb counting of partial cycles, is proposed in (Stroe, Knap, & Schaltz, 2018;Q. Yang et al., 2017), indicating that the reduced voltage range measurements are likely to underestimate the capacity fade, see also (Meng et al., 2019). Coulomb counting are also often pro-posed to be used together with other data-driven or modelbased techniques. It is possible to include a current correction term in the Coulomb counting procedure to account for the fact that capacity generally decreases as discharge current (C-rate) increases (Z. Deng, Yang, Cai, & Deng, 2017). The Peukert equation describes the relationship between the discharge current (I) and the discharge time (t) by stating that I k t is a constant, where k is the Peukert coefficient (Doerffel & Sharkh, 2006;Z. Deng, Yang, Cai, Deng, & Sun, 2016). However, this requires the battery to be discharged at a constant C-rate throughout the cycle (Doerffel & Sharkh, 2006), and also at constant temperature. Extensions of the Coulomb counting method are discussed in (Gismero, Schaltz, & Stroe, 2020)

Hybrid Pulse Power Characterisation (HPPC) and
Electrochemical Impedance Spectroscopy (EIS) HPPC and EIS are methods to measure the electrochemical response of certain inputs. HHPC measures the cell voltage response to short high-current charge/discharge pulses and EIS measures the frequency response of the battery by measuring the impedance over a range of AC input at different frequencies. It yields a impedance spectrum from which it is possible to estimate various battery characteristics, such as charge transfer resistance, capacitance and ohmic resistance as different frequencies are associated with different mechanisms in the battery, and to relate this to state of health (Blanke et al., 2005;Tröltzsch, Kanoun, & Tränkler, 2006;Pérez, Benavides, Rozas, Seria, & Orchard, 2018). A passive impedance measurement technique is proposed in (Bohlen, 2008) to alleviate the need for specific hardware implementations, allowing the impedance spectrum to be estimated from arbitrary excitation signals by way of digital filters. See also (Howey, Mitcheson, Yufit, Offer, & Brandon, 2014) for an example of online EIS measurements, and . An extension of the EIS to study also higher order harmonics and nonlinear responses is proposed in (Harting, Wolff, Röder, & Krewer, 2017;Harting, Schenkendorf, Wolff, & Krewer, 2018).
EIS measurements are used together with model-based approaches in (Kuipers et al., 2020;X. Wang, Wei, & Dai, 2019) and with data-driven approaches in e.g. (Y.  for SOH estimation and RUL prediction.

Incremental Capacity Analysis (ICA) and Differential Voltage Analysis (DVA)
IVA and DVA measure the change in charge (Q) and voltages (V) during charging/discharging and estimates the gradient curves, dQ/dV and dV /dQ, respectively, to determine changes in electrochemical properties. Such curves will typically exhibit features like plateaus and peaks that can be associated with different mechanisms and phases in the battery and changes in these features can be ascribed to battery degra-dation. It is also possible to apply this method for partial charging curves, which is a huge advantage for online monitoring. However, two major challenges with this approach for online monitoring based on real-time sensor data is that a constant and low current is typically needed in order to acquire accurate curves and the differentiation of noisy, discrete data to obtain the IC (dQ/dV ) and DV (dV /dQ) curves (Feng et al., 2020). An example of a charge-voltage curve and the corresponding IC (dQ/dV ) curve is shown in Figure 1, illustrating that flat parts of the charge-voltage curve appears as peaks in the dQ/dV curve. Such curves may be estimated in different ways, including curve fitting, parametric models and machine learning methods Y. Li, Abdel-Monem, et al., 2018;He, Wei, Bian, & Yan, 2020). Various smoothing techniques can also be applied to obtain smooth curves from noisy measurements (Jiang, Dai, & Wei, 2020 (Weng, Cui, Sun, & Peng, 2013), linear models (Jiang et al., 2020). A current interrupt technique is introduced to evaluate the cell resistance in order to account for the effect of different C-rates in ICA in (Fly & Chen, 2020).
A somewhat similar method based on charge and discharge data estimates the probability density function of voltages during a discharge cycle by way of kernel density fitting of discrete voltage measurements (Feng et al., 2013). This method is referred to as the pdf-method and is a simplified variant of ICA where the need to fit a curve to the charge/discharge data are eliminated. Similarly as with ICA, the probability density function will exhibit clear peaks around voltage plateaus, that is, voltages that occur more fre-quently during a charge or discharge cycle, and the idea is that the state of the battery can be inferred by these peaks.A fusion of Coulomb counting and differential voltage analysis is proposed in (S. Zhang, Guo, Dou, & Zhang, 2020) as a model-free approach to obtain SOH estimation from constant current discharge data.

Other Direct Measurement Techniques
A differential thermal voltammetry approach is proposed in (B. Wu et al., 2015), where voltage and temperature measurements in galvanostatic operations are used to model state of health. This allows shorter measurement time than slow rate cyclic voltammetry analysis (Stiaszny, Ziegler, Krauß, Schmidt, & Ivers-Tiffée, 2014;Stiaszny, Ziegler, Krauß, Zhang, & Ivers-Tiffée, 2014). A differential heat analysis based on measuring gradient heat flux and temperature after discharge is proposed for SOH estimation in (Murashko et al., 2019). State of health estimation based on the Ampere-hour throughput -voltage curve and fitting a parametric curve to these is proposed in (Le & Tang, 2011).

State-Space Models with Observers
A different approach to battery modelling relies on models that approximate the battery dynamics. Typically, these may be referred to as state-space models where sensor data can be used to estimate model parameters corresponding to underlying unobservable states using so-called observers such as various variants of the Kalman filter or particle filters. Two main classes of such models are equivalent circuit models and electrochemical models. Such models may also be combined with experimental methods or direct measurements and data-driven methods to estimate state parameters and state of health. Pure data-driven models can also be established, such as the local model network (LMN) presented in (Hametner, Jakubek, & Prochazka, 2018), where a set of local regression models are used to establish a non-linear battery model. A Brownian motion model with drift was proposed in (Dong, Yang, Wei, Wei, & Tsui, 2020) to model battery degradation as a hidden state model based on observed health indices (that could be capacity loss or resistance increase).
Equivalent circuit models (ECM) describe the voltage-current characteristics of a battery by a model of an electrical circuit with different elements such as resistors and capacitors in different series-and parallel configurations. Having established a ECM for the battery, the state of the battery is described by the battery model parameters. These are typically unobserved, but may be estimated based on measurements using various optimization techniques such as different variants of least squares methods. Various forms of constrained and regularized optimization may be employed to avoid unreasonable parameter estimates (Tian, Wang, Chen, & Fang, 2020) and forgetting factors can be used to avoid saturation problems by giving less weight to previous data compared to more recent ones (L. Chen, Lü, Lin, Li, & Pan, 2018). Model parameters are typically changing dynamically over time and observers such as Kalman filter and particle filters can be used to dynamically update model parameters and unobserved model states. Extensions of the Kalman filter to handle non-linear state transition and observation models include the extended Kalman filter and the unscented Kalman filter (see e.g. (Plett, 2004a(Plett, , 2004b(Plett, , 2004c). The effect of temperature may be included in such models by coupling the ECM with an energy balance or thermal model, see e.g. (Karlsen et al., 2019;Bian, Liu, Yan, Zou, & Zhao, 2020). An ECM is used in  to adapt a statespace model to learn a polarising impedance surface used for capacity degradation modelling.
Electrochemical models typically consist of a simplified set of electrochemical equations that model the transport of charge between the positive and negative electrode in the battery cells based on the underlying physics. They describe the charge flows through the electrolyte and voltage drops at the cathode, anode and separator of the battery cells and typically include a set of differential equations, several model parameters, model states and some measurable model output. The model parameters are typically identified from battery dimensions and chemistry or are estimated based on data. Examples of such electrochemical models are given in (Bole, Kulkarni, & Daigle, 2014;Daigle & Kulkarni, 2013;Bi, Yin, & Choe, 2020;C. Lin, Xing, & Tang, 2017). Battery ageing and degradation can be modelled by changes in model parameters describing e.g. the internal resistance and charge capacity of the battery.

Regression Type Models
Regression model range from simple linear regression models assuming a linear relationship between a set of explanatory variables and a response variable to complex machinelearning (ML) type of regression models for more complicated and non-linear relationships. One advantage of complicated models is that more accurate models may be constructed when accounting for non-linearities. However, a parsimonious model can also be preferred as it will be less likely to overfit training data and be more easily interpreted. In general, in order to use regression type models there is a need for representative training data so that the model can learn the relationship between the input variables and the response. For batteries, this means that battery test data is needed, where both the explanatory variables and the response is measured, typically based on laboratory tests. However, it is uncertain how representative the typical lab test data are for the degradation caused by more random duty cycles experienced in the field.
Simple linear regression models are proposed in (D. Wang, Kong, Yang, Zhao, & Tsui, 2020;Huang, Tseng, Liang, Chang, & Pecht, 2017;Severson et al., 2019;Tang et al., 2018), and different regression models for SOH based on polynomial functions of cycle number as the only variable and polynomial and exponential functions of fully discharged voltage and internal resistance are compared in (Tseng, Liang, Chang, & Huang, 2015). A kernel ridge regression model is suggested for SOH estimation in (Y. Li, Sheng, Cheng, Stroe, & Teodorescu, 2020). The relationship between capacity, accumulated charge and ranges of state of charge during cycling expressed in Eq.
(3) is formulated as a regression problem in (Plett, 2011), where the total capacity is a regression coefficient between measured changes in state of charge (predictor) and accumulated charge obtained by Coulomb counting (response). A similar approach framing maximum capacity estimation as a total least square problem is taken in (T. Kim et al., 2015), where a Rayleigh quotientbased algorithm is employed to estimate capacity recursively. The capacity of batteries is not measured directly by sensors and are therefore not available for each cycle in online battery data. If available at all, capacities will only be available for limited cycles. This raises the need for semi-supervised learning, as addressed in (Yu, Yang, Wu, Tang, & Dai, 2020). In order to address the problem of insufficient training data, concepts of transfer learning and ensemble learning are incorporated in (S. Shen, Sadoughi, Li, Wang, & Hu, 2020).

Time-Series Models
Time-series models represent a different approach to modelling capacity fade. Rather than estimating capacity and state of health by regressing on some explanatory variables, timeseries models estimate capacity based on previous observed capacities and model the serial dependence in observed capacities. Hence, based on a history of capacity measurements, current and future capacity values can be estimated. Notwithstanding several approaches where time-series models have been used for SOH prediction, such models generally project future values based on historical observations of the capacity, rather than regressing capacity on other explanatory variables. Hence, such models are believed to be more relevant for prognostics applications than for diagnostics and such methods are deemed less relevant for estimating SOH of maritime battery systems based on sensor measurements.

Survival Type Models
Survival and event history modelling is a separate branch of statistics that are used to model time-to-event data. If for example a battery's end of life is regarded as the event to be modelled, one could construct probabilistic models for the time until this event, determined by a set of covariates. However, one prerequisite for establishing such modes is the availability of sufficient amount of run-to-failure data, where the time until EOL is observed for a number of batteries or bat-tery cells. Such data could typically be collected from similar batteries in operations to reflect realistic load profiles.
Survival analysis modelling are applied to lithium-ion batteries for end-of-performance modelling in (Y.-F. Wang, Tseng, Lindqvist, & Tsui, 2019). A trend-renewal process is used on accelerated testing data to predict end of performance. However, this model relies on observed capacity ratios for projecting capacity fade and estimate end of performance, which will typically not be available for maritime battery systems.

Cumulative Damage Models
Cumulative damage models are often used for modelling of structural fatigue, where the structural deterioration is modelled as a cumulative sum of different load cycles. Fatigue life of a structure is typically given in terms of number of stress cycles of a specific amplitude. For structural components exposed to a complex, random sequence of loads, the fatigue damage can be estimated by reducing the complex loading to a series of simple cyclic loadings using techniques such as rainflow counting and then form a fatigue damage spectrum as a histogram of cyclic stresses. The degree of cumulative damage for each stress level can then be calculated from an S-N curve, that can be established based on laboratory tests. Often, simple parametric functions can be fitted to the test data to allow interpolation on the S-N curve.
For battery cells, if one were able to construct curves or surfaces similar to S-N curves that determines the contribution to battery degradation from individual charge/discharge cycles of specified DOD/SOC range, temperature and C-rate one could imagine that this could be used to calculate state of health based on experienced load profiles and some form of cycle counting such as rainflow counting. However, an extensive set of laboratory tests would presumably be needed, where run-to-failure tests would need to be performed for a number of different cycle amplitudes and conditions.
Cumulative damage-type modelling of battery degradation based on so-called load collectives are proposed in (Nuhic et al., 2013(Nuhic et al., , 2018You et al., 2016), and approaches based on rainflow counting are suggested in (B. Xu, Oudalov, Ulbig, Andersson, & Kirschen, 2018;S. Li, He, Su, & Zhao, 2020) One potential issue with cumulative damage models is that they rely on the complete operational history of the batteries. If there are long periods with missing data, the histograms, distributions or collectives may be biased and will miss information from the period where data are missing. Hence, this puts strict requirements on the reliability of the data collection procedures and on allowable downtime. Nevertheless, for complete time histories cumulative damage models are found to perform well and may be attractive alternatives for modelling battery degradation and state of health.

Empirical/Analytical Models
Some methods for SOH estimation are based on fitting empirical models to various measurement data. The aim of such models is to capture relationships between battery state of health and various stress factors, such as operation time, temperature and operational loads. These models are typically based on test data and the empirical relationships can be used during operation to model state of health and capacity loss of the battery.
The coulombic efficiency (CE) is used to establish a model for actual reversible capacity in (F. Yang et al., 2019). It is assumed that the coulombic efficiency describes the decrease in reversible capacity in successive cycles, C k = C k−1 CE k , where C i denotes the reversible capacity at cycle i and CE i is the coulombic efficiency of cycle i. Then, assuming that the coulombic efficiency is constant over cycles, one arrives at the following, by iterating over cycles since the initial capacity C 0 : C k = C 0 (CE 1 CE 2 · · · CE k ) ≈ C 0 CE k , see also (Arachchige, Perinpanayagam, & Jaras, 2017). Hence, they propose the following parametric model for reversible capacity α 0 and α 1 are considered model parameters, and also CE is regarded as a model parameter, reflecting that it is difficult to measure CE accurately. This model is compared to a simple empirical model based only on cycle number; C k = β 0 √ k + β 1 , and is found to perform better.

DATA-DRIVEN MODELS FOR RUL PREDICTION
SOH estimation is an important part of battery diagnostics and can inform about the current state of the battery. RUL prediction, on the other hand projects state of health into the future in a prognostic setting and relies on some assumption on future operating conditions and loads. Hence, even though some methods for SOH estimation can also possibly be used for RUL prediction, different types of approaches are more specifically focusing on the prognostic part of battery health estimation. Several reviews on RUL for batteries have been presented in the literature (L. Wu et al., 2016;Su & Chen, 2017;, and approaches for RUL prediction include electrochemical models (Y. Zhang, Xiong, He, Qu, & Pecht, 2019b), equivalent circuit models (Y. Ma, Yang, Zhou, & Chen, 2019), empirical models (Sarasketa-Zabala et al., 2016), particle filters (L. Chen, Wang, et al., 2020;L. Chen, An, Wang, Zhang, & Pan, 2020), data transformations (Y. Peng et al., 2020), regression models (S. Zhang, Guo, & Zhang, 45;F. Yang, Wang, Xu, Huang, & Tsui, 2020; Wang, & Miao, 2020). Various health indicators are also proposed for use in prognostics of battery systems, see e.g. (Zhou, Huang, Chen, & Tao, 2016;Sun, Hao, Pecht, & Zhou, 2018).
Machine learning methods are used for identifying and predicting knee-points and knee-onset in capacity degradation curves in (Fermín-Cueto et al., 2020). Based on knee-point predictions, the cell's expected cycle lives are estimated and classified as short, medium or long.

Data Availability and Requirements
Data-driven models need training data to learn relationships between input variables and responses, and the availability of data determines both what types of models can be used and the accuracy of the model predictions. Often, training data are gathered by laboratory experiments and used to train a model that can be used in an operational setting. However, if a sufficient amount of operational data is available, it may also be possible to train models based on such data without requiring extensive laboratory testing, as suggested by e.g. (Lucu et al., 2020a(Lucu et al., , 2020b. The origin of the training data notwithstanding, available training data needs to be of sufficient quality and quantity, sufficiently representative, sufficiently complete and sufficiently relevant in order to train usable data-driven models, and the availability of such data is a crucial prerequisite for relying on data-driven models for battery capacity and state of health estimation.
Also, it is important to understand what type of operational data will be available throughout the lifetime of the battery system. It is safe to assume that data such as various current, voltage and temperature measurements will be available for the SOH algorithms, but temporal and spatial resolution may vary. Furthermore, the reliability and accuracy of derived quantities such as the state of charge will need to be assured. It needs to be determined whether the data automatically collected are sufficient, or if additional specific measurements are required, e.g. periodic tests with set load patterns and fixed conditions, or particular tests such as pulse tests and impedance or resistance measurements. From a practical point of view, it may be desirable to only rely on continuously measured data streams, but results could be improved if additional tests are carried out.
The data quality is a crucial issue for data-driven methods, and results can only be as good as the data allows. Many of the continuous variables will most likely be discretized in both time and value, and additional measurement noise will always be present. This could influence results in different degrees, and some denoising and preprocessing of the data will probably be needed. For example, for methods based on ICA/DVA relying on the differentiation of discrete signals will certainly need some type of smoothing to perform well. Hence, proper approaches to preprocessing and denoising of the data signals will need to be considered as well as the actual data-driven models.
Additional factors that may be relevant for maritime batteries have not been well studied in the literature, such as the effect of humidity, airborne salinity, vibrations and the constant movement of the ship. Such information may not be available and it should be investigated to what extent such factors influence battery degradation.

Synthetic and Realistic Load Profiles
Some approaches to SOH modelling assumes that batteries are used in a controlled way, at near-constant temperatures, with constant charge and discharge C-rates and systematically cycled within a specified voltage range. Indeed, training data obtained from laboratory tests will often be collected under such controlled situations. However, for maritime battery systems, as well as for many other applications, this is hardly the case and batteries are typically cycled only partially and under highly variable loads and environments (You et al., 2016).
Charging is often performed with a constant current constant voltage procedure, with deterministic rather than stochastic current and voltage profiles in the different steps. Hence, methods that considers features from charging profiles may be preferred to methods relying on discharge features. However, typical charging patterns may vary and extensive use of partial fast-charging may deviate from normal charging routines under very similar conditions. This review have seen several approaches that extracts features from partial charging curves. Hence, such features are believed to be useful and it is believed that such features can be used to estimate SOH for maritime battery systems. However, the effect of dynamically varying temperatures and currents must be taken into account also for features based on partial cycling data, and this may not be straightforward.

Statistical and Machine Learning Models
Some aspects to consider when selecting a statistical or a machine learning model are predictability and interpretability. Typically, more advanced machine learning models are more flexible and can accommodate complicated relationships between the input and output variables and may have higher predictive power. However, such models are often referred to as black box models in the sense that it is difficult to understand the predictions and difficult to interpret the relationship. Furthermore, complicated models may fail to generalize and are more prone to overfitting than more simplistic models.
Another aspect is how to handle uncertainty. Whereas some models provide predictive distributions, most machine learning model only give point estimates. Obviously, estimation of the uncertainty can be useful, but often comes at a computational cost. Hence, selecting a statistical or a machine learning model for SOH estimation is a trade-off between accuracy, generalizability, interpretability and computational cost.
One aspect of missing data is that data streams will typically not contain capacity or SOH for all data points. Hence, models that can be applied with no or limited labelled data may be needed, indicating that methods from unsupervised or semisupervised learning could be relevant .

Feature Extraction and Selection
Different modelling techniques require different types of features to explain battery degradation and different training data. For models to be useful it is also important that the selected features will be collected during operation. Hence, there is typically a need for features that can be extracted from data readily available from the battery management system, such as current, voltage and temperature measurements. From such raw data, derived features such as state of charge, number of cycles and rest time at different SOC/voltage level can also be extracted. This review has showed that there are countless approaches to extract features, sometimes referred to as health indicators, for SOH modelling, and which features are used to train the data-driven models may typically be more important than the actual type of statistical/ML model to employ.

Models Based on Complete Loading History vs. Snapshot Methods
Some of the models reviewed in this paper relies on the whole operating history of the battery cells in order to estimate SOH, whereas others estimate SOH based on brief snapshots. Cumulative damage models and empirical/semi-empirical models relating SOH to number of cycles and other stress factors such as temperature, C-rate and SOC swing are examples of the former. Regression models on features extracted from partial charging curves or incremental capacity curves are examples of the latter. Both approaches have some advantages and disadvantages.
Cumulative damage models are attractive, since they can be used to model the accumulated degradation effect from the experienced operational profile. In essence, such models establish a relationship between the load profile or individual cycle and the change in SOH, the ∆SOH. The actual SOH after n cycles can then easily be estimated as SOH n = SOH 0 + ∑ n i=1 ∆SOH i , where SOH 0 is the initial capacity. Moreover, if a future duty cycle can be assumed, such an SOH estimation model can also be used for prognostics and RUL prediction. However, one disadvantage of this approach is that the complete operational profile is needed, from the first to the current cycle. Periods of missing data will effectively render such models inaccurate. Possible remedies could be to impute values for missing data, but this is probably only possible for relatively short periods of missing data.
Methods based on regular snapshots of the data streams are very attractive in the sense that it does not require access to continuous data streams, or alternatively, accumulated data in the form of histograms or collectives representing the complete operation history. With such models, it would suffice to get batches of data at certain intervals, and if the models are able to reliably extract battery capacity and SOH from such snapshots, the cumulative effect since the previous batch would implicitly be estimated. Thus, if such models are found to perform well enough, they may be the preferred approach for SOH verification of marine battery systems.

SOH Estimation and RUL Prediction
Estimation of state of health (SOH) and prediction of remaining useful life (RUL) of batteries can be considered as two sides of the same coin. SOH estimation aims at describing the current degradation state of the battery, whereas RUL predictions projects future degradation of the battery until it reaches its end of life. Hence, both depends on a method for describ-ing ageing as a function of various factors such as calendar time, cycle time and operating conditions related to temperature, C-rate and SOC levels. However, for RUL there is the additional need of predicting future conditions and usage patterns. For battery systems operating under variable loads, this may be challenging and some additional assumptions need to be made. Some of the methods described above for SOH estimation cannot easily be adopted to predict RUL, and all methods based on direct measurements such as Coulomb counting, electrochemical impedance spectroscopy and incremental capacity analysis will be difficult to apply in a prognostics setting. However, other methods will typically be more relevant for RUL prediction than for SOH estimation, for example different time series models and survival models. Cumulative damage type of approaches, where degradation is modelled based on cumulative effects of the load histories, on the other hand, could presumably be adopted and used also for RUL prediction, under some assumed future loading conditions. Also, many of the empirical capacity fade models could in principle be extended to predict remaining useful life of batteries, i.e. to predict when capacity crosses a predefined threshold.
The duty cycles and operating conditions of maritime battery systems will typically be unpredictable and depend on weather and sea state conditions, loading conditions and possibly different voyage lengths and routes or different operations. However, one plausible assumption could be that past operating history is representative for the future. This approach was suggested in (Nuhic et al., 2013). However, as pointed out in (Severson et al., 2019), degradation mechanisms are typically nonlinear and degradation during earlylife cell cycles may not be strongly correlated with degradation patterns in later cycles (Harris, Harris, & Li, 2017).

Cell vs. Module vs. Pack level
When establishing diagnostics methods for state of health estimation one need to consider whether to apply these on cell, module or pack (string) level and the heterogeneity of the cells within a module or a pack poses a challenge; cells within a battery system will typically not degrade uniformly. Hence, methods to identify cell differences are relevant.
SOH estimation at cell level could be aggregated to pack level. For example, for cells connected in series with passive equalization, the available capacity of the entire string will be determined by the capacity of the single cell with the minimum capacity, and for series-connected cells with active equalization, the available pack capacity is given by the average of the call capacities (Cordoba-Arenas, Onori, & Rizzoni, 2015). For parallel-connected cells the available capacity will be given by the average cell capacity times the number of cells. However, earlier studies have shown that bat-tery pack lives are typically shorter than single cell life due to other degradation mechanisms (Y. Zheng, Ouyang, Lu, & Li, 2015). Even though most studies focus on single cell data, several papers addresses SOH estimation of battery modules and packs, see e.g. (Diao, Jiang, Zhang, Liang, & Pecht, 2017;Dubarry et al., 2019).

Effect of Battery Chemistry
Modelling approaches for a range of different battery types and chemistries have been reviewed, without a lot of emphasize on what type of batteries the various methods have been applied to. It has tacitly been assumed that the data-driven methods are agnostic to battery chemistry, in most cases, and that different chemistries can be handled by changing the model parameters or re-training models with appropriate training data. However, it should be noted that some methods may not be easily transferred to other battery chemistries, so care should be taken when selecting a modelling approach for a particular battery type. For example, it is generally known that for lithium-iron-phosphate batteries (LFP), there is a flat plateau in the SOC-OCV curve that renders voltage-based algorithms and incremental capacity analysis difficult to apply to such types of batteries (Z. Deng et al., 2016).

Verification and Validation
One important question for data-driven SOH estimation methods is to what extent they can be verified and validated to perform satisfactorily for the intended battery system. This may require a standardized platform and extensive testing data from actual degrading batteries, and the models would need to be, somehow, verified and validated particularly for each case.
It is noted that some particular methods may be prone to systematic under-or overestimation of actual capacity. For example, SOH estimation based on Coulomb counting of partial cycles -a technique that is utilized for maritime battery systems today -is likely to underestimate the capacity fade and thereby overestimate the actual capacity (Stroe et al., 2018). Such systematic biases for specific approaches are important to understand and account for in order to obtain reliable estimates of state of health and capacity.

SUMMARY AND CONCLUSION
This paper has presented a thorough literature review of recent publications on data-driven state of health and capacity modelling of lithium-ion battery systems. More than 300 scientific papers have been reviewed and it is believed that this review gives a fair overview of current state-of-the art in datadriven SOH modelling.
Data-driven methods for SOH modelling can be categorized into a few groups of approaches, i.e. direct measurement techniques, state-space models with observers, regression type models, time-series models, survival type models, cumulative damage models and empirical/analytical models. However, the distinction is not crisp, and several types of approaches are often combined. Some of these approaches are deemed more relevant for maritime battery systems than others. One desired feature is that it only needs information contained in normal operational data. For example, time-series models require time-series of capacity measurements that will not be available, and survival type models need extensive lifetime data that cannot be expected to be available. State-space models, either electrochemical models or equivalent circuit models are typically used in BMS for SOC and SOH estimation. However, from a class perspective the aim is to develop means for independent verification of capacity/SOH estimation made by the BMS, and it may therefore be advisable to consider alternative modelling approaches. Hence, it is believed that a combination of direct measurement techniques, regression models, empirical models and cumulative damage models will be most relevant.
Direct measurement techniques include some approaches that require particular hardware and might not be suitable for online verification of SOH estimation. Moreover, direct capacity estimation based on Coulomb counting requires specific reference charge and discharge cycles, under specific conditions which will not be observed during normal operations. However, techniques based on partial charge or discharge information could be useful and will be explored further.
A large number of regression type models, ranging from simple linear regression models, to empirical/analytical models, to highly complex machine learning models have been proposed, establishing a relationship between capacity and different features extracted from the data. Perhaps more important than what type of regression model to use is the selection of features to use. Two fundamentally different approaches can be taken, herein referred to as snapshot and cumulative approaches. The cumulative approaches establish a relationship between various stress factors and capacity degradation, ∆C, whereas the snapshot approach establishes a relationship between observed features and actual capacity, C.
One disadvantage of cumulative models is the need for the full operating history of the batteries. If parts of the history is missing it will not be possible to estimate actual capacity at a particular time. A huge advantage of snapshot models is that capacity can be estimated based on only parts of the continuous data-stream. This is believed to be a very promising feature of a method employed for regular verification of online capacity estimation. However, such methods may require higher temporal resolution in the data in order to extract the necessary features. If reliable such models can be established, SOH can be verified based on regular batches of data. However, challenges remains with respect to how the influ-ence of temperature, variations in state of charge and current can be incorporated into the models. Notwithstanding many unresolved challenges; many of the reviewed papers explicitly state that the problem of online state-of-health estimation of lithium-ion batteries are far from being solved, this paper gives some directions for further research on data-driven estimation of battery state of health, for the purpose of verifying capacity on maritime battery systems.

ACKNOWLEDGMENT
This work has partly been carried out within the DDD BAT-MAN project, supported by MarTERA and the Research Council of Norway.