Data‐driven snapshot methods leveraging data fusion to estimate state of health for maritime battery systems

The number of fully electric or hybrid ships relying on battery power for propulsion and maneuvering is growing. In order to ensure the safety of these ships, it is important to monitor the capacity that can be stored in the batteries, and classification societies typically require that this can be verified by independent tests—annual capacity tests. However, this paper discusses data‐driven alternatives based on operational sensor data collected from the batteries. There are different strategies for such data‐driven state of health (SOH) estimation. Some approaches require full operational history of the batteries in order to predict SOH, and this may be impractical due to several reasons. Thus, methods that are able to give reliable estimation of SOH based on only snapshots of the data streams are more attractive from a practical point of view. In this paper, data‐driven snapshot methods are explored and applied to degradation data from battery cells cycled in different laboratory tests. Hence, data from different sources are fused together with the aim of achieving better predictions. The paper presents the battery data show how relevant features can be extracted from snapshots of the data and presents data‐driven models for SOH estimation. It is discussed how such methods could be utilized in a data‐driven classification regime for maritime battery systems. Results are encouraging, and yields reasonable degradation estimates for nearly 40% of the tested cells, although the fusion of data from different laboratory tests did not improve results significantly. Results are greatly improved if data from the actual cell is included in the training data, and indicates that better results can be achieved if more representative training data is available.

Electric or hybrid ships using batteries have been an increasingly attractive alternative for many shipping segments, most notably ferries and offshore supply vessels, with significant environmental benefits and large potential for fuel, cost, and emission savings.Moreover, electrification of the ship fleet is completely aligned with societal and regulatory ambitions for emission reduction and a change to more environmentally friendly technologies for maritime transport.
The maritime industry has always been concerned with safety at sea, and the safety of battery-powered ships is no exception.The risk of fire and explosion are obvious for battery-powered ships, and these are controlled by risk mitigation measures and safety regulations.Another central aspect of the safety of electric ships is to ensure that the available energy stored in the batteries is sufficient to cover the demand for safely operating the vessel.Loss of propulsion power in a critical situation can lead to collision or grounding accidents with potentially severe consequences.Therefore, a reliable estimation and prediction of the actual available energy of a maritime battery system are crucial.
Battery systems are ageing, meaning that the energy storage capacity degrades by calendar time and by charge/discharge cycles.This degradation affects both the amount of charge that can be stored in the battery as well as the power that can be delivered.Battery degradation also affects fire safety and thermal runaway properties, 1,2  but this paper focuses on data-driven methods for state of health (SOH) estimation.Thus, the monitoring of the degradation of capacity for maritime battery systems based on sensor data will be addressed in this paper.
Classification societies typically require annual capacity testing for ships utilizing batteries for propulsion or maneuvering in order to ensure that the estimated SOH estimated by the battery management system (BMS) is accurate and reliable. 3-5There are some challenges with this approach, however, and data-driven methods to predict SOH is believed to be an attractive alternative if they can be demonstrated to work satisfactorily.From a practical point of view, the annual capacity test is time consuming and typically requires that the ship is taken out of normal operation for the duration of the test.Moreover, the accuracy of the test is questionable due to several factors influencing the results, such as variability in loads, temperatures, and depth of discharge (DOD).Maritime battery systems are typically designed for a 10-year lifetime while ships are normally designed for 25-30 years.Hence, the ship will typically outlive the onboard battery system, which may need to be replaced.When battery systems are approaching their end of useful life reliable estimation of SOH will become increasingly important, both from a safety point of view, but also from pure economical considerations.
A recent literature survey on data-driven models for SOH estimation presented an overview of various approaches and grouped them into a few generic categories. 6One important distinction that was made is between cumulative methods and snapshot methods.Other approaches to capacity monitoring include direct measurement techniques and state-space models with observers.
Cumulative methods refer to methods that rely on the full loading history of the batteries in order to predict current SOH.Such methods can relate information such as number of equivalent full cycles (EFC) or the total energy throughput the battery has experienced, combined with other stress factors such as temperature, Crate, and variations in state of charge (SOC) to maximum available capacity.In essence, such methods can model the accumulated degradation by establishing a relationship between the individual cycles and the change in SOH, that is, ΔSOH.The actual SOH after n cycles can then be estimated as the cumulative sum of such differences, that is, SOH n ¼ SOH 0 þ P n i¼1 ΔSOH i , where SOH 0 is a known initial capacity; typically 100%.Although this is an attractive approach, with potentially accurate and reliable results, it has some challenges.For example, the full operating history of the batteries is needed, and it will be challenging to handle large data gaps.For maritime battery systems onboard ocean-going ships, it may be difficult to guarantee uninterrupted data streams throughout the lifetime of the battery system.Moreover, for very large battery systems, the amount of data can easily be enormous, putting strict requirements on ship-to-shore connectivity, storage, and computational capacity for the data-driven models.One example of a cumulative data-driven method for SOH prediction is the battery.aitool 7 ; for other examples, see Ref. [8][9][10][11][12].
Snapshot methods, on the other hand, refer to methods that can make SOH predictions from just brief snapshots of the data without requiring the full cycling history.Such methods can utilize regression models where for example features extracted from partial charging or discharging curves or incremental capacity curves are used as covariates.Some examples of methods that exploit such features can be found in, for example, Ref. [13][14][15][16].It is noted that there are several challenges with this approach as well, and it may not be straightforward to account for the effects of varying conditions on the charge/discharge curves, for example, varying temperatures and current rates.Notwithstanding, with such models, it would be sufficient to receive batches of the data at regular intervals, which would be much more practical from a third-party verification point of view.Hence, if it can be demonstrated that such methods perform well enough, they may be the preferred approach for independent verification and validation of onboard SOH prediction routines.Such models include various regression-type models that would need to learn the relationship between the extracted features and SOH from a training dataset.The simple method proposed in Ref. [17] which is a linear regression model based on Coulomb counting and accounting for measurement uncertainties, do not need training data, however, but it is heavily dependent on accurate SOC estimation.In this paper, snapshot methods based on the raw data of measured currents, voltages, and temperatures rather than derived quantities such as SOC will be explored and applied to a novel dataset from laboratory battery cycling tests.This paper extends the approach presented in Ref. [18] which considered only one dataset from one set of experiments, by fusing data from different laboratory tests together in order to enlarge the training data set.Hence, all the results in this paper are new, and the fusion of data from different lab experiments is new compared to what was presented in Ref. [18].
The specific objectives of this paper are to establish data-driven snapshot methods for SOH estimation of maritime battery system and to investigate how they perform on cycling data for particular cell types.This is done by extracting relevant snapshot features from the data and exploring various regression models, from simple linear models to somewhat more complicated statistical models.The overall objective is to identify methods that are reliable enough to replace annual capacity tests for third-party verification of battery SOH within the context of ship classification.

| LABORATORY BATTERY CYCLING TESTS
Two different types of battery cells have been subject to cycling tests at both Fraunhofer's and Corvus' laboratories in order to generate degradation data.Two types of cylindrical 18 650 cells, that is, energy cells (henceforth denoted DDE or E) and power cells (henceforth denoted DDP or P), have been cycled according to specified test matrices, that is, within specified lower and upper voltage limits, with specified charge and discharge current rates, and at specified controlled temperatures.Varying these parameters for different cells yields different degradation rates.
In the experiments, the battery cells are cycled continuously according to these specifications, interrupted at regular intervals to perform check-ups and capacity measurements.These check-ups include pulse tests and charge and discharge capacity measurements by way of Coulomb counting over a deep cycle at low current rates.Hence, there will be observations of capacities at certain points in time for all cells.This is illustrated in Figure 1, which shows the measured capacities from these test procedures as a function of the number of EFC for the two types of cells and from the two laboratories.As can easily be observed, the degradation of the cells varies considerably according to how they have been cycled.It is noted that the cells in this experiment have been charged according to a constant-current-constant voltage (CCCV) scheme: the cells are charged with constant current until the cut-off voltage, where the cells continue to charge at constant voltage with a current that gradually decreases toward zero.However, the discharge procedures have been different in the two laboratories.At Fraunhofer's lab the discharge has also been with CCCV, whereas the discharge at Corvus' lab has been performed with constant power rather than constant current.Hence, the battery cells have been discharged somewhat differently, and therefore, only features from the charging curves have been extracted for use in this study.This is different from the study presented in Ref. [18] where also discharge features were included, but where data from only one lab experiment were used.

| DATA DESCRIPTION
Values of current, voltage, and temperature are sampled continuously, resulting in high-resolution time-series of these variables throughout the experiments.From these raw measurements, different derived variables can be calculated as well, such as cumulative throughputs, cycle counts, and EFC.Examples of time series of the raw measurements of current, voltage, and temperature are shown in Figures 2 and 3 for arbitrary cells from the two laboratory tests.According to the test-matrix, the cell from Frunhofer should be cycled between SOC = 50% and 10%, corresponding to voltage limits of approximately 3.70 and 3.23 V, respectively, with discharge and charge C-rates of 0.75 and 0.2, respectively, and at a temperature of 22 C.The regular cycling and the check-ups are easily discerned in the figure.It can also be observed that the cycling has been interrupted at certain times during the experiment.The cell from Corvus should be cycled between 3.6 and 4 V with a C-rate of 1.5 and at a temperature around 25 .Again, the regular cycling and capacity measurements are easily identified in the figures, and it can also be observed that for this particular cell, there seems to be some issues with the temperature measurements.
Similar measurements are collected from a total of 65 individual cells from the Fraunhofer experiments; 35 DDE cells and 30 DDP cells and 75 cells from the Corvus experiments; 29 energy cells and 46 power cells.This constitutes the two datasets that have been fused in this study for establishing data-driven models for state-ofhealth and capacity estimation.It is noted that data for some cells were discarded due to, for example, data quality issues or too short time-series, rendering data from 102 cells in the final dataset.

| Extracting charge and discharge curves
In order to establish data-driven snapshot methods for the prediction of SOH, the data need to be preprocessed, so that selected features can be extracted from the raw time series.First, different filters are applied in order to extract the charge and discharge curves from the regular cycling part of the data.This is done by first identifying where in the time-series regular cycling starts and ends, and then filtering out individual charge and discharge half-cycles by examining the currents and when the current changes sign.Additional checks and filters are applied in order to remove spikes and correct for erroneously categorized data.The individual charge and discharge cycles within each regular cycling period are then numbered, starting at 1 for the first cycles following a capacity measurement.In this study, features have then been extracted from the second charge curves after such a check-up.This choice is made based on two considerations: one wants features that are as close to known capacities as possible (i.e., close in time and EFC to the capacity measurements) and features far enough away  from the check-ups so that the transient effects of the pulse-testing on the battery cells have diminished.Considering the second set of cycles after a test is a trade-off between these two considerations.Examples of extracted charge and discharge curves for an arbitrary cell are shown in Figure 4.The different colors correspond to different regular cycling periods.The measured capacity preceding each period of regular cycling is also indicated in the figures.
Some interesting observations can be made from these plots.First, it is clearly seen that the charge and discharge curves change as the battery degrades.From the figure to the left, where only the second cycle after each test is shown, it can be seen that the curves change notably.From the middle figure, it is observed that also within a regular cycling interval, the charge and discharge curves change gradually.It is also observed that some of the charging curves behave slightly differently and do not start at the same voltage level.This happens to be from the first charge cycles, immediately following a check-up, and this supports the choice of using the second cycles.
When the individual charge and discharge half-cycles have been extracted from the time series, they need to be matched with results from the corresponding capacity tests.Results of this matching are illustrated in Figure 5, where the matching is illustrated in terms of both EFC and time for two arbitrarily selected cells.Vertical red lines indicate the EFC/time of the capacity measurement and blue vertical lines represent the EFC/time of the second charge cycle.It can be observed that the cycles and the capacity measurements are close in both EFC and time in most cases.In some cases, where the cycling has been interrupted for some reason, there might be some time between the test and the cycle, but they will still be close in EFC.At any rate, for the purpose of this study, it will be assumed that the capacity from the preceding capacity test is approximately the same as the actual capacity during the second charge/discharge cycle.

| Extracting Features from the charge and discharge curves
The next step is to extract particular features from these curves.Several alternative features can be used, and in this paper, features related to the current rate, temperature, and energy throughput between selected voltage ranges will be utilized.That is, for each selected cycle, the mean, minimum, and maximum temperature as well as the mean current are used as covariates.Moreover, the total energy throughputs between voltage ranges in steps of 0.1 V, as illustrated for an arbitrary cell in Figure 6, are used as additional explanatory variables (a similar idea was explored in 19 ).Simple linear interpolation has been used to estimate the cumulative throughput at the voltage limits (in terms of Ampere-hours [Ah]).It is noted that even though the constant-voltage phase of the charging cycles might contain information that can be related to the degradation state of the cells, features have only been extracted from the constant-current phase in this study.One reason for this is that these are the features deemed most likely to be found in data from battery systems in actual operation onboard ships.This is illustrated in Figure 6, where the black line represents the complete charge and the red points correspond to measurements during the constant current part, from which the voltagebased features are calculated.Other features suggested in the literature, include features from derivative curves and from probability density functions from time spent in different voltage ranges, see, for example, Ref. [13][14][15][16] and feature extraction is also discussed in Ref. [20][21][22].
In this way, a set of snapshot-based features are extracted from the time-series data and simple prediction models will be trained on these to estimate the capacity and subsequently the SOH of the battery cells.In total up to 19 features are collected from the charging curves and the overall dataset of extracted features contains 330 samples for the energy cells and 401 samples for the power cells.However, it should be noted that not all cells have information for all covariates.The various cells have been cycled between different voltage limits, and therefore have different subsets of the voltage-based features.Hence, the feature matrix is sparse, and this represents an additional challenge.It means that the effective number of samples available for training is reduced, and at different degrees for the different cells.For most of the models presented in the following, only complete cases are used for training.This means that values are needed in the training data for all covariates that are relevant for the cell to be predicted.This will be further elaborated in the analysis and results section of this paper.

| DATA-DRIVEN MODELS
A number of rather simple statistical models are employed in this study to predict the capacity of the battery cells based on snapshot features.The amount of training data is not sufficient to train more complicated machine learning models such as neural networks and deep learning.Moreover, simple prediction models have the advantage that they are more easily interpretable, and that they are less prone to overfitting.At any rate, the following data-driven models are explored in this study: F I G U R E 6 Energy throughput (Ah) across selected voltage ranges are calculated and used as explanatory variables (left), together with mean temperature (middle) and current (right).
• Linear regression (Linear) • Linear regression with missing covariates (Miss) • Generalized additive models (GAM) • Ridge regression (Ridge) • Least absolute shrinkage and selection operator regression (Lasso) • Multivariable fractional polynomial regression (MFP) Reference is made to standard textbooks for full mathematical descriptions of the various models, and in the following, a crude qualitative description will be given.
Linear regression models assume a linear relationship between the response variable y (capacity in this case) and the individual covariates where the β's are regression coefficients estimated from data and ε is an error term, typically assumed zero-mean Gaussian.Standard linear regression requires data for all relevant covariates.However, the linear regression with missing covariates model tries to account for missing covariate values by imputation: values of missing covariates can be predicted based on the available data and these predictions can be used in the regression model.Generalized additive models are predictive models based on fitting nonlinear functions to each individual covariate, typically using splines.Hence, more flexible relationships between the response variable and the covariates can be found.That is, the linear terms β i x i in equation ( 1) is replaced by terms of the form f i x i ð Þ, where f i can be nonlinear functions.
Ridge regression and lasso are linear regression models, where additional regularization constrains are applied in order to shrink the regression coefficients and avoid overfitting.The difference between ridge and lasso is how the penalty term is defined: in ridge regression, the penalty term is proportional to the square of the magnitude of the coefficients whereas for lasso the penalty is proportional to the absolute sum of the coefficients.Multivariable fractional polynomial regression is again similar to linear regression, but searches for an optimal polynomial transformation of the individual covariates.Such methods have previously been applied to battery data in Ref. [23].
Regression trees represent a different way of obtaining a prediction rule for the response variable.The input covariate space is subdivided into several nodes by making splits for selected covariates.Simple prediction models are then applied within each terminal node, typically the nodal mean.Such models are very flexible, but may be prone to overfitting.A random forest is a model that combines several regression trees, trained on different subsets of the data, and combines them into a random forest that makes predictions based on the ensemble of trees.This is normally found to reduce the risk of overfitting from individual trees.
Finally, support vector regression is based on finding a hyperplane in a higher dimensional space that can be used to make predictions.So-called support vectors are used to find this and are the data points closest to the hyperplane.For further details on these regression models, reference is made to, for example, Ref. [24].
In addition to these regression models, two types of ensemble predictions will be calculated.The first is simply the ensemble mean, that is, the mean prediction from all the individual models, and the second is a weighted ensemble prediction, where the weighted average of the individual predictions is calculated and the weights are calculated based on the relative root mean square error (RMSE) of the individual predictors.That is, the models that obtain a lower RMSE on the test data will obtain a higher weight.
It is also noted that for some of the cells, the amount of training data was insufficient to train some of the more complicated models such as the GAM and MFP models.In these cases, the models that cannot be estimated are replaced by the linear model (which is a special case of both the GAM and the MFP model).

| ANALYSIS AND RESULTS
The various regression models described above, as well as the ensemble models, are applied to predict the actual capacity of all the 102 energy and power cells for which cycling data are available.For each cell, two sets of predictions are made.First, all the data from similar cells (energy or power) are used as training data to fit the models, and these are then used to predict the capacity for the cells.Note that even though data from all similar cells are used for training, the training data will not be identical for each cell, since they have different sets of explanatory variables.Only the voltage ranges that are relevant for the cell in question are included in the training data.In the second set of predictions, data from the cell that is to be predicted is removed from the training data.This gives out-of-sample predictions that are more comparable to predictions on real operational data.
When applying the various regression models on data from all the cells it turns out that results are quite good for most of the cells.However, when data from the cell itself is removed from the training data, results are more varying.Predictions might be reasonably good for some cells but not so for many others.For some cells, some of the models perform well, whereas others might give poor predictions for the same cells.This is the case for both DDE and DDP cells, and there are no clear trend as to what models perform best (this will be further elaborated below).Some examples of predictions that were good for most of the models are shown in Figure 7.For each of the cells, predictions are compared with observed capacities for both the case where data from the cell itself were part of the training data and the case where the models were trained on data from other cells only.In both cases, error metrics in terms of RMSE and mean error (ME) are included in the plots.As can be observed, whether data from the actual cell are part of the training data or not has a significant impact on the prediction accuracy.Figure 8 shows some examples where some models yield reasonable results but where other models give poor predictions, and Figure 9 shows some examples where predictions were generally poor across models.
Similar results are obtained using the same predictive models but with only a subset of all the features described above.For example, analyses have been made where only the mean temperature is included, disregarding the temperature variation described by the minimum and maximum temperatures experienced during the charging and discharging.Another reduction in features that has been explored is to remove some voltage ranges, for example, the first and last voltage ranges in the charge and discharge curves.Although the numerical predictions vary from case to case, the overall observations remain the same, that is, the simple snapshot models are F I G U R E 7 Data-driven predictions based on snapshot features with reasonable accuracy for most models.
F I G U R E 8 Data-driven predictions based on snapshot features with reasonable accuracy for some models and poor for others.able to predict well the degradation for some cells, but not for all.

| Performance evaluation
The results above illustrate that the simple predictive models based on snapshot features may and may not perform satisfactorily for an arbitrary cell.In order to evaluate the model performance, there is a need to decide what level of accuracy is acceptable from a practical point of view.In current classification rules for electric ships, 25  it is stated that the annual tests should be within ±5% of the value presented by the BMS.Hence, it may be assumed that an error around 5% from data-driven models would be acceptable.Thus, for an energy cell with nominal capacity around 3.5 Ah, an RMSE below 0.175 Ah could be regarded as acceptable.Similarly, for a power cell with a nominal capacity around 2.5 Ah, an RMSE value below 0.125 Ah could be deemed acceptable.

| Performance across cells
One may want to evaluate the performance of the datadriven snapshot methods across the different cells, for example by calculating the average RMSE from the different predictive models.Then one could compare the average performance for each cell with the test parameters-voltage ranges, C-rates, and temperaturesof the cells to see which influence the results most.This is illustrated in Figure 10, for both the energy and power cells.The orange markers present the average RMSE for each cell when the data from that cell is included in the training data and the red markers present the results when data from that cell have been excluded from the training.The vertical bars correspond to the voltage range the cells have been cycled between (right axis), and the two colors of each vertical bar correspond to the Crate (leftmost color) and temperature (rightmost color) of the different cells, respectively, as indicated by the color legends in the plots.
Results from the models presented in this study indicate that 22 of the 49 energy cells (45%) have RMSE < 0.175 and that 16 of the 53 DDP cells (30%) have RMSE < 0.125.However, if the cell to be predicted has been included in the training data, then these ratios increase to 49/49 (100%) for the energy cells and 46/53 (87%) for the power cells.Although it is obviously not good enough that predictions are reasonably accurate only for some of the cells, it is encouraging to observe that the rather simple predictive models based on a few snapshot features perform acceptable on nearly half of the cells.Moreover, if the training data include data from the actual cells to be tested, the simple snapshot methods perform acceptable for more than 93% of the cells, and in fact for all of the energy cells.This indicates that the simple models are indeed able to model the dependencies between these covariates and the capacity of the cells, provided sufficient representative training data are available.

| Performance across models
One may also try to evaluate the performance across predictive models by comparing the average RMSE for all cells, as well as counting the number of times a particular model performs best and when it performs worst.This is summarized in Table 1.It is observed that there is no F I G U R E 9 Data-driven predictions based on snapshot features with poor accuracy for most models.clear winner.Looking at the mean RMSE for the results obtained when the predicted cell is included in the training data, it is observed that most models perform quite well.However, when these data are excluded from the training, all models do considerably worse, and the best candidate models (ignoring the ensemble predictions) F I G U R E 1 0 Average RMSE and test parameters for each cell.
T A B L E 1 Performance across models; average RMSE for all cells; number of times a model performs best and worst.

Linear
Missing would be regression tree, random forest, ridge regression, and lasso.Considering results not trained on the predicted cell only, it appears that the regression tree is the model which most often performs best.However, this also turns out to be one of the models that most often performs worst.On the other hand, ridge regression and lasso never perform worst, but also rather seldom perform best.Overall, it is difficult to select one model that performs best overall, and results vary considerably between cells.Possibly, an ensemble method could be regarded as the best approach, but this is also very sensitive to very wrong predictions from individual models.Indeed, the very high average RMSE for the MFP model is from a very few predictions that are completely wrong, probably as a result of extrapolating a polynomial fit.

| Importance of representative training data
One explanation for the varying results from the snapshot methods applied in this study could be the limited training data that have been available.Since the various cells have been cycled differently, the data for the different cells do not contain the same covariates, and only the ones relevant for the cell to be predicted could be included in the models.This leads to varying amount of training data for the different cells, and may in particular lead to lack of representative training data for some cells.For example, some cells have been cycled between 2.5 and 4.2 V and contain all voltage ranges, corresponding to 17 charge-based features.For these cells, only data from other cells that include all voltage ranges can be included in the training set, and this significantly reduces the amount of training data.In one example, this means a reduction in training samples from 261 to 44 complete cases.For cells cycled over a narrower voltage range, the number of features is reduced, and the amount of training data is effectively higher.For example, one cell cycled between 3.85 and 4.12 V contains 6 covariates with an effective training set of 96 complete cases; another cell cycled between 3.22 and 3.72 V contains 8 covariates with a training dataset of 86 samples.In all cases, the limited amount of representative training data is a possible explanation for varying results, and one possible remedy could be to enlarge the dataset by doing more laboratory tests.The sparsity of covariates in the training data is the main motivation to try out the missing covariate model.With such a model, rather than disregarding all samples without the full set of covariates the idea is to also use this additional information by imputing values for the missing covariates based on the existing ones.However, as it turns out, this is not successful for all cells, and this model often turns out to perform worst.This is presumably due to the number of missing covariates in many cases.If only one or a few covariates are missing in a sample, it may be possible to accurately impute the missing value.However, in many cases in this dataset, a large portion of the covariates are missing in many samples, and it is unrealistic to believe that one should be able to impute them all accurately.In some examples, the number of missing covariates is similar to the number of existing ones.
The importance of representative training data is also clearly seen by looking at the few duplicate cells.In the experiment, a few pairs of cells have been cycled according to the same test parameters.Hence, even though the data from the cell that are to be predicted are removed from the training set, data from another cell that have experienced a similar load pattern are available.In this study, duplicate cells are DDE057/DDE061, DDE068/ DDE072, DDP030/DDP031 and DDP038/040, and for all of these cells all the individual models perform very well even when the cell itself is not included in the training.This can also be seen by studying Figure 10.Again, this highlights the importance of representative training data to obtain good predictions.

| Possibilities for improvements
Although it is overall encouraging that the data-driven snapshot methods perform well for many of the battery cells, it is obviously not satisfactory that they fail to predict accurately for all cells.In the following, some ideas of how to improve the results are discussed.

| Extending the training data
The obvious solution for improving the data-driven methods is to extend the training data.As presented above, if a cell that has experienced similar loading history is included in the training data, as is the case with the duplicate tests, the models perform very well.Hence, this paper has fused data from different laboratory tests together in order to extend the training data.Nevertheless, even with a combined dataset from two laboratory tests, results could possibly be improved by adding even more training data from other tests, if available.
Another approach could be to, rather than expanding the cycling tests, to focus testing more on the data needed for training snapshot methods and to design the experiments somewhat differently.Typically, tests are performed according to a static test matrix, where one wants to investigate how different stress factors such as varying temperature, C-rate, and DOD influence the degradation.This will typically be features in cumulative damage models and such tests are useful for training such models.However, for snapshot methods, the features will typically not be these stress factors but will be extracted from the charge and discharge curves.In this case, it may not be necessary that the cells are cycled according to static test parameters, and it will be more important to ensure that all cells have a full set of covariates, for example that they are experiencing the same voltage ranges.One could assess which voltage ranges the battery system is expected to experience most frequently during normal operation, and focus on these voltage ranges, for varying temperatures and current rates, in the experiments.If all charge and discharge cycles contain these selected voltage intervals, the training data will be richer and could possibly lead to better predictions from snapshot methods even without extending the number of tests.One could also apply more dynamical loading in order to better explore the variation in the other covariates with a limited number of tests.Looking at the voltage ranges for the various cells indicated in Figure 10, it is obvious that it is not possible to find a subset of the voltage ranges that is included in all the cells for the available dataset.Thus, a lot of data are disregarded for each cell, and this way of testing is wasteful if the data are to be used for snapshot methods.

| Additional features
The models established in this study are based on a small set of covariates extracted from the measured time series of current, temperature, and voltage.In addition to mean current and mean, minimum and maximum temperatures, covariates related to the energy throughput at selected voltage intervals are the main explanatory variables.These have been extracted directly from the charge and discharge curves.It is possible that improved predictions can be achieved by using different or additional features.One example could be to try to extract features from derivative curves, such as dQdV or dVdQ curves.Such derivative curves are known to exhibit peaks at certain voltage levels, which may change in both location and amplitude as the battery cell degrades, and could be possible features in data-driven models.Examples of such derivative curves are shown in Figure 11, and it is clearly observed that the curves change character as the cell degrades.However, there is a lot of uncertainty associated with obtaining such curves, for example, related to the differentiation of noisy, discrete measurements as discussed in Ref. [26] and even though there are several approaches to estimate such curves, 27-30 it is far from straightforward to obtain smooth curves with welldefined properties.Moreover, the information extracted from the charge and discharge curves would undeniably contain the same information as the derivative cures, and it is not obvious that predictions would be improved by including additional features from the derivative curves.
Another way of extracting features from the charge and discharge curves is to fit parametric functions to the curves, and use the parameters as features.Two examples are polynomial or piecewise linear curves.However, it is not straightforward to obtain good fit, and for example with polynomial functions, one would typically require rather high-order polynomials to fit the curves well.

| Additional data preprocessing, finetuning, and different predictive models
Careful preprocessing of the data has been performed in order to extract the charge curves from the raw timeseries data and subsequently to extract the features from these curves.A number of filters have been applied to remove spikes in the data and to obtain error-free training data.However, more careful preprocessing could possibly improve the data quality and remove residual errors in the data.Examples of such errors are that only parts of the charge cycles are included in the data or that charge curves could be associated with wrong capacity measurement.It is always possible that such noise can be removed by additional preprocessing and yield improved results.
It is also possible to more carefully fine-tune the predictive models.Some of the models include several hyperparameters that needs to be specified.For many of the methods applied in this paper, hyperparameters are selected by cross-validation and grid-searches, but even more optimal values can possibly be found.Yet another option to explore is to use different predictive models.In this study, several simple data-driven models are employed, and more complicated machine learning models such as neural networks and deep learning models have not been tried out.However, it is believed that with the limited amount of training data available, it will be difficult to fit more complicated models, and even the more complicated models used in this study, most notably the GAM and the MFP models, are found to sometimes have difficulties.
Hence, notwithstanding several possibilities for improving the predictions by adding more features, more preprocessing, fine-tuning, and exploring different models, it is believed that the most important potential for improvement in this study is related to the available training data.Hence, further efforts will be directed toward integrating results from other tests in order to extend the training data.

| Snapshot methods for data-driven diagnostics of ships in operation
The objective of this study is to explore and establish data-driven models for diagnostics and SOH prediction of the maritime battery systems.If such methods can be demonstrated to yield reliable results, they may be used to replace annual capacity tests as a class requirement for electric and hybrid ships.In this context, snapshot methods are believed to be a much more attractive solution compared to cumulative damage models, which would require data from the whole operational history of the ship.This is difficult to guarantee for ships at sea, with possible intermittent and limited ship-to-shore connectivity and data storage capabilities.Moreover, for cumulative methods, it might be difficult to account for maintenance and possible replacement of individual modules in the battery systems.Snapshot methods, on the other hand, would only require mere snapshots of the data streams at regular intervals.
In addition to requirements on prediction accuracy and reliability, for snapshot methods to be accepted by classification societies as an alternative to annual capacity testing, a natural class requirement would be that sufficient and relevant training data are available.Hence, laboratory testing might be required prior to installation for the particular cell type.This is associated with a notable cost, but it is believed that test results could be re-used for different installations of the same or similar battery systems.Moreover, as elaborated above, it is believed that if experiments are carefully designed with snapshot methods in mind, it should be possible to obtain richer training data without prohibitively extensive and expensive laboratory testing.As has been demonstrated by this study, the quality and representativeness of the training data are crucial, and further research is needed in order to specify such data requirements within a datadriven classification regime.

| SUMMARY AND CONCLUSION
This paper has presented results from a study on snapshot methods for data-driven SOH modeling and monitoring of battery systems onboard ships in operation.If such models can be demonstrated to work well, they will represent a huge benefit to the maritime industry compared to annual capacity testing or data-driven modeling using cumulative damage methods.The paper has illustrated how charge curves extracted from raw measurements of currents and voltages are influenced by battery degradation and how relevant features can be identified from these.Moreover, a set of simple statistical models are explored for predicting the actual capacity of the batteries.These have been applied to a dataset resulting from fusing data from different laboratory tests together.
The overall results are encouraging, and demonstrate that simple statistical models based on a limited set of features easily obtained from sensor data are able to predict the degradation in battery capacity.With representative training data, illustrated in this study by including data from the cell to be predicted in the training data, the models yield reasonable results in more than 93% of the cases.If data from the cell are excluded from the training set, results are still reasonable for 37% of the cells.This is encouraging, but needs to be improved before such methods can be recommended as an alternative to current class requirements of annual capacity tests.
The results clearly illustrate the sensitivity of datadriven models in general, and snapshot methods in particular, to available training data.It is believed that lack of sufficient representative training data is the main explanation for the varying results, and further efforts will be directed toward extending the data set and more carefully designing experiments with snapshot methods in mind.In summary, the main findings from this study are that it is possible to extract a set of features for datadriven SOH estimation from snapshots of the data, and that it is possible to obtain reasonable results from rather simple statistical models.However, it remains challenging to get sufficiently accurate and reliable predictions for all cells, and this is believed to be mostly related to the quality and amount of relevant training data.

F I G U R E 1
Measured capacity as a function of EFC from different tests; from Fraunhofer's (top) and Corvus' (bottom) laboratories; energy (left) and power (right) cells.

F
I G U R E 2 Example of measured time series from the cell cycling at Fraunhofer; currents (top), voltages (middle), and temperature (bottom); full time series (left) and zooming in on one of the periods with regular cycling (right).

F I G U R E 3
Example of measured time series from the cell cycling at Corvus; currents (top), voltages (middle), and temperature (bottom); full time series (left) and zooming in on one of the periods with regular cycling (right).

F I G U R E 4
Extracted charge and discharge curves from the raw time series for arbitrary cells.Only the second curves after each test (left) and all curves (middle) from Fraunhofer tests and second curves from a cell tested at Corvus (right).F I G U R E 5 The extracted charge and discharge cycles in the raw time series have been matched with results from capacity measurements.In terms of EFC (left) and in terms of time (right).