Machine Learning‐Based Lifetime Prediction of Lithium‐Ion Cells

Abstract Precise lifetime predictions for lithium‐ion cells are crucial for efficient battery development and thus enable profitable electric vehicles and a sustainable transformation towards zero‐emission mobility. However, limitations remain due to the complex degradation of lithium‐ion cells, strongly influenced by cell design as well as operating and storage conditions. To overcome them, a machine learning framework is developed based on symbolic regression via genetic programming. This evolutionary algorithm is capable of inferring physically interpretable models from cell aging data without requiring domain knowledge. This novel approach is compared against established approaches in case studies, which represent common tasks of lifetime prediction based on cycle and calendar aging data of 104 automotive lithium‐ion pouch‐cells. On average, predictive accuracy for extrapolations over storage time and energy throughput is increased by 38% and 13%, respectively. For predictions over other stress factors, error reductions of up to 77% are achieved. Furthermore, the evolutionary generated aging models meet requirements regarding applicability, generalizability, and interpretability. This highlights the potential of evolutionary algorithms to enhance cell aging predictions as well as insights.


: Evolutionary Process
The core of our framework relies on an adapted version of the GPTIPS 2 framework for multi-gene symbolic regression via genetic programming developed by Searson et al. [1] . The GPTIPS 2 framework was chosen because it is open source, optimized for multi-gene symbolic regression, well established, highly customizable and programmed in MATLAB, which is commonly used in vehicle development.
Its evolutionary process, which is summarized in Figure S5 (Supporting Information), starts with the creation and evaluation of an initial population. This is followed by a generational loop, which is iterated until a termination criterion is satisfied and the final generation's best individuals are returned. The generational loop consists of parent selection, reproduction, fitness evaluation and survivor selection. [2][3][4][5] While initialization originally relies exclusively on the random generation of trees, [1] we added the opportunity to seed variable proportions of models, which are e.g. domain knowledge based. Seeding improves the initial population's fitness and hence potentially increases the algorithm's efficiency. Additionally, it can direct the optimization process towards regions of the search space containing good solutions and thereby enhance its effectiveness.
However, also the opposite effect is possible resulting from narrowing the search space. The influence of seeding on the algorithm's performance depends on the investigated problem. [2,4,6] Furthermore, the adapted version only utilizes the ramped half and half method for the random generation of trees due to its advantages regarding diversity in comparison to the other tree creation methods available in GPTIPS 2. [1,4] For fitness evaluation, the gene weighting coefficients of the individuals are estimated by least squares regression. In GPTIPS 2 the fitness values are determined either under sole consideration of the RMSE on training/learning data (single-objective) or by non-dominated sorting (multi-objective). [1] Since the single-objective fitness function does not take into account the multi-objective task of predictive modelling, it is replaced by the scaled multi-objective fitness function Fitness1 in the modified version of the algorithm. This approach transfers multiple objectives into a single-objective problem by weighting the objective specific evaluation functions for model error (RMSE on training/learning data) and model complexity (expressional complexity). This is implemented to offer a computationally more efficient option for multi-objective fitness determination than non-dominated sorting. However, in comparison to non-dominated sorting, the assumption of structure and weights of the combined fitness function can be disadvantageous as it reduces the area of interest to a line of pre-defined form before the range of possible solutions is known ( Figure S25, Supporting Information). [3,4,7] The non-dominated sorting algorithm takes into account the same inputs as the multi-objective fitness function: accuracy measured by the RMSE on training/learning data and complexity quantified either by expressional complexity or by the simpler approach of node counting, which is not considered by the modified version of the algorithm. [1] An individual is not dominated if it achieves the best fitness regarding one objective compared to all individuals of the population that share its fitness level regarding the other objective. If an individual is not dominated by any feasible solution, it is Pareto-optimal. In non-dominated sorting all individuals are assigned to a non-domination rank based on the worst ranked individual they are dominated by. [2][3][4] In GPTIPS 2, individuals of the same non-domination rank are sorted randomly (secondary sorting). [1] For parent selection, only the tournament method is supported. Tournament selection is a computationally efficient solution for parent selection because it does not require knowledge of the entire population. Each parent is selected by an independent tournament for which a defined number of individuals is randomly picked from the population. Out of these individuals, the one with the best fitness is selected. Depending on probabilistic choice, each tournament relies on one of the supported fitness evaluation methods. [1,4] Reproduction creates new individuals from previously selected parents in order to explore the space of possible solutions. In genetic programming, the genetic operators recombination, manipulation and cloning are applied mutually exclusive depending on probabilistic choice. [2,4] Recombination creates offspring, which contain a combination of the genetic material of their parents. The recombination techniques offered by GPTIPS 2 are subtree crossover (low level) as well as gene crossover (high level). [1,2,4] The purpose of manipulation is the introduction of new genetic material by modification of existing individuals. [2,4] For manipulation, subtree mutation and five types of point mutation are available. The point mutation techniques either exchange non-constant terminals (type 1) or modify constant terminals by adding values based on Gaussian distributions (type 2), by substituting them with randomly generated values (type 3), by setting them to zero (type 4) or by setting them to one (type 5). [1] While the other genetic operators modify genetic material to explore the solution space, cloning preserves genetic material to avoid a decline in fitness. Each offspring created by cloning is an exact copy of its parent. [2] The choice of a specific genetic operator is probabilistic. In GPTIPS 2 the corresponding probabilities remain constant throughout the evolutionary process. [1] However, the efficiency of an evolutionary algorithm can be improved by adjusting the probability of mutation to local solution space characteristics. [3] Therefore, we introduced an adaptive rate of mutation according to Rechenberg's 1/5 success rule. The modified algorithm adapts the mutation rate starting with the eleventh generation of each evolutionary process. For this, the previous ten generations are compared regarding their RMSEmin on training/learning data. In case of more than two improvements during these generations, the probability for mutation is increased by an adaptation rate of 0.1. Correspondingly, the probability for mutation is decreased in case of less than two improvements. Compliance with the requirements of the evolutionary algorithm is ensured by scaling of all other reproduction related probabilities as well as by enforcing constraints.
The survivor selection of GPTIPS 2 relies on the replace-worst method. This allows for the survival of the fittest previously existent individuals. Even though this enables quick improvements of the population's fitness, it has to be considered that it can also lead to premature convergence. [1,4] Possible termination criteria are the number of generations, the computational time and the best individual's fitness. [1] While GPTIPS 2 selects the individual, which is returned as best solution, from all individuals of the final generation based on their RMSE on training/learning data, [1] the modified algorithm utilizes the selection mechanism described in Methods for champion selection during model creation and the selection mechanism described in Note S1.2: Hyperparameter Optimization (Supporting Information) for champion selection during hyperparameter optimization.

Note S1.2: Hyperparameter Optimization
Hyperparameters tune the behavior of learning algorithms. [8] The 21 hyperparameters summarized in Table S8 (Supporting Information) are considered for optimization due to their suspected relevance to the performance of the evolutionary algorithm utilized in our framework.
Our framework optimizes these hyperparameters via the MATLAB function bayesopt for Bayesian optimization. Further hyperparameters of the evolutionary algorithm are not optimized as they are expected either to have no significant influence on its performance or to require further research. Therefore, they are not changed from their default settings. More details can be found in the documentation for GPTIPS 2, cf. [1] .
The Bayesian optimization algorithm iteratively performs runs of the evolutionary algorithm on training data with varying settings. For this, the allowed values for each hyperparameter are defined by minimum, maximum and step size. In addition, constraint functions ensure that all hyperparameter variations meet the requirements of the evolutionary algorithm.
During the first five iterations, the hyperparameters are selected randomly in order to build an initial surrogate model. After each subsequent iteration, this Gaussian process surrogate model is updated by fitting it to all previous observations. Based on this probabilistic model, an acquisition function selects the next point for evaluation, cf. [9] . To reliably find good global solutions, which are also computationally efficient, the acquisition function expected improvement -per second -plus is chosen in combination with an exploration ratio of 0.6.
For each iteration of the optimization algorithm, the final population of the corresponding evolutionary process is evaluated on validation data. The individual passing a filter regarding robustness (see Note S1.3: Robustness Filters for Champion Selection, Supporting Information) and achieving the minimum RMSEvalidation is selected as champion. The quality of an iteration is assessed by the predictive error of its champion model quantified by If no individual meets these basic requirements of robustness, the evolutionary run is restarted. Otherwise, the remaining individuals are compared by their RMSElearning on learning data. Only the best 25 % of models are considered for champion selection as described in Methods.

Note S1.4: Adaptive Thresholds for Seeding / Apex Model Selection
The reliable selection of generalizable models during and after model creation of Stage One and Stage Two is particularly challenging since  the information of training and validation set is already used during hyperparameter optimization and thus is known to the models,  the test set is reserved for a-posteriori performance evaluations,  the relatively small sample sizes usually available for lifetime prediction of lithiumion cells discourage the additional hold-out of data sets for model selection.
In order to enable the selection of generalizable instead of overfitting models, this unavailability of unknown data needs to be compensated. For this, the concept of the crowd error RMSEcrowd is introduced. It assumes that the majority of evolutionary runs concludes with reasonable champion models and that overfitting champion models are outliers. The mechanism calculates the median ̃ of the predictions of all eligible models for the input data of test sets (Stage One) or artificially generated grids (Stage Two). As shown in Figure S22, ̃ resembles the trend of the test data's response variable despite not using this information. Therefore, the comparison of a single model's predictions with ̃ in form of RMSEcrowd can be used as indication of a model's likeliness for overfitting while unknown data is not available. The crowd error is relevant for adaptive thresholds, which models need to pass for further evaluation.
In Stage One, models need to meet the requirement crowd,j ≤ 5 * ̃l earning (S2) with the median ̃l earning of all eligible models. This threshold filters out overfitting models since they contradict the trend of RMSEtest, which is approximated by RMSEcrowd, decreasing with RMSElearning.
To put increased focus on generalizability in Stage Two, the learning data's aging conditions are extended to artificial grids as illustrated in Figure S23. For calendar aging, the grid covers three evenly distributed temperatures and five evenly distributed SoCs limited by each factor's minimum and maximum values available in the learning data. For cycle aging, the grid comprises all available factor levels for temperature and -after the exclusion of minimum and maximum -all available levels of the remaining stress factors. These conditions are extrapolated over twice the maximum storage time or energy throughput observable in the learning data.
All eligible models j are evaluated by RMSEcrowd,j,a and its standard deviation σcrowd,a individually for each aging condition a of the artificial grids. A generalizable model is expected to perform well for various aging conditions. Thus, models have to comply with the requirement crowd,j,a ≤ crowd,min,a + thresh * crowd,a (S3) for all aging conditions a of the artificial grid. Initially, the scaling factor xtresh is assigned a value of 0.5 to limit the number of models passing the threshold. However, as long as less than 10 models satisfy Equation S3 (Supporting Information), xtresh is iteratively increased by 0.5.
The stepwise progression of the selection mechanism is exemplarily shown in Figure S24a (Supporting Information) for Stage One and Figure S24b (Supporting Information) for Stage Two of test case Cal Ext20/Int2.

Note S2: Benchmark Analysis
Due to the large variety of aging models and their dependency on the investigated stress factors, we refrain from a comprehensive review of aging models. Instead, only models, which can be applied to our aging data, are considered as benchmark.

Note S2.1: (Semi-)Empirical Calendar Aging Models
Recent advances in modelling calendar aging were made by Hahn et al. [10] and Schimpe et al. [11] . The model proposed by Hahn et al. considers an initial loss of lithium during formation, the one by Schimpe et al. improves SoC dependency by implementing an anode potential dependency. These approaches require additional information obtained by specific begin of life experiments, cf. [10,11] . However, to ensure a fair comparison between all investigated approaches, all models are required to use the same data. Therefore, these models are excluded from further evaluation.
As a result, the benchmark analysis of semi-empirical calendar aging models is limited to the models summarized in Table S1 (Supporting Information). These models represent the most common approaches for predicting calendar aging with a semi-empirical model. Their fitting parameters pi are determined by the MATLAB function lsqcurvefit for least squares regression. All investigated models show similar predictive performance in the calendar aging case study ( Figure S13, Supporting Information). Of the relevant models, 4 achieves the lowest predictive error ̅̅̅̅̅̅̅̅ averaged over all test cases (Table S1, Supporting Information).
Therefore, it is selected to represent (semi-)empirical approaches for calendar aging. In Table   S6 (Supporting Information), the predictive errors RMSEtest of the 4 model are compared with each test case's apex model of the genetic programming framework.

Note S2.2: (Semi-)Empirical Cycle Aging Models
The higher complexity of cycle aging leads to a large diversity in investigated stress factors and resulting cycle aging models. [12][13][14][15] Therefore, most models are designed for a specific dataset and cannot be applied to other datasets.
As a result, only the models summarized in quires an individual fit per aging condition instead. Therefore, this modelling approach cannot make aging predictions over cell aging conditions. In contrast, Regr is able to perform all common tasks of lifetime prediction (Figure S14, Supporting Information). As a result, Regr is selected to represent (semi-)empirical approaches for cycle aging. In Table S7 (Supporting Information) the predictive errors RMSEtest of the Regr model are compared with each test case's apex model of the genetic programming framework.

Note S2.3: Machine Learning Approaches
The machine learning approaches selected for the benchmark analysis are investigated in Python. For this, the software packages summarized in Table S3 (Supporting Information) are used.
In order to tune their hyperparameters comparable to our genetic programming framework, the Tree of Parzen algorithm implemented in the hyperopt python package is applied, cf. [16] .
Exceptions are Gaussian process regression and fast function extraction: For Gaussian process regression, the interval hyperparameter tuning module available in the used python package is applied. Furthermore, no hyperparameters of the fast function extraction algorithm are tuned, as its authors claim that their algorithm is hyperparameter insensitive (cf. [17] ). parameters are set to the standard setting of each package) and the parameter search spaces.
The network architecture of the multi-layer perceptron is defined by two hidden layers with rectified linear unit (ReLU) activation functions and dropout rates per hidden layer.
Analysis of the performance of all investigated algorithms regarding calendar and cycle aging prediction ( Figure S15, Supporting Information) reveals that the different types of machine learning greatly differ in their strengths and weaknesses. However, of all investigated approaches, extreme gradient boosting achieves the lowest predictive error ̅̅̅̅̅̅̅̅ averaged over all test cases (Table S3, Supporting Information). Therefore, it is selected to represent established machine learning approaches in the case study. In Table S6 (Supporting Information) and by Hahn et al. [10] with = 2 · 3 · cell and the lithium lost during formation Lilossform. are seeded together with a proportion of 20 % each. To enable a detailed evaluation of the influence of seeding on the evolutionary process, this investigation is conducted with an adjusted version of the genetic programming framework with simplified seeding/apex model selection mechanisms:  The 25 best ranked champion models of Stage One according to Fitness1 are selected as seeding models.
 The best ranked champion model of Stage Two according to Fitness2 is selected as apex model.
By replacing the complex selection mechanisms with these simple rankings, this investigation allows for conclusions regarding the influence of seeding on the evolutionary process with minimal interference from the selection mechanisms.
In Figure S20 Figure S5. Evolutionary process of the genetic programming framework: Following the creation and evaluation of an initial population, a generational loop is iterated until a termination criterion is satisfied and the final generation's best individuals are returned. Variation by reproduction combined with fitness based selection drives this optimization process. Based on [2][3][4][5] .    Functions for fitness evaluation define the area of interest of the optimization process they are used in. Therefore, it is desirable for the assumption of structure and weights of a scaled multi-objective fitness function to approximate the Pareto-front. [3] Our genetic programming framework uses two different scaled multi-objective fitness functions, which meet context specific requirements: a) Fitness1 is used for model evaluation during and after each evolutionary process enabling parent and survivor selection as well as champion model selection. To guide the optimization towards generalizable models with rather simple and flat genes, the complexity term is given more weight than the error term. Consequently, the tolerated margin of error is substantially increased for low complexities. b) Fitness2 is used for apex model and experiment selection. Since predominantly solutions of high quality take part in these selection processes, a narrower target range is required. For this, equal weights are applied for complexity and error term. This results in a reduced tolerance for high errors and complexities. Both functions require optimization by minimization.

Cyc Ext80
Extrapolation over ETP 10 10 80 Cyc Ext20/IntT Extrapolation over ETP (Cyc Ext20) and interpolation between operating conditions with focus on the influence of T (Cyc IntT) 37 31.5 31. 5  Table S6. Comparison of (semi-)empirical benchmark model, machine learning benchmark approach and genetic programming apex models for calendar aging: Evaluation of calendar aging test cases by predictive error RMSEtest and relative improvement of the genetic programming framework (GP) in comparison to the (semi-)empirical benchmark model 4 and the machine learning benchmark approach extreme gradient boosting (XG Boost).  Table S7. Comparison of (semi-)empirical benchmark model, machine learning benchmark approach and genetic programming apex models for cycle aging: Evaluation of cycle aging test cases by predictive error RMSEtest and relative improvement of the genetic programming framework (GP) in comparison to the (semi-)empirical benchmark model ETPRegr and the machine learning benchmark approach extreme gradient boosting (XG Boost).  Table S8. Hyperparameters of the genetic programming algorithm considered in Bayesian optimization: A range of allowed values is defined for each hyperparameter in order to reduce the search space. During optimization, only values are considered, which are a multiple of the step size and do not exceed the limits specified by minimum and maximum.

Hyperparameter Minimum Maximum
Step