Economic model predictive control of Li‐ion battery cyclic aging via online rainflow‐analysis

Cycle identification via the rainflow‐algorithm is implemented online in a model predictive controller (MPC) for Li‐ion batteries. This is achieved by externalization of the cycle identification from the optimization problem. The limitation for cyclic aging estimation due to short prediction horizons is overcome by updating and utilizing a State of Charge memory. Furthermore, a comprehensive plant model for Li‐ion batteries is presented with novel submodels for calendric and cyclic aging. The novel MPC is implemented in the ACADO Toolkit and tested with the aforementioned plant model. Simulation results indicate that—even without tuning—the novel MPC clearly outperforms a rule‐based controller and an extensively tuned MPC from literature.


| Motivation and previous work
Battery energy storage systems are very well suited to absorb and release electrical energy with intermediate storage periods which can last over different time scales. This stationary and transient operation causes calendric and cyclic aging, respectively. Aging manifests in the decrease of charge capacity and the increase of internal resistance. 1 When a defined aging level is reached, the battery reaches its end-of-life and has to be replaced. Consequently, an important task of modern battery operation strategies is the economic balancing of the revenue from energy storage and the cost of aging. This is a classical optimal control problem (OCP). Model predictive control (MPC) enables the solution of this OCP on a moving horizon while respecting constraints on system states and inputs. 2 A crucial and difficult task in the setup of the MPC is the correct modeling of abovementioned aging mechanisms. Specifically, cyclic aging estimation requires cycle identification from the state of charge (SOC) trajectory. The absolute range of SOC excursion of a cycle is called Depth of Discharge (DOD), and has a strong influence on the amount of aging. 3 The most accepted method for cycle identification for aging estimation is the rainflowcounting (RFC) algorithm. 4 However, RFC is a discontinuous algorithm and does not have an analytical mathematical form. 5,6 Thus, until recently, RFC could not be implemented in online controllers of batteries, and could only be used for post-processing of measured and simulated data. Instead, typically only surrogates for the cycling have been utilized in the setup of controllers, as described in the following.
A fairly simple surrogate is the penalization of charge throughput which is implemented as a time integral of discharge current in Wang et al. 7 This method completely lacks the consideration of cycles. Thus, the disproportionate aging of big cycles is missed.
Some touch of cyclic behavior is captured by the simplistic definition of DOD as the difference of SOC between the beginning and the end of a control step in Yallamilli et al. 8 However, this method still lacks the identification of extrema and actual cycles. Thus, on the one hand, cycles which may be located within the considered control step are missed entirely. On the other hand, the aging effect of cycles which span over multiple control steps is highly underestimated.
This limitation to fixed evaluation samples is not present in He et al, 9 where the location of extrema is identified on a historic SOC trajectory and used in the control problem. However, here DOD is simply defined as the SOC difference between two adjacent extrema. This definition corresponds to Simple Range Counting, which is standardized in ASTM. 4 As a consequence of this definition, no nesting of small cycles within big cycles can be considered. Instead, a big cycle is split into two cycles by a nested small cycle, and thus will be underestimated again in terms of aging.
The same simplistic definition of DOD is used in Koller et al. 10 Unlike the preceding formulation, here identification of extrema is based on the predicted SOC trajectory, and is implemented by a switched linear dynamic subsystem within the MPC. Even though the above drawback of aging underestimation persists, this formulation can be classified as sufficiently advanced, and thus is utilized as comparison object in the present work.
Proper cycle identification by the rainflow algorithm in offline optimal control of a battery is presented in Shi et al 11 . Further, this work provides an excellent proof of convexity of the rainflow algorithm which guarantees reliable convergence of the considered OCP. However, computational times will be high due to the slowlyconverging gradient-descent method.
In order to enable online implementation, a simple control policy with upper and lower SOC thresholds is derived from above offline method in Shi et al 5 and Xu et al. 12 The distance of these SOC thresholds represents a DOD which is cost-optimal between mismatch penalty and aging cost. While this cost-optimal DOD can be calculated explicitly a priori, the actual threshold levels arise online. Benefits of this policy are its simplicity and a quantified optimality gap to the offline method. Furthermore, these thresholds enable carrying along reduced information about SOC history, whose benefit will be explained later. As drawbacks, the policy does not seem to support more realistic battery dynamics where SOC is not a simple integral of power, or other cost terms than mismatch penalty and cyclic aging. Thus, this policy is not suitable if also calendric aging should be considered, or if a hybrid energy system controller is pursued.
In Loew et al, 13 an MPC formulation has been presented which allows for RFC evaluation externally from the MPC algorithm, and the inclusion of its results into the MPC via time-varying parameters. Therefore, this formulation is referred to as Parametric Online Rainflow-counting (PORFC). In PORFC, cyclic aging is calculated based on SOC information from the prediction horizon of the MPC. However, cyclic aging is a long-term effect where SOC cycles are usually defined on much longer time spans. Therefore, in Loew et al, 14 PORFC has been combined with the concept of residue to systematically incorporate SOC history. In the same work, this concept has been applied successfully for MPC of a fatigue-affected wind turbine.

| Contribution
In the present work, these controller formulations for wind turbines 14 are extended, and adapted to the requirements and use cases of battery energy storage systems. To the best of the authors' knowledge, this introduces the first model predictive controller (MPC) that considers correctly and without approximation the cyclic aging of batteries.
As a by-product for battery modeling, novel submodels for calendric and cyclic aging are presented which improve consistency and accuracy.
The novel MPC formulation is compared to a conventional rule-based controller and the above mentioned state-of-the-art MPC from Koller et al 10 in a realistic market scenario, showing significantly better economic performance. Furthermore, the accuracy of the cyclic aging cost function, the degree of optimality, and the computational performance of the MPC are assessed in detail.

| Outline
In section 2, a battery plant model with specific focus on aging is presented. This model is the basis for the MPC formulation in section 3 whose key novelty is the direct implementation of cycle identification. In section 4, this MPC formulation is compared with state of the art controllers in a realistic market setting. Furthermore, key features of the MPC are analyzed and validated. In section 5, the work is concluded and an outlook is provided.

| BATTERY PLANT MODEL
The goal of this section is to present the approach used to model Lithium-ion batteries with particular focus on degradation. A grey-box modeling approach is used which involves first-principle models, data-fitted static maps and algorithmic parts. A comprehensive battery storage system model includes several individual sub-models, capturing the electrical, thermal, and aging behavior as shown in Figure 1. These sub-models will be presented in the following.

| Electrical model
A simple electrical equivalent circuit as shown in Figure 2 is chosen. 15

| Thermal model
The thermal model captures the dynamic behavior of cell temperature T (in [K]) due to internal heating and external cooling. The most commonly used thermal model is based on a lumped heat capacitance with the parameters heat capacity C H (in [J/K]) and cooling rate C R (in [J/(Ks)]). Ambient temperature T ambient is an external input. 16

| Degradation model
The degradation model captures the loss in charge capacity and the increase of internal resistance over time and usage. Typically, only the degradation of capacity is modeled, and used to express the degradation of resistance via (A5).

| Charge capacity
Typically, the charge capacity decreases over the lifetime of the battery. By convention, the amount of loss of charge capacity is assigned to two quantities: calendric capacity loss Q cal which occurs intrinsically at any time, and cyclic capacity loss Q cyc which occurs during usage of the battery. Consequently, the instantaneous charge capacity can be expressed by aforementioned quantities. The battery is considered to be at the end of its service life when its capacity Q has decreased to a predefined end-of-life capacity Q eol .  1 In the literature, this square-root dynamical behavior often is modeled in a way which does not suit continuous simulation. 1,17 Therefore, in the present work, the time-derivative of the square-root is proposed for seamless integration into the dynamic model. This term corresponds to a time-variance of the dynamical system. The shift time t shift is added to avoid numerical errors at t = 0. Its value can be very small. Alternatively, the shift time can also be used to account for the previous operation time duration of an already aged battery. The constant A cal As ffi ffi s p h i is derived from experimental data. Its unit has no physical meaning and simply results from the chosen mathematical model.
The temperature effect typically is represented by the Arrhenius law where A arrhenius is a model parameter obtained by experiments, E activation is the activation energy of the cell, and R gas is the Universal gas constant. The SOC effect is represented by a linear fit of experimental data via the parameters K 1 and K 2 .

Concept
Cyclic aging refers to the loss of battery capacity during dynamic operation by repeated charging and discharging.
The major stress factor for cyclic aging is the absolute range of magnitude of the individual SOC cycles, also referenced as "depth of discharge" DOD . 3,11 Other than that, current, temperature, and terminal voltage also affect cyclic damage, but are neglected in the present work. 1,18 The most accepted algorithm for cycle identification from SOC trajectories is the well-known Rainflow algorithm. 4 Identified cycles can be "full" or "half cycles," as visualized in Figure 3. A full cycle refers to a sequence of battery charge and discharge with identical initial and final SOC values. A half cycle refers to only one charging or discharging sequence. Furthermore, cycles can be nested in other cycles. The output of the rainflow algorithm is summarized in Table 1.

Damage and capacity loss
Lifetime of the battery is usually specified as the number of full cycles of a given cycle depth which the battery can provide before reaching end-of-life capacity Q eol . Nondimensional damage of an individual cycle of a specific depth is obtained by the reciprocal of the number full of cycles. For an increased cycle depth, the damage of the battery typically increases over-proportionately. 3 Based on experimental data at discrete DODs, this dependency is represented by a piece-wise linear interpolation Consequently, capacity loss of a specific cycle is obtained by multiplying the constant A cyc = Q nominal − Q eol . Cycle identification requires an entire SOC trajectory as input and therefore originally only can be performed F I G U R E 3 Exemplary result of a rainflow analysis. Sampled SOC trajectory (blue), extrema (red circles). There are three half cycles (green, purple, and yellow) and one nested full cycle (red) batch-wise. The total batch-wise cyclic capacity loss of an SOC trajectory is obtained via linear accumulation of damage of all N c identified cycles. 20 Due to batch-wise execution, unlike calendric aging, the cyclic aging model originally cannot be expressed as a time-continuous dynamic system. Instead, for the present battery model, a time-discrete cyclic aging model will be developed. As a basis for this, in the following, the concept of "residue" will be presented first.

Residue
The SOC values which are extrema, but have not contributed to a full cycle so far, form half cycles. These SOC values can later form a new half or full cycle with the newly acquired SOC samples of subsequent simulation time-steps. This new cycle will have a higher DOD, and thus higher damage, than the original half cycle. 21 Thus, it is important to preserve these half-cycle extrema and merge them with the new SOC samples before performing the next cycle identification. This set of extrema is referred in current work as "residue" SOC residue and plays a central role in accurate identification of cyclic damage. 21 One-step time-discrete model Based on the rainflow algorithm and the concept of residue, a one-step time-discrete model of cyclic aging has been proposed in Loew and Obradovic. 22 This means that by introducing residue as an additional "memory" state, with each new SOC sample an update of cyclic capacity loss can be computed. The procedure is presented in Algorithm 1.
In a mathematical way, time-discrete cyclic aging estimation is expressed by the update of the residue set by the update of cyclic capacity loss of full cycles and by the update of total cyclic capacity loss Thus, the time-discrete equations G cyc,Á represent the algorithmic steps which have been described in Algorithm 1.

| Time-discrete expression of continuous dynamics
Since the cyclic aging estimation is performed in timediscrete fashion, the other submodels of the battery model as well are expressed as time-discrete state updates: This is obtained by application of integrators G int to the respective ODEs. The time t k [s] denotes the absolute time of the current sampling instance k since production of the battery.

| CONTROLLER DESIGN USING PARAMETRIC ONLINE RAINFLOW-COUNTING
The battery model of section 2 consists of mainly continuous submodels; namely the electrical, thermal and calendric aging models. For those submodels, the calculation of sensitivities w.r.t. the control inputs, and the integration of the submodels into gradient-based optimal control is straightforward.
In contrast, the cyclic aging submodel contains the Rainflow algorithm which contains algorithmic branches and loops. Thus, a crucial property of the Rainflow algorithm is its discontinuous output-behavior. Furthermore, the number N c of identified cycles is not known before execution, but bounded by the number of extrema. Therefore, until recently, integration of the cyclic aging submodel into gradient-based Model Predictive Control has not been possible.

| Concept
In Loew et al, 13 the above mentioned obstacles for a direct implementation of RFC in MPC are overcome by the method of Parametric Online RainFlow-Counting (PORFC). In PORFC, all discontinuous parts of the cyclic aging estimation procedure are carried out in a preprocessing step before each execution of the MPC algorithm, as shown in in Figure 4. In order to base this externalized cyclic aging estimation on the same SOC trajectory like the MPC, the preprocessing step has to start with a predictive forward simulation using the same model, sampling and horizon length as the internal simulation of the MPC. Additionally to this prediction, the SOC history is incorporated in the preprocessing via a periodically updated residue SOC residue (see section 2.3.3).
The algorithmic workflow within the controller is as follows: • Simulation: The battery model is simulated over the prediction horizon using the currently measured statesx as initial states x 0 , and the current guess of the optimal F I G U R E 4 Control loop for battery plant simulator and MPC using PORFC control trajectory u guess . Relevant result is an SOC prediction over the entire horizon.
• Merge: The residue is merged with the SOC prediction.
• Rainflow: The Rainflow algorithm is used to identify SOC cycles over this merged trajectory. Consequently, it is assumed that the structure of identified cycles does not change within the upcoming optimization run. The term "structure" denotes here positions (k RFC max,c , k RFC min,c ) and weights (w RFC c ) of cycles. This assumption implies that the controllable extrema in the prediction horizon only can be shifted vertically by the optimization.
• Residue update: SOC cycles can be composed by SOC samples only from residue or prediction, or by a mixture of both. However, only controllable samples within the prediction horizon can be altered by the optimization. Especially the measured initial value at the beginning t 0 of the horizon cannot be altered and, therefore, is added to the residue. If a full cycle is detected entirely within the residue, both contributing values are discarded from the residue. The reason for this is that also in the future they will never anymore form a cycle with a sample from the prediction and, therefore, are irrelevant for the MPC.
• Time-varying parameters: Information from cycle identification is used to fill vectors of time-varying parameters, which are forwarded to the cost function of the MPC. Details on this step are provided in section 3.2.
• Optimization/MPC: In the cost function of the MPC, the parameters are used to time-continuously calculate the cyclic capacity loss over the horizon and accumulate it via integration. Finally, the optimization problem is solved and the resulting control variables are applied to the battery plant.
The simulation flow of the control loop in Figure 4 is as such: Measurement of the plant states and controller execution happen at a rather coarse sample time T cntrl . Within the preprocessing step of the controller, the forward simulation is performed with a fixed step size of T sim,preproc < T cntrl . The PORFC parameters are sampled on the coarse control grid T cntrl since parameters typically only can be inserted into the MPC problem on this granularity for current MPC frameworks. Within the MPCstep of the controller, Multiple-shooting is used where the shooting nodes are defined on the control grid T cntrl . Within each shooting interval, the battery model is simulated on the fine grid T sim,MPC = T sim,preproc . The optimized control variables and shooting states are output from the controller again at a sample time of T cntrl . The first entries of each control variable are applied to the plant.

| Time-varying parameters
Distribution of cyclic capacity loss over time: Since information from cycle identification is supposed to be forwarded to the MPC via parameters, which are timevarying over the prediction horizon, the total cyclic capacity loss has to be distributed over the prediction horizon, as visualized in Figure 5A.
Any particular half or full cycle always corresponds to exactly two "complementary" SOC samples. Consequently, the assumption is used that the capacity loss of a cycle can be evenly split over the corresponding two complementary sample instances.
Therefore, the capacity loss of each SOC cycle is split into two halves, which are allocated to the two contributing SOC samples k RFC max,c and k RFC min,c . For example, cycle 4 is formed by samples k = 5 and k = 8. Their capacity loss terms therefore are allocated to these samples, as shown by the yellow blocks in Figure 5A. This example also F I G U R E 5 PORFC approach for distribution of cyclic damage over time, and setup of SOC tracking goals. A, Distribution of damage per cycle to damage per sample. B, Piecewise constant mean SOCs as tracking goals shows an important property of the rainflow algorithm, which identifies cycle 4 even though it is interrupted by the nested cycle 2.
Although a SOC sample is not allocated uniquely to one identified cycle, it can at maximum be part of two cycles. 11,23 Thus, in a general case, it can be assumed that a particular SOC sample may belong to either none (when it is not an extremum), or one (when it is part of only one-half or full cycle), or two cycles (when it sits at a cycle junction point).
Setup of the parameters: Figure 5B visualizes the generation of the time-varying parameters. Since each SOC extremum belongs to one or two SOC cycles, 11 the Rainflow algorithm provides one or two mean SOC values per extremum. These mean SOC values (M1-M4) are considered as optimization-or tracking-goals for the current MPC-step. For a specific cycle, capacity loss appears twice, once at each of the corresponding sample indices. Thus, the equivalent capacity loss at each sample point has to be halved if both samples lie in the controllable prediction horizon. If, for a given controllable SOC sample, its complementary SOC sample is not controllable (residue or initial value), all damage is allocated to the controllable sample. Here, this is the case for cycle one, where SOC sample k = 0 is not controllable. Therefore, the weight is defined to with an additional dependency on the location of the complementary SOC sample.
Cycle mean values SOC PORFC m,c and cycle weights w PORFC c are collected in the parameter vector which is defined piecewise constant over the control intervals of the prediction horizon. A more detailed derivation and explanation can be found in Loew et al. 13

| Cyclic aging rate
Consequently, the cyclic capacity loss term of PORFC is defined. The capacity loss function is defined by a time-integral over two cost terms, which each represent one potential cycle-contribution of a SOC sample. In a time-differential form, the rate of cyclic capacity loss is expressed by: The cost terms are "switched on" by nonzero cycle weights w PORFC c1=2 . Here, the capacity loss by individual cycles c is defined based on (8) and on a power-function fitting of (7). The fitting by an analytic function with its parameters K 1 and K 2 is necessary to achieve a continuous optimization problem. Furthermore, DOD here is obtained based on the parametric mean SOC values SOC PORFC m,c . As an overview, Figure 6 shows the process flows of the novel online PORFC cyclic aging estimation and the conventional offline batch-wise procedure. As depicted, the main differences of PORFC w.r.t. the conventional procedure are the cycle evaluation per time sample, the usage of a rate of cyclic capacity loss and the accumulation via an integral.

| Optimization problem
The task of the present battery controller is to operate the battery at an economically optimal point between aging cost and power mismatch penalty. Thus the Economic Nonlinear MPC optimization problem is formulated as with the power reference P ref which is variable over the prediction horizon T horiz , the power mismatch penalty w P which can be variable over the horizon, and the aging penalty w B which is constant during the entire simulation. The aging penalty is calculated as with the battery purchase price C purch and the nominal Open Circuit Voltage U ocv,nominal . Each quantity is adjusted to SI-units. In order to avoid a linear cost term of power mismatch, all cost terms in (19) are squared individually. In practical tests, this significantly improved the convergence behavior. The cyclic aging rate _ Q PORFC cyc t ð Þ is obtained by the novel parametric ODE (16). The states of the controllerinternal model f(Á) are defined as x = (SOC, T, ΔQ cal ) T , and are obtained by the ODEs (1), (2), and (3). The state ΔQ cal is initialized to zero since it represents the added capacity loss within the horizon.
The control input u(t) = P(t) is defined by the piecewise-constant charge−/discharge power. The external input d(t) = T ambient (t) is the ambient temperature.
The constraints on the system states SOC and T, and on the input P are denoted as box constraints with respective upper and lower limits. For the power, the limits are time-varying, and depend on the power reference to enforce a meaningful power. For instance, in the case of a discharging reference (P ref (t) ≥ 0) only a positive power of maximum the power reference power can be chosen by the optimization. Outside of these limits following meaningless situations would occur: • A negative power would lead to more cyclic capacity loss than zero power, and additionally cause higher power mismatch penalty.
F I G U R E 6 Process flow of calculation of cyclic capacity loss using PORFC and using the conventional batch-wise procedure (9). The differences are highlighted in bold • A greater power than reference power would lead to more cyclic capacity loss than reference power, and would additionally cause power mismatch penalty.

| RESULTS, ANALYSIS, AND VALIDATION
In this section, the closed-loop behavior of the MPC of section 3 with the battery plant model of section 2 is presented and compared to other controllers in a realistic simulation case. Furthermore, the controller is assessed in terms of optimality, cyclic aging estimation, and computational time. A validation of the plant model with a special emphasis on the calendric and cyclic aging submodels can be found in section A.2.

| Comparison to other controllers in PJM market setting
This section presents a comparison of controllers for the use case of the US-American real-time ancillary service market of the transmission operator "PJM Interconnection LLC." After an overview on the PJM market setting, the controller configurations and the simulation results are provided.

| PJM market setting
In the PJM frequency regulation market, the system operator sends an Automatic Generation Control (AGC) based secondary frequency control signal every 2 seconds to the participating generating units. 24,25 The control signal has two components: the RegA signal is the low-pass and the RegD signal is the high-pass filtered component of the area control error (ACE). RegD exhibits very high ramp rates but is designed to have a zero mean over a certain duration. 24,25 Thus, the RegD signal is ideal for storage units and has been considered as the reference regulation signal r(t) for the present battery system. The economics for providing the frequency regulation service depend on two components 26 : • The power-based option fee which is paid to the battery operator based on the offered regulation power P offered .
• The energy-based mismatch penalty w P [$/Ws] which is paid by the battery operator depending on the mismatch between energy provided and energy requested by the system operator via the RegD signal.
These prices are provided to the generating entities every 5 minutes for the next 5 minutes interval. 24,25 The dynamic performance of the participating entity is further constrained by the system operator using a performance score δ [0, 1], where a value of 1 represents complete tracking of the RegD signal. 24,25,27 The participating entity must maintain a minimum performance score to be qualified for participation in the regulation market. Originally, this score is calculated based on tracking error, response delay, and correlation. 25 In the present work, response delay and correlation are neglected for simplicity. This leads to the simplified per- The key simulation parameters and input signals for the battery plant model and the frequency regulation market are summarized in Table 2.

| Customized PORFC MPC
In the following, the optimization problem (19) is customized to the PJM market setting. The reference power signal P ref (t) = P offered r(t) results from the offered regulation power and the unitless RegD signal. An additional constraint δ min ≤ δ(t) ≤ 1 for the performance score is added to prevent disconnection from the grid (see section 4.1.1). The MPC is implemented in the MPC framework ACADO Toolkit. 28

| Constraint-based controller
The reference controller is designed to follow the power reference and only deviate from it when the SOC-limits are reached. This controller therefore also can be considered as "rule-based."

| MPC with piecewise-affine cycle identification ("PWA MPC")
As a more sophisticated comparison object, the timediscrete MPC formulation of Koller et al 10 is implemented which focuses explicitly on optimal control of battery aging. Here, cycle identification is approximated via a piecewise-affine dynamical system. Therefore, this formulation is referenced as "PWA MPC" in the present work.
In the original work, a Mixed-integer Quadratic Programming (MIQP) formulation is used. In the present work, only the continuous decision on power is required. Therefore, a QP formulation is sufficient. This MPC is implemented in MATLAB using the fmincon optimization solver with two QP steps per MPC execution.
The cycle identification method of this formulation is designed according to the Simple Range Counting of ASTM 4 which is simplistic in comparison to Rainflow counting. Here, this is implemented by adding two extra states DOD = (DOD(k) charge , DOD(k) discharge ) T to the controller-internal system model, which represent instantaneous charging and discharging DODs, respectively. The evolution of these DOD states is governed by a piecewise affine (PWA) dynamical system The input to the PWA system is the change in SOC over the current simulation time step. The dynamics of DOD is defined by the system matrix and the input vector which both are switched by the net battery power P(k). The first two cases for the system matrix lead to growth of the instantaneous (dis)charging DOD. The third case diag(0, 0) results in a reset of the DOD states at each sign change of power. The fourth case holds the DOD constant. Figure 7 shows the evolution of the DOD states for a defined SOC trajectory. Here, following disadvantages of this PWA cycle identification method become apparent: • Cycle growth is terminated even by small nested cycles, for instance once at t = 1400 seconds and several times at t > 4000 seconds.
• The detected maximum DOD of up to 0.12 is much lower than the actual SOC excursion which by visual inspection is greater than 0.2 and defined by the SOC extrema at t = 1900 seconds and t = 5000 seconds.
For a comparison, a second plot in Figure 7 shows DODs which are obtained from the cycle identification method of the battery plant model of section 2 using the standard RFC algorithm. At each sample, that DOD is plotted which is related to the instantaneous SOC sample. Following advantageous behavior can be seen: • Continuation of cycle growth after completion of a small nested cycle, for instance at t = 1600 seconds.
• During discharge at t = 3050 seconds, the discharging DOD exhibits a jump in magnitude since the nested full cycle is completed and cycle identification switches to a former larger cycle. As per PJM regulation, 25 a minimum performance score of 0.4 is necessary to avoid disconnection from the grid.
• As expected, the final charging and discharging DODs related to the same nested cycle are equal, like for instance at t = 2700 seconds and t = 3050 seconds.
• The visible maximum SOC excursion of DOD = 0.2 is reached for a charging half cycle at t = 1900 seconds and for a discharging half cycle at t = 5000.
To conclude, the PWA cycle identification systematically detects too low cycle depths. Consequently, also cyclic capacity loss is underestimated. Furthermore, the present comparison highlights that for correct cycle identification it is necessary to memorize past SOC extrema, as it is done in the novel PORFC formulation.
The optimization problem for the PWA MPC is defined as subject to P k ð Þ = P charge k ð Þ + P discharge k ð Þ ð25bÞ x 0 = SOC 0 , DOD 0 = 0,0 ð Þ T ð25eÞ The decision variables for the OCP are charging P charge and discharging P discharge power which are applied F I G U R E 7 Evolution of DOD states for the PWA and the RFC cycle identification zero-order-hold over the N cntrl steps k of the prediction horizon. Their sum (25b) is the net power of the battery. Both powers are also the inputs to the SOC difference Equation (25d), where the charge/discharge efficiencies are set to η charge = 1 and η discharge = 1 in the present work. In (25e), the DODs are initialized to zero at the beginning of the prediction horizon for each new MPC step. Constraint (25f) enforces a SOC within its limits, and (25g, 25h) enforce meaningful (dis)charging powers analogously to (21). A complementary constraint (25i) is imposed on the system to prevent simultaneous charging and discharging control actions. The cost function (25a) contains three terms for aging costs and one term for power mismatch penalty. The aging cost terms quadratically penalize the present approximate DOD, the deviation of SOC from a SOC reference SOC PWA ref = 0:5 and the (dis)charge powers. Figure 8 shows the closed-loop simulation results for the constraint-based controller, the PWA MPC and the new PORFC MPC. For PORFC, the cases with ("PORFC-R MPC") and without ("PORFC MPC") utilization of the residue are shown. In order to achieve a fair comparison to the PWA MPC with its reduced capabilities, in the PORFC MPC the performance score constraint and the calendric aging cost function are turned off for the present simulation. Additionally, a fixed power deviation penalty of w P = 9.37 Á 10 −9 $/Ws is used.

| Comparison of simplified PORFC MPC to constraint-based controller and PWA MPC
The PORFC(−R) MPCs are parameterized using the physical and economical parameters in Table 2. In contrast, for the PWA MPC, profit-optimal aging weights (w PWA DOD = 10 − 4 ,w PWA SOC = 10 − 9 , w PWA P,wear = 10 − 20 ) are obtained by an extensive parameter study of 126 simulations of 10 minutes each using exactly the power profile of the present test. Interestingly, in essence, these weights switch off the SOC and power cost terms. This indicates that the remaining DOD cost has highest correlation to the true cyclic aging cost; despite abovementioned underestimation of DODs. The mismatch penalty is set to w PWA P = w 2 P = 8:78 Á 10 − 17 . In order to cause significant cyclic aging within the simulation, it is assumed that the evolution of SOC started from SOC = 0 before the current simulation time. Thus, the battery is operating within a big charging halfcycle already at the start of the simulation. This information is used to initialize the residue SOC set SOC residue = 0 of the PORFC-R MPC.
During the simulation, the SOC constraints are not reached. Thus, the power of the constraint-based controller is equal to the power reference. All MPCs deviate from the reference power to a certain extent. The PWA MPC frequently even chooses zero power. In contrast, the PORFC MPC stays very close to the power reference. The PORFC-R MPC follows the discharging reference to a certain extent but does not perform any further charging to avoid further increase of the high charging DOD from the past. This is caused by its awareness of the big charging half-cycle via the residue. Consequently, the PORFC-R MPC stays below the SOC = 0.5 level during the present simulation. However, it is not obvious why it does not follow the charging power at least up to this F I G U R E 8 Comparison of PORFC MPC to piecewise-affine MPC 10 and a rule based controller. For the MPCs: Prediction horizon 120 seconds, controller sampling time 1 second level without incurring high cyclic aging. This effect will be investigated in the future.
The PORFC-R MPC incurs highest cumulative power deviation penalty but also lowest cyclic aging penalty. The PWA MPC reaches intermediate values for both penalties. In terms of total penalty, the PWA MPC improves w.r.t. the constraint-based controller by 10%. The PORFC MPC reaches similar total penalty. The PORFC-R MPC clearly exhibits lowest total penalty which is 23% below the one of the constraint-based controller.
It should be noted that the PORFC(−R) MPCs achieve this performance after straightforward parameterization with given physical and economical parameters. In contrast, the PWA MPC has been extensively tuned specifically for the present case.
To put the penalties of the order O(0.01$/10 minutes) into perspective, it should be noted that the revenue (power-based option fee, see section 4.1.1) reaches up to an order O(10$/10 minutes). Profit is the revenue subtracted by the penalties. As a consequence, the variations in profit for different controllers remain fairly limited for the present PJM market setting.

| Comparison of full PORFC MPC to rule-based controllers
The above test involves several simplifications in the PORFC MPC in order to be comparable to the PWA MPC. Therefore, in the following, the PORFC-R MPC (with residue) is tested in a full configuration with these additional features: • Calendric aging cost function active.
• Performance score constraint active.
• Power deviation penalty factor variable over time.
Four controllers are compared: the "baseline PORFC-R MPC" with a performance score constraint of δ min = 0.4, an "unconstrained PORFC-R MPC" without performance score constraint δ min = 0, the constraintbased controller for power following, and a policy of applying zero battery power.
The present test is extreme in the sense that the variable power deviation penalty is at a very low level. Therefore, pure power following is far from being optimal, and the MPCs are expected to converge to strategies of low or even zero power.
As shown in Figure 9, during large time spans the PORFC-R MPCs do not apply the full reference power, and the baseline PORFC-R MPC hits but does not violate its lower power limit. The unconstrained PORFC-R MPC in general applies low power values to limit the DODs due to cyclic aging, and to remain close to SOC = 0.5 due to calendric aging. Thereby, its power deviation penalty almost reaches the one for zero-power policy. The baseline PORFC-R MPC exhibits an intermediate power deviation penalty. On the other hand, the MPCs are able to drastically reduce battery aging. Since the zero-power policy does not cause SOC cycles, its aging trajectory is at F I G U R E 9 Comparison of PORFC MPC including calendric aging, performance score constraint and variable power deviation penalty factor to two rule-based controllers. For the MPCs: Prediction horizon 300 seconds, controller sampling time 1 second a very low level and only reflects the unavoidable calendric aging.
The behavior of total penalty exhibits that little power following is advantageous, and that the intermediate strategies of the MPCs are superior to the rule-based controllers. More specific, the zero-power policy reduces total penalty by 50% w.r.t. the constraint based controller. The baseline PORFC-R MPC additionally reduces total penalty by 3%, and the unconstrained PORFC-R MPC by 10% w.r.t. the zero-power policy.
Furthermore, this test proves that the present PORFC-R MPC can handle time-varying penalty factors very well. It may be worth mentioning that the penalty factors are interpolated over time, in order to achieve smooth transitions, and to facilitate convergence of the optimization algorithm.

| Degree of optimality
In order to validate the operational performance of the PORFC-R MPC, a comparison to a "one-shot" OCP solution is presented in the following. The latter algorithm has full information about the reference power and penalty costs over the entire simulation duration. Its optimization horizon equals the simulation duration. The MPC problem is an approximation of this OCP, and is based on the formulation in section 4.1.2. Different horizon lengths are tested for the MPC. A standard power reference signal (constant and ramp) over a simulation time of 1 hour is used, as shown in Figure 10.
The OCP solution exhibits a reduced and constant charging power level already at the beginning of the simulation. Exactly in parallel to the reference ramp, the charging power goes down to zero and remains there while the reference power crosses zero. After this crossing the discharging power again increases in parallel to the reference until a new constant power level is reached. In absolute terms, this discharging power level is lower than the earlier charging power level. This difference can be explained by the longer time span of discharging in comparison to charging reference power. By decreasing the applied discharging power the corresponding cycle DOD remains limited.
The MPCs start charging from reference power and gradually converge to the OCP solution as the cycle DOD grows. This difference in initial behavior may be caused by the difference in cost function design, where the OCP penalizes absolute power mismatch and the PORFC-R MPC penalizes a square of power mismatch. Especially the point of reaching zero power is matched very well, although some spikes are visible whose cause is unclear. Due to their limited horizon, the MPCs initially again rather follow the discharging reference power. However, later on, their power drops even below the OCP solution in order to compensate for the already high DOD.
During the discharging phase, the MPCs with longer horizon approximate better the OCP solution. In terms of total economic penalty (mismatch + aging), OCP exhibits best performance, followed by the MPCs in descending order of horizon length. Worst performance results from following the reference power.
Note that the OCP does not converge toward zero power at the end, only because of the fact that its horizon ends at the end of the simulation time t = 60 minutes. In other words, such a constant non-negative power is an optimum control result only for limited time spans.
On a lower level, the degree of optimality of the optimization problems at separate MPC steps is quantified using the Karush Kuhn Tucker (KKT) values. 2 KKT values indicate how close the solution of the constrained optimization problem is to its closest local optimum. Figure 11 shows the KKT values over the duration of the PJM simulation of section 4.1. During most of the simulation time, the KKT values remain below 10 −3 . Therefore, the optimization problems can be considered as fairly converged.
F I G U R E 1 0 Validation of the PORFC-R MPC against a one-shot optimal control solution

| Comparison of PORFC in one-shot fashion to conventional cyclic aging calculation
In section 3.2, it has been explained how cyclic capacity loss is distributed and decoupled over time. In order to check, if this approximation introduces an error, it is compared to the conventional cyclic aging calculation (9). For PORFC, cyclic aging is calculated by Ð t k end 0 _ Q PORFC cyc t ð Þdt using (16). The process flows of the two approaches have been shown in Figure 6.
The present comparison is based on random SOC sets of k set = 10,000 samples each. Consequently, both cyclic aging functions are applied in one-shot to the full SOC sets k = [0, k set ] which results in two cyclic capacity loss values for each SOC set. The relative errors of 100 of those tests are depicted in Figure 12. For all the tests, it turns out that both approaches result in the same total cyclic capacity loss up to machine precision.
Therefore, it can be concluded that the preprocessing step and the cyclic capacity loss dynamics of PORFC do not introduce errors in terms of absolute capacity loss.

| Comparison of PORFC in movinghorizon fashion to conventional cyclic aging calculation
In section 3.1, the PORFC concept is explained of utilizing a residue to correctly estimate cyclic aging in a movinghorizon fashion. In order to validate this approach, in the present assessment, the dynamic evolution of cyclic aging over simulation time for PORFC is compared with the evolution for conventional cyclic aging calculation.
Here, PORFC is evaluated at each time step k on a moving horizon [k, k + N] of N = 100 samples, while the residue is carried along from previous time steps. In contrast, the conventional calculation is applied to all samples [0, k] up to the current one k. Figure 13 shows that in principle both approaches match w.r.t. the dynamics and the end values. The only difference lies in the leading time shift of 100 samples by PORFC. This, however, is expected since PORFC is evaluated on the prediction horizon and therefore outputs a future capacity loss.

| Length of SOC residue
In order for the formulated controller to be implemented on an embedded hardware, the arrays used should have a finite length such that a finite memory space can be allocated on the hardware. Specifically the length of the SOC residue vector, which is used in the preprocessing algorithm, is not clear a priori.
However, numerous practical tests have shown that its length typically remains well below 100 entries. As an example, in Figure 14 the length of the SOC residue even remains at around 10 during a closed-loop simulation of 18 hours. Table 3 shows the average computational time taken by the preprocessing and MPC step to generate the optimal control input for the plant at every simulation step (compare with flowchart of Figure 4). The averaging is done over 600 execution steps each. Results are shown for different combinations of controller sample times and lengths of the prediction horizon. F I G U R E 1 2 Relative error between PORFC cost function in one-shot fashion and conventional cyclic aging calculation As expected, the preprocessing causes less effort than the MPC execution, and an increase in prediction time leads to increases in both modules. The increase is conveniently sublinear for the preprocessing and only slightly superlinear for the MPC execution. In any case the average total computational time is significantly smaller than the sample time, assuring real-time implementability of the present controller. Figure 15A shows that in rare cases the computational times can be significantly higher than their mean values. This is especially true for the preprocessing step. In the present test, the maxima of preprocessing and MPC execution time do not occur at the same time since the maximum of total time is lower than the sum of the individual maxima. However, even if the maxima of preprocessing and MPC execution time occurred at the same time, the total computational time would remain below 50 ms. Figure 15B provides a further breakdown of computational cost within the preprocessing step. Obviously, the presimulation of the battery dynamics consumes most time. However, this part already has high numerical efficiency since it is based on a code-generated integrator from ACADO Toolkit. 28 Concluding, there remains only minor improvement potential in the array-handling of the parameter generation.

| Conclusion
A comprehensive Li-ion battery model including electrical, thermal, and aging submodels has been presented. For calendric aging, a novel time-variant term has been proposed for time-continuous simulation of the squareroot aging dynamics. For cyclic aging, cycle identification and damage accumulation have been cast into a novel time-discrete dynamical model.
Consequently, a MPC comprising an exact cyclic aging cost function has been formulated for Li-ion batteries. This cyclic aging cost formulation is enabled by • Performing a Rainflow-analysis prior to each MPC step based on a pre-simulation of the system dynamics. • Distributing and decoupling of cyclic capacity loss over time. • Memorizing a condensed set of SOC samples from past half cycles.
In a simplified market setting, the novel MPC incurs 13% less total penalty than a state-of-the-art MPC from literature and 23% less total penalty than a conventional rule-based controller. Total penalty here is the sum of power mismatch penalty and aging cost.
In a full market setting with additional constraints, the novel MPC exhibits its ability to adapt even to unusual economic situations, and to outperform manually chosen control policies.
A validation of the cyclic aging cost function shows that correct aging estimation in the MPC is maintained despite the distribution of capacity loss over time and the moving-horizon mode.

| Outlook
The promising results of the present study motivate the following future work: • Integration of the plant model and controller in an in-house hybrid power plant simulation suite.
• Testing with a real-life battery energy storage system.
• Application of the controller formulation to further domains where fatigue or aging play a crucial role for profitable operation.

Internal resistance
The internal resistance of a battery cell is function of SOC, cell temperature T and State of Health SOH. Here, the internal resistance is modeled as a product of the individual dependency functions. The dependency on temperature is modeled using a decreasing exponential function with the constants A r , B r , and C r. 15,29 The dependency on SOC is modeled using a piece-wise linear function based on available battery data. 30 Battery degradation leads not only to reduction in capacity but also to additional increase in the internal resistance. 31 The increase in internal resistance is observed to be approximately linearly proportional to the decrease in battery capacity. 1 Consequently, the dependency of internal resistance on State of Health is modeled as with the proportionality constant β degradation which is obtained from experiments.

Battery current and terminal voltage
By applying Kirchoff's voltage law in the battery electrical equivalent circuit shown in Figure 2 for a given power P, battery current I can be obtained by the solution of the underlying quadratic equation. The terminal voltage U terminal can be subsequently obtained by Verification and validation of the battery plant model

Exemplary simulation output
In order to qualitatively verify the behavior of the submodels, the battery plant model is simulated in open-loop and subject to power steps and ramps. The sample time of this time-discrete model here is set to 1 minute, and the total simulation time to 12 hours. The results of the simulation are shown in Figure A1. Each of the output plots can be divided into six sections (each of duration 2 hours) based on the power input type: In the first section, a discharging power is applied for 2 hours. As a result, the battery SOC decreases and the temperature increases due to resistive heating.
In the second section, an equal magnitude of charging power is applied to the battery for 2 hours which brings the SOC back to the initial value.
In the third section, the battery discharges at a higher power than that of the first section. Therefore, the rates of SOC decrease and temperature increase are higher than that in first section.
In the fourth section, a zero power allows the battery to cool down. Hence, the temperature asymptotically converges toward the ambient temperature. As expected, there is no change in SOC during this time.
In the fifth section, a charging power ramp is input to the battery which shows a gradual increase in SOC, and a smaller initial rate of temperature increase compared with that of section three, where a power step is applied.
Throughout the simulation, the charge capacity of the battery and therefore the State of Health is decreasing. This decrease can be understood better by looking at the individual behaviors of calendric and cyclic capacity loss: Calendric aging, The increase in calendric capacity loss follows a square root behavior with respect to simulation time. According to (3), the slope of this square root function is scaled depending on the temperature and SOC levels.
Cyclic aging, During section one, the battery only discharges, and thus the Rainflow algorithm only calculates one half cycle with linearly increasing cycle depth, and over-proportionally increasing cycle damage. As the SOC decreases stronger in section three compared to section one, the corresponding increase in cyclic capacity loss is also higher. For a constant SOC trajectory (as in section four), the cyclic capacity loss as well is constant because the battery does not exhibit any SOC cycle.
F I G U R E A 1 Open-loop simulation results for the battery storage system model. The signals are grouped into input, states, and intermediate variables Electrical submodel, During the entire simulation, the battery current follows the behavior of input power. The terminal voltage (A7) mainly is governed by the current, and secondarily by the SOC via the Open Circuit Voltage. Here, it is normalized by the nominal Open Circuit Voltage U ocv,nominal . The cell internal resistance (A2) has complex dependencies on temperature, SOC, and state of health as can be seen in the lowest subplot.

Calendric aging
The calendric aging submodel is validated by simulating the battery plant at defined combinations of SOC and ambient temperature. For each combination, the simulation duration is the corresponding expected lifetime. Thus, at the end of each simulation the end-oflife amount of capacity loss should be reached. These simulations are done at zero power such that there is no cyclic aging and thus the resultant total capacity loss is only due to calendric degradation. Furthermore, the SOC and temperature are fixed during the entire simulation.
The results for variations of SOC are summarized in Table A1, and for variations of ambient temperature in Table A2. All results show that exactly the expected capacity loss is reached.
As discussed in section 2, the rate of calendric capacity loss is scaled depending on the instantaneous temperature and SOC. Higher temperature or SOC increase the aging rate. The simulation output in Figure A2 verifies this behavior.

Cyclic aging
The cyclic aging submodel is validated by simulating the battery plant at defined alternating power magnitudes such that the SOC is cycling at fixed DODs. In order to obtain a fixed DOD for every cycle throughout the F I G U R E A 2 Validation of calendric capacity loss at different SOCs and temperature for zero power. A, Calendric capacity loss at different SOCs for a given temperature. B, Calendric capacity loss at different temperatures for a given SOC simulation, the dynamics of calendric aging and temperature are deactivated. For a specific DOD, the simulation is performed until the expected number of cycles to endof-life is reached. Thus, at the end of each simulation the end-of-life amount of capacity loss should be reached.
The results are summarized in Table A3. For all simulations, the expected capacity loss is obtained up to machine precision.