Economic nonlinear model predictive control of fatigue—Formulation and application to wind turbine control

In this article, the estimation of fatigue is implemented in the cost function of a gradient‐based model predictive controller (MPC). This is a challenging problem, because calculating fatigue leads to a non‐standard and discontinuous cost function. Based on a brief previous publication, in the present work the method is derived, explained, and assessed in detail. The key enablers of the proposed method are a sequential implementation of MPC, the periodic substitution of discontinuous aspects of the cost function by linear functions, and the assumption of a sufficiently infrequent switching of this substitution. Fatigue cost gradients are obtained efficiently by automatic differentiation. The method is implemented in an economic nonlinear model predictive controller (ENMPC), which optimally balances revenue and fatigue cost. This novel formulation is applicable to a very wide range of domains, and it is demonstrated here on the control of a wind turbine. The proposed ENMPC is fully parameterized by physical and monetary variables, and outperforms a conventional ENMPC based on the literature. The method is assessed by considering various metrics, including frequency spectra, damage estimation, switching, and gradient dynamics, which together provide useful insight into its main characteristics and an initial assessment of its performance.


Motivation
The phenomenon of fatigue is characterized as damage caused by cyclic loading. Fatigue failure of a component typically occurs after a large number of cycles even at rather low amplitudes and mean load values, well below the admissible ultimate ones. 1

Domain Relevant load state Unit Examples
Moderately loaded mechanical structures (elastic deformation) Mechanical stress Pa Towers and blades of wind turbines 2,3 Highly loaded mechanical structures (plastic deformation) Mechanical strain -Jet engine rotors 4 Power semiconductors (thermo-mechanical variations) Temperature K Insulated-gate bipolar transistors (IGBT) in power converters 5,6 Battery energy storage systems (loss of active material by charging and discharging) State of charge (SoC) % Lithium-ion cells 7,8 Fatigue is relevant to a variety of domains, where it is driven by different system states, as shown in Table 1. The present work considers the structural fatigue of a wind turbine tower. Thus, without loss of generality, mechanical stress is considered as the fatigue-driving quantity throughout this work. Fatigue has a large impact on the investment and operating costs of devices, and thus on economic profit. However, unfortunately fatigue damage cannot in general be completely avoided, and can only be mitigated, for example by active controls or passive design solutions. Model predictive control (MPC) enables the economically optimal management of systems by using predictions of their future response, 9,10 including stress at crucial spots in the device structure. Rainflow counting (RFC) is the standard algorithm used for the decomposition of stress time-series in the estimation of fatigue damage. It should be noticed that RFC is typically a post-processing activity: given a measured or simulated signal, RFC can be used to decompose it in a way that is amenable to the estimation of fatigue-induced damage. Unfortunately, however, RFC is a discontinuous branching algorithm, which is not per-se directly usable within an optimization context, including MPC. 2 Furthermore, RFC requires complete stress trajectories as input, and thus cannot be cast into standard stage or terminal cost functions. Consequently, previous work on MPC for fatigue reduction always has avoided using RFC, as presented in the following.

Previous work
Conventional MPC formulations for fatigue found in the literature can be classified according to the use of an indirect or a direct fatigue metric. With indirect fatigue metrics, instead of actual fatigue damage, only a damage-related value is considered and optimized, as illustrated in the left part of Figure 1. Two main families of methods have been described: • The penalization of deflection rate is a common approach for considering fatigue in MPC. In Gros et al. 11 and Evans et al., 12 this method is implemented for wind turbine towers by penalizing the deflection rate at the tower top. Since deflection correlates with stress, this can be interpreted as damping of stress oscillations. This method does not consider stress cycles per se, and thus cannot establish a direct link to fatigue. However, since this method has served as a common basis for comparison among several publications, it will be used here as a baseline reference under the label "Tower-tip-velocity penalization" (TTVP).
• In spectral methods, damage is approximated by empirical functionals depending on spectral moments of the predicted stress signal. 2 A severe limitation of this approach is the inherent assumption of a narrow-banded Gaussian process, which is often violated in practical cases.
In contrast, direct fatigue metrics evaluate actual fatigue damage, which can be readily converted to monetary fatigue cost, as illustrated in the right part of Figure 1. The main approaches within this family are as follows: • By the serialization of stress cycles, 3  method of ASTM. 14 This serialization is an extreme simplification of RFC, because nesting of stress cycles is not accounted for.
• In Barradas-Berglind et al., 15 hysteresis operators are used to adapt parameters of a cost function in the MPC. However, this cost function essentially penalizes only deflection rates, similarly to the TTVP approach. Additionally, the authors state that the resulting control problem is hard.
• In Collet et al., 16 fixed-point iterations are used to tune a parametric nonlinear relationship of stress variance to actual fatigue at each MPC step. Two advantages of this approach are a solid relation to fatigue and the combination of proven numerical methods. The main disadvantage relates to the numerous required fixed-point iterations, with a consequently high computational cost. 16 • In Luna et al., 17 a surrogate Artificial Neural Network is trained based on damage results from a large number of stress time series. This approach seems to be very promising in terms of correct damage estimation. However, it involves a high a priori effort for the training of the surrogate model, as well as a significantly increased computational load in the MPC. 17 In summary, all available indirect or direct fatigue metrics either do not accurately approximate fatigue damage, or result in high computational load. Since RFC is the de facto standard for fatigue estimation, a direct and non-simplified implementation of RFC in MPC is desirable and promises best results. To the authors' knowledge, the problem of rigorously implementing RFC within MPC was first solved by the same authors in Loew et al., 18 using a novel method termed Direct Online Rainflow Counting (DORFC). However, in Loew et al., 18 only the core idea of the method was mentioned with a minimum level of detail, and brief simulation results were shown.

Contribution
These gaps are closed in the present work. For the first time, a comprehensive presentation of DORFC is provided. The discussion covers the obstacles to the inclusion of RFC into MPC, the solution strategy, the required assumptions, and the inherent limitations of the resulting algorithm. Furthermore, mathematical and algorithmic details are provided here to facilitate a rapid implementation by interested readers. Finally, for the first time several interesting characteristics of DORFC are demonstrated with reference to a complex wind turbine control problem. These characteristics span over the domains of economic behavior, dynamic behavior, online damage estimation capabilities, switching behavior, and gradient dynamics.

Outline
The standard procedure for off-line fatigue estimation is presented in Section 2. Next, Section 3 discusses an implementation of nonlinear MPC (NMPC), which is suitable for fatigue control. Section 4 provides a detailed explanation of the DORFC formulation. In order to give practical insights into DORFC, the method is applied to a wind turbine control problem in Section 5. The DORFC MPC is compared to a conventional MPC in Section 6, leading to an extensive demonstration and quantification of its performance. Finally, Section 7 presents the main conclusions and findings, and directions for further research.

Overview and governing quantities
Fatigue impact of a given stress time-series (k) is characterized primarily by the amplitudes a,c of the N c stress cycles c that the signal contains. Methods for identification of these stress cycles are presented in Section 2.2. For an overview, the cycle-governing quantities are collected in Table 2.
In addition to the amplitudes, stress mean values m,c of the cycles have to be considered, since typically a positive stress mean increases fatigue, whereas a negative stress mean has the opposite effect. In fatigue estimation, the frequencies and actual shapes of stress cycles are neglected. Therefore, all reviewed methods for cycle identification include the reduction of the stress time histories to a series of peaks and valleys (extrema). The calculation of fatigue damage is then based on the identified cycles, as shown in Section 2.3.

Cycle identification
Cycle identification from stress time-series is straightforward if, for example, a simple sinusoid is analyzed. There, amplitude, mean value, and number of cycles are obvious. However, realistic stress trajectories are usually highly complex and contain stress cycles that can be nested.
In the ASTM-E1049 standard, 14 various methods are presented for the identification of stress cycles. These include methods for serial counting of stress-level crossings, stress extrema and stress ranges. For repetitive stress patterns, the Reservoir Method is a sophisticated and intuitive approach. 19 For general non-repetitive stress patterns, the Rainflow Counting algorithm (RFC) is considered as the most sophisticated method. Thus, RFC is chosen for the present work and is expressed as where the weight is set as w c = 1 if a full stress cycle is detected, whereas w c = 0.5 in the case of a half cycle. Full cycles are cosines of a full period, while half cycles are cosines of only a half period. Half cycles therefore represent either a rising or falling transient. As shown in more detail in Appendix A, RFC contains algorithmic branches and loops, resulting in a discontinuous output behavior. Furthermore, the number N c of identified cycles is not known before execution, but can only be bounded by the number of extrema.

Damage estimation
After the identification of stress cycles, the total damage of a stress time-series can be computed as shown by the standard damage program reported in Figure 2a.
The effect of the cycle stress mean on fatigue is considered by correcting the stress amplitude to an equivalent stress via the Goodman equation 20(p. 184) : where R m denotes the ultimate tensile stress. Woehler curves, which can be obtained by ad hoc material experiments, provide the number N fail of cycles to failure for specific equivalent stresses. Consequently, fatigue damage of one cycle is obtained by the reciprocal to the number N fail of cycles to failure, that is, Assuming linear and time-invariant damage accumulation, the Miner-Palmgren Rule 21 is used to compute the total fatigue damage of the stress time series as

NMPC FORMULATION AND IMPLEMENTATION
The original NMPC problem is an optimal control problem over an infinite time horizon. Because this problem usually cannot be solved in closed form, an approximative solution is obtained by optimization over a shorter, moving horizon [t 0 , t end ].

Dynamic system
The present work considers a continuous nonlinear systeṁ comprising the system states x(t), the control variables u(t), and external disturbances d(t). Fatigue is caused by stress (t), which can be one of the states or outputs of the system.

Economic formulation of revenue and fatigue
In order to provide meaningful optimal solutions, the concept of Economic NMPC (ENMPC) is pursued in the present work. An interesting discussion on economic nonlinear model predictive controller (ENMPC) can be found in Gros et al. 11 The core idea of ENMPC is the direct implementation of high-level operational objectives in the cost function. In many cases this is superior to the tracking of state trajectories often used in standard NMPC.
For the present application, the economic cost function J is expressed as This figure of merit is used to trade-off the maximization of revenue J revenue and the minimization of fatigue cost J fatigue by the controller. A weighting factor damage can be used to influence this trade-off. Revenue is formulated as a stage cost term revenue (x(t)) dt.
The formulation of fatigue cost will be presented in Section 4.3.

Formulation of the nonlinear program
The NMPC problem is transformed to a nonlinear program (NLP) by discretizing the time-continuous control variables u(t) into piecewise-constant optimization variables u i,j . Here, the type of control variable is determined by i ∈ N in and the control interval by j ∈ N u . The number of control inputs is denoted by N in and the number of control samples by N u = T horizon ∕T cntrl , with horizon length T horizon = t end − t 0 and controller sample-time T cntrl . The NLP is defined by the minimization of the cost function (6) subject to the system dynamics (5), box constraints on the optimization variables and nonlinear inequality constraints on the system states x k,j , which are sampled at the control intervals j ∈ N u . Here, the individual states are represented by k ∈ N x with the total number of states N x . The above hard nonlinear inequality constraints are sufficient in a setting without noise or plant-model mismatch, as true for the exemplary application of the present work. In the case of significant noise, soft constraints have to be employed to ensure feasibility. 22 In the present work, the Sequential Approach is applied, where the dynamics are solved in a single-shooting fashion before the optimization step. This results in a small-size optimization problem. 23 The simulated dynamics are expressed asẊ and involve four components: the ODEs of the reduced plant systemẋ, given by (5), and of the revenue terṁJ revenue , given by (7), as well as the variational differential equations (VDEs) for the state gradients dẋ du and the revenue gradients ḋJ revenue du . Note that the VDEs make up the major portion of the simulated dynamics since, for example, for the state gradients, each state has to be derived w.r.t. each individual control sample, and thus the dimension is dẋ du ∈ R N x ⋅N in ⋅N u . More details on the present formulation are provided in Löw et al. 24

Real-time implementation
Sequential Quadratic Programming (SQP) is a common method for solving NLPs. Here, the NLP is sequentially approximated by QPs, which is repeated until full convergence and can be very time-consuming. Thus, the present implementation is based on the Real-Time Iteration (RTI), 25 which is inspired by SQP but only solves one Quadratic Programming (QP) problem at each MPC step, as shown in Figure 3. RTI is based on the assumption that the optimization problems vary slowly over time. Thus, the QPs in subsequent MPC steps approximate an SQP. Solving only one QP per MPC step implies that full convergence is in general not achieved. However, starting from a good initialization at the first MPC step, each new operating point is close to the previous one, and each new optimization solution exhibits only limited suboptimality (see Diehl et al. 26 and Gros et al. 25 for further details).
Each QP problem is formulated at the current operating point (x * , u * , d * ), where the cost is approximated by a quadratic function as Here H * is the Hessian, and g * is the gradient of the cost function w.r.t. the optimization variables u. Each QP yields an update of the optimization variables. In the present work, the Hessian H * of the QP problem is approximated at each control step according to the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. 23,27 The BFGS method only requires the current optimization variables and cost gradients, and the Hessian does not have to be explicitly calculated.

Real-time Iteration
The gradients of the cost function (6) w.r.t. all optimization variables are written as and are obtained separately for revenue and fatigue cost. The gradients for the cumulative revenue stage cost write and are only required at the end t = t end of the prediction horizon. These gradients are obtained via the corresponding VDE in (11). The formulation of the fatigue cost gradients will be presented in Section 4.4.
For the present implementation, the algorithms for solving the dynamics and the QP will be stated in Section 5.3.

Obstacles
According to the literature, the following obstacles stand in the way of a direct implementation of RFC in MPC: Obstacle 1: RFC requires the entire stress trajectory over the prediction horizon as an input, 3 and thus cannot be cast into standard stage or terminal cost functions. Stage costs would comprise a summation of state-samples or a time-integral of state-trajectories over the prediction horizon. Terminal costs would be defined as functions of the state samples at the end of the prediction horizon. 28 Obstacle 2: It does not seem possible to derive analytical expressions of fatigue cost gradients g * fatigue , due to the RFC algorithm not having a "closed mathematical form" 2, [29][30][31] Obstacle 3: RFC is a highly nonlinear algorithm, 15 which exhibits discontinuous outputs because of its branches and loops (see Section 2.2).

Solution strategy
The above mentioned obstacles can be overcome by the combined application of three principles, which are labeled here "Separate", "Substitute", and "Switch seldom": "Separate": Obstacle 1, which amounts to a "non-standard cost function", can be overcome if in the MPC implementation the solution of the ODEs and of the QP are separated and serialized, as visualized in the upper part of Figure 4. This is enabled in the present MPC implementation by exploiting the following properties of the Sequential Approach (see Section 3.3): • Trajectories of states x and state gradients dx l (k)∕du i,j are obtained by an ODE solver using single-shooting, and thus are available before the QP solution (see Section 3.3).
• Cost gradients dJ fatigue (t end )∕du need to be determined only at the end of the prediction horizon, but not over the prediction horizon itself (see Section 3.4).
Via the insertion of code in between the QP steps shown in the lower part of Figure 4, the rainflow algorithm is executed (directly) without approximations and (online) within the MPC. This justifies the use of the terms Direct Online in the DORFC name of the algorithm.
"Substitute": Obstacle 2 refers arguably to an absence of a "closed mathematical form", which is questioned in the present work. As shown in Figure A1 in the Appendix, the algorithmic loops in RFC are always executed a finite number of times. Thus, RFC can actually be classified as a "closed" algorithm. Instead, in the present work, the argument of a non-closed form is interpreted to originate from the fact that the discrete execution structure of the RFC algorithm is highly dependent on the input, and thus is not known a priori.

Definition (Execution structure):
The execution structure is composed of the set of consecutive branch decisions.
The input-dependency of the execution structure originates from two properties of RFC: 1. The input stress time-series is reduced to extrema, which clearly depend on the stress time-series itself. 2. The branches and loops in the cycle-counting procedure depend on the locations and values of these extrema.
The input-dependent execution structure of the RFC algorithm still poses a very challenging obstacle, and probably still blocks any attempt at deriving gradients of RFC( ). However, for many practical applications (including the one shown in Section 6.4), it is reasonable to hypothesize that: • The execution structure only changes gradually over time.
• Consequently, the execution structure-and thus also the output cycle structure-can be assumed to remain fixed within each MPC step.

Definition (Cycle structure):
The cycle structure is composed of the number of cycles N c , cycle weights w c , and cycle samples (k max,c , k min,c ).
The assumption of a fixed structure allows for the substitution of RFC( ) by continuous expressions, and the transformation from a discontinuous to a continuous (and differentiable) damage program, as shown in Figure 2.
The substitution is performed for the stress mean m,c and stress amplitude a,c for separate cycles (see Table 2), by using their related maximum and minimum stress samples, which leads to These expressions are continuous and linear, and are assumed to be locally valid for one MPC step.
To conclude, each execution of DORFC consists first of the evaluation of the "Discontinuous fatigue cost program" based on RFC, and second of the evaluation of the "Fatigue cost gradient program" based on the continuous substitutes, as shown in Algorithm 1. The computation steps of the fatigue cost program will be specified in Section 4.3, and of the fatigue cost gradient program in Section 4.4.
"Switch seldom": Obstacle 3, which refers to the "discontinuous output" of the algorithm, is especially relevant for the present work, since the RTI method is used. In fact, RTI requires the cost function and its gradients to only change mildly for successive MPC-steps. This can be achieved if: Algorithm 1. MPC using DORFC Input: measurement of system states, initial guess of control variables Output: update of control variables while true do 1: Receive new initial states from measurement 2: ODEs: Solve dynamics (11) via single-shooting,using current initial states and control trajectories 3: DORFC(1): Evaluate "Discont. fatigue cost program", using rainflow algorithm (1) on stress trajectory 4: DORFC(2): Evaluate "Fatigue cost gradient program", using continuous stubstitutes (16) 5: QP: Set up and solve QP 6: Apply new control variables to plant 7: ODEs + DORFC: Execute steps 2: to 4:,using current initial states and new control trajectories 8: HESS: Update Hessian matrix via BFGS method,using old and new control trajectories and cost gradients end • The sample-time T cntrl of the controller is appropriately low w.r.t. the time constants of the system dynamics (5).
• The discontinuities in RFC switch sufficiently seldom.
This requirement of seldom switches can be linked to the concept of "average dwell-time" in switching cost functionals. 32 In a nutshell, this concept implies that an MPC with switching cost function can remain stable if, on average, there is a sufficient time period between the switching events. Since this requirement only holds "on average", a longer time period without switches may be followed by a time phase of frequent switches. A more precise theoretical analysis is out of scope of the present paper. However, studies using an exemplary system in Section 6 show that indeed changes in the execution structure of RFC happen infrequently enough over the simulation time.

Continuous fatigue cost program
In order to obtain monetary cost functions in the ENMPC, the damage programs are transformed into fatigue cost programs as shown in Figure 2 (programs (a) into (c), and (b) into (d)). This transformation requires a continuous expression for the cost J fatigue,c of separate stress cycles c w.r.t. the equivalent stress eq,c . The following criteria are proposed for this analytical expression: • A zero equivalent stress should result in a zero cost, that is, • There are no healing effects in standard fatigue models. Thus, negative fatigue costs should never be obtained for all possible values of the equivalent stress, that is, for every eq,c ∈ R + 0 . Notice that the Goodman equation (2) does not produce negative equivalent stresses, since the stress mean m,c < R m must not exceed the ultimate tensile stress limit.
• Due to the use of gradient-based optimization, the expression has to be continuously differentiable on eq,c ∈ R + 0 . • Due to the use of a QP formulation, the expression has to be convex, that is, for every ∈ [0, 1] and eq,c1 , eq,c2 ∈ R + 0 .
Polynomials are one class of analytical expressions that fulfills the above mentioned criteria, leading to the following expression for the cost of a cycle: J fatigue,c ( eq,c ) = w c N ∑ n=2 a n n eq,c (20) with a minimum order of n ≥ 2 and non-negative coefficients a n ≥ 0. The cost is weighted by w c = 1 if a full stress cycle is detected, and by w c = 0.5 in the case of a half cycle. The polynomial order and coefficients can be adjusted to various economic fatigue models, which for example can comprise: • Initial capital cost; • Opportunity cost of lost remaining lifetime; • Safety-margin for a guaranteed lifetime.
For the mechanical case, these cost models should be based on a relationship between applied stress and cycles-to-failure, such as the one provided by Woehler curves.
Following the linear damage accumulation approach by Miner Palmgren, 21 the total fatigue cost of a specific stress time series is obtained by the linear accumulation of fatigue cost of all detected cycles c, that is,

Fatigue cost gradient program
Differentiation of the continuous fatigue cost program yields the fatigue cost gradient program ( Figure 2, program (d) into (e)), which can be written as dJ fatigue with Jacobians Gradients of functions and algorithms ("programs") generally can be obtained by numerical differentiation, symbolic differentiation, or automatic differentiation (AD). Numerical gradients of nonlinear programs, for example, via finite differences, can be expensive and may result in loss of accuracy.
Symbolic gradients, on the other hand, are also unsuitable for the present problem: either a large number of symbolic gradients would have to be pre-computed offline for all possible execution structures of the rainflow algorithm, or the symbolic gradients would have to be composed online by computer algebra. The latter method would need to be executed at each controller step, which can be expensive for real-time applications.
In contrast, AD obtains exact numeric gradients online in a step-wise manner by following the chain rule. The forward execution of the original program ( Figure 2c) determines which steps are taken, that is, which execution structure of RFC can be currently observed. Using the numerical results of each step of the program, gradients are obtained either by Forward AD or by Reverse AD. Simply put, these two modes stem from different applications of the associative and distributive law, and differ in efficiency depending on the problem. In the present work the fatigue program only has one output, represented by total fatigue cost J fatigue , and a large number of input optimization variables u i,j . In such situations, Reverse AD is typically more efficient. 33 Accordingly, calculations are executed top-down as shown in the "Fatigue cost gradient program" in Figure 2e.
AD requires a library of gradients either for elementary mathematical operations or, as in the present case, for the chain elements of the differentiated program (22). In the following, analytical gradients for all chain elements are presented: • Since linear damage accumulation (21) A gradient w.r.t. weight w c is not required, since w c is part of the cycle structure and thus fixed for the current MPC step.
• Partial differentiation of the Goodman equation (2) w.r.t. the stress trajectory samples. It should be noted that each row of the resulting Jacobians will contain two non-zero entries, since each stress cycle c is defined by two stress samples. For instance, for c = 3 identified stress cycles and k = 6 stress samples, the Jacobian a ∕ will take the following form: • The stress gradients d (k)

EXEMPLARY APPLICATION
Performance of MPC with DORFC is evaluated using the reference wind turbine developed within IEA Wind Task 37, as sketched in Figure 5. A detailed presentation of the model is provided in Appendix B.
F I G U R E 5 Sketch of the LiDAR-equipped wind turbine model with ambient wind speed V w , generator torque T g , rotational speed of the rotor r , aerodynamic torque T Q , displacement of tower tip d T , aerodynamic thrust force F T , collective blade pitch angle b and tensile stress at tower root

Idealizations
The present work is focused on the assessment of the key characteristics of DORFC. In order to limit the complexity of the analysis, the following idealizations are used: • The simulation (plant) model is the same as the internal MPC model, that is, there is no model mismatch.
• All system states (B1) are assumed to be measured exactly and without noise.
• An exact prediction of the wind is assumed within the MPC horizon.
• The MPC is assumed to operate without execution delays. This implies that acquired measurement signals are synchronous with output control signals.
• No mechanical and electrical energy-losses are considered.
Particularly the first three hypotheses neglect noise, biases, and uncertainties, thereby eliminating the need for a state estimator and soft constraints in the MPC. A more realistic and complete simulation and controller setup without the above idealizations is part of ongoing research.

Cost functions
The primary objective of wind turbine operation below rated wind speed is to maximize the extraction of energy from wind, while respecting various system constraints. This is mainly achieved by adjusting the generator torque, while the variation of blade pitch angle typically is low or zero. Above rated wind speed, the goal is to track the rated electrical power, mainly by varying the pitch angle. However, perfect tracking is not possible due to the variability of the exciting wind. For instance, for a drop of wind speed, the pitch angle has to follow as rapidly as possible in order not to excessively affect energy extraction. This behavior can be achieved either by a power tracking cost function, 34 or by the combination of energy maximization and a power upper constraint. The latter was chosen in the present work. Consequently, the cost function (6) of power maximization and fatigue cost minimization is universally effective over all wind speed regions, from cut-in to cut-out wind speed.
The interaction of the rotor with the wind induces a thrust force that -among other effects-excites fore-aft oscillations of the tower. These oscillations result in cyclic stress and thus in fatigue at the tower base. To mitigate this effect, the turbine controller can be designed to achieve a second objective, that is, the minimization of fatigue. Clearly, fatigue damage occurs at other wind turbine components, and may drive their design. For simplicity, however, the present work only considers fatigue at tower base.
In the following, the above mentioned objectives are cast into the economic cost function (6) for the MPC formulations TTVP and DORFC: (1) For TTVP, energy-based cost terms are considered. Therefore, following Gros et al., 11 revenue is expressed as an integral of extracted aerodynamic power Fatigue cost is represented by the average kinetic energy of tower oscillations, which is obtained as using the lumped mass of the tower m T and a normalization by the horizon length T horizon .
for a fixed electricity price of J elec,kWh = 0.1 e/kWh. By the term at the denominator, the electricity price is converted to SI-units [e/(Ws)]. Fatigue cost is based on the Woehler curves log 10 ( eq,c ) = − 1 m (log 10 N fail − log 10 N slo ) + log 10 eq,slo , which are a function of the number of cycles to failure N fail . The specific curve for the present wind turbine tower is obtained by the Woehler parameters of Table 3, and it is displayed in Figure 6. The damage model is obtained from the Woehler curve shown in (3). Using this damage model and an exemplary initial capital cost (ICC) J machine = 4 ⋅ 10 6 e, the Woehler-based fatigue cost of separate stress cycles can be calculated to bẽ as shown in Figure 7. For the definition of the MPC problem, a polynomial fatigue cost function according to (20) is required. An approximation by a 5th-order monomial is used here, resulting in the expression J fatigue,c ( eq,c ) = w c a 5 5 eq,c (32) with coefficient a 5 = 6.79 ⋅ 10 −10 . Due to this parameterization, the approximation is always equal to or above the original Woehler-based fatigue cost function, as shown in Figure 7. The corresponding MPC formulation is termed in the following DORFC-5. Due to the use of QPs, a second cost function based on a 2nd-order monomial is also tested, resulting in the expression J fatigue,c ( eq,c ) = w c a 2 2 eq,c (33) with coefficient a 2 = 7.38 ⋅ 10 −5 . Using this value of the coefficient, the gradient of (33) matches the gradient of the Woehler-based fatigue cost function at the frequently occurring equivalent stress level of eq,c = 35 MPa. The corresponding MPC formulation is termed in the following DORFC-2.

Controller and simulation settings
For the present work, an in-house developed MPC implementation is used, where the dynamics are solved by a 4th-order Runge-Kutta integrator, and the QPs by the interior-point algorithm of quadprog. 35 A controller sample-time of T cntrl = 0.2 s, a fixed integrator step-time of T integr = 0.005 s and a prediction horizon of T horizon = 8 s are set. This horizon length was chosen to obtain several stress cycles within the prediction horizon. Considering that Lidars are capable of scanning at distances of 200 m and more in front of the turbine, 36,37 10 s long wind predictions are available without the need for extrapolation, even at high wind speeds of 20 m/s. The number of control variables within the horizon is N u = T horizon ∕T cntrl = 40, since the control horizon equals the prediction horizon. The present implementation is not capable of real-time execution with these settings, but achieves real-time performance with half the horizon length on a standard workstation equipped with an Intel® Core i7-6820HQ CPU and a 16 GB DDR4 RAM. 18 Each individual simulation is conducted using 645 s of turbulent hub-height wind data, where the first 30 s and the last 15 s are neglected in the post-processing phase.
Since the controller and plant models are identical in the present implementation and all measurements are exact and without delays, it was verified that all MPCs are able to almost exactly eliminate all tower oscillations when a smooth rotor-equivalent wind speed is used as input. To avoid this situation, the more turbulent hub-height wind speed signal is used, which essentially amounts to the introduction of a stochastic disturbance in the controller.

RESULTS AND DISCUSSION
Three different MPC formulations are applied to the exemplary wind turbine problem: the conventional TTVP, and the two versions DORFC-2 and DORFC-5 of the novel method, respectively with an approximate and a physically correct fatigue exponent of m = 2 and m = 5. For the strongly convex cost function of DORFC-5, the MPC initially exhibited a noisy behavior and deteriorated performance. This effect was eliminated by reducing the step length of the QP output from the standard Δu to 0.3Δu.

Economic controller behavior
The tuning of the fatigue weight damage and the economic behavior over different reference wind speeds has already been discussed in Loew et al. 18 Therefore, only a brief summary is provided here. The resulting fatigue cost J fatigue,real of the plant is obtained by post-processing via offline RFC. Fatigue damage is converted into fatigue cost by multiplying by an ICC of 4 ⋅ 10 6 e. Energy capture is converted into revenue by multiplying (after appropriate unit conversion) by an electricity price of 0.1 e/kWh. Profit is revenue minus fatigue cost.

Performance as a function of tuning weights
The present turbine has a rated wind speed of V w,rated = 9.8 m/s. Tuning of the controller was obtained by considering Design Load Case (DLC) 1.1 38 at a wind speed of 9 m/s. This wind speed was chosen because it is close to the rated one, and thus ensures frequent transitions between the partial and the full load operating regions. As shown in Figure 8, maximum profit is reached for TTVP at a fatigue weight of damage = 2000, for DORFC-2 at damage = 1, and for DORFC-5 at damage = 10. It should be noted that the weights for DORFC are in the order of O(1), demonstrating that the monetary cost formulation renders tuning easier or almost unnecessary.

6.1.2
Performance as a function of wind speed As shown in Figure 9, DORFC reaches higher revenue and profit for almost all wind speeds. However, just as in Loew et al., 18 these advantages remain below 1%. The reason mainly lies in the perfect match of the MPC-internal and plant models used in the present simulation setting. Because of this, all MPC formulations are able to suppress tower oscillations without significantly reducing energy capture. Furthermore, the low tower stress oscillations lead to a fatigue cost that is about 3 orders of magnitude smaller than the revenue. Both DORFC formulations exhibit a similar profit behavior. However, this is achieved by different strategies. While DORFC-2 increases revenue only to a moderate extent and decreases fatigue cost by up to 10%, DORFC-5 tolerates significantly higher fatigue in order to maximize revenue.
Regarding actuator usage, DORFC-2 exceeds TTVP in numerous velocity ranges, while DORFC-5 decreases generator torque travel by about 20% and blade pitch travel by about 10% on average.

Dynamic controller behavior
For the MPC without fatigue penalization, there are severe oscillations in the tower root stress, as shown in Figure 10. When the conventional cost formulation of TTVP is used, a good damping of turbine tower oscillations is achieved. By using DORFC, an unusual further modification of the stress trajectory is achieved: in fact, plateaus in the stress trajectory are visible in Figure 10, for example, at t = 82 s and t = 89 s. Although these plateaus only result in a minor reduction of stress cycle amplitudes, the effect on fatigue damage is strong due to the superlinear relation of stress to fatigue. These In order to verify this statement, the amplitude spectra of the stress at tower root are compared for different controllers in Figure 11. For DORFC, different fatigue exponents m and fatigue weights damage are presented. Each spectrum is based on an average over 12 Fourier-transformed simulation outputs of 600 s of length each. For this comparison, the base configuration of DORFC-2 is re-tuned to a fatigue weight of damage = 0.8, in order to achieve identical fatigue cost as TTVP.
For the MPC without fatigue penalization, a clear resonance peak is visible at the first fore-aft eigenfrequency f eigen,T,1 = 0.277 Hz of the tower, which is represented by the vertical dash-dotted line in Figure 11. All MPC formulations equipped with TTVP or DORFC are able to eliminate this peak.
However, despite TTVP and the base configuration of DORFC resulting in the same fatigue cost, their spectral behavior is quite different. In fact, TTVP exhibits a deep notch-filter-like decrease of amplitude around the first eigenfrequency of the tower. TTVP focuses on this frequency range, since it contains much stress content at high frequencies and thus at high stress rates, which are explicitly penalized in the cost function. In contrast, DORFC-2 and DORFC-5 exhibit a broader frequency range of moderate amplitude decrease. This behavior is even more pronounced in the additional spectrum plotted in the same figure for a higher fatigue weight damage = 10 and DORFC-2. This demonstrates that DORFC is focused on the reduction of stress amplitudes, irrespective of the frequency where they occur. This conclusion is in line with the above statement regarding the origin of stress plateaus.
To complete the picture of spectral behavior, it is discussed how the controllable frequency range is limited by the controller settings (see Section 5.3):

Validation of damage estimation
For a verification of the online damage estimation capability of DORFC, its online fatigue estimate J fatigue,estimated is compared to an a posteriori fatigue estimate J fatigue,real , which is obtained via the original RFC and represents the fatigue accumulated over the entire simulation. Similarly, the online estimate J fatigue,estimated is obtained via an accumulation over the simulation time [T sim,start , T sim,end ] of 600 s and is computed as Here, J fatigue (t cntrl , t end ) is the predicted fatigue cost (21) at a controller execution instance t cntrl within the simulation time and at the end of the respective prediction horizon t end . These individual fatigue costs are normalized by the number N u of control intervals within the MPC horizon. This normalization is required, since the predicted fatigue cost J fatigue (t cntrl , t end ) represents fatigue over the entire horizon T horizon , while the accumulation is over simulation time and thus requires only the average fatigue value over one controller execution step T cntrl = T horizon N u . In the following, the ratio J fatigue,estimated ∕J fatigue,real of estimated to real fatigue cost is assessed for variations of wind speed and fatigue weight.
6.3.1 Variation of wind speed Figure 12 shows the fatigue estimation ratio J fatigue,estimated ∕J fatigue,real over more than 200 simulations at different wind speeds. For both formulations DORFC-2 and -5, the estimation ratio increases with wind speed and spans about 2 orders of magnitude. The ideal ratio of 1 is met for DORFC-2 at low reference wind speeds, and for DORFC-5 at high reference wind speeds. This deserves some attention, as estimation ratios of 1 or more are not to be expected. As shown in Loew et al., 13 a significant portion of stress cycles has a time range that is longer than the present MPC horizon of 8 s. Therefore, since the MPC only sees a part of the cycle stress amplitude, it should underestimate their fatigue impact. Two mechanisms of overestimation may compensate this effect: • One reason for the high estimation ratio may be the presence of strong oscillations in the predicted stress signal, when an abrupt excitation (e.g., by a fast wind gust) enters the prediction horizon "from the right-hand side" at t = T end . These strong oscillations in reality do not occur at the turbine (plant), since within the subsequent QP steps the MPC generates suitable control signals to attenuate the excitation until it reaches the beginning of the horizon at t = t 0 . Thus, large cycles that only occur in the prediction lead to an overestimation of fatigue cost.
• Another reason, which only applies to DORFC-2, may be the approximation of fatigue cost by a monomial of exponent m = 2, as shown in Figure 7. This function overestimates fatigue cost for short cycles of low equivalent stress, and underestimates fatigue cost for long cycles. Thus, fatigue cost is overestimated for the rather small cycles that are visible in the prediction horizon.

Variation of fatigue weight
The fatigue weight damage is influencing only indirectly the computation of fatigue J fatigue,estimated in the MPC. Via the optimization, a higher fatigue weight typically leads to smaller stress cycles, and thus also affects the estimation ratio.
Consequently, the question arises if fatigue weight tuning implicitly aims for a certain estimation ratio. Figure 13 shows the fatigue estimation ratio J fatigue,estimated ∕J fatigue,real for different fatigue weights from the fatigue cost tuning setting of Section 6.1. The first observation is that for DORFC-2, the estimation ratio remains constant over various orders of magnitude of fatigue weight, and reaches a local maximum at the optimum weight of damage = 1. In contrast, DORFC-5 exhibits a higher slope and no extremum at the optimum weight of damage = 10. The second observation is that the estimation ratio at optimum fatigue weights deviates significantly from the value of 1 for both formulations. From these observations, it may be concluded that optimum tuning does not imply meeting a certain or ideal estimation ratio.

Seldom switching and continuous substitution
As shown in Section 4.2, the rainflow algorithm RFC( ) is substituted by the continuous expressions (16) for the calculation of gradients in DORFC. The underlying assumption is that the execution structure of RFC, and thus the output cycle structure, switches only seldom. In the following, the validity of this assumption is assessed over subsequent MPC steps and within individual MPC steps. The former is required for the applicability of the RTI approach.
For the present assessment, the controller sample time was extended to T cntrl = 0.25 s and the prediction horizon was extended to T horizon = 15 s, in order to accommodate even more stress cycles.

6.4.1
Assumption of seldom switching cycle structure The most crucial aspect of the cycle structure is the sample indices of maxima k max,c and minima k min,c of detected cycles. Figure 14 shows these indices for the three cycles of highest fatigue cost for every execution of DORFC over an operation period of 10 s. As shown in Figure 4, for each MPC step there are two evaluations of DORFC that occur simultaneously. In the present visualization, every second evaluation result of DORFC is shifted in time by one half controller sample time, which leads to a virtual DORFC sample time of T cntrl ∕2 = 0.125 s.  Within the initial four seconds, the positions of the detected cycles approximatively follow the evolution of operation time, which is an expected behavior. This corresponds to a shift towards the beginning of the prediction horizon, and thus a decrease of the respective sample indices by one for every MPC step. Correspondingly, the slopes are approximatively equal to −4 samples∕s. Another expected result is the null slope at the upper and lower borders of the horizon, where new cycles enter or exit the horizon. For example, at k = 1 the exit of the initially most damaging cycle from the prediction horizon can be observed by a switch from Maximum 1 over Maximum 2 and Maximum 3 to disappearance.
Continuity of the cycle identification is interrupted by only a few switching events, especially for the period 85 s < t < 88 s. However, it can be observed that maxima and minima maintain their respective nature, and are only assigned to different cycles. Thus, the directive to the optimization algorithm to decrease or increase the corresponding stress values will not change.

Substitution by continuous functions
In order to check the validity of the continuous substitution within one MPC step, the outputs of the continuous expressions before the QP step are compared to the outputs of RFC after the QP step, as shown in Figure 15.
In order to focus on the change in cycle structure, the comparison is based on a common stress trajectory from the second ODE evaluation within the MPC step. Thus, this stress trajectory is obtained using the same initial states but updated control variables u i,j w.r.t. the first evaluation of DORFC. This stress trajectory is then input into the continuous expressions ( m,c,1 , a,c,1 ) created before the QP step, and to the rainflow algorithm executed after the QP step. Figure 16 shows that the ratios between the outputs of the continuous expressions and RFC are typically very close to the ideal value of 1. All amplitude and stress mean results are deviating by less than 3%, except for one sample at t = 81.5 s. At this instance, stress amplitudes are slightly underestimated, whereas stress mean values are slightly overestimated by the continuous expressions.
In conclusion, the use of continuous substitutes of the rainflow algorithm is considered as justified and amply acceptable, at least as far as the present application is concerned.

Fatigue cost gradients
The results discussed so far show good performance of the MPC equipped with DORFC. However, switches in the fatigue cost gradient program can lead to excessive changes of the optimization problem for successive MPC steps. These changes can lead to convergence problems for the optimization algorithm, since RTI does not treat an approximate SQP in this case. Therefore, in the following, the continuity of the fatigue cost gradient is assessed, and the computation of the fatigue cost gradient is validated.
6.5.1 Assessment of the fatigue cost gradient dynamics Figure 17 shows time-series of the fatigue cost gradient dJ fatigue ∕du 1,1 w.r.t. to pitch angle (i = 1) in the first control interval of the prediction horizon (j = 1). Due to its position at the beginning of the prediction horizon, there is a high influence of this pitch control variable on the response of the system behavior, and thus on fatigue cost. For TTVP, almost sinusoidal oscillations of the fatigue cost gradient can be observed, at a frequency similar to the first eigenfrequency f eigen,T,1 of the tower.
For DORFC, significant variations in the fatigue cost gradient can be observed at higher frequencies. High-frequency content of the fatigue cost gradient signal could in principle stem from the discontinuity of the RFC algorithm. However, for the assessed time interval, the detected switchings in cycle structure visible in Figure 14 do not seem to correlate with the variations in the fatigue cost gradient signal.
Another reason for the presence of high frequencies in the fatigue cost gradient can be its inherent static nonlinearities (24) and (25). The output signal of a highly nonlinear function can contain a wider-band response than the input, 40 and especially it may contain higher harmonics. 41 Part of future research will focus on the reduction of this high frequency content.  For validating the AD procedure within DORFC, gradients of fatigue cost are calculated by FD, as shown in Figure 17. AD and FD are in good agreement if a fixed FD step-size of u 1,1 = 10 −12 rad is chosen. This step-size is sufficiently small, because the control variable of pitch angle usually varies in the order of u 1,1 = [10 −2 , 10 −1 ] rad. If a larger FD step-size of u 1,1 = 10 −3 rad is chosen, some information is lost in comparison to the locally exact AD output of DORFC. This can be seen for example, at t = 81 s and t = 84.5 s.

CONCLUSION AND OUTLOOK
The novel fatigue cost formulation Direct Online Rainflow Counting represents a fundamental advance in the control of fatigue. In the present work, for the first time, a comprehensive explanation and assessment of this formulation have been provided. Three obstacles have been identified in the literature, that seemed to prevent the direct implementation of fatigue estimation in MPC: • Non-standard cost function; • Non-closed form of the algorithm; • High nonlinearity due to discontinuities.
To overcome these obstacles, the new DORFC algorithm has been proposed, which works by • Separating the solution of system dynamics and optimization problem, in order to allow for the "surgical" inclusion of specialized code for the calculation of gradients; • Substituting the discontinuous algorithm by linear functions at each MPC step; • Switching the discontinuities only seldom, by choosing a sufficiently low controller sampling time, and by ensuring a sufficiently long "average dwell-time" between switching events.
The present work has also discussed how to embed DORFC within an MPC framework, how to define appropriate cost models, and how to compute the fatigue cost gradients.
Application to a wind turbine control problem has shown that the novel DORFC MPC can achieve slightly higher profit at significantly lower actuator usage in comparison to a conventional MPC. Additionally, finding an initial guess for the tuning weight is much easier for DORFC.
A comparison of the dynamic response has shown that DORFC is able to reduce stress amplitudes by introducing plateaus in the stress trajectory. Frequency spectra have shown that DORFC beneficially focuses on the reduction of stress amplitudes, irrespective of the frequency where they occur.
An assessment of online damage estimation capabilities in DORFC MPC has revealed that estimation ratios are at meaningful levels, and that optimal cost tuning does not seem to imply meeting a certain estimation ratio.
An investigation into the discontinuous behavior of the cost function has shown that switching events happen sufficiently seldom. Furthermore, high-frequency behavior in the fatigue cost gradient do not seem to correlate to switching. Finally, the correct implementation of the fatigue cost gradient has been verified by the FD method.
Further investigations on the method of DORFC are planned, and in particular the focus will be directed to: • Investigating the effects of uncertainties and robustness especially by considering the presence of noise in the measurements and of a plant-model mismatch; • The simultaneous consideration of fatigue cost for multiple components of the turbine; • The application to other devices, as for example batteries and power semiconductors. As a first step, a derived formulation from DORFC has been applied successfully to a complex hybrid energy system comprising a wind turbine and a battery energy storage system. 42

ACKNOWLEDGMENTS
The authors would like to thank the German Federal Ministry for Economic Affairs and Energy for funding the MIS-TRALWIND project. Furthermore, the authors would like to thank Elisabeth Gruber for her careful review of the manuscript. Tower-tip-velocity-penalization VDE Variational differential equation

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.  Figure A1. At the beginning of the algorithm, RFC receives as input a stress trajectory and extracts its reversals (extrema). Throughout the algorithm, reversals are read consecutively from left to right. Each new reversal is stored in an operational memory. From this memory, cycles are identified based on a triplet of reversals. The rainflow algorithm contains four main loops.

A.2 Example
The left part of Figure A2 shows an exemplary stress trajectory and its extrema. Using this series of extrema, the Three-Point Algorithm identifies three minor full cycles and six half cycles as shown in the right part of Figure A2 and in Table A1. The half cycle c = 5 has the most impact on fatigue and contains the nested full cycle c = 4.

B.1 System quantities and optimization variables
The states of the wind turbine are collected in the state vector x(t) = ( r (t), d T (t),ḋ T (t), b (t),̇b(t), T g (t) ) T , which is composed of the rotational speed of the rotor r (t), the fore-aft deflection of the tower tip d T (t) and its time derivative, the collective pitch angle of the blades b (t) and its time derivative, and the generator torque T g (t). Stress at tower root is not a state of this simplified model. However, it is assumed that the stress at tower root is linearly dependent on the tower tip deflection d T (t), resulting in (t) = K d T (t). The control variables for the subordinate controllers of the corresponding actuators are u(t) = ( b,d (t), T g,d (t) ) T where b,d (t) is the desired blade pitch angle and T g,d (t) is the desired generator torque. The control variables are defined as piecewise-constant time functions u 1,j = b,d,j and u 2,j = T g,d,j . This leads to the MPC optimization variable vector u = ( u 1 , u 2 ) T . Excitement and disturbance d(t) = V w (t) of the system stem from the ambient wind speed V w (t), which is assumed to be perfectly measured by a forward-looking LiDAR.

B.2 System dynamics
The generator and pitch actuator dynamics are modeled as linear first order (PT 1 ) and second order systems (PT 2 ), respectively. Nonlinearity in the system model stems from the aerodynamic torque T Q (t) = T Q ( r (t), V rel (t), b (t)), and the aerodynamic thrust force F T (t) = F T ( r (t), V rel (t), b (t)).
The aerodynamic loads T Q and F T are modeled via feed-forward neural networks, trained on samples at stationary operating points calculated by the aeroelastic simulator Cp-Lambda, 44 as presented in Löw et al. 24 The aerodynamic loads depend on the rotor-relative wind speed where the external rotor-effective ambient wind speed V w (t) is combined with the tower tip speedḋ T (t). In turn, torque and thrust affect the drive train dynamicṡr and the tower fore-aft dynamicsd as shown in Figure B1. The model parameters include the lumped inertia J tot of rotor and drive train, and the lumped mass m T , damping c T and stiffness k T of the tower.