Anderson-accelerated polarization schemes for fast Fourier transform-based computational homogenization

Classical solution methods in fast Fourier transform-based computational micromechanics operate on, either, compatible strain fields or equilibrated stress fields. By contrast, polarization schemes are primal-dual methods whose iterates are neither compatible nor equilibrated. Recently, it was demonstrated that polarization schemes may outperform the classical methods. Unfortunately, their computational power critically depends on a judicious choice of numerical parameters. In this work, we investigate the extension of polarization meth-ods by Anderson acceleration and demonstrate that this combination leads to robust and fast general-purpose solvers for computational micromechanics. We discuss the (theoretically) optimum parameter choice for polarization methods, describe how Anderson acceleration fits into the picture, and exhibit the characteristics of the newly designed methods for problems of industrial scale and interest.


State-of-the-art
For solving the (mechanical) corrector problem arising in periodic and stochastic homogenization problems, methods based on the fast Fourier transform (FFT), pioneered by Moulinec and Suquet, 1,2 were identified as fast, robust, and versatile. As they operate on a regular voxel grid, such schemes are directly compatible to modern imaging techniques, such as microcomputed tomography, and do not depend on creating interface-conforming meshes for complex microstructures. Based on a formulation that is naturally matrix-free, and, therefore, permits a memory-efficient implementation, handling industrial-scale microstructures with a large number of degrees of freedom is possible. A key ingredient for the solvers' computational efficiency in practice is the quality of available FFT implementations. 3 From the very beginning, FFT-based solvers supported nonlinear material models, and their range of applicability was extended subsequently. This encompassed characterizing polycrystalline metals micromechanically, including effective properties, 4,5 stress localization, 6 the formation of slip bands 7,8 and fatigue-lifetime estimation, 9 the mantle flow of geophysical minerals, 10 homogenization of the elastic 11,12 and rate-dependent 13 behavior of fiber-reinforced composites and the anisotropic thermoelastic behavior of explosive materials. 14 Recently, FFT-based solvers have entered concurrent 15,16 and data-driven 17,18 multiscale methods. Furthermore, multiphysics problems, such as phase-field damage, 19,20 brittle fracture, [21][22][23] ferroelectricity, 24 and thermomechanics, 25 may be successfully treated by FFT-based techniques, as well.
The wide range of applications just described, in turn, poses challenges for state-of-the-art FFT-based solvers. Indeed, modern FFT-based methods have to be flexible enough to support a wide variety of (nonlinear) material models, and to work robustly for different classes of microstructures. In this context, research on FFT-based algorithms has mostly focused on two key areas. First, it was realized that the original discretization by Moulinec and Suquet, 1,2 based on trigonometric polynomials, shows characteristic ringing artifacts and, more importantly, is ill-conditioned for porous materials. Motivated by these findings, finite difference 26,27 and finite element 28,29 discretizations were exploited in the context of FFT-based methods. Secondly, Moulinec-Suquet's original basic scheme 1,2 converges slowly for highly contrasted materials. The resulting shortcomings in performance, for example, for porous materials or perfect elastoplasticity, led to the development of solution algorithms with improved convergence behavior.
FFT-based solvers may be classified into two families. The first class of algorithms comprises extensions of the basic scheme of Moulinec and Suquet 1,2 and includes fast gradient methods, 22,30 conjugate gradient (CG) methods, [31][32][33] as well as Newton 34,35 and Quasi-Newton methods. 21,36,37 Typically, these algorithms operate on a compatible strain-field (or directly on a displacement-fluctuation field) and seek a solution to the Lippmann-Schwinger equation. By a dualization procedure, these techniques may also iterate on equilibrated stress fields. 38,39 The second group of solvers consists of the polarization-based methods. Pioneered by Eyre and Milton, 40 these primal-dual methods are based on a reformulation of the Lippmann-Schwinger equation and operate on a stress-polarization field, which is neither compatible nor in equilibrium. First introduced for conductivity problems, the Eyre-Milton method was extended to linear elasticity by Michel et al. 41 In the same paper, the authors proposed another polarization scheme, the so-called accelerated scheme, with applications to infinitely contrasted materials in mind. Monchiet and Bonnet 42 introduced a family of damped polarization-based fixed-point schemes and identified the Eyre-Milton method as a particular instance of these algorithms. Subsequently, Moulinec and Silva 43 revealed that, in fact, Michel et al. accelerated scheme was another member of this solver class. For general nonlinear material behavior, Schneider et al. 44 showed that polarization methods arise as the Douglas-Rachford operator-splitting method 45 applied to nonlinear elasticity. In particular, the well-established convergence theory for the Douglas-Rachford splitting could be transferred to polarization-based methods, clarifying the optimum choice of algorithmic parameters in the nonlinear setting. Based on these results, polarization-based methods proved to be very powerful, oftentimes outperforming the fastest strain-based solvers, see section 7 in Schneider et al. 44 Unfortunately, polarization-based schemes are highly sensitive to the choice of algorithmic parameters, limiting their suitability as general-purpose solvers.

Contributions
In this work, we study the combination of polarization methods and Anderson acceleration, producing a fast, flexible and versatile general-purpose FFT-based solver. Anderson acceleration 46 is a method for improving the convergence behavior of fixed-point iterations, where derivatives of the fixed-point mapping are not available. Based on a limited number (the so-called depth) of previous iterates, Anderson acceleration generates the next iterate based on a mixture of previous iterates, where the mixing coefficients solve an associated low-dimensional optimization problem. Anderson acceleration often leads to a substantial speed-up in applications, such as convective flow, 47 well-fracture, 48 radiation-diffusion, 49 computer graphics, 50 or microstructure generation. 51 Anderson acceleration may be interpreted as a multisecant Quasi-Newton method 52 and is "essentially equivalent" to GMRES for linear problems, see Walker and Ni. 53 Theoretical convergence assertions were only recently provided. 54,55 In FFT-based computational micromechanics, the Anderson-accelerated basic scheme was included as a solution algorithm in the AMITEX software package, 21 and compared with single-secant Quasi-Newton methods by Wicht et al. 37 Unfortunately, when applied to the basic scheme, Anderson acceleration is unable to unleash its full potential. Indeed, when applied to gradient descent (such as the basic scheme 30,35,56 ), Li and Li 57 proved that the convergence rate of an Anderson-accelerated gradient method does not improve upon the optimum convergence rate of plain gradient descent. This theoretical result is backed up by numerical experiments 37 in computational micromechanics.
Applying Anderson acceleration to polarization schemes appears much more promising. Indeed, most of the time, an optimally tuned polarization method is competitive or even outperforms the fastest strain-based solvers in terms of iteration count. 43,44,58 Thus, by relieving the user of the daunting task of identifying the optimum numerical parameters, the Anderson-accelerated polarization scheme turns into a general-purpose solver for FFT-based computational micromechanics. We wish to draw the reader's attention to recent applications 50,59,60 of Anderson acceleration to operator-splitting methods, which motivated the present work.
Please note that Shantraj et al. 61 investigated the combination of a nonlinear GMRES method 62 (which is equivalent to Anderson acceleration) and polarization methods in the setting of finite-strain crystal viscoplasticity, and reported the Anderson-accelerated basic scheme to outperform the Anderson-accelerated polarization methods. However, polarization methods are known to be less powerful at finite strains due to the nonconvexity of the problem. We refer to Kabel et al. [35, sec. 3.2.5] for computational experiments. Thus, the conclusions of Shantraj et al. 61 cannot be transferred to the small-strain setting. Furthermore, Shantraj et al. 61 consider the deformation gradient and a rescaled polarization field as iterates of their algorithm. However, a recent study by Ouyang et al. 60 demonstrates that it is preferable in terms of iteration counts and run-time to restrict Anderson acceleration to the lower-dimensional fixed-point iteration of the polarization. In the context of FFT-based micromechanics, this corresponds to accelerating the (damped) Eyre-Milton iteration, which is the approach we follow in this study.
This work is organized as follows. After recapitulating the basics of polarization methods, see Section 2.1, and Anderson acceleration, see Section 2.2, we present the resulting algorithm in Section 2.3. In Section 3, we perform numerical experiments to evaluate the performance of Anderson accelerated polarization methods and compare them to the fastest strain-based solution algorithms.

The Eyre-Milton equation and polarization schemes
This section provides a stream-lined presentation of polarization methods for FFT-based computational micromechanics at small strains, see Schneider et al. 44 as a general reference.
In the context of small-strain continuum mechanics, let a cuboid cell Q in R d be given, together with a heterogeneous strain energy density where d = 2, 3 denotes the spatial dimension and Sym(d) is the space of symmetric d × d matrices. In the following, we assume that w is measurable in its first variable and (twice) differentiable in the strain. For a general physically nonlinear hyperelastic material, w corresponds to the strain-energy density so that the stress operator computing the Cauchy stress tensor (x, ) at x in response to the applied (infinitesimal) strain ∈ Sym(d) is defined by the hyperelastic relation Alternatively, the energy w may arise as the incremental potential of a generalized standard material after time discretization and static condensation of internal variables, see Miehe. 63 Assuming vanishing nonequilibrium stresses, the condensed incremental potential permits the hyperelastic definition (2) of the stress operator. Note that, in this case, the energy w has no intrinsic physical meaning, as it depends on the chosen time-integration scheme and mixes the Helmholtz free energy and the dissipation potential of the material.
For the convenience of the reader, we suppress the x-dependency of the energy w and the stress operator in the following.
Introducing the space of periodic and mean-free displacement fluctuations we seek a solution u ∈ H 1 # (Q; R d ), which satisfies the static balance of linear momentum without volume forces for a prescribed macroscopic strain . The corresponding space of square-integrable stress-and strain-fields L 2 (Q; Sym(d)) is endowed with the inner product where vol (Q) denotes the volume of the cell Q. Assuming that the stress for vanishing strain is square-integrable, w is -strongly convex in its second variable and has an L-Lipschitz gradient the balance of linear momentum has a unique solution. 56 This permits to define the effective stress associated to the strain loading where the displacement fluctuation u ∈ H 1 # (Q; R d ) solves equation (3). For more general existence results for monotone operators 1 , which are not necessarily derived from a potential, we refer to chapter 22 in Bauschke and Combettes. 64 It can be shown 65 that for any displacement fluctuation field u solving Equation (3) and any reference stiffness C 0 , the total strain = + ∇ s u ∈ L 2 (Q; Sym(d)) solves the Lippmann-Schwinger equation where Γ 0 denotes Green's operator associated to the reference material C 0 , 66 a bounded linear operator on L 2 (Q; R d ).
Conversely, suppose that the strain field ∈ L 2 (Q; Sym(d)) solves the Lippmann-Schwinger Equation (6) for some reference stiffness C 0 , then we may find a displacement fluctuation field u ∈ H 1 # (Q; R d ), s.t. = + ∇ s u holds and u solves the balance of linear momentum (3), see, for instance, Schneider. 65 The Lippmann-Schwinger equation serves as the basis of successful numerical algorithms for solving the balance of linear momentum (3), see Wicht et al. 37 for a recent overview. Alternatively, we may investigate a formulation based on the polarization field P = ( ) + C 0 ∶ , that is, the Eyre-Milton equation 40 in terms of the operator a nonlocal reflection operator on L 2 (Q; Sym(d)), and the operator a nonexpansive and local operator on L 2 (Q; Sym(d)). More precisely, the operator Y 0 satisfies the reflection identity Y 0 • Y 0 = Id, formulated in terms of the identity operator Id on L 2 (Q; Sym(d)). Furthermore, Z 0 is well-defined, as the operator  → ( ) + C 0 ∶ is invertible due to the (strong) convexity of w and the nondegeneracy of the reference stiffness C 0 . For any solution of the Lippmann-Schwinger equation (6), the polarization field P = ( ) + C 0 ∶ solves the Eyre-Milton equation and vice versa, as a direct implication of the Eyre-Milton identity 1 In the present setting, -monotonicity of is implied by condition (4). a simple algebraic rewriting of the Lippmann-Schwinger equation [44, sec. 2]. For any damping parameter ∈ (0, 1], we may consider the damped Picard iteration associated to the Eyre-Milton equation which is called polarization scheme. 42,43 Under the hypotheses of this section, for any initial value P 0 ∈ L 2 (Q; Sym(d)), reference stiffness C 0 and damping parameter ∈ [0, 1), the iterative scheme (9) converges to a solution of the Eyre-Milton equation (7). This is a direct consequence of the identification [44, sec. 3] of the polarization scheme (9) as the Douglas-Rachford method, 45 and the tight linear convergence bounds for the Douglas-Rachford splitting established by Giselsson and Boyd, 67 see section 3.1 in Schneider. 68 Restricted to the class of reference materials proportional to the identity, explicit formulae for obtaining the optimum convergence rate are available. More precisely, if C 0 = 1∕s Id holds in terms of a positive number s, the distance of the iterates of (9) to the fixed point P * decreases by with the quantity see theorem 2 in Giselsson and Boyd. 67 The best convergence rate is achieved by setting s = 1∕ √ L and = 1, leading to the estimate At this point, some remarks are in order: 1. For simplicity of exposition, we restricted to the case where the stress operator derives from a convex potential. However, Giselsson 69 established linear convergence estimates for the Douglas-Rachford method in the context of monotone operator theory, which places fewer restrictions on . In fact, -monotonicity and L-Lipschitz continuity of are sufficient to prove linear convergence. 2. In practice, it can be difficult or, at least, computationally demanding to determine the optimum parameters for convergence. If the energy w is twice differentiable, the parameters and L may be obtained as the minimum and maximum eigenvalue of the algorithmic tangent field C alg = 2 w∕ 2 , respectively. For small-strain materials, the maximum slope of the stress operator is typically not larger than the maximum slope of the algorithmic tangent at zero strain. Thus, for example for elastoplasticity, L may be estimated from the maximum eigenvalue of the initial elastic stiffness, maximized for all x ∈ Q. Computing , on the other hand, may require an eigenvalue decomposition of C alg , which is computationally expensive. Hence, an approach should be identified, which minimizes how often and L are computed while preserving the convergence rate of the scheme, see Section 3.2 for further discussion. 3. The situation becomes even more difficult for stress operators which do not derive from a potential. In this case, the optimum choice of algorithmic parameters depends on the regularity conditions of the stress operator , such as L-Lipschitz continuity or 1/L-cocoercivity, see theorems 6.5 and 7.4 in Giselsson. 69 In practice, it is difficult to check these conditions for a given material law, further complicating the choice of s. 4. Although setting the damping parameter to zero is typically the theoretically optimum choice, 67 Owing to the explicit updates, the basic scheme suffers from a step-size restriction in order to retain stability. By contrast, due to the implicit nature of the updates, polarization schemes (9) are stable for any step size. In particular, much larger step sizes than for the basic scheme can be used. The latter phenomenon is responsible for the improved convergence speed of the polarization methods compared with the basic scheme. Moreover, the relaxation (9) of the fixed-point scheme by a factor may also be applied to the basic scheme. However, the resulting scheme will be equivalent to the basic scheme with a different step size. By contrast, for polarization methods, relaxation leads to more general methods.
Overall, we may conclude that the choice of the step size s and the relaxation parameter is not straight-forward. This is particularly bothersome, since the convergence rate of polarization-based schemes exhibits a strong sensitivity w.r.t. these parameters [44, sec. 7]. Thus, for problems where estimates for or L are not available, polarization-based schemes may perform poorly, see section 3.2 in Schneider, 36 limiting their usefulness as general-purpose solvers. In the following Sections (2.2) and (2.3), we shall discuss how Anderson acceleration (9) may counterbalance the slow convergence behavior of polarization-based schemes for suboptimal parameter choices.

Anderson acceleration for fixed-point iterations
Suppose a (nonlinear) operator F : X → X is given, mapping a Banach space X into itself, which is Lipschitz-continuous with Lipschitz constant < 1. For any initial value x 0 ∈ X, Banach's fixed point theorem 70 asserts that the iterative scheme converges to the unique fixed point x * of F with rate , that is, holds. Anderson acceleration, 46 sometimes also called Anderson mixing, is a method applicable to general fixed-point iterations (12). It aims at improving the convergence properties of the Picard iteration (12) for cases where derivatives of F are either not available or expensive to compute. Anderson acceleration depends on a nonnegative integer m called depth. For m = 0, it reduces to the original Picard iteration (12). For general depth m ≥ 0, to determine the next iterate x k + 1 , Anderson acceleration "mixes" the last m k + 1 iterates where m k = min(k, m) holds and the coefficients k ∈ R m k +1 are chosen to minimize the function where The formulation (13) involves applying the nonlinear operator F (m k + 1) times for each iteration step of Anderson mixing. As evaluating the operator F is typically the most expensive step, practical implementations are based on the (already) computed residuals r i instead, using the equivalent update formula In this way, the nonlinear operator F needs to be evaluated only once per Anderson iteration. In addition, if X is a Hilbert space, the minimization problem (14) simplifies to a quadratic programming problem for k which may be solved by where 1 _ is a vector of all ones in R m k +1 , A k is the symmetric positive semidefinite matrix and A † k is the Moore-Penrose pseudoinverse 71,72 of the matrix A k . In case the matrix A k is ill-conditioned and X is finite-dimensional, Fu et al. 59 recommend solving the optimization problem (14) based on a singular value decomposition (SVD) of the matrix However, this approach requires a higher memory footprint than the procedure based on the pseudoinverse. Furthermore, we did not encounter ill-conditioning of the matrix A k during our numerical experiments, see Section 3. This suggests that the SVD-approach may not be necessary for the problem at hand. At the end of this section, we wish to put Anderson acceleration into context, and report on recent convergence assertions.
Anderson acceleration may be interpreted as a Quasi-Newton method of multisecant type. 52 Applied to linear problems, Walker and Ni 53 showed that Anderson acceleration is "essentially equivalent" to GMRES with depth m. 73 Toth and Kelley 54 showed that Anderson acceleration does not decrease the convergence rate of linearly converging fixed point iterations. Furthermore, Evans et al. 55 showed that Anderson acceleration improves upon the convergence rate of linearly convergent fixed-point iterations, but not for those converging quadratically. However, some caution is advised for these results, because they assume that the coefficients k remain uniformly bounded (and uniformly bounded away from zero) in k. This assumptions is difficult to verify in practice, as it is an assumption on the Anderson acceleration procedure and not an assumption on the fixed-point mapping F. Furthermore, Anderson acceleration may also converge if the original mapping F was not contractive. 74 For stationary Anderson acceleration with fixed coefficients k , De-Sterck and He 75 provided convergence estimates for accelerating gradient-descent, drawing on the similarity to Nesterov's scheme 76 for m = 1. In numerical tests, the authors found that the convergence rate of the stationary version provides a rough performance estimate for the classical Anderson acceleration. Using a similar strategy, Wang et al. 77 investigated the speed up for accelerating ADMM, which may be interpreted as a dual version of the Douglas-Rachford splitting. 67 Of particular interest is the work of Li and Li. 57 They demonstrate that, when Anderson acceleration is applied to gradient descent and strongly convex functions with Lipschitz gradient, the convergence rate is not improved compared with an optimally tuned gradient-descent scheme. At first, this result appears discouraging because other methods, for instance, fast gradient solvers, 78 lead to an improvement of the convergence rate. However, the problem with convergence assertions for Anderson acceleration is its finite depth m. Suppose, for instance, we consider solving a symmetric linear system. Suppose that we obtain a sufficiently accurate solution with MINRES 79 in K steps. Then, choosing m ≥ K, GMRES(m) gives identical iterates as MINRES. As Anderson(m) is essentially equivalent to GMRES, it also converges as quickly as MINRES. However, this speed is not reflected in convergence rates, because these always consider infinite sequences.
In addition, the Li and Li 57 result may be interpreted in a positive way by noticing that it may be extremely hard to tune the parameters of gradient-descent schemes in an optimum fashion. Thus, Anderson acceleration may indeed lead to a benefit in practice, also for gradient descent. As Moulinec-Suquet's basic scheme 1,2 is essentially a gradient-descent method for stress operators with potential, 30,35,56 we may interpret the positive results of Gélébart's AMITEX solver, 21,80 who applied Anderson acceleration to the basic scheme, as a testament for this statement.
In this work, we shall follow a slightly different path by applying Anderson acceleration to the polarization scheme (9), and use it for avoiding tedious parameter calibration.

2.3
Application to polarization schemes ⊳ Return scalar and polarization residual Polarization schemes (9) are fixed-point methods (12) for the nonlinear mapping where we restricted to C 0 = 1 s Id for simplicity. Then, the operators Y 0 and Z 0 attain the form In the general setting of Section 2.1, for any s > 0 and ∈ [0, 1], the operator F s, is nonexpansive, that is, Lipschitz continuous with Lipschitz constant 1. If, furthermore, w is strongly convex, for any s > 0 and ∈ [0, 1), F s, is even contractive in view of the estimate (11). Unfortunately, it is not always apparent how to choose the parameter pair (s, ) to ensure fast convergence. Even though explicit values for (s, ) were listed, their practical determination may be expensive. Indeed, suitable constants and L may be read off from eigenvalue analyses based on the material tangents (x, (x)) if is continuously differentiable. However, if the voxel count is large, the sheer number of eigenvalue decompositions may be expensive per se. Thus, we apply Anderson acceleration to the contractive operator F s, (18), as discussed in Section 2.2. Performing a single step of the polarization scheme is summarized in Algorithm 1. Notice that we do not compute F s, (P), but its polarization residual P − F s, (P), because the latter enters in the Anderson matrix (17) and in the Anderson update (15).
Alongside we compute the (scalar) residual where ⟨ ( )⟩ Q denotes the average stress associated to the polarization P = ( ) + 1 s . The residual (19) measures the strain compatibility, the stress equilibrium and the average value of the strain in view of the identity [44, sec. 5] The residual (19) depends on the step size s, so some care has to be taken in comparing different solution schemes. However, this phenomenon is intrinsic, because conditions on the strain and the stress field have to be enforced, and the step size helps converting between the different physical units of strain and stress. if residual ≤ tol then 10: exit while loop 11: end if 12: update step size s ⊳ Step may be omitted for performance reasons 13: append P and R to the list  14: compute inner products of R with older R's from  15: update matrix A (17) ⊳ Only single row+column changes 16: determine by Equation (16) 17: compute new P by Equation (15) ⊳ Discard superfluous R's and K's from  18: end while 19: The most expensive step for nonlinear material behavior is evaluating the operator Z 0 (8). In computational practice, it is often more convenient to use the equivalent expression where D 0 is the reference compliance. Indeed, for inelastic material whose stress-strain relationship is governed by Hooke's law, the operator Z 0 may be computed by a standard call to the user-defined material law (with a modified stiffness) [44, sec. 6]. The average stress is easily computed as a byproduct. The algorithm is summarized in Algorithm 2, where a hat over a variable refers to the corresponding Fourier coefficients. The method may be implemented on 2(m + 1) strain-like fields. Implementations on the strain field, as in Grimm-Strehle and Kabel, 81 are not feasible because the iterates are not compatible.

General setup and organization
The Anderson-accelerated polarization-based schemes, abbreviated as A2DR (Anderson-accelerated Douglas Rachford), following Fu et al., 59 were implemented in an in-house FFT-based micromechanics solver, written in Python 3.7. Computationally expensive operations, such as applying Γ 0 , evaluating the material law and the Anderson update (15), were realized as Cython extensions using OpenMP parallelization. For applying the FFT, we use the FFTW library. 3 Throughout, we rely on the staggered-grid discretization. 27 To describe the action of the corresponding discrete Γ h operator, we introduce the complex vectors where denotes a frequency vector and h j and N j are the mesh spacing and voxel count in j-direction, respectively. Then, the associated symmetrized gradient operator D of the staggered-grid discretization has the Fourier-space representation whereD * [ ] is the Hermitian adjoint ofD[ ], Our convergence criterion for the polarization-based schemes reads using the residual defined in Equation (19). For the strain-based schemes, which serve as performance benchmarks for A2DR, we use Note that the criterion of the polarization-based schemes (22) checks the compatibility of the strain field and the deviation from the prescribed macroscopic strain in addition to the equilibrium of the stress field in (23), as each condition is only satisfied upon convergence. Unless explicitly stated otherwise, we solve to a tolerance of = 10 −5 . In case of multiple load steps, we use an affine linear extrapolation 2 as an initial guess for our solution field.
The computations in Section 3.2-3.5 were performed on a desktop computer with 32 GB RAM and a 6-core Intel i7-8700K CPU. The computations for Section 3.6 ran on a workstation with 512 GB RAM and two 12-core Intel Xeon(R) Gold 6146 CPUs.
These computational investigations are intended to demonstrate the power and versatility of A2DR, and are organized as follows. We start with a two-dimensional (2D) example in Section 3.2, which permits us to study the dependence on the involved algorithmic parameters. In three dimensions, studies with large depth are prohibited by memory constraints. In Section 3.3, we study a three-dimensional (3D) example with nonlinear constituents, but finite material contrast. The example serves as a standard benchmark for FFT-based solvers. 33,36 In Section 3.4, we study a linear elastic material including pores. Porous microstructures are known to be difficult for polarization methods, because the optimum step size s = 1∕ √ L is not sensible for = 0. In Section 3.5, we study a metal-matrix composite (MMC) undergoing ratcheting. This example is challenging for two reasons. For a start, the underlying material model is not a generalized standard material, as the material tangent is not symmetric. In particular, the convergence theory discussed in Section 2 does not apply. As a second challenge, the material tangent becomes increasingly ill-conditioned for increased loading. Last but not least, in Section 3.6, we study a polycrystalline microstructure. Such constitutive laws are notoriously expensive to evaluate. Thus, Newton-Krylov methods are usually the preferred choice for this type of problem, as iterations of the linearized problem require much less computational demanding than the nonlinear evaluation. Furthermore, the specific material law we utilize involves a softening behavior. In particular, the example is not covered by the available convergence theory.

Continuous glass-fiber reinforced polymer
As our first example, we consider polyamide 6.6 continuously reinforced by glass fibers with a volume fraction of 30%. The microstructure, see Figure 1, is modeled as a 2D periodic cell, generated by the adaptive shrinking cell algorithm of Torquato and Jiao. 82 The resulting structure contains 200 fibers and is resolved by 512 × 512 pixels.
The glass fibers are modeled as isotropic linear elastic and the polyamide matrix is governed by J 2 -elastoplasticity with isotropic hardening, see sec. 3.3 in Simo and Hughes. 84 Following Doghri et al., 83 we use a linear exponential hardening function for polyamide where 0 denotes the initial yield stress, k 1 is the asymptotic hardening modulus and k 2 = 0 − ∞ specifies the difference between the initial and saturated yield strength for k 1 = 0. The material parameters are listed in Table 1. Please note that a similar microstructure was considered by Wicht et al. 37 For the present article, however, twice the volume fraction, and four times the resolution of Wicht et al. 37 is investigated. In particular, we observe a more pronounced plastification caused by the higher filler fraction.
For this comparatively small 2D example, we investigate the convergence rate of A2DR with respect to the chosen depth m. In particular, we are interested in the sensitivity of the results with respect to the chosen algorithmic parameters. As shown in Section 2.1, the convergence rate of the nonaccelerated polarization-based schemes depends on the choice of the step size s and the damping parameter . Indeed, it was shown that, in practice, choosing a suboptimal step size may increase the necessary iteration counts by orders of magnitude, see sec. 7 in Schneider et al. 44 Hence, we aim to find a suitable depth m for which the dependence of performance on s and is eliminated or, at least, reduced.
For a start, we consider a linear problem, where both glass fibers and polyamide matrix are modeled as linear elastic. Using the formulation of mixed boundary conditions by Kabel et al., 85 the microstructure is subjected to 5% uniaxial extension in z-direction. For a fixed step size s = 2∕( + L), the required iteration counts and total run-times for A2DR for different values of and m are plotted in Figure 2(A).
As a general trend, we observe that the required number of iterations decreases up to a depth of m = 4 and stagnates afterwards. This decrease is not necessarily monotone, c.f. the plot for = 0.5. The total run-times follow a similar trend. Anderson acceleration introduces only a small overhead. In particular, for this section, it suffices to investigate either the Taking a look at the effect of the damping parameter for m = 0 (i.e., when Anderson acceleration is deactivated), we note that the iteration counts range from 205 for Monchiet-Bonnet's choice = 0.25 42 to 310 for = 0.5, corresponding to Michel-Moulinec-Suquet's accelerated scheme. 41 For depth m ≥ 4, the difference between these choices is largely eliminated, and A2DR converges in roughly 50 iterations for all damping factors considered. These results indicate that, in addition to the faster convergence, A2DR relieves the user of the task to select the damping factor carefully.
Next, we take a look at the influence of the step size s for a fixed damping factor of = 0.25, see Figure 2(B). Starting at m = 0, the slowest choice of s = 1/L requires about 10 times as many iterations to converge compared with the theoretically optimum choice s = 1∕ √ L. Activating Anderson acceleration reduces this performance gap significantly. Up to a depth of m = 3, the iteration counts decrease for all step sizes and stagnate afterwards. For the optimum step size the effect is least pronounced, with a decrease of 46 iterations for m = 0 to 35 iterations for m = 3. However, for all other step sizes the iteration counts are significantly reduced, leading to a factor of less than two between the slowest and fastest choice for m ≥ 3. Notably, the iteration count for s = 1∕ matches that of the theoretically optimum choice for m = 3 and is even slightly lower for higher depths.
In conclusion, we observe that the performance of A2DR with a depth of m = 4 is largely independent of the damping factor . The influence of the step size on performance does not vanish, but is significantly reduced compared with the classical polarization-based schemes without Anderson acceleration. In particular, step-size choices such as s = 1/L or s = 2∕( + L) become competitive using A2DR, making polarization-based schemes available for materials where is close or equal to 0, see Sections 3.4 and 3.5.
To check whether the results of the linear elastic setting carry over to nonlinear problems, we consider the case where the polyamide matrix is governed by J 2 -elastoplasticity. The boundary conditions are imposed as for the linear elastic case, that is, 5% uniaxial extension in z-direction, applied in 50 equidistant load steps. For computing the step size, and L are estimated in the first iteration of each load step based on the tangent field, see remark 2 in Section 2.1. The step size is then kept fixed for the remainder of the load step. Using this strategy, the effect of Anderson acceleration with respect to different damping factors and step-size choices is qualitatively similar to the linear elastic setting, see Figure 3(A,B). For depths m ≥ 4, the impact of the damping factor is largely eliminated and iteration counts stabilize. However, the difference between the investigated step-size choices is more pronounced in the nonlinear case. For the unaccelerated schemes, the slowest choice s = 1/L requires 33 times more iterations than the optimum choice s = 1∕ √ L. At a depth of m = 4, the factor between the slowest and fastest step size is reduced to 4.
A few synoptic remarks are in order. As an alternative strategy for computing the step size s, we investigated the approach of Schneider et al., 44 where and L are computed in every iteration and the step size s is subsequently updated based on the minimum value of and the maximum value of L over all past iterates. Upon Anderson acceleration, no positive effect on the convergence behavior was observed in practice, except for very large nonlinear load steps. Furthermore, the overall performance with respect to run-time suffered due to the overhead of computing and L. Thus, we use the simpler strategy of updating and L at the beginning of each load step for the remainder of the article. As a second remark, based on the results up to this point, choosing s = 1∕ seems to be competitive when using A2DR, as the resulting performance was often similar or better than for the theoretically optimum value of s = 1∕ √ L. However, whenever is unknown or zero, both, s = 1∕ and s = 1∕ √ L cannot be used. In addition, s = 1∕ was found to result in low convergence rates for high accuracy. Hence, we prefer s = 1∕ √ L where applicable. Larger values for the depth m, up to 200, were tested for A2DR, leading, however, to no further decrease of the iteration count. Thus, for the sake of readability, these results were omitted in the respective plots of this section. Interestingly, this strongly differs from the behavior observed for the Anderson-accelerated basic scheme, see sec. 4.2 in Wicht et al., 37 where iteration counts were found to decrease up to depths of m = 50 (albeit at the cost of slower overall performance, due to computational overhead). This further demonstrates the difficulty of finding the optimum step size of the basic scheme. As exemplified by adaptive step-size selection-schemes, for example, by Barzilai and Borwein 86 or Malitsky and Mishchenko, 87 a constant step size of s = 2∕( + L) does not yield the best possible performance for gradient descent when time to solution is the major objective. By contrast, using the optimum constant step size for polarization-based schemes appears to leave less room for improvement.

Short glass-fiber reinforced polymer
Motivated by the results for the 2D example of the last section, we compare the performance of A2DR to other modern FFT-based solvers, based on a larger 3D microstructure that serves as a recurring benchmark example for FFT-based solvers, see section 3.3 in Schneider 36 . More precisely, we consider a polyamide matrix, reinforced by 1140 glass fibers with an aspect ratio 30, filling 20% of the volume, see Figure 4. The microstructure was generated using the sequential addition and migration algorithm 11 and resolved by 256 3 voxels. The fiber orientation in the resulting microstructure is close to unidirectional with a second-order fiber-orientation tensor 88 of A = diag(0.8, 0.1, 0.1). Throughout, we use the material parameters listed in Table 1, as in Section 3.2. First, we restrict to the linear elastic problem with an applied loading of 5% uniaxial extension in fiber direction. We solve up to a high accuracy of = 10 −10 to get a better idea of the convergence rate of the investigated FFT-based solvers. We compare the performance of A2DR with = 0.25, m = 4 and the optimum step size s = 1∕ √ L, to Monchiet-Bonnet's scheme with s = 1∕ √ L, 42 the CG method, 31,32 the Barzilai-Borwein (BB) basic scheme 36,86 and the original basic scheme by Moulinec and Suquet. 2 Throughout, the optimum algorithmic parameters are used for the Lippmann-Schwinger solvers. To be precise, we choose s = 2∕( + L) as the reference material of the basic scheme and the CG method (where it does not matter 31 ). The BB method is initialized with s = 2∕( + L) as well and adaptively selects its step size after the first iteration. 36 In Figure 5(A), we see the excellent convergence rate of A2DR, reaching the prescribed tolerance with the lowest number of iterations among all investigated solvers. Most notably, the performance of A2DR and CG is nearly identical. Interestingly, for the same benchmark, see sec. 3.3 in Schneider, 36 the author already observed that the (nonaccelerated) Eyre-Milton method mirrored the performance of CG for low accuracy. Using Anderson acceleration, this advantage is preserved up to the investigated tolerance of = 10 −10 . Both the BB method and the Monchiet-Bonnet scheme exhibit similar convergence rates up to an accuracy of 10 −7 . Subsequently, the residual of the BB method decreases rapidly, leading to a lower final iteration count. Note, however, that this only a fortuitous byproduct of the inherently nonmonotone convergence behavior of the algorithm 2 . In the aforementioned numerical experiment in sec. 3.3 of Schneider, 36 the final iteration count of the BB method was very close to the one we observed for Monchiet-Bonnet's method, which is roughly 50% higher than the iteration counts of A2DR and CG. The basic scheme is not competitive, being an order of magnitude slower than the other investigated schemes.
Taking a look at the overall performance in terms of computation time, the ranking between the solvers changes slightly, see Figure 5(B). Essentially, the BB method and A2DR switch places, with one being slightly faster and the other being slightly slower than CG. This is due to the lower computational cost per iteration of the BB method, see Table 2. Using the complexity-reduction trick in sec. 6 of Schneider et al., 44 for all investigated algorithms, the computational effort of evaluating the material law, applying the Γ 0 -operator and computing the residual are very similar. Whereas an iteration of the BB method only requires a single inner product and one addition of two fields on top of the aforementioned steps, the A2DR update involving Equations (15)-(17) requires computing m + 1 inner products, solving a linear system of size m + 1 and summing m + 1 fields. As a consequence, the cost per iteration for A2DR(m = 4) ends up at being roughly 50% higher compared with the BB method. Last but not least, we consider the nonlinear problem with J 2 -elastoplastic matrix behavior. The uniaxial loading up to 5% uniaxial extension is applied in 50 equidistant steps. We add the Newton-CG method 34,85 to our list of investigated schemes in place of the linear CG method. To be precise, we use Dong's line search criteria 89 for controlling the step size of the Newton update and prescribe Eisenstat and Walker's 90 forcing term choice 2 as tolerance for the linear system, see Wicht et al. 37 Note that, for Newton-CG, we take the sum of Newton iterations and linear CG iterations for computing the iteration count. Taking a look at Figure 6, we see that the polarization-based schemes outperform the (Quasi-)Newton methods and the basic scheme. A2DR is fastest, followed by Monchiet-Bonnet's method whose run-time is 35% higher. Both, Newton-CG and BB, perform similarly, taking roughly twice as long as A2DR to finish.
In conclusion, we see that Anderson acceleration further improves the performance of the already powerful polarization-based methods for finitely contrasted materials. A depth of m = 4 emerges as a reasonable choice for both linear and nonlinear problems. Note that the improved performance of the Anderson-accelerated polarization-schemes comes at a price. Using A2DR with a depth of m = 4 requires the storage of 10 strain-like fields, not counting internal variables. Compared with one strain-field for the basic scheme, two for the BB method and 8.5 for Newton-CG (when storing the tangent-field), this represents a rather large memory footprint. This is exacerbated by the fact that polarization-based schemes do not permit a displacement-based implementation, which can roughly half the memory requirements of the aforementioned strain-based methods.

A sand-core microstructure
For this example, we investigate a sand-core microstructure, discretized by 256 3 voxels. The structure consists of 64 sand grains with a volume fraction of 58.58% held together by an inorganic binder with 1.28% volume fraction, see Figure 7. For a detailed treatment of the microstructure generation and the linear elastic properties of the material, we refer to Schneider et al. 91 The material parameters of the constituents are listed in Table 3. The sand-core microstructure represents a porous material for which is usually unknown. 95 Using the natural estimate = 0 makes the step size s = 1∕ √ L, which is optimal for the polarization-based schemes, not applicable. Hence, in numerical experiments, their performance for solving this type of problem was found to be poor, 36 exhibiting even lower convergence rates than the basic scheme. In contrast to fast gradient solvers 30,33 or (Quasi-)Newton methods, 34,36,37 this has prevented using polarization-based schemes as general-purpose algorithms.
In this context, we investigate whether A2DR can increase the efficiency of polarization-based schemes to competitive levels. To this end, the sand-core microstructure is subjected to 1% uniaxial extension in x-direction. We solve up to a tolerance of = 10 −10 to determine the convergence rate of the algorithms. We fix = 0.25 and consider the available step sizes for = 0, that is, the conservative choice s = 1/L and the optimum step size of the basic scheme s = 2∕( + L), that is, s = 2/L for = 0.
As for finitely contrasted media, Anderson acceleration substantially improves the performance for the investigated step-size choices, see Figure 8(A). In agreement with the results by Schneider, 36 Monchiet-Bonnet's method with the step size of the basic scheme slows down considerably over the course of the computation and fails to converge within 1000 iterations. By contrast, A2DR with the same step size exhibits a linear convergence rate, requiring less than 300 iterations to reach the prescribed tolerance. Using s = 1/L roughly triples iteration counts and run-times. Figure 8(B) reveals that, using Anderson acceleration, polarization-based schemes become competitive to the fastest available FFT-based solvers for porous media. More precisely, A2DR(m = 4, s = 2∕( + L)) ends up just between CG and the BB method in terms of iteration count, requiring 25% more than the former and 25% less than the latter. In terms of overall performance, it matches the BB method, with both methods running about 30% longer than CG.
To summarize, Anderson acceleration makes the computational power of polarization-based schemes available for treating porous microstructures, considerably broadening their range of application. The optimum step size of the basic scheme s = 2∕( + L) emerges as a decent general-purpose choice for materials with both finite and infinite contrast.

Metal-matrix composite under cyclic loading
For our next example, we investigate the cyclic behavior of a metal-matrix composite (MMC). The microstructure consists of 50 spherical ceramic particles with a volume fraction of 30% embedded in a metal matrix, see Figure 9(A). For the particle placement, we relied on the random sequential addition algorithm 96 and the resulting microstructure was resolved by 128 3 voxels. The ceramic inclusions are assumed to be linear elastic, whereas the material behavior of the matrix is governed by a J 2 -elastoplasticity model with kinematic hardening, see, for instance, Chaboche. 97,98 For simplicity, we neglect isotropic hardening, that is, we model the yield stress Y to be independent of the equivalent plastic strain p. The governing equations for the model are given by Hooke's law the Karush-Kuhn-Tucker conditions and a kinematic hardening law, defining the evolution of X, that is, the center of the elastic domain. Over time, a multitude of formulations has been proposed for the latter, see, for instance, the reviews by Abdel-Karim 99 or Kang 100 for an overview. For the present study, we choose the kinematic hardening law by Chaboche et al. 101 where the backstress X is decomposed into multiple parts, following the classical Frederick-Armstrong rule. 102 Using a backwards Euler time discretization, we rely on the fixed-point algorithm by Kobayashi and Ohno 103 for the implementation of the material model. Note that the tangent stiffness for this model is not symmetric. 103 For using the Newton-CG method, Kobayashi and Ohno 103 suggest using the symmetrized tangent when solving the linear system. Following their recommendation, we also estimate based on the symmetrized tangent, whereas L is fixed by the elastic stiffness of the materials. The material parameters for both constituents are listed in Table 4. We consider a cyclic uniaxial stress loading with a mean stress value of 100 MPa and an amplitude of 300 MPa. The loading is applied over four cycles, discretized by 30 equidistant steps per cycle. Each step is solved up to a tolerance of = 10 −5 . Note that, in contrast to Section 3.3, the prescribed hardening law leads to a stress operator which is neither derived from a potential nor strictly monotone. Indeed, in our numerical experiments, the lower bound of the tangent field quickly approached zero during plastification, preventing the use of the optimum step size s = 1∕ √ L. Hence, in combination with the nonmonotonic loading, the metal-matrix composite with kinematic hardening constitutes a challenging benchmark, which is not covered by the theoretical treatment of Section 2.
For evaluating the performance of the solvers, we fix the algorithmic parameters of A2DR to = 0.25, s = 2∕( + L) and m = 4 based on the results of the previous sections. Comparing the iteration counts and run-times of the different FFT-based solution schemes in Figure 10, we see that A2DR performs admirably. It requires the lowest number of total iterations and ties with the BB method for the fastest computation time. Monchiet-Bonnet's method, with the same algorithmic parameters as A2DR (except for m = 0), takes more than twice as long to converge, with an overall performance comparable to the Newton-CG method. Note that the effect of Anderson acceleration, while still significant, is less pronounced compared with the numerical experiment on a porous structure in Section 3.4. This is due to the cyclic loading, where plastic flow in the matrix material is only activated for high stress levels, see Figure 9(B). In the elastic loading and unloading steps, all solvers converge in a single iteration owing to the affine-linear extrapolation. Hence, the superior performance of A2DR is only realized in roughly half of all load steps. For the same reason, the basic scheme performs comparatively well for this problem, with a final iteration count just over three times higher than A2DR.

Directionally solidified NiAl-9Mo
For our final example, we turn to a directionally solidified NiAl-9Mo eutectic alloy used as a benchmark problem in Wicht et al. 39 . The considered microstructure with 84 unidirectionally aligned fibers with square cross-section was generated using the random sequential addition algorithm 96 and resolved by 1200 × 160 × 160 voxels. The fibers have an aspect ratio of 100 105 and take up 14% of the overall volume. Following Albiez et al., 106 where d and n denote the slip direction and the slip-plane normal, respectively,̇0 is the reference slip-rate and F is the yield stress. The operator ⊗ s denotes the symmetrized dyadic product, that is, d ⊗ s n = (d ⊗ n + n ⊗ d )∕2 and the shear stress in a slip system is computed by = ∶ d ⊗ s n . The matrix is assumed to behave perfectly plastic, that is, the yield stress is constant F = F 0 . For the molybdenum fibers, Albiez et al. 106 proposed the softening law in terms of the dislocation density , with maximum yield stress ∞ and a characteristic length parameter d. The authors used the storage-recovery model by Kocks and Mecking 108 for the evolution of the dislocation density, which permits expressing as an explicit function of the accumulated plastic sliṗ= upon integration. The parameters of the materials are listed in Table 5. Evaluating the material law  → ( ) for single-crystal elastoviscoplasticity is computationally expensive compared with other operations such as applying Γ 0 and the associated FFTs, see Eghtesad et al. 109 or sec. 4.4.2 in Wicht et al. 37 Typically, Newton methods enjoy the best performance for this type of problems, as the material law is evaluated only once per Newton iteration, whereas applying the material tangent is substantially cheaper. This is why a performance comparison with A2DR is of interest for this computationally demanding problem.
We consider a creep test, where a uniaxial stress loading with an amplitude of 250 MPa is applied in a single load step for 1 s. Subsequently, the loading is held constant for 120 s, subdivided into 120 equidistant load steps. Throughout, we solve to an accuracy of = 10 −5 . The number of load steps was chosen to obtain a sufficiently fine resolution of the strain rate over time and to ensure the positive definiteness of the tangent stiffness, which is not guaranteed a priori due to the softening law (25).
The different stages of creep behavior are reflected in the iteration counts of the solvers, see Figures 11(B) and 12.
During the first few load steps, a high number of iterations is required, due to the rapid load transfer from matrix to fibers. As the creep rate stabilizes, the affine linear extrapolation takes effect and the iteration count per load step reaches a minimum roughly between step 30 and 60. Subsequently, the softening of the fibers causes an increase in the effective creep rate and leads to a higher material contrast. Thus, the computational effort increases in the last 60 load steps. Comparing the investigated solution schemes, we observe that A2DR with the optimum step size s = 1∕ √ L closely matches the performance of the Newton-CG method. To be precise, A2DR is slower in the first 20 load steps and enjoys a slight advantage afterwards. Roughly around load step 70, both schemes break even in terms of total computation time. In the end, A2DR is even slightly faster overall than Newton-CG. Using the more widely applicable step size s = 2∕( + L) for A2DR doubles the total run-time compared with the optimum step size. Still, the slower option is about 30% faster than the BB method.

CONCLUSION
The present study was devoted to increasing the robustness and performance of polarization-based methods in FFT-based micromechanics, by applying Anderson acceleration, a general-purpose technique for accelerating fixed-point methods.
To demonstrate the usefulness of the proposed algorithm, we covered a wide spectrum of problems, including microstructures and material laws of varying complexity. To be more precise, in Section 3.2 and Section 3.3, we investigated finitely contrasted fiber-reinforced microstructures with elastic and J 2 -elastoplastic material behavior, which were covered by the theoretical treatment in Section 2.1. For this class of problems, the excellent performance of polarization-based methods, see Schneider et al. 44 and Schneider, 36 could be further improved using Anderson acceleration. With respect to the choice of algorithmic parameters, we found that, using a depth of m = 4, the influence of the damping parameter could be eliminated and the sensitivity with respect to the step size was drastically reduced. Whereas the theoretically optimum step size s = 1∕ √ L led to the best performance, when applicable, the step size of the basic scheme s = 2∕( + L) emerged as a viable alternative. In particular, when the strong convexity constant is unknown or tends to zero, s = 2/L can be readily estimated from the elastic stiffness of the constituent materials.
This enabled us to investigate examples which lie outside of the framework of strongly convex optimization and where polarization-based schemes typically struggle. For the porous sand-core structure in Section 3.4, a lower bound of the elastic stiffness was unavailable. In the problems of Sections 3.5 and 3.6, we considered computationally demanding material models which do not permit a potential-based formulation. Both cases incorporated material laws without a strictly monotone stress operator, with the latter example even including softening behavior. For all of these problems, Anderson acceleration led to substantial speed-ups compared with the classic polarization-based schemes, with A2DR being competitive to the fastest strain-based FFT-solvers.
Indeed, to optimize performance in FFT-based micromechanics, a judicious choice of the solution scheme is often inevitable. For instance, CG is the natural choice for linear elastic problems, inexact Newton-CG is hard to beat if the cost of evaluating the material law is much larger than applying the tangent and the BB method is excellent if the material law is cheap to compute, see Wicht et al. 37 In this study, we demonstrated that A2DR closely matches (or even beats) the performance of these schemes in each of their "ideal" settings. Thus, A2DR represents a robust and powerful solution scheme, which is close to optimal for a wide range of problems.
However, the excellent performance of the method is still accompanied by a large memory footprint. Future work may be devoted to investigating alternative vector-sequence acceleration techniques, 110,111 seeking methods with lower memory requirements which preserve the advantages of Anderson acceleration. Last but not least, the efficiency of polarization-based methods relies on the cheap evaluation of the nonlinear Z 0 -operator. Extending the complexity-reduction technique of Schneider et al. 44 to a wider class of materials would further increase the usefulness of A2DR as a general-purpose method.

ACKNOWLEDGMENTS
This work was supported by the German Research Foundation (DFG) within the International Research Training Group "Integrated engineering of continuous-discontinuous long fiber reinforced polymer structures" (GRK 2078). Matti Schneider gratefully acknowledges DFG-support in terms of the project SCHN 1595/1-1. The authors Daniel Wicht and Thomas Böhlke gratefully acknowledge the partial financial support by the DFG under the project "Lamellar Fe-Al in situ composite materials" (BO 1466/12-2). We thank the anonymous reviewers for their constructive remarks which helped improving the article. Open Access funding enabled and organized by ProjektDEAL.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.