Efficient assembly of linearized equations in nonlinear homogenization

A novel deformation gradient‐based Reduced Basis homogenization method is compared to the well‐established displacement‐based approach in the geometrically nonlinear context. Algorithmic complexity, the intrusiveness, and the simplicity of the each approach are addressed. It is found that the novel method is advantageous in most respects, especially in returning good approximations of the macroscopic tangent stiffness.


Problem setting
Let u(X) denote the displacement with respect to the Lagrangian coordinate X on the microscale. The deformation gradient is defined as F = (u + X) ⊗ ∇ X . A nonlinear, hyperelastic stored energy density function, W , relates the deformation gradient to the first Piola-Kirchhoff stress, P , via P = ∂W/∂F . The corresponding fourth-order stiffness tensor is denoted by C = ∂ 2 W/(∂F ) 2 . Assuming the macroscale to be separated from the microscale, analogous relations hold on the macroscale, i.e. P = ∂W /∂F and C = ∂ 2 W /(∂F ) 2 . The homogenization problem emanates from the fact that the macroscopic energy density, W , is generally unknown. We consider the quasi-static balance equations of linear and angular momentum at the microscale in the absence of body forces, Div(P ) = 0, complemented by periodic fluctuation boundary conditions. Note that the periodic boundary conditions may be replaced by arbitrary constraints fulfilling the Hill-Mandel condition. Thus, the scale transitioning relations hold, where • denotes the volume averaging operator of the microstructural domain. The inequality (2) 4 holds in general for heterogeneous microstructures and can be refined by C ≥ C, where the operator ≥ is to be understood in the spectral sense. The goal is to solve (1) efficiently for a prescribed macroscopic deformation gradient F , acting as the boundary condition, in order to quickly evaluate the effective material response P and C.
2 Reduced Basis methods

Deformation gradient-based Reduced Basis model
The Reduced Basis (RB) method presented in [1] is briefly recapped here.

Offline phase
The microscopic problem (1) is solved by means of the Finite Element Method (FEM) 1 for carefully chosen boundary conditions F (j) , j = 1, . . . , N s . These boundary conditions are subtracted from their corresponding solution deformation gradient fields, F (j) , in order to obtain N s deformation gradient fluctuation snapshots, F (j) (X; F (j) ) = F (X; F (j) ) − F (j) . More precisely, the snapshots consist of the values of F at each quadrature point within the microstructure. Reduced ansatz functions B (i) (X), i = 1, . . . , N F , constituting the RB, are constructed by means of a standard Proper Orthogonal Decomposition (POD) [2] of the set of snapshots with the usual truncation considerations, N F N s . Both the snapshots and the basis elements have the zero-mean property, F (j) = B (i) = 0.
The reduced ansatz functions B (i) are linear combinations of the snapshots F (j) . These fluctuation snapshots are linearly dependent on the respective displacement fields that solve (1). Therefore, the displacement fields resulting in the reduced ansatz functions B (i) may be visualized, see Figure 1.

Online phase
The RB approximation of a deformation gradient field solving (1) for arbitrary boundary condition reads where the coefficient vector ξ ∈ R NF depends on F via the energy minimization principle ξ = arg min ξ∈R N F W (F RB ) . This optimization task is performed by means of a (Quasi-)Newton scheme, approximating a root of the residual vector r F ∈ R NF by employing the Jacobian D F ∈ R NF×NF with the components where i, j = 1, . . . , N F , P RB = P (F RB ), and C RB = C(F RB ). After convergence, the effective material response is given by Comparing the residual (4) 1 with the weak form of (1) 1 , one can see that this RB method is a Galerkin procedure.

Displacement-based Reduced Basis model
The RB method based on an approximation of the displacement, u, is well-established, see for instance [3][4][5]. In the offline phase, displacement fluctuation snapshots are collected in a manner analogous to the procedure described in Section 2.1.1. An important difference is that these snapshots consist of values at nodes of the discretized microstructure. The nodal RB elements B u(i) (X), i = 1, . . . , N u , are identified by means of a standard POD of this set of snapshots. The online phase is based on the classical discretization arising from the weak form of (1) 1 , e.g. as in the FEM. It reads r FE (u) = 0, with r FE , u ∈ R N dof and N dof being the total number of degrees of freedom. The RB is stored in the matrix B u ∈ R N dof ×Nu which is used to approximate the nodal displacement vector u ≈ B u ξ u . The corresponding Galerkin projection of the Newton scheme reads Here, N dof , the number of nonzero entries of the generally sparse matrix K FE is bN dof . Here, the residual and the stiffness are projected globally, i.e. after assembly. Equivalently, one could apply the projection locally, i.e. at element level, and assemble "directly" (cf. [4]), Here, the index e = 1, . . . , N el denotes the element number, N e dof is the number of degrees of freedom of the element with index e, L e ∈ R N e dof ×N dof is the corresponding nodal index matrix, r e = r e (L e B u ξ u (k) ) is the element residual, and

Comparison of algorithmic complexities
The algorithmic complexities associated with a single Newton iteration for the u-RB with global (6) and local (7) assembly, and for the F -RB (4) are compared in Table 1. Therein, the generally sparse structure of K FE is considered. Concerning the local u-basis, the products L e B u are usually not computed as dense matrix-matrix multiplications but in a more sophisticated manner, such that the associated computational costs are neglected here. For the F -RB, N qp denotes the number of quadrature points X p , and w p are the corresponding quadrature weights. The basis at each quadrature point is stored in the matrix B F (X P ) ∈ R 9×NF . The stress P RB (X p ) ↔ P (X p ) ∈ R 9 and stiffness C RB (X p ) ↔ C(X p ) ∈ R 9×9 are treated likewise. RB assembly quantity complexity Notably, the complexities of the residuals and of the Jacobians depend similarly on the sizes N u and N F of the RB. An analysis of how these sizes might relate to each other, e.g. for achieving a certain accuracy, is postponed to future work. In the following, the dependence on N u and N F is neglected and only the other factors affecting the complexities are compared.
To this end, we consider the FE meshes of two exemplary cubic microstructures, see Figure 2. One consists of 16 3 =4096 linear hexagonal elements with 8 nodes and 8 quadrature points per element (microstructure A). The other (microstructure B) is made of 25633 quadratic tetrahedral elements with 10 nodes and 4 quadrature points per element. The quantities that are relevant for the complexity analysis are provided in Table 2. A B  In order to better judge on the content of Table 1, ratios of the quantities most relevant for the assembly of the Jacobian (highlighted by blue color) are given in Table 3. The values therein should be interpreted cautiously. After all, the complexity analysis presented here is asymptotic, incomplete, and no statements of actual computational costs or run times are made.

Comparison of the local and the global u-based Reduced Basis methods
In terms of the numerical values stated in Table 3, the u-based method with global assembly appears to be superior to the local approach. However, it is important to note that the opposite would hold if the sparsity pattern of K FE was not accounted for, i.e. if b was substituted by N dof . This is in line with the findings of [4], where the local approach consumed less CPU time than the global assembly with dense operations, by a factor of 5-10. Furthermore, the local assembly benefits from more efficient memory access operations. For caching reasons, it is advantageous to perform the projection operation on element level. In contrast to this, the sparse matrix-matrix product involving K FE involves adverse, scattered memory access. Generally, the actual overall performance depends on the specific implementation.
In order to give more sophisticated estimates on the computational efforts involved with each method, the assembly costs of r FE , K FE , r e , and K e need to be taken into account. Hence, some standard FE formulas are recited in equations (8)-(10).

Advantages of the F -based Reduced Basis method
The advantages of the suggested deformation gradient-based scheme become apparent with the values of the last two columns of Table 3. All of these values are greater than one, which is in strong favor of the F -based approach. This becomes even more obvious when considering the fact that, for the F -RB, all complexity factors are stated in Table 1, whereas the u-based approaches necessitate additional "overhead" operations, see Section 2.3.1.
While the operations associated with equations (8)-(10) may be performed in highly optimized manners, the F -RB is appealing in that it completely avoids them at all. In fact, no mesh information other than the quadrature weights of the undeformed mesh, w p , are contained in or reconstructed by the scheme (4). Note that the complexities of the u-based methods implicitly depend on N qp , too.
The F -method is restricted to operations on gradient fields, which are the only fields entering the constitutive equations of the hyperelastic (and more general) material laws. In fact, the constitutive models must not depend on the translation of material points given in terms of the displacements u, which is a physics-based motivation for the F -ansatz.
Also, the degree to which the F -RB method may be computed in parallel is both massive and straight-forward: the volume averaging operation in (4) is the sum in the representations of r F and D F in Table 1. The quadrature points may be accessed in arbitrary order. Therefore, and by considering the amount of quadrature points, N qp , the local material law and the subsequent algebraic operations may be conducted in a highly parallel manner, cf. [6].
As for the intrusiveness, all schemes are comparable, as the offline phases only require fields (displacements or gradients thereof) which are standard output data of common FE software. In our implementation, the gradients are stored during the assembly of r e and the fluctuations are computed in a post-processing step.
Another practical advantage of the novel deformation gradient-based method over the classical methods is its simplicity: a demonstrator tool containing a minimum working example of an F -basis has been published alongside [1], consisting of 50 lines of MATLAB/Octave code.
It is also worthwhile to notice the explicit representation of the effective tangent modulus (5) 2 . This is a key contribution to the field of hyperelastic finite strain homogenization, considering that previous methods have suggested to use a numerical perturbation method [3,5] or a rough approximation by considering only one phase of the heterogeneous microstructure [3].