Many‐scale finite strain computational homogenization via Concentric Interpolation

A method for efficient computational homogenization of hyperelastic materials under finite strains is proposed. Multiple spatial scales are homogenized in a recursive procedure: starting on the smallest scale, few high fidelity FE computations are performed. The resulting fields of deformation gradient fluctuations are processed by a snapshot POD resulting in a reduced basis (RB) model. By means of the computationally efficient RB model, a large set of samples of the homogenized material response is created. This data set serves as the support for the Concentric Interpolation (CI) scheme, interpolating the effective stress and stiffness. Then, the same procedure is invoked on the next larger scale with this CI surrogating the homogenized material law. A three‐scale homogenization process is completed within few hours on a standard workstation. The resulting model is evaluated within minutes on a laptop computer in order to generate fourth‐scale results. Open source code is provided.

mechanical properties of a biological tissue modeled on seven scales were reported. 8 These and other works exemplify the applicability of homogenization techniques based on multiple, hierarchical separations of scales.
Multiple spatial scales are also studied within scenarios of fractal or self-similar microstructures. Recent examples for this include the analytical homogenization of the elastic properties in the presence of fractal pores 9 or (without assumed separation of scales) fractal interfaces. 10,11 Such considerations are motivated, for example, by geology or fracture mechanics.
The present work investigates a computational method for the mechanical homogenization of a large amount of spatial scales-hence the term "many-scale homogenization." It relies on the presence of exclusively hyperelastic constitutive laws on the smallest scale. There, no assumptions are made on the degree of anisotropy or nonlinearity of the material laws. Also, it allows for an arbitrary amount of material phases-including voids-in any geometric layout, given periodic representative volume elements (RVEs). The latter may or may not differ on each scale. The case of similar RVEs on each scale relates this method to fractal schemes.
The proposed scheme is based on the idea of transitioning from computationally demanding, high fidelity methods to methods compromising some of the accuracy for efficiency. This transition is conducted in a staged manner on each scale, and many scales are processed successively. The scheme is illustrated in Figure 1. Starting on Scale 0, high fidelity finite element (FE) solutions are produced. These are then processed by means of a classical proper orthogonal decomposition (POD) which yields a reduced basis (RB). The latter is, in turn, used to evaluate the homogenized material response at numerous specifically chosen sampling sites within the strain space. This set of data serves as the support of the Concentric Interpolation (CI) scheme, which rapidly approximates the effective constitutive response. With this fast, purely numerical surrogate at hand, the next scale can be homogenized in a similar manner. Eventually, the resulting CI method approximates the material law that is relevant on the engineering Scale M.
The outline is as follows. In Section 2, essential notation is introduced and the multiscale problem as well as the RB scheme are recalled. Section 3 contains the many-scale homogenization algorithm with detailed comments on each step. Numerical examples are presented in Section 4. A summary and additional discussions are given in Section 5.

Notation
Coordinate-free descriptions of first and second order tensors are denoted by bold letters, for example, X, F. The order of a tensor is not related to the capitalization of its representative symbol. Fourth order tensors are written in blackboard bold symbols, for example, C. Only the set of real numbers R, its subset R + = {x ∈ R : x > 0}, and the hypersphere S 5 = {x ∈ R 6 ∶ ||x|| = 1} are exceptions to this rule. Real vectors and matrices are denoted by a single and a double underline, for example, X and F, respectively. Vectors are assumed to be columns, unless transposed, for example, X T .
In this work, all simulations are conducted in three spatial dimensions. Let 0 ⊂ R 3 be the domain of the simply connected physical body in its undeformed state, and ⊂ R 3 be the domain occupied by the deformed body. Points within these two sets are referred to as X ∈ 0 and x ∈ , respectively.
In order to measure the deformation, the deformation gradient F = x/ X and the right Cauchy-Green tensor C = F T F are introduced. The physical impossibility of material self-penetration implies the positiveness of the determinant J = det(F) > 0, which in turn guarantees the unique existence of the polar decomposition F = RU. This yields the rotation tensor R, with the properties det(R) = 1 and R T R = I, where I is the identity tensor, as well as the symmetric positive definite right stretch tensor U. The Hencky strain E = log(U) is of utmost importance to the present work.
Hyperelastic material laws are characterized by elastic energy density functions W F (F) or W C (C), respectively, which are equal if their arguments correspond to each other. The F-energy W F density gives rise to the first Piola-Kirchhoff stress tensor via differentiation, P = W(F)/ F, while the C-energy density W C defines the second Piola-Kirchhoff stress tensor, S = W C (C)/(2 C). These two stress measures are convertible by the relation P = FS. The tangent moduli of the first and of the second Piola-Kirchhoff stresses read C F = 2 W F (F)/( F) 2 and C = 2 W C (C)/(4 C) 2 , respectively.
For spatial descriptions, the standard Cartesian basis e (1) , e (2) , e (3) is employed. It is orthonormal with respect to the Euclidean inner/dot product and the norm || • || induced thereby, that is, e (i) ⋅ e (j) = e (i) T e (j) = ij (i,j = 1, … ,3) holds for the vectorized counterpart of the basis, and ij denotes the Kronecker symbol. Components of tensorial quantities are with respect to the tensorial basis {e (i 1 ) ⊗ … ⊗ e (i A ) } i 1 , … ,i A =1,2,3 , where A ∈ {1,2,4} is the respective tensorial order. Thus, there are natural representations of tensors of first and second order as vectors and matrices respectively, for example, X ↔ X ∈ R 3 or E ↔ E ∈ R 3×3 . For vectors and matrices of arbitrary dimensions, the Euclidean norm and, respectively, the Frobenius norm are employed.
Likewise, the matrix variant of the fourth-order stiffness tensor C ↔ C ∈ R 6×6 with both minor and major symmetry employs the same basis. This basis is compatible with the Euclidean norm, that is, ||S|| = ||S||. In general, ambiguity of the components of a second order tensor and of its vectorized counterpart is not possible since the former are indexed by two variables, whereas the latter has only one index variable. Another basis is used with regard to the vectorization of the symmetric Hencky strain E ↔ E ∈ R 6 . For this purpose, a family of special bases  J * = {Y (1) , … , Y (5) , Y (6) (J * )} is introduced: These bases are parametrized by the constant dilatational scaling factor, the meaning of which will be given towards the end of this subsection. The components E 1 , … ,E 6 of E are the coefficients of the  J * -representation of E, where the explicit dependence of the sixth basis element on the scaling factor, Y (6) = Y (6) (J * ), is dropped for notational convenience. The equivalence (4) occasionally leads to a slight abuse of notation implying E = E, which can always be uniquely resolved from the context. Furthermore, the vector E is decomposed as where t ∈ R and N ∈ S 5 ⊂ R 6 are called the magnitude and the direction in Hencky strain space, respectively. While the magnitude, t, is one possible scalar measure of strain, the direction in Hencky strain space, N, characterizes the kind of the strain. For instance, N = [1, 0, 0, 0, 0, 0] T ↔ Y (1) corresponds to isochoric uniaxial extension and N = [0, 0, 0, 0, 0, 1] T ↔ Y (6) (J * ) corresponds to pure dilatation. This vectorized notation requires a particular choice of  J * . This choice will always be clear from the context. For simplicity, we only consider Hencky strains with the restriction Therefore, the constant dilatational scaling factor, J * , determines the maximum and the minimum possible determinant of the associated stretch tensor, In general, adaptions to T max as well as to the normalization of the basis elements Y (1) , … , Y (5) are admissible. For incompressible materials, it is reasonable to consider only the five-dimensional Hencky strain space spanned by the first five elements of Equation (2) to which the following theory can be easily adapted. 12 It is important to note that the magnitude, which is the modulus of the vector representation, is generally not the same as the norm of the matrix representation, that is, t = ||E|| ≠ ||E|| whenever J * ≠ exp √ 3) and [0, 0, 0, 0, 0, 1] N ≠ 0. While the quantity ||E|| might be of significance in other contexts, we only ever use t instead.

Recap of two-scale homogenization
The starting point is the (quasi-)static formulation of the balance equations of linear momentum with respect to the reference configuration 0 , Div X (P) = B ext , where B ext denotes the body forces. Complementary, sufficient boundary conditions (BCs) are assumed to be provided and the balance equations of angular momentum hold, F −1 P = P T F −T . The latter will not be addressed in the sequel for the sake of brevity. In many cases, the domain 0 consists of a very detailed and heterogeneous microstructure that must be resolved for a sufficiently accurate approximation of the solution of the above problem. On the one hand, the microscopic details considered here are of a much smaller characteristic length than that of the overall domain 0 . On the other hand, these microfeatures are still large enough to justify the assumptions of general continuum mechanics.
A well-known method to avoid an excessively fine discretization of the domain 0 (eg, with a FE mesh containing an overly large number of elements) is the method of asymptotic homogenization with assumed separation of scales. 13 For more details on this well-established procedure, the reader is referred to standard literature. 14

Multiscale problem
As mentioned in the Introduction, some realistic scenarios make multiple, hierarchical separations of scales desirable. Formally, this is achieved by iteratively stating the assumptions of the classical two-scale procedure. A rigorous derivation and mathematical analysis of this procedure can be found in the seminal works by Bensoussan et al 15 (Chapter 1, Section 8). An informal motivating description of this process is now stated briefly.
Consider a setting in which, after a single scale separation, the emanating RVE of the smaller scale, in turn, exhibits geometric features that are of a much smaller characteristic length than the RVE. Then, another scale separation leads to an even smaller scale with its own RVE and so forth. The total number of scales that is considered within this context is M + 1. We count the scales zero-based and in ascending order, that is, Scale 0 denotes the smallest scale and Scale M denotes the engineering scale, for example, the one on which we would like to resolve the original domain. The index 0 ≤ N ≤ M is reserved for reference to an arbitrary scale.
Quantities on Scale N are denoted by N overlines, for example, F(X) and F(X) are the deformation gradients on Scales 0 and 1, respectively. However, if N > 3 or if N is not specified, the notation is switched to the corresponding number in parentheses as a superset, for example, (2) C . For instance, the characteristic lengths of all scales are Furthermore, for each separated Scale N (0 ≤ N < M), there is an RVE which is denoted by L . The scale separation approach requires compatible kinematic BCs. Multiple such BCs are available and they are characterized by the fact that they fulfill the Hill-Mandel condition. For more details and a discussion of some possible choices, the reader is referred to standard literature. 16 Here, periodic fluctuation BCs for the displacements, employing the volume averaging operator Moreover, the right-hand side of Equation (9) representing the scale coupling induces a kinematic BC for the problem on Scale N. The complete set of coupled balance equations is stated in the following box: Scale M − 1: Scale 0: The BCs to Scale M may be of any generally admissible type, for example, kinematic, static, or mixed. For each material point X ) acts as a parameter on Scale N − 1 (Chapter 1, Remark 8.5 of Reference 15) representing the corresponding BC. Hence, the problems (12) to (15) need to be solved in a nested manner. On Scale 0, the material laws of all constituents are assumed to be given explicitly. The ridiculously large computational effort associated with the solution of Equation (11) becomes even more obvious when considering that multiple iterations (e.g., of the Newton method) are required-on each scale. This exponential growth of the algorithmic complexity is long-known and was even literally indicated in the denomination of the FE 2 method. 17 For notational clarity, the argument of the BC will be omitted in the sequel, B ext = … = B ext = B ext = 0, in order for the asymptotic homogenization ansatz to remain valid.
As with the classical two-scale results, the hyperelasticity property is conserved by upscaling, that is the characterizing energy density function on the next scale is the volume average of the respective function on the current scale. The same holds for all additive quantities, but especially not for those that are nonlinear with respect to the displacement X . The most relevant statements in this regard are given in the following box: kinematics: energies: stresses: tangent moduli: The inequality (17) 1 stems from the quadratic nature of the right Cauchy-Green tensor. From this, the inequality (21) 1 follows, which can be shown by the corresponding form of the Hill-Mandel condition. The non-additivity of stiffness tensors (22) 1 and (23) 1 is adding to the complexity of the computational multi-scale homogenization. Having the stiffness tensor for the subsequent scale accurately represented is required in order to setup reliable multiscale FE simulations with low numbers of nonlinear Newton-Raphson iterations. The classical computational evaluation of (N+1) C F requires the usage of numerical perturbation of the corresponding stress, 18 or of a computationally very demanding Schur complement technique. 19 The authors have previously introduced a RB method 12 that can efficiently compute the effective stiffness tensor in a two-scale setting. This method will now be briefly introduced and formally generalized to multiple scales. It is crucial to the proposed computational many-scale homogenization method.

RB homogenization
The essentials of the RB homogenization method 12 are now summarized in brevity. For more details, the reader is referred to the original open access source. An algorithmic comparison with competing alternative methods can be found in Reference 20. In the following, the transition from Scale N to Scale N + 1 is considered, that is 0 ≤ N < M and M ≥ 1. In terms of Figure 1, the RB_Nmethod is now described.
At the core of the RB ansatz is the approximation of the deformation gradient field  , and its size, N RB , we omit the explicit N-dependence. The basis functions are assumed to be given at the moment, their construction is algorithmically described in the next section of this article. Thus, the BC and the vector of RB coefficients, = (N) ∈ R N RB , are included as parameters in the RB approximation As a fundamental principle of mechanics, the energy integral is sought to be minimized, From this and from the fact that the ansatz (24) is a low-dimensional approximation within the space of all deformation gradients, it follows (Section 2.3 of Reference 12) that meaning that the solution to Equation (25) realizes the best approximation of the true effective energy density. Once sufficient convergence (see Equation (28)  C F;RB are readily available, cf. Appendix B of Reference 12. Next, these are converted to their equivalents with respect to the right Cauchy-Green tensor, The formulation with respect to the second Piola-Kirchhoff stress and its tangent modulus is widely used for the implementation of material routines, for example, in the context of the FEM. Thus, this RB method may be easily wrapped for the employment in such simulation software. This holds true even for commercial products as long as a user-defined material interface is provided, which usually is the case. The computational effort for a single evaluation of the stress in Equation (27) is significantly reduced compared to the FEM as the solution of the minimization task (25) depends on just N RB DOF. Usually, N RB is on the order of 20-100. Apart from this speed-up, a good approximation of the tangent modulus is given by Equation (28).
However, the volume averaging operator requires integration over the RVE (N) 0 , cf. (10). This operator is invoked multiple times per iteration of the minimization task (26). Thus, although the number of DOF is drastically reduced in comparison to the FEM, the original spatial complexity is still present by the number of integration points. For this reason, the unmodified RB method as described above is not suitable for online application in many-scale simulations. Several opportunities for further speed-up are discussed in the original paper. 12 For now, we settle with the moderate speed-up factor of 10-100 with respect to the FEM (not accounting for the setup phase of the RB), as the RB method is part of the offline phase of the proposed many-scale scheme.

COMPUTATIONAL HOMOGENIZATION VIA CI
The CI method was recently introduced by the authors 21 in its general form. Previously, it was employed in the RNEXP method 22 in an ad-hoc manner. It serves as another major building block for the proposed many-scale computational homogenization scheme. In terms of Figure 1, the RB method efficiently provides some evaluations of the homogenized material law. These data are used as supporting data for the subsequent CI method. While the RB method as treated in this context is based on the fundamental physical principle of energy minimization, the CI method is a purely mathematical means of function approximation. However, its particular design is specialized to the characteristics of material laws and enables very efficient numerical treatment of material data, accounting for the usually rather sparse availability of the latter. The C/C++ implementation of the CI scheme employed in this study is provided as open source code. 23 The repository includes an example in which the hyperelastic material law ETM from Section 4.1 is interpolated. For this, compile demo_largestrain in the subfolder examples, and then execute the corresponding binary in the bin folder.

Basic scheme
The CI method is a specialization of the established general kernel interpolation method. It is standard 24 that the approximationf of a scalar function f : R → R by N supp symmetric positive definite kernel functions k i : The kernel functions k i (x) = k(x i ,x) are identical up to the locations of the support points, x 1 , … , x N supp . These functions constitute the components of the kernel vector k(x), whereas the kernel matrix The function values at the kernel centers define the components of the vector p, that is, , by substituting the geodesic (i.e., angular) distance function, acos(x ⋅ y), for the argument in Φ is known as spherical basis function ansatz ( §6 of Reference 25). A popular choice for kernel functions is the Gaussian kernel, which then takes the form In the present work, attention is restricted to this kernel function and investigations of possible alternatives is not within the current scope. Since the kernel centers are points on a unit hypersphere, they are from now on denoted support directions, and accordingly their number is N supp dir . The CI method is an extension of this spherical interpolation on the hypersphere S D − 1 to the surrounding Euclidean space R D . This is achieved by first splitting the argument x ∈ R D into its magnitude t and its direction N, Then, the spherical interpolation ansatz from above, classically acting exclusively on N ∈ S D−1 , is complemented by the introduction of a dependence of the vector of function values on the magnitude t ∈ R + , Here, it is assumed without loss of generality that f (0) = 0, which can always be ensured by shifting the function values. Accordingly, the components of the function value vector p(t), p i (t):R + → R, are chosen with the property p i (0) = 0. More precisely, these functions are defined as piecewise linear polynomials and the same set of support magnitudes Hence, there is a distinct one-dimensional approximation of the function values along each of the N supp dir supporting directions.
The total number of function values necessary for the setup of the CI scheme is N supp dir N supp mag and can be adjusted according to the properties of the target function f . Also, the placement of the magnitude support points t 1 , … , t N supp mag may be adapted in a problem specific manner. However, the choice of the support directions N (1) , … , N (N supp dir ) is strongly constrained by the choice of identical kernel functions of the kind (30) for each of the components of the kernel vector k(N), that is, all kernel functions share the same kernel parameter . Thus, the directions are sought to be asymptotically uniformly distributed. 21

Efficient implementation
The approximantf from (32) can be highly efficiently implemented: by means of dedicated matrix-vector representations of the involved quantities, the inverse kernel matrix, K −1 , can be multiplied with static data during the setup phase. Then, each evaluation off (t, N) reduces to the evaluation of the radial interpolation vector, p(t), the kernel vector, k(N), and some matrix-vector multiplications.
For the sake of simplicity, this is now exemplified assuming the radial interpolation vector, p(t) ∈ R N supp dir , is linear on the whole domain R + , that is, not just piecewise. Then, it has the discrete representation Here, the components a i and b i are the polynomial coefficients of the component In the actual implementation with piecewise linearity of p, the polynomial coefficients are stored in N supp mag sets, each of which is multiplied with the inverse kernel matrix in analog to Equation (34) during the setup phase.

Concentric Interpolation of material laws
FE software oftentimes provides an interface to user-defined material routines. In finite strain hyperelasticity, material laws are usually defined with respect to rotationally invariant kinematic descriptors, such as U or C = U 2 . However, as the authors previously emphasized, 12 it is advantageous for a number of reasons to sample mechanical response functions on the space of Hencky strains, E =log(U). Additionally, the space of Hencky strains is isomorphic to R 6 , cf. (4), allowing for the application of CI,f ∶ R 6 → R. In principle, the form of the interpolant, cf. Equation (32), is suitable for approximating the hyperelastic energy density function, W C = W C (exp(2E)). Furthermore, if sufficiently smooth one-dimensional radial interpolation functions, p i , were used instead of piecewise linear polynomials, the stress, S, and its tangent modulus, C, could be computed by means of differentiation. However, differentiation comes along with significant increases of the number of linear operations (Section 5.3 of Reference 22). Moreover, the involved derivatives of the matrix exponential and of the matrix logarithm are highly non-trivial to implement. A different approach is pursued here for these reasons.
Instead of interpolating the energy density function, the second Piloa-Kirchhoff stress, S ↔ S ∈ R 6 , and its tangent modulus, C ↔ C ∈ R 21 are directly interpolated over the space of Hencky strains, E ↔ E ∈ R 6 . It is important to recall that the basis (1) is employed for the stress and the stiffness, while the Hencky strain is formulated with respect to the F I G U R E 2 Sketch of concentric sampling of the Hencky strain space. For visualization purposes, only two of the six coordinate axes of  J * are considered. N mag samples are placed along N dir approximately uniformly distributed directions. The former characterize the load magnitudes, the latter correspond to the type of the load [Colour figure can be viewed at wileyonlinelibrary.com] basis (2). The CI ansatz is modified into meaning that all 27 components (6 for the stress and 21 for the stiffness) are interpolated separately but simultaneously: the respective functional data is independently interpolated by the N supp dir distinct components of the vector p (i) (t) (i = 1, … , 27). The magnitude, t, and the direction, N, of the query Hencky strain, E, are computed as defined in Equation (5).
By interpolating the components of the stress and of the stiffness separately, their consistency is generally lost. Thus, a possibly negative effect on the convergence behavior of Newton-Raphson is to be expected and will be investigated in the numeric section of this article. One way to principally attenuate this drawback could be to incorporate FFT-based homogenization schemes 26,27 on the lower scales as such schemes are generally less sensitive to the tangent modulus. Alternatively, automatic differentiation 28 (AD) of the energy density function W C could possibly be employed for the evaluation of derivatives, similar to previous works. 22

Concentric Sampling of material laws
The CI method crucially depends on the concentric sampling (CS) method. The CS algorithm employed in this work is a slight modification of the original (see Algorithm 1 of Reference 12) and is essentially described by the following construction of Hencky strains, cf. Equations (4), (5): Here, a Hencky strain basis Y (i) ∈  J * (i = 1, … , 6) with fixed scaling factor J * is considered. Furthermore, a set of asymptotically uniformly distributed points on the five-dimensional unit hypersphere, N (n) ∈ S 5 ⊂ R 6 ∼ span( J * ), n = 1, … , N dir , is assumed to be provided. Such point sets may be constructed, for example, by means of the MinimumEnergyPoints open source Matlab/Octave code provided by the authors. 29 These hyperspherical points are interpreted as directions within the Hencky strain space along which samples are placed. The distribution of the samples along each direction is given by the set of sampling magnitudes, 0 < t 1 , … , t N mag = 1. A visualization in two dimensions is provided in Figure 2, where N dir = 5 and N mag = 3. When the Hencky strains obtained through CS are utilized as support points for the CI method, the nomenclature N dir = N supp dir and N mag = N supp mag is adopted. On the other hand, if the CS points are evaluation points of the CI scheme, this is denoted by N dir = N eval dir and N mag = N eval mag . Accordingly, when the resulting stretch tensors, exp E (m,n) ), are applied as BCs to the boundary value problems (12) to (15), this is again denoted by N dir = N eval dir and N mag = N eval mag , regardless of the employed solution method, FE, or the RB.
By means of this sampling approach, representative and unbiased sampling of the Hencky strain space and, thus, after exponentiation, of the physically meaningful part of the space of stretch tensors is realized: • Representative means that the sampled space is covered in a manner that is dense with respect to a certain metric, given a limited number of discrete sampling points. In the case of material laws, the metric is the angular distance between different load directions, acos(N (i) ⋅ N (j) ). Sampling the directions in an approximately uniform manner ensures that the resolution of the anisotropy of the present material law can be adjusted.
• Unbiased means that the resolution by which different regions of the domain are covered is either constant or deliberately adapted. The radial density of the samples in the CS method may be adjusted according to expected features of the present material law. For instance, if a significant change of the material response is expected at certain load magnitude, for example, a transition from linear to nonlinear behavior, then a neighborhood of this critical point may be resolved more thoroughly.
• Physically meaningful, in the context of sampling of kinematic quantities, such as the stretch tensor U =exp(E), means that the sampling domain is confined in such a way that all contained points correspond to realistic deformations that might actually occur in real-world problems. This is one of the main advantages of the CS method in contrast to classical approaches that directly sample the space of stretch tensors, right Cauchy-Green tensors, or deformation gradients with uniform Cartesian grids. 30,31 Such grid-based sampling methods are unable to keep the determinant J = det(U) within realistic bounds (eg, 1 ± in the quasi-incompressible case) while also exploring isochoric regions of the respective kinematic state spaces to their whole physically meaningful extent, for example, more than 100% strain for rubber-like materials.

Many-scale homogenization: Overall algorithm
The proposed method is a many-scale solution to the multi-scale problem: it is emphasized that not just more than one scale but actually numerous scales are transitioned. As indicated in Figure 1, the lowest M scales of the general multi-scale problem, cf. Equations (12) to (15), are homogenized in a recursive manner. Then, the resulting numerical surrogate of the homogenized material law can be applied on the engineering scale. The complete procedure is formally stated in Algorithm 1. A thorough step-by-step description is given in the following. We omit the scale index N on top of some quantities for the sake of enhanced readability. For the sake of brevity, the scaling parameter J * is fixed throughout the whole algorithm, that is, across all scales, which conforms with the numerical experiments of the next section. If the query Hencky strain was ever out of the sampling range defined by T max and J * , the CI routine would still return a result by means of extrapolation.

FE simulations
On Scale N, the FE method is invoked by a call to Algorithm 2 which is denoted FE_N. Kinematic periodic fluctuation BCs, defined in terms of stretch tensors corresponding to Hencky strains (36)   It is important to note the order in which the BCs are applied: the FE method generally converges quicker if only the load magnitude, t, is incremented between load steps and the kind of the load, that is its direction N with respect to the Hencky strain representation, is kept constant. Furthermore, very simple parallelization of the overall workload is possible by running multiple instances of the same program with different sets of loading directions. For instance, if the number of available compute threads is 10 and N eval dir = 100, one might run 10 instances in parallel, each applying 10 different load directions successively.
The Newton iterations of the FEM are stopped when the maximum nodal residual is below 10 −5 MPa ⋅ L 2 (where L is the side length of the cubic RVE) or if its ratio to the maximum reaction force is below 10 −3 . In the present study, the same stopping FE criterion is employed on all scales. This is in contrast to other works 32 that eased the stopping criterion on larger scales.
Usual convergence issues are expected at some evaluation directions above few critical load magnitudes (see Section 3.2.5 of Reference 30). In practice, certain states of (large) deformations simply cannot be reached by a given FE model. This effect depends on the maximum magnitude T max , the properties of the mesh, the element stabilization method, the material laws employed at the current Scale N, the scaling parameter J * , among other factors. Therefore, the number of available solutions is usually lower than the number of BCs provided by CS. In order to conservatively assess the practicality of the proposed method, no FE stabilization method is employed in the present study.
The converged resulting fields of deformation gradients are post-processed by pointwise subtraction of the prescribed homogeneous deformation. Thus, the set  of deformation gradient fluctuation fields is computed, denoted snapshots, and returned to the main routine.

Proper Orthogonal Decomposition
The set of deformation gradient fluctuation snaphots, , is processed by the POD Algorithm 3, which is dubbed POD_N. Snapshot POD methods of this kind have been successfully applied before. 33 A popular way of deciding when to truncate the emanating set of basis functions, that is, how large to choose the number N RB = || of basis functions, is to employ the Eckart-Young-Mirsky theorem. By this, the resulting projection error can be controlled based on the eigenvalues of the correlation matrix. However, the authors have experienced 12 a deteriorating effect in terms of the stress approximation error with N RB values greater than a certain optimum. This effect is interpreted as an artifact stemming from spurious displacement patterns (for instance strongly oscillating fields) mainly located at the interface of high contrast materials. The severity of this effect has yet to be investigated systematically.
For the moment, we settle with the empirically determined 12 number of N RB = 30 basis elements, which appears to give reasonably good approximation accuracy of both the energy density and the stresses. Also, the resulting speed-up compared with the FE method remains significant. The truncated basis  is returned to the main algorithm.

RB simulations
Just as the FEM, the RB model is evaluated at kinematic BCs corresponding to Hencky strains that result from CS. But in contrast to the FE model, the numbers N eval dir and N eval mag are chosen significantly larger for the call of the RB_N (Algorithm 4).
As a rule of thumb, the number of BCs, N eval dir N eval mag , applied to RB_N can be greater than that of FE_N by the same factor as the speed-up of the RB with respect to the FEM. This way, the computational work load associated with these two stages of the overall homogenization process is comparable. Positive experiences were obtained by increasing N eval dir by a factor of 5 and N eval mag by a factor of 2, corresponding to a speed-up factor of approximately 10. The choice of these parameters strongly depends on the size of the RB, N RB . On the one hand, a smaller basis is advantageous in that a greater number of BCs can be applied, that is a greater sampling resolution can be achieved. On the other hand, the quality of this sampling data is increased by a larger basis. It is a complex task to choose all parameters (of the RB but also of the overall method) in an optimal way. Importantly, the robustness of the RB method is greatly increased in contrast to the FEM. Hence, the maximum magnitude T max may be chosen much larger for the RB than for the FE model. However, one should keep in mind that the reliability of the RB results is then decreased as the RB is evaluated far outside the domain of the snapshot data. Also, the enhanced robustness is still limited and an occasional lack of convergence for (large) magnitudes of certain kinds of deformations is to be expected.  C RB in t from magnitudes t m−2 , t m−1 (assume convergence at least for m = 2) increase extrapolation counter N eval ext ↤ N eval ext + 1 end if subtract initial stiffness (i.e., at t = 0) from current stiffness: } end for end for output Concentric Interpolation support data  Convergence in this context is defined as the RB residual (see Equation (28) of Reference 12) reaching a norm of less than 5⋅10 −7 MPa ⋅ L 2 (where L is the side length of the cubic RVE). Alternatively, convergence is assumed when the ratio of the residual to the current increment's initial residual is below 10 −7 .
In contrast to the FE_N, where partly unsuccessfully applied BCs simply reduced the number of snapshots, this is a more severe issue for the RB_N: the outputs of this stage serve as inputs to the subsequent CI scheme. The latter necessarily requires data at all support points (with the current assumption of piecewise linear interpolants with identical support positions along all directions). We choose to linearly extrapolate incomplete RB data along the respective direction. The number of such interpolated RB results is counted in N eval ext . Alternatively, any other means of data completion could be employed, for example, by means of specific choices of the radial data functions p i (t) from Equation (32).
Note that just like the FE method, the RB method is suitable for trivial parallelization by simply running multiple instances of the program at the same time, each with different sets of BCs.

User-defined material function based on Concentric Interpolation
As the last step before the completion of one homogenization loop in Figure 1, the CI_N scheme is set up according to Equation (35). The interpolantf is incorporated in the user-defined material (Algorithm 5). Finally, the scale counter N is incremented by one and Algorithm 1 proceeds correspondingly.

Goals
The main goal of the following considerations is to provide an example of a three-scale simulation based on Algorithm 1, cf. Figure 1. This is meant as a proof of concept only and thorough investigations of important details, such as the influences of different kinds of parameters, are not within the current scope. The geometries of the RVEs on Scales 0 and 1 are identical. On Scale 2, the same geometry is again utilized. In this sense, the geometric setup (cf.  34 If the geometry on Scale 2 is again considered an RVE and no body forces are applied, the resulting local stress field may be volume averaged and then interpreted as a fourth-scale homogenized quantity. This subsection ends with selected evaluations of such stresses on a hypothetical Scale 3.

Geometry and Scale 0 model
The geometry considered on all three scales is a cube with a centered spherical void, cf. Figure 3. The mesh contains 4399 second-order TET10 elements with a total of 7531 nodes resulting in almost 22 600 DOF. The mesh, that is, the solid phase, occupies a volume of 0.90 within the unit cube. For the material law on Scale 0, the extended tube model (ETM) (eq. (22) of Reference 35) together with a mixed quadratic-logarithmic volumetric model 36 is utilized: As found by a comparative study (table III of Reference  other parameters and, thereby, to approach quasi-incompressible behavior. A nonlinearity that is more pronounced with the ETM than with the widely used Neo-Hookean model is visible in Figure 4.

Three-scale setup
According to Algorithm 1 and Figure 1, the homogenization of Scale 0 and of Scale 1 is performed in a staged manner. All parameters employed throughout this homogenization process are listed in Table 1. The kernel width parameter is determined via optimization of the stresses S CI with respect to validation stresses S RB on a large set of validation Hencky strains: N eval dir = 1000, N eval mag = 100, J * = 1.2, T max = 1.0. For comparison, the optimization of the kernel parameter yields an optimal value of = 3.56 when the discrepancy between the stiffnesses C CI and C RB is minimized. As the stress approximation directly influences the balance equation, the result obtained from the stress optimization is employed. The same value of is used for the interpolation scheme CI_1 without further optimization.
Another noteworthy parameter is T max , which is 1.0 throughout Scale 0. However, it is reduced for FE_1 to T max = 0.3 due to severe convergence issues, cf. Section 3.5.1. The enhanced robustness of RB_1, cf. 3.5.3, allows for the choice of T max = 1.0 again. At this stage, 14% ≈ N eval ext ∕(N eval dir N eval mag ) of the applied BC lacked RB convergence.

Hardware setup and runtimes
In order to accelerate the setup phase (or offline phase), the FE, POD, and RB computations were performed on a standard workstation with an Intel(R) Xeon(R) E5-2643 v3 CPU (12 threads) and 256 Gb of RAM. The memory intensive POD never came close to requiring all memory. An exact quantitative comparison of the various runtimes is not within the scope of the current work. Depending on the method and on the stage, different kinds of overhead data were created and stored during the multi-scale homogenization process, for example, visualization data, statistics, and debugging information. Also, important meta-parameters were chosen as sophisticated guesses, for example, stopping criteria, maximum number of Newton iterations, behavior in case of convergence failure. Furthermore, the FE and the RB simulations were conducted with multiple instances of the same program running in parallel, each with a different set of BCs. Most importantly, the computations for the current case study were not always conducted on a dedicated workstation. For these reasons, quantitatively exactly reproducible runtime values are set aside for the moment. The overall wall time to conduct the whole three-scale homogenization procedure on the workstation, that is, for FE_0, POD_0, RB_0, FE_1, POD_1, and RB_1 together, was approximately 7 hours. The simulations on the third scale by means of FE_2 were conducted on a laptop computer and took approximately 2-3 minutes per load path.

Results
As the overall porosity of the structure increases with each homogenized scale, a scale softening effect is observed in the initial stiffnesses, that is, in the stiffness tensors at zero magnitude, t = 0, cf. Table 2.
Next, the geometrically identical structures on Scales 0, 1, and 2 are subjected to the same BC, U =Ū =Ū, with the principal stretches x = z = 0.9035, y = 1.225. The resulting deformations and stress distributions are depicted in Figure 5. The effect of the scale softening is clearly visible which is in accordance with Table 2.
We now address the question of how reduced bases of two scales compare to each other. Although they are returned from identical routines POD_0 and POD_1, their underlying snapshot data differs in terms of the material law and, most significantly, the parameter T max . Since the eigenvectors of the snapshot correlation matrix are ordered descendingly with respect to the corresponding eigenvalues, the returned N RB = 30 RB elements represent the most significant information contained in the snapshots. Out of this set, the first nine functions are depicted in Figure 6 for both scales.
A systematic comparison of the two bases is possible when considering the absolute correlation of the reduced bases, that is, the mutual projections of the RB elements onto each other, |⟨B (i) ⋅ B (j) ⟩|. This is well-defined, cf. Equation (10), when considering that the geometries of the RVE are identical, 0 =̄0. The RB correlations are visualized in Figure 7 These correlation values reveal some similarities that one might expect from visual inspection of Figure 6, for example, B (4) ↔ B (6) and B (6) ↔ B (5) . Moreover, some less obvious relationship are found, for example, B (5) ↔ B (4) . One should bear in mind that the visualization in Figure 6 might appear differently if another clipping plane was chosen.
Next, the accuracy of RB_N and CI_N is compared against FE_N for N = 0 and N = 1. Recall the definition of the homogenized stress from Equation (20). For N = 0, a set of CS BCs  vali are chosen with the parameters T max = 0.3, J * = 1.2, N eval dir = 150, and N eval mag = 10, that is, || = 1500. The same CS parameters are chosen for vali and N = 1. However, this set is reduced to contain only those BCs for which FE_1 successfully converges, wherefore effectively | vali | = 1437.     (1) , … , B (30) and B (1) , … , B (30) of Scales 0 and 1, respectively. See also Figure 6 The set of relative errors err is considered. The corresponding empirical distribution functions are shown in Figure 8. One should keep in mind that FE_1 (and RB_1) utilizes CI_0 as a material law. Hence, the comparison in Figure 8 (right) is with respect to an approximate reference. In any case, it is observed that the error in the stress approximation is within acceptable bounds (≤ 8% on Scale 1, ≤ 1% on Scale 2). This has to be seen in the context of actual engineering applications where many other additional uncertainties come into play, such as the material models on Scale 0 and the geometry of the considered RVEs.
Eventually, a couple of four-scale simulations are conducted. Isochoric uniaxial extension BCs of the kind with t ∈ [0, 1], are applied to the three Scales 0, 1, and 2. The respective FE methods are invoked and the resulting stress fields P(X), P(X), andP(X), respectively, are computed. Subsequently, these are volume averaged, leading to P, P, andP which are quantities on Scales 1, 2, and 3, respectively (counted to base zero). The magnitudes of these effective stresses are depicted in Figure 9 (left). The amount of corresponding Newton iterations is graphed in Figure 9 (right). By the number of iterations and by the maximum value of the magnitude t for which FE convergence is obtained, one can clearly see a trend of deteriorating FE convergence as more scales are homogenized, cf. Section 3.5.1. Also, the scale softening effect is again noticeable.
More kinds of fourth scale BCsŪ = exp(t Y (i) ) are applied to Scale 2. The evolution of the volume-averaged stresses for the choices i = 1, 2, 3 is shown in Figure 10 (left), with the corresponding Newton iteration count on the right.

4.2
Homogenized stress response of stiffening microstructure via CI

Goals
The aims of this numerical example are to • prove the suitability of the CI method for the homogenization of a highly anisotropic RVE, that is, homogenize one scale only, • modify the CI interpolantf ∶ R 6 → R 9 such that it interpolates the stress P as a function of the Hencky strain E, • setup the CI interpolant directly on FE data, avoiding the offline stages POD and RB as far as possible, • gain insights into the influence of the parameters N supp dir and on the accuracy of the stress interpolation, • study the influence of the parameter N supp dir on the runtime of the CI scheme.
This subsection does not strictly employ Algorithm 1 but instead investigates the possible shortcut FE → CI. As pointed out in Section 2.2.2, the FE method is not nearly as well suited as the RB for providing the tangent modulus, hence we now focus on the stress only.

Model
The RVE under consideration is depicted in Figure 11. The FE mesh of the noncubic RVE consists of quadratic TET10 elements with a total of 33 923 nodes, resulting in over 10 5 total DOF. The hash-shaped inclusion phase has a volume fraction of 13.3% and is the source of significant geometric stiffening effects under certain BCs, as was investigated previously. 12 An isochoric Neo-Hookean material law together with a mixed logarithmic-quadratic volumetric part 36 of the form is employed for both the inclusion and the surrounding matrix phase. While G m = 10 MPa and K m = 3000 MPa are chosen for the latter, the former is described by the constants G i = 100 MPa and K m = 3000 MPa, that is, the phase contrast is 10 for the shear modulus. These material parameters roughly correspond to a soft polymer matrix and a stiffer polymer inclusion. As before, it is important to emphasize that the choice of the hyperelastic material model is completely arbitrary and serves as an example only. The method is general with respect to this choice.

Design of the interpolation scheme
For this example, the domain of interest within the Hencky space is defined via T max = 1 and J * = 1.005. Initially, it is assumed that a sufficiently accurate interpolant P CI of the effective stress P should be attainable with N supp dir between 50 and 1000. The supporting magnitudes are chosen at N supp mag = 10 values, wherefore the total numbers of supporting points lie between 500 and 10 000. If instead, for example, regular Cartesian grids were employed, the same total numbers of points p 6 would imply as few as p ≈ 2.8 to p ≈ 4.7 points along each axis. The

Evaluation and results
A validation set of response stresses P FE is created at additional Hencky strain sampling sites (i.e., disjoint from the training sites) parametrized by N eval dir = 150 and N eval mag = 16, where the magnitudes are of the form t (i) = i∕16, i = 1, … ,N eval mag . Of these 150⋅16 = 2400 evaluation points, 17 are excluded due to a lack of convergence of the FEM, cf. Section 3.5.1. Therefore, here, the validation set  vali consists of 2383 FE-homogenized stresses corresponding to an incomplete set of concentric samples of Hencky strains E.
In order to assess the accuracy of the interpolant P CI compared to the result P of the FEM, the error measure is evaluated on  vali . With this at hand, the quality of a given kernel parameter can be judged. Figure 13 Next, the distribution of the function err P is studied for these optimum values. To this end, Figure 15 shows the empirical distribution functions (EDFs) of this error quantity for each value of N supp dir and for each evaluation magnitude. In other words, the EDFs measure the probability, for each evaluation magnitude separately, of the error lying within certain bounds when evaluated at any of the N eval dir = 150 directions. One can observe that the mean of the error err P is below 2% for as few as N supp dir = 100 supporting directions. Also, the individual EDFs are comparatively "smooth," that is, they do not exhibit significant jumps or plateaus. This confirms a lack of severe outliers of the error function (41).
It is worth noticing that, just as with the results depicted in Figure 13, the mean of the error seems to be decreasing monotonically with an increasing number of supporting directions, except for the transition from N supp dir = 200 to N supp dir = 250. This is very likely due to the additional error introduced by the RB, slightly deteriorating the quality of the data at the supporting points. Still, a further increasing number of supporting directions leads, again, to a monotonic decrease of the mean error. In terms of the mean error, parity of the RB-based CI scheme with the finest FE-based CI scheme is only reached with N dir > 500. However, even with N supp dir = 1000 the ROM-based CI scheme is notably more prone to the maximum error than the FEM-based one with just one-fifth as many supporting points.
Finally, the average CPU time of all CI evaluations on the validation set  vali are depicted in Figure 16 Figure 13). For these computations, a standard laptop computer was used.

Summary
A novel method for the homogenization of many spatial scales in finite strain hyperelasticity was successfully demonstrated. The combination of moderate, simple parallelization on standard workstations and efficient numerical methods rendered the offline homogenization across three scales possible within a couple of hours. The online phase was conducted on a laptop computer within a few minutes of compute time. Many thousands of evaluations of the numerical surrogate for the upscaled material response are evaluated per second. Qualitatively, the observed results matched the expectations, that is, nested scales with porous RVEs resulted in a significant scale-softening effect. Also, a highly anisotropic RVE was homogenized with good accuracy.
The main challenge was a deteriorating convergence behavior for the larger scales. This is a widely-known issue that, to the best knowledge of the authors, is present in most if not all computational multi-scale methods. Additionally, the interplay of the many parameters for the main algorithm does not contribute to user-friendliness.
The workflow can possibly be significantly simplified and accelerated if one settles with interpolation of data generated directly by the FEM. This is especially viable when effective stiffnesses are not sought-after. As far as usability is concerned, it must be emphasized that the scheme is non-intrusive in the sense that it can be utilized with any simulation software that provides an interface to user-defined material routines.

Interpolation given different strain measures
Sticking to the Hencky strain space for the sampling comes with many advantages. Most importantly, the Hencky strain space is isomorphic to the full space R 6 while the spaces of other deformation or strain measures are isomorphic to nonlinear manifolds embedded into R 6 (e.g., for C, U) and R 9 (e.g., for F) due to constraints on the tensors such as positive definiteness or preserved orientations. More precisely, any point in R 6 will yield an admissible Hencky strain while for other deformation measures (independent of the chosen basis) additional checks on the admissibility are required. Furthermore, the Hencky strain allows to explicitly control the sampling of the determinant of F in a straight-forward manner, which is a tricky procedure for other strain measures. It should be noted that regardless of what input strain is delivered, as long as the sampling is effected with respect to strain states corresponding to the CS points in the Hencky strain space, CI can be performed after transforming the input to the Hencky domain. However, this does not imply that the program calling the CI-based scheme must work with the Hencky strain. Note also that the validity of the samples in more obvious deformation measures (such as the right Cauchy Green strain) yields points for which the requested samples cannot be obtained, as discussed in other works. 30 Therefore, the authors decided to use and also to promote the use of CS in the Hencky strain space and the related CI although this choice is not intrinsically unique and explicitly not a rigorous requirement of CI.

Number of sampling directions
From the results of Section 4.2, it is concluded that the gains in accuracy by a larger number of supporting directions is partly offset by the additional modeling error of the RB scheme. Moreover, this error comes along with significant additional offline costs for the identification and evaluation of the RB. Future investigations of the shortcut of directly interpolating high-fidelity data-if available-might be fruitful. The trade-off with respect to the resolution of anisotropic effects may be a delicate choice. Note that by anisotropy, not just explicitly given material laws are meant but also geometry-induced effects on higher scales.

Number and position of sampling magnitudes
Choosing the distribution of the N supp mag radial supporting points relies on empiricism at the current state of the art. Variations of the qualitative distribution as well as variations of the number N supp mag were exemplarily treated previously. 21,22 It was found that CI is rather insensitive to the distribution of the supporting points and that an increase of their amount correlates with the quality of the interpolation. However, neither the positioning of the points nor the determination of their quantity can be rigorously guided at the moment, for example, by means of an error estimation.

Kernel parameter
Also, the findings of Section 4.2 suggest that the determination of the optimum value of the kernel parameter may be omitted if the value is chosen according to Equation (42). The error induced by this possibly sub-optimal value may be justifiable depending on the context. An optimization of the kernel parameter should generally be conducted on data that is primary to the balance equations, that is, on stresses and not on tangent moduli or on energy densities. However, fitting to modulus data yielded a comparable result in the present study.

Accuracy
Errors propagate exponentially through scales. As far as errors stemming from the data-driven approach are concerned, these might be assessed by performing unreduced FE 2 simulations-for a single scale transition. Concerning three or more scales, such unmodified schemes are practically impossible with meaningful spatial resolution at each scale, as the computational effort also grows exponentially with the number of scales. Furthermore, convergence issues with classical methods remain a serious hurdle, as was also experienced in the present work. Thus, the application of the RB method reduces the computational effort to a manageable level not just because of the numerical economy but also due to the increased numerical robustness. Because of the latter, larger load increments are possible and, as we experienced, certain load levels become computationally feasible. For growing load magnitudes, the accuracy of the CI-model tends to decrease. This effect was investigated in previous works ( fig. 12 of Reference 21). There, it was also shown that an increase of N supp mag generally improves the accuracy-with only very minor effects on the computational effort during the online evaluation of the surrogate model.

Generalization to other material models
In principle, a CI surrogate can be set up for any material model with exclusive state-dependency. Gradient elasticity, where the energy density function also depends on the second order deformation gradient, W = W(F, ∇ X F), is a candidate that belongs to this category. 38 In such a case, the interpolation domain should be re-considered since the choice of the Hencky strain space was not made with higher gradients in mind. Dissipative material laws might be covered or at least assisted by CI. For instance, standard viscoelastic material models 39 contain a hyperelastic term that can be directly treated by the CI method as in this article. History dependence might be treated by means of a decisive super-model switching between multiple CI schemes, each covering a certain region of the historic state space. However, the high dimensionality of this space would drastically increase the computational effort. To begin with, the numerical surrogate would be more complex. But even more severely, the state history would need to be tracked at each integration point on the current Scale N, as is the case in many classical dissipative material models involving internal variables. It could be an option for the super-model to decide to retreat to unreduced FE simulations on Scale N − 1 in case the state evolves along a path leaving the CI-covered domain. 40 Future works might reveal alternatives pathways for the consideration of dissipative effects with notable contributions of CI for attaining computational efficiency.