High-performance computational homogenization of Stokes–Brinkman flow with an Anderson-accelerated FFT method

Fluid flow problems in bi-porous media have strong implications in a variety of industrial applications, such as nuclear waste disposal and composites manufacturing. Despite various numerical solutions proposed in the literature, the prediction of dual-scale flow remains a challenging task due to the requirement of fine meshes to resolve the complex microstructures of bi-porous media. This paper introduces a new numerical solver based on fast Fourier transform (FFT) for the Stokes–Brinkman problem to predict fluid flow in bi-porous media with local anisotropy. The novel FFT solver is derived from a velocity-form of the problem, in contrast to the stress-form proposed by a recent work. The Anderson acceleration technique is applied for the first time to the Stokes–Brinkman problem, leading to a substantial (orders of magnitude) improvement of the convergence rate. A range of detailed numerical examples are provided to validate the method against the analytical and literature results. Parametric studies are also demonstrated to aid in the selection of model parameters to achieve optimum numerical performance. With a 3D unit cell of a woven textile fabric, we demonstrate that the proposed method is capable of handling high-resolution simulations with strongly heterogeneous microstructures. Combined with a parallelized implementation over high-performance computing systems, our method demonstrates a new state-of-the-art in numerical solvers for the Stokes-Brinkman problem, in terms of


INTRODUCTION
micropores L 2 is assumed much smaller L 2 ∕L 1 ≪ 1, making the two length scales separable. The flow in Ω f is governed by the Stokes equation: where u and p denote the local velocity and pressure, respectively. is the dynamic viscosity of the fluid, The flow in Ω s is described by a modified Darcy's law with the Brinkman term 5, 6 : where e is Brinkman's effective viscosity. k s (x) is a 3 × 3 symmetric tensor denoting the local permeability tensor. Depending on the microstructure of the solid region, k s can be isotropic or anisotropic. The velocityũ and pressurep in the permeable solid Ω s should be interpreted in the sense of Darcy's flow, that is, they represent the effective quantities at a chosen length scale. The Brinkman equation was motivated by the fact that it can simultaneously approximate the Stokes equation (k s → ∞) and Darcy's law ( e = 0). Although it is often assumed e = in the literature, the physical meaning of the effective viscosity e is still an open question, and there is no commonly recognized way for determining its value. Some authors determined this parameter using either numerical calculation 26 or experimental measurement, 27 suggesting e > , whereas, others analytically demonstrated e < for channel-type porous medium. 28 The interested readers are referred to the dedicated papers for more comprehensive discussions, see for example, References 26-31. The Laplacian term e Δu in the Brinkman equation is also commonly expressed as ∇ ⋅ e ( ∇u + ∇ T u ) , which may potentially lead to a singularity at the interface between Ω f and Ω s if e ≠ is chosen (discussed later in the next section).
Kinematic conditions (either jump or continuity) need to be defined at the interface Γ between the two regions Ω f and Ω s . This has been subject to intensive research, [32][33][34] which claimed that a more accurate description requires a suitable jump condition at the interface. However, continuity conditions are also commonly used as a reasonable simplification in the literature, see for example, References 12,14,35,36. In the present work, we choose to apply the continuity in both stress and velocity, so that the application of the FFT method is relatively straightforward. A numerical treatment of a jump condition in the FFT method merits a dedicated investigation. The composite-voxel technique, able to deal with strong local variations like cracks in elasticity, 37 may be a relevant route to explore.

Problem with a unified framework
As proposed in References 12, the coupled problem can be written into a heterogeneous problem by introducing two coefficient functions (x) and (x) as Thus, the two-domain problem is unified into with u and p ′ periodic at (4) which is valid across the entire domain Ω = Ω f ∪ Ω s , hence only one equation system needs to be solved instead of two, simplifying the solution procedure. To complete this unified problem, we consider periodic boundary conditions, with the velocity u and the local fluctuation of pressure p ′ = p − G ⋅ x being periodic at . While the velocity continuity u =ũ at the interface Γ is naturally included within this unified equation, the stress continuity ( ∇u + ∇ T u ) ⋅ n = e ( ∇ũ + ∇ Tũ ) ⋅ n requires a further discussion. The unified equation can also be written in a form with stress and strain rate tensors: where and d = 1 2 ( ∇u + ∇ T u ) represent the stress tensor and strain rate tensor, respectively. I is the 3 × 3 identity matrix. In this equation, the term 2 e d represents the viscous shear stress of fluid in the porous solid. An underlying assumption of this form (viscosity times strain rate) is that the constitutive relation between fluid flow and stress in the porous solid region can be described by an equivalent Newtonian fluid with effective viscosity e and average velocity u. It is worth noting that when the local permeability is anisotropic, due to the microstructure at the lower scale, it is also likely that the effective viscosity is best described by an anisotropic tensor, for example, = −pI + U e ∶ d, where U e is an anisotropic fourth-order effective viscosity tensor. This is however out of the scope of the present work, and we assume an isotropic effective viscosity.
The two forms in velocity (Equation 4) and in stress (Equation 5) are not strictly equivalent when e ≠ , and they may differ in the stress continuity at the interface Γ. Since the stress is explicitly included in Equation (5), the stress form naturally ensures the stress continuity, for which the velocity form requires an additional term. In fact, the combination of the two first equations in Equation (5) yields which introduces the additional term 2∇ S u(x) ⋅ ∇ (x) comparing to Equation (4). The gradient ∇ (x) consists of a singularity at the interface: where Γ (x) is the Dirac distribution of Γ and n(x) is the unit vector normal to Γ, pointing from solid toward fluid. This singularity makes the numerical solution impractical when e ≠ . In this paper, after presenting a recently published FFT solver for the form of stress, 12 we propose a new method for the velocity form of Equation (4), which ensures the stress continuity only when e = . However, as will be shown in Section 4.1.1, this requirement can be loosened to some extend when the local permeability is relatively low.

Upscaling with Darcy's law
Macroscopic quantities, that is, permeability and resistivity, of the domain Ω can be defined by upscaling the above linear problem using Darcy's law at macroscale: where the macroscopic velocity U = ⟨u⟩ Ω is linearly dependent on the macroscopic pressure gradient G via the macroscopic permeability tensor K. Alternatively, the macroscopic resistivity tensor H (with H = K −1 ) is defined through the inverse of Equation (10): These two macroscopic quantities K and H can be calculated from numerical simulations of dual-scale flow in the porous domain Ω, governed by the unified equation in the form of either stress or velocity.
Such full-field simulations (boundary value problems) require boundary conditions (B.C.) and loading conditions (L.C.). As mentioned earlier, we choose periodic boundary conditions for the local variables u and p ′ . The loading conditions will be prescribed via either macroscopic velocity U or macroscopic pressure gradient G, which will be referred to as velocity-controlled or pressure-controlled loading conditions, respectively. Three independent simulations are required to predict the full 3×3 tensor K (or H) from Equation (8) (or Equation 9). In practice, macroscopic velocity (or pressure gradient) that aligns with coordinate axes are commonly chosen for the three simulations, that is, The homogenization procedure with either pressure control or velocity control is as below: • When a macroscopic pressure gradient G is prescribed, the full-field simulation predicts a macroscopic velocity U = ⟨u⟩ Ω , from which the permeability tensor is calculated with Equation (8). It should be noted that in this case, the local pressure p should be replaced by its local fluctuation p ′ in Equations (4) and (5). Accordingly, the macroscopic pressure gradient G should be added to the right-hand side.
• When a macroscopic velocity U is prescribed, the simulation predicts a macroscopic pressure gradient G = ⟨∇p⟩ Ω , from which the resistivity tensor is calculated with Equation (9).
In the present paper, we first recall an algorithm that solves Equation (5) under velocity control, 12 and then present an approach that solves Equation (4) under both velocity-control and pressure-control.

NUMERICAL SOLUTION BASED ON FFT METHOD
The unified Stokes-Brinkman problem needs to be reformulated into a form that is convenient for numerical solution based on the FFT method. One approach has been recently proposed by Mezhoud et al. 12 Inspired by the latter, we propose a new algorithm that works on the velocity form. Three algorithms are implemented in this paper: Algorithm 1 solves the stress-based formulation of Reference 12 which will be used as a baseline to compare with Algorithms 2 and 3, which solve the velocity-based formulation under velocity control and pressure control, respectively.

Formulation with stress
This formulation was proposed by Mezhoud et al. 12 For the sake of completeness, we list the key equations here. The problem to be solved is Equation (5), and the velocity control is used, prescribing a macroscopic velocity U of the domain Ω: ⟨u⟩ Ω = U. The authors introduced two quantities to formulate a polarization problem: where 0 and 0 are two constant parameters that are interpreted as reference material coefficients, which will be discussed in Section 3.3.1. Substituting Equation (10) into Equation (5), the heterogeneous problem is expressed as Considering the velocity u and the strain rate d as unknown, an iterative algorithm that can be built using the FFT method: The superscripts k and k + 1 denote the previous and current iteration steps, respectively. The expression of the 4th-order Green operator  0 can be derived by writing Equation (11) in the Fourier space (see Reference 12). With , an iterative procedure is established. As shown in Reference 12, this procedure can be simplified into the following form with no need to calculate the 4th-order Green operator:û where i 2 = −1, is the frequency vector, and the symbol ⋅ denotes a variable in the Fourier space. A macroscopic velocity is prescribed by settingû( = 0) = U in the Fourier space.Ê k is given for non-zero frequency vectors aŝ where s (its Fourier transformŝ) is the deviatoric part of the stress tensor The strain rate tensor d(x) can be calculated in the Fourier space aŝ Interested readers are referred to the original Reference 12 for a detailed description of the algorithm. It has been implemented in the present work and referred to as Algorithm 1, which is slightly different from that proposed in Reference 12, in terms of the convergence criterion. In our implementation, the residual is defined as the difference between two adjacent iterations This algorithm (Box 1) is used in the present paper as a baseline to compare with the velocity-based formulation proposed in the next section. For this reason, the definition of Equation (17) has been chosen to make a fair comparison.
It is worth noting that a different definition, based on the equilibrium equation, can be used: where E k is the inverse Fourier transform ofÊ k . This residual provides a measurement of the equilibrium equation, which has more physical significance. However, from a preliminary test, we observe similar convergence behavior using the two different residuals (see later in Section 4.1.2).
The iterative procedure predicts a local velocity field u at convergence. Volume integration of the equilibrium equation in Equation (5) over Ω gives the macroscopic pressure gradient where the divergence theorem has been used and the term related to stress vanished due to the antiperiodicity of the traction vector ⋅ n. Finally, the homogenized resistivity tensor can be calculated from Equation (9).
Convergence check: if k+1 < , then break It must be noted that our implementation may differ from the one in the cited Reference 12, who seemed to conduct a non-dimensionalisation of the governing equation. Therefore, the performance of Algorithm 1 presented in this paper may not exactly reflect that of the implementation of Reference 12 in some cases, for instance with very small values of local permeability k s . Nevertheless, we follow the same implementation procedure for all the algorithms in this work, to give them the common ground for a fair comparison.

Formulation with velocity
Inspired by the method of Mezhoud et al., 12 we derive a formulation that considers the velocity-form of the unified Stokes-Brinkman equation under either velocity control or pressure control. First consider a velocity control situation, the problem to be solved is Equation (4), with macroscopic velocity prescribed as ⟨u⟩ Ω = U. Introducing two reference material coefficients 0 and 0 , an auxiliary problem can be written as with the polarization vector defined as Applying the divergence operator to both sides of the equilibrium equation in Equation (19), we have Considering div u(x) = 0, the equation is simplified as Thus, the problem can be written into Passing the differential equations into the Fourier space, we have where Ξ is the space of frequency vectors for the Fourier transform. Here, the macroscopic velocity is prescribed through the velocity at zero-frequencyû(0) = U. The periodic boundary condition is intrinsically encoded into the algorithm via the Fourier transform. Assuminĝas known, the solution of the velocity field is given aŝ where Q = , same as in Equation (14). Now considerinĝis unknown and is a function of the velocity field and its Laplacian, Equation (25) formulates a fixed-point iterative algorithm as described in Box 2, which is referred to as Algorithm 2.

BOX 2 Iterative solution (Algorithm 2) of the Stokes-Brinkman problem based on the velocity formulation with the velocity-controlled loading condition.
The macroscopic pressure gradient G = ⟨∇p⟩ Ω can be obtained by integrating the equilibrium equation in Equation (4) over Ω: The homogenized resistivity tensor H is then calculated with Equation (9) from three independent full-field simulations.
Using the same technique, a solution procedure for the pressure-controlled loading condition can also be obtained. The corresponding auxiliary problem with the polarization term is written in the Fourier space as which gives the closed-form solution of the velocity field aŝ The corresponding algorithm is referred to as Algorithm 3, and described in Box 3. We choose to initialize the velocity field as u 0 (x) = 0. The same definition of residual (Equation 17) as in Algorithms 1 and 2, however, the difference for Algorithm 3 is that the macroscopic velocity is unknown and needs to be calculated at each iteration as

BOX 3 Iterative solution (Algorithm 3) of the Stokes-Brinkman problem based on the velocity formulation with the pressure-controlled loading condition.
With the pressure-controlled loading condition, the homogenized permeability tensor K is determined from Equation (8) via three independent full-field simulations.

Remark.
A potential simplification can be done on Algorithms 2 and 3, when the situation e = is considered. In this case, = 0 , hence, the polarization term simplifies to = ( − 0 I) ⋅ u k . As a result, there will be no need to calculate the Laplacian of velocity at each iteration.

Numerical implementation
Some practical details for the numerical implementations are presented in this section.

Reference material coefficients
The two reference material coefficients can significantly affect the convergence of the two algorithms. We follow the suggestion of Reference 38 to choose these two parameters as the algorithmic average of the heterogeneous field: where the i (i = 1, 2, 3) are the eigenvalues of . Although no mathematical demonstration is provided here, we will use some numerical tests in Section 4.2.1 to show this suggestion leads to the best convergence rates.
In the numerical implementation, each material point (voxel) of the solid region is assigned a local tensor = k −1 s and a local coordinate system . The quantity ( − 0 I) is then calculated within the global coordinate system.

Finite difference discretization
Let us consider a voxelised grid that discretizes the computation domain Ω. The discretisation scheme of the FFT methods is usually established via the definition of the frequency vector . The classical frequency vector, as used in Reference 12, is defined as where i = 1, 2, 3 representing the indices of the three coordinate axes. h i and N i respectively denote the voxel size and the number of voxels along each axis.
In addition, we also test a modified frequency vector, which can be derived using a rotated coordinate system, 39 a 1-point Gauss quadrature 40 or a local volume average formulation. 41 The modified frequency vector reads where the signs + and -in the factor term ± correspond to the translation from voxel vertices to voxel centres and vice-versa. Corresponding to this definition of frequencies, a first-order derivative would switch the location of the variable between the vertices and centres of the voxels; whereas, a second-order derivative makes the placement stay at its initial location. Note that we only use the classical frequency vector in Algorithm 1, as the latter is considered as a baseline to compare with the velocity-based algorithms (Algorithms 2 and 3). If the modified frequency vector is used, the placement of variables needs to be specified. In this work, we choose to place the velocity at the vertices of voxels, and the pressure at the centers of voxels. Accordingly, the polarization vector is also at the voxel vertices. Expressing Equation (24) for Algorithm 2 (as an example) over the discrete grid associated to the modified frequency vector, we have where the subscripts V and C indicate whether the variable sits at the vertices or at the centres. This gives the discrete Green operator asĜ This Green operator is valid for both Algorithms 2 and 3.
Remark on the Laplacian operator: When the modified frequency vector is being used, a consistent discretization framework requires a Laplacian operator || || 2 to ensure that the placement of the variable does not change after the operation. A simple product of || || 2 =̌⋅̌would satisfy the afore-mentioned condition. When choosing the Laplacian operator, another condition that requires to be verified is the incompressibility div(u) = 0. We prove this as follows.
, the divergence of velocity is calculated in the Fourier space asdiv u || || 2 = 0 for alľ, hence the incompressibility is verified.

Convergence acceleration
The Anderson acceleration technique 24,25 is adopted to improve the convergence behavior of Algorithms 2 and 3. We note that Algorithm 1 is a Newton type iterative algorithm (see Equation (13)), for which the Anderson acceleration is not suitable. 24 This acceleration technique has achieved great success in electronic structure computations, and start gaining popularity in other applications, such as solid mechanics. [42][43][44] Together with fixed-point algorithms, it is a generic approach and usually preferred over other methods, such as Newton's, when the analytical form of the Jacobian of the problem is not available or expensive to compute. 24 For linear problems like the present one, the Anderson acceleration is equivalent to the generalized minimal residual (GMRES) method. 24 It is reported in Reference 45 that the GMRES method well outperformed the conjugate gradient method for their FFT solver of Stokes equation, which further justifies the choice of the Anderson acceleration in the present work.
The main idea is to use the residuals and the estimate solutions of the previous m iterations (referred to as the depth of the Anderson acceleration) to find a better estimate solution for the (k + 1)-th iteration. Consider the generic form of a fixed-point algorithm u k+1 = F ( u k ) , with u the unknown variable and F an operator (linear or non-linear).
denote the estimate solutions obtained from the previous m iterations, and correspondingly . A new estimate closer to the exact solution at the (k + 1)-th iteration is calculated as where the coefficients i need to be determined at every n iterations (referred to as the increment of the Anderson acceleration), by solving a least-square minimisation problem: The closed-form solution of this problem can be obtained 44 : where a ij denotes the components of the m × m symmetric matrix defined as where [ * ] + stands for the Moore-Penrose pseudoinverse of matrix [ * ].
The memory usage may be a potential drawback of the Anderson acceleration. In the practical implementation, at each iteration we store 2 scalar fields ( , || || 2 ), 6 vector fields (u k−1 , u k ,û k , k ,̂k, ), and 1 second-order tensor field ( ).
The Green operatorĜ is calculated on the fly to reduce the memory usage. Using the similar implementation technique, Algorithm 1 stores 2 scalar fields ( , || || 2 ), 6 vector fields (u k−1 , u k ,û k ,̂k,Ê k , ) and 3 second-order tensor fields ( ,ŝ k , . This suggests that the basic Algorithms 2 and 3 consume less memory than Algorithm 1. However, if the Anderson acceleration is activated, (2m − 1) additional vector fields will be required by Algorithms 2 and 3: . Therefore, the depth of the Anderson acceleration must be chosen carefully if the memory usage is a concern. Furthermore, both the depth m and the increment n of the Anderson acceleration may affect the efficiency of the algorithms, which will be discussed in Section 4.2.2.

NUMERICAL EXAMPLES
All the three algorithms have been implemented in Python with the Numpy and Scipy libraries. Algorithms 2 and 3 have also been implemented into the massively parallelized platform AMITEX*, which will be used in the example in Section 4.3. First, with several benchmark problems, the proposed algorithms will be validated against the literature results. Then, parametric studies will be presented to discuss the effects of the reference material coefficient 0 and the depth and increment of the Anderson acceleration. Finally, a 3D problem involving 8 million voxels will be solved to demonstrate the numerical efficiency. Focus will be put onto Algorithms 2 and 3, while Algorithm 1 will be used as a reference for validation purpose. The effective viscosity is chosen as e = = 1 MPa ⋅ s in all simulations unless specified otherwise. The Anderson acceleration uses a depth m = 10 and an increment n = 10 unless specified otherwise.

4.1
Benchmark problems: model validation

Parallel channel flow with permeable walls
The parallel channel flow depicted in Figure 2. is used to check the capability of the proposed algorithm in predicting local velocity field. Analytical solution of this 1D problem can be derived following the procedure proposed in Reference 46 The corresponding 1D Stokes equation and 1D Brinkman equation are written as It is physically realistic to assume that dp dx = dp dx = G is a constant, with G the macroscopic pressure gradient along the x-axis.
We more convenient for solving this 1D boundary value problem analytically. The solution is expressed as where l c = √ e k s , and u d = − k s G.
The FFT simulations use a stripe discretized into 1 × N × 1 voxels, corresponding to the physical dimensions of 1/(a + b) mm, (a + b) mm and 1/(a + b) mm in the x, y and z directions. The parameter N controls the voxel size of the discretization. The solid region y ∈ [−b, 0] is assigned an isotropic permeability tensor k s I. The convergence criterion is set to = 10 −6 . We fix a = b = 0.1 mm and = 1 MPa ⋅ s to illustrate the effects of other parameters. The velocity field obtained from Algorithm 2 is scaled by a factor of G∕G sim to make sure the macroscopic pressure gradient is the same for all.
First, the effects of the choice of frequency vector and mesh density are shown in Figure 3. The FFT predictions from Algorithms 2 and 3 are the same and are in good agreement with the analytical solution especially when the mesh is refined. The modified frequency vector reduces the local fluctuation at fluid-solid interface, which is as expected.
Then, we test the applicability of our method to the situation e ≠ in Figure 3. We recall that the interface singularity (see Section 2.2) makes Algorithms 2 and 3 unable to always ensure the stress continuity at the fluid-solid interface. The effect of this limitation can be seen from the discrepancy between the FFT prediction and the analytical solution, especially for high values of the local permeability ( Figure 4A,D). However, the discrepancy becomes negligible when the local permeability becomes relatively low, for example, k s = 10 −8 mm 2 in the present case ( Figure 4C,F). It would be beneficial to conduct a more quantitative investigation on this interface continuity effect, taking into account the ratio e ∕ and the geometry dimension, yet it is out of the scope of this paper.

4.1.2
Permeable square with cylindrical hole The next benchmark problem is used to compare our method (Algorithms 2 and 3) to the baseline implementation (Algorithm 1). It consists of a permeable square with a circular hole, as depicted in Figure 5. The unit cell is discretized into N × N × 1 voxels, with physical dimensions of 1 mm × 1 mm × (1/N) mm. The effective viscosity in the solid region is set to be e = = 1 MPa ⋅ s. The solid region Ω s is assigned an isotropic permeability tensor k s I. The convergence criterion is set to = 10 −6 for all simulations. The classical frequency vector is applied to all the simulations presented in this section.  Figure 6, which suggests that both definitions lead to the similar convergence behavior of the iterative procedure, justifying the use of Equation (17) as the convergence criterion.
The mesh size dependency is then tested in Figure 7A for the three algorithms. The predicted homogenized permeability values from Algorithms 2 and 3 are the same and close to those from Algorithm 1. The converged values can be obtained at approximately N = 200. The number of iterations ( Figure 7B) required to achieve the convergence criterion is relatively independent of the mesh size, especially when the mesh is sufficiently refined, that is, N > 100 in the present case.
To provide a guideline for choosing the convergence criterion , we plot in Figure 8 the relative errors of the homogenized permeability as a function of the applied convergence criterion. The result indicates that = 10 −4 seems to be a sufficient criterion, although a smaller criterion may be designed for higher accuracy.
Other tests are performed with various values of k s . We set = 10 −6 for all the simulations. The predicted homogenized permeability values are plotted in Figure 9, as a function of a non-dimensional parameter k s ∕L 2 . The results are also compared with the estimate of Reference 47 which was derived from the Maxwell average of a Stokes-Darcy model: where denotes the volume fraction of the clear fluid region. The results suggest that the predicted macroscopic permeability exhibits good agreement between the three algorithms for different values of k s and R. The predictions from Algorithms 2 and 3 are almost identical. All three algorithms produce results consistent with the analytical solution from Reference 11 for k s ∕L 2 ≤ 10 −4 . For higher values of k s , the analytical solution exceeds its domain of applicability because the assumption of scale separability is not valid anymore.
Using this benchmark problem, we examine the effect of the local permeability on the convergence rate ( Figure 10). Overall, all the algorithms are more difficult to converge for smaller values of k s ∕L 2 . This is understandable due to the definition of = k −1 s . When k s → 0, → ∞, leading to numerical difficulties. This limit would largely depend on the machine precision and potentially the microstructure under investigation, which may differ from one to another. More detailed investigation is required on this matter, yet this is out of the scope of the present paper. The three

4.1.3
Transverse flow in dual-scale dry fiber reinforcements In this section, we compare the prediction of macroscopic permeability from our method to those obtained from an FEM solver in the literature. 3 Fibrous textile reinforcements are represented by two typical packing patterns of the fiber tows:  regular and hexagonal, as depicted in Figure 11. The dimensions were chosen to be consistent with those used in Reference 3, leading to a volume fraction of tows of 26.18% for both packing patterns. The unit cells are discretized with a voxel size of 0.01 mm. Simulations are conducted using Algorithms 2 and 3, with convergence criterion of 10 −6 . An isotropic permeability tensor k s I is assigned to the ellipse regions, with k s calculated from Gebart's formula 48 :

K xx ∕(ab) K yy ∕(ab)
where R f is fiber radius and is set to be 0.023 mm to be consistent with, 3 = 0.53 is the porosity in the tows. This leads to k s = 1.17175 10 −5 mm 2 . The predicted permeabilities for the two packing patterns are given in Table 1. The two FFT algorithms predict the same macroscopic permeabilities, and they show a reasonable consistency with the FEM solutions of Reference 3. The macroscopic permeabilities are clearly anisotropic due to the shape and packing of the tows, with K xx < K yy for the regular packing and K xx > K yy for the hexagonal packing. The discrepancy between the FFT solutions and the FEM solutions is higher in K xx for the regular packing and in K yy for the hexagonal packing. Note that the contrast between the global and local permeabilities is lower in the x-axis for the regular packing and in the y-axis for the hexagonal packing. This implies that the effect of different numerical treatments is more pronounced for a lower global-to-local permeability ratio k s ∕K. These numerical treatments may include not only the chosen PDE solver but also the discretisation scheme.

Anisotropic local permeability
To check if the model can handle the anisotropy of the local permeability tensor, we use the geometry of a square with a centered disk, as depicted in Figure 12 In both case 1 (closed fluid region problem) and case 2 (open fluid region problem), a strong correlation can be observed between the orientation angle and the velocity field, as shown in Figures 13 and 14, which suggests that the model can correctly take into account the effect of the anisotropy of local permeabilities. Furthermore, the velocity fields predicted from Algorithms 2 and 3 exhibit noticeable difference for ≠ 0 • in case 1. This is not surprising, because the two algorithms use different loading conditions.
The predicted macroscopic permeability tensors are compared in Tables 2 and 3, showing almost identical values predicted from both algorithms, despite the difference in velocity field. The effect of the angle that is consistent with intuition can be observed: as increases from 0 • to 45 • , K xx decreases, K yy increases while K zz remains unchanged; in addition, the non-diagonal components K xy and K yx increases with increasing . The effect of the angle is less pronounced in case 2 than in case 1, which is because the dominating flow is in the clear fluid region Ω f and not in the permeable solid region Ω s .

Parametric studies
The problem in Section 4.1.4 (with a discretization of 200 × 200 × 1) is now used to study the effects of some numerical parameters. The orientation angle is chosen as = 45 • . Only the flow direction (under velocity control or pressure control) along the x-axis is considered as an illustrative example.

4.2.1
Effect of reference material coefficient 0 The choice of the reference material coefficient 0 has a strong effect on the convergence rate of the algorithms. This is studied in this section using the two cases depicted in Figure 12, with different test values of 0 (denoted by test 0 ). The numbers of iterations for each test to achieve the convergence (tolerance <10 −6 ) are plotted in Figure 15. Algorithms 2 and 3 have also been tested with the Anderson acceleration using a depth m = 8 and an increment n = 4. TA B L E 3 Permeability tensor K calculated from the two algorithms for a permeable disk with the remainder as void (case 2). For all the algorithms with and without the Anderson acceleration, the optimum convergence rate is achieved when test 0 takes the value calculated from Equation (29). The results also demonstrate the efficiency of the Anderson acceleration, which not only reduces the number of iterations by one order or two of magnitude, but also prevents divergence in some cases for test 0 ∕ 0 < 1. When the Anderson acceleration is not applied, Algorithms 2 and 3 exhibit similar convergence rates as Algorithm 1, which has also been observed in Figure 10 for the isotropic porous medium problem.

Depth and increment of Anderson acceleration
The depth and the increment of the Anderson acceleration technique need to be properly chosen to achieve an optimum performance of the algorithms. As demonstrated in other applications, 24,44 increasing the depth m usually reduces the number of iterations, but increases the memory usage due to the storage of intermediate iterates. The optimal choice of the pair of depth m and increment n may be problem dependent, therefore, the recommendations given for other problems (e.g., m = 4 for elasticity problems 44 ) need to be verified in the case of dual-scale flow problem. This section presents such a study using the case of closed-fluid region problem depicted in Figure 12. Figure 16 shows the maps of computation performance (number of iterations and computation time) for different values of the two parameters. Overall, the two algorithms exhibit a similar behavior. The provide a reference, Algorithm 1 required 2404 iterations and took ∼80 s to reach the chosen tolerance = 10 −6 , which are greater/longer than Algorithms 2 and 3 with almost any choice of the two parameters. The minimum number of iterations is generally achieved as long as the increment is smaller than the depth. In contrast, the minimum computation time is achieved only within a small region, in the proximity of m = 8, n = 4. We recommend using an increment smaller than the depth, and when memory is a concern for large-scale problems, the pair can be reduced to m = 4 and n = 2, as will be used in the 3D example in Section 4.3. Figure 17 shows a linear dependency of the memory usage on the depth of the Anderson acceleration for both Algorithms 2 and 3. To provide a reference, the same simulation with Algorithm 1 used ∼22 MB memory. This is comparable to the memory required by Algorithms 2 and 3 with the depth m = 2∼8. When it comes to large-scale problems, the memory usage also needs to be considered for the choice of the depth of the Anderson acceleration. This linear dependency makes it possible to accurately determine the memory required for a given problem through some preliminary tests with coarse meshes.

3D flow in dual-scale textile fabrics
The last numerical example involves a more realistic microstructure -a woven fiber reinforcement. The unit cell has been generated using TexGen. 49 As shown in Figure 18A, it consists of three 1 × 1 twill woven layers, with nesting between layers simulated using the "maximum nesting" option in the software.
where the subscripts * , * , * denote the indices of the maximum, middle, and minimum components of e 1 in absolute values. Then, the unit vector e 2 is obtained with a simple normalization. The third coordinate axis e 3 can be obtained easily through the cross-product e 3 = e 1 × e 2 . An anisotropic permeability tensor k s is assigned to all yarn voxels. Three computations were conducted using different sets of k s values, being k s = diag ( 10 −6 , 10 −7 , 10 −7 ) , k s = diag ( 10 −4 , 10 −5 , 10 −5 ) and k s = diag ( 10 2 , 10 −5 , 10 −5 ) , respectively. The physical unit of the local permeabilities is mm 2 . The two first k s values are roughly in the same order of magnitudes for intact yarns with different fiber volume fractions, whereas the physical significance of the last k s value correspond to large gaps between fibers (potentially created by textile deformation during preforming), which act as highly permeable flow channels.
The proposed Algorithms 2 and 3 have been implemented on the platform AMITEX to make use of parallel computing. For the Anderson acceleration, we set the depth of 4 and the increment of 2, which give better convergence rates, based on some preliminary tests. The computations were performed on the ARCHER2 high-performance computing (HPC) system. Each of the three computations involves three simulations with different principal flow directions.
The velocity fields for the flow direction along the y-axis are shown in Figure 18. When the local permeability is low ( Figure 18B,C), the flow is mostly concentrated in the macro-channels between yarns. In contrast, faster flow along the fiber direction is observed inside yarns in Figure 18D, where the longitudinal component of k s is high, yet the flow in the transverse yarns remains slow due to the low permeability in that direction. This suggests that the anisotropy has been properly described in the present model.
The permeability tensors for the three configurations are given in Table 4. The non-diagonal components are non-zero, as a result of the anisotropy due to both the local permeability and the woven microstructure. Increasing k s leads to a higher permeability, which agrees with the intuition. Comparing the second and the third k s , a significant increase of permeability in the principal directions is produced due to the increase of the longitudinal component of k s . Table 5 lists the numbers of iterations for these simulations to achieve the tolerance = 10 −6 . The two algorithms exhibit the same trend: more iterations were required for a lower local permeability, and for the simulations with principal flow in the z-axis (transverse to the fabric plane). This again suggests that the convergence rate of the algorithms may be affected by the ratio between local and global permeability k s ∕K. Figure 19 shows the scalability test of Algorithm 2, running the configuration of k s = diag ( 10 −6 , 10 −7 , 10 −7 ) with different numbers of cores. A linear scalability is observed up to 128 cores for this problem of 8 million voxels. The fastest computation using 128 cores only took 268 s (total time). We note that each computation involved 3 simulations with TA B L E 5 Number of iterations to achieve the tolerance = 10 −6 for the permeability calculations of the woven textile. The three numbers in the parentheses correspond to the three simulations with loading direction in x-, y-and z-axes, respectively.

F I G U R E 19
Scalability test of the proposed algorithms (Algorithm 2 as an example here). The elapsed time ratio is defined as To provide a reference for comparison, a parallelized finite-element solver took 3 hours to solve a Stokes-Brinkman problem with 800,000 cells (8 million nodes) using 360 cores of a HPC system. 10

CONCLUSIONS
Efficient FFT based numerical solvers for the Stokes-Brinkman problem have been proposed to predict the permeability of dual-scale porous media with anisotropy. The proposed new algorithms were reinforced with the Anderson acceleration that reduced the number of iterations by 1 ∼ 2 orders of magnitude. The algorithms were validated against the literature and analytical results, in terms of macroscopic permeability and local velocity field. Furthermore, the new method was able to properly deal with the anisotropy related to local permeability and microstructure. Detailed parametric studies were provided to select the model parameters (reference material coefficient 0 and the depth and increment of the Anderson acceleration) to achieve optimum numerical performance. A 3D example of woven fiber textile, involving 8 million voxels, demonstrated the numerical efficiency of the proposed algorithms, showing a good scalability over HPC systems. We believe that the parallelized implementation in this work presents a new state-of-the-art of numerical solvers for the Stokes-Brinkman problem, in terms of computation capacity. This unique numerical solver will enable the solution of large-scale problems, accurately predicting the effective permeability of real microstructures identified from 3D imaging techniques.

ACKNOWLEDGMENTS
This work was supported by the Engineering and Physical Sciences Research Council (grant number EP/P006701/1), as part of the EPSRC Future Composites Manufacturing Research Hub, for which the author holds the Innovation Research Fellowship. The ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk) was used for the woven fabric example. The author would like to acknowledge Dr. Lionel Gélébart for his help on the improvement of the algorithms and the implementation with AMITEX, Profs. Richard Butler and Chris Bowen for their help in preparing the manuscript. Acknowledgement also must go to the anonymous reviewers for their careful reading and in-depth comments, which have led to a significant improvement of the work.