Efficient stochastic finite element methods for flow in heterogeneous porous media. Part 2: Random lognormal permeability

Efficient and robust iterative methods are developed for solving the linear systems of equations arising from stochastic finite element methods for single phase fluid flow in porous media. Permeability is assumed to vary randomly in space according to some given correlation function. In the companion paper, herein referred to as Part 1, permeability was approximated using a truncated Karhunen‐Loève expansion (KLE). The stochastic variability of permeability is modeled using lognormal random fields and the truncated KLE is projected onto a polynomial chaos basis. This results in a stochastic nonlinear problem since the random fields are represented using polynomial chaos containing terms that are generally nonlinear in the random variables. Symmetric block Gauss‐Seidel used as a preconditioner for CG is shown to be efficient and robust for stochastic finite element method.


INTRODUCTION
This article is concerned with the numerical simulation of flow through porous media. This is an example of a diffusion problem in which the properties of the medium, such as the permeability, are not known in any precise way due to the paucity and accuracy of available measurements. Quantifying the uncertainty of the model parameters and modeling them as random variables means that the governing partial differential equation (PDE) is stochastic rather than deterministic. When the stochastic PDE (SPDE) is equipped with suitable boundary conditions, which can be defined as stochastic processes, then its solution is also a stochastic process. In this case the mean and variance provide useful statistical information about the solution.
One of the most popular approaches for solving SPDEs is the Monte Carlo Method (MCM). This approach is based on selecting an ensemble of realizations for the random input parameters from a given distribution. A deterministic problem corresponding to each realization from the ensemble is solved and statistical quantities are computed from the set of solutions. This approach is simple to implement since it can be based on the use of existing PDE solvers. However, MCM is computationally very expensive since it converges sublinearly at a rate independent of the stochastic dimension.
Consequently, MCM requires a large number of realizations to create meaningful statistics. This limitation has motivated researchers to investigate acceleration techniques for MCM to improve its performance.
Multilevel Monte Carlo combines multigrid concepts with traditional MCM. The acceleration in convergence is guaranteed as most of the MCM simulations are carried out on coarse grids while only a very limited amount of time is spent on finer grids. When applied to the solution of PDEs 1,2 multilevel MC methods have been shown to be incredibly efficient for problems with rough coefficients (ie, spatial random fields with large variance or small correlation lengths). These types of problems, common to radioactive waste disposal applications, require a large number of random variables (typically in excess of 100 modes in a Karhunen-Loève expansion [KLE]) in probability space to accurately represent the variability of the spatial random field.
An alternative class of methods to MCM is based on the discretization of SPDEs by the stochastic finite element method (SFEM). However, in contrast to MCM, this approach can be computationally expensive since the discrete system is significantly larger with the dimension of the problem growing factorially with the number of random variables used to describe the input spatial random field. This phenomenon is known as the curse of dimensionality.
Solution strategies for SFEM depend on the choice of basis functions for the discretization of the stochastic space. There are two popular choices. The first choice uses global tensor product polynomials. 3 The advantage of this approach is that it allows the global Galerkin system to be decoupled. However, this is restricted to problems for which the permeability is approximated by normal or uniform random fields. 4 There is no evidence that when the permeability is approximated by a lognormal random field (a very common assumption in the groundwater modeling community), the global Galerkin system can be decoupled. Furthermore, this approach has the disadvantage of the size of the stochastic space growing more rapidly than alternative methods such as polynomial chaos. Solution strategies for this choice involve iterative solvers based on Krylov subspace recycling techniques. 4 The second choice is based on the polynomial chaos (PC) method as outlined in the original work of Ghanem and Spanos. 5,6 In this approach, spectral representations of uncertainty in terms of multi-dimensional Hermite polynomials or polynomial chaos expansions are used to approximate both the model parameters and the solution. This enables the stochastic equations to be replaced by deterministic systems of PDEs which are then truncated and discretized. The original PC method was motivated by the Wiener chaos expansion in which Hermite polynomials are used to represent Gaussian random processes. The PC approach was later extended to generalized PC (gPC) where other sets of orthogonal polynomials are used to generate improved representations of more general random processes. This allows efficient methods to be constructed for problems for which the uncertainty in the model parameters cannot be represented using Gaussian random processes. The PC expansion is a projection of the input random variables onto the space spanned by the orthogonal polynomials. Therefore, the rate of convergence depends on the smoothness of the solution as a function of the input random variables. The optimal choice of polynomials, in the L 2 sense, is the set that is orthogonal with respect to the probability density function of the random variables that appear in the input variables. As MCM requires the solution of many PDEs over the same computational domain, PC methods generate large coupled systems of equations.
The large and highly structured linear system generated by the PC approach has to be solved using Krylov subspace iterative solvers. An efficient implementation of SFEM, which does not require the assembly of the global stiffness matrix uses a block-diagonal preconditioner (subsequently referred to as "mean-based preconditioner") for CG based on an incomplete factorization of the mean stiffness matrix. 7,8 Powell and Elman 9 replaced the incomplete factorization with a black-box algebraic multigrid (AMG) solver. Ernst et al 10 extended the implementation of the mean-based preconditioner to the solution of stochastic mixed finite element method (SMFEM) systems. Ullmann 4 proposed a Kronecker product preconditioner for the stochastic linear (Gaussian/uniform random fields) and nonlinear (lognormal random field) cases. The implementation of the Kronecker preconditioner was recently extended to the SMFEM 11 and significantly reduced the number of CG and MINRES iterations resulting in faster convergence. However, it is more expensive to implement than mean-based preconditioners. A review of a large number of iterative solvers, including one-level iterative methods, multigrid methods, and multilevel methods (for the stochastic discretization), can be found in Rossell and Vanderwalle. 12 In this article, the PC approach is adopted. The derivation of the global Galerkin system is described and solution strategies that take full advantage of its characteristic block structure are proposed. Numerical experiments in Part 1 showed that a symmetric block Gauss-Seidel preconditioner for CG provides a competitive alternative to traditional mean-based preconditioners. Since information associated with the off-diagonal blocks of A is incorporated into the preconditioned system, the conditioning is improved. As a result, CG requires few iterations to converge. This approach is particularly efficient for those cases in which the off-diagonal blocks of A hold significant information on the permeability, that is, problems with large values of .
It is evident from the literature and from our computational analysis that mean-based preconditioners cannot be robust with respect to the permeability since they only include, in the preconditioned system, information associated with the mean value of the spatial random field. The mean information is included in the blocks of the leading diagonal of the global stochastic system, whilst variations (representing the variability of the spatial random field) about the mean are contained in the off-diagonal blocks. When the latter contributions become important the mean-based preconditioner performs poorly simply because this information is not included in the preconditioned system. For the stochastically nonlinear case this situation is exacerbated by the fact that every block of the global system has non-zero entries.
To overcome this important limitation, we propose an alternative preconditioner for SFEM in which the off-diagonal blocks of the global system are included in the preconditioned system using a block symmetric Gauss-Seidel (bSGS) algorithm. The computational analysis clearly shows that block Gauss-Seidel algorithms used either as a preconditioner for CG or as standalone solvers are more efficient than mean based preconditioners for the stochastically nonlinear case and deliver considerable CPU savings. Although for some of the test cases considered, the standalone standard Gauss-Seidel solver was the best performing solver, its performance deteriorated at a faster rate for test cases with large standard deviation than preconditioned CG. Therefore, the main finding of this work is that CG equipped with a bSGS preconditioner should be used to solve SFEM systems.

THE MATHEMATICAL MODEL
The steady-state flow of water in porous media, whose material parameters are assumed to be unknown, is described by a scalar second-order stochastic partial differential equation (SPDE). The parameter that is known with the greatest degree of uncertainty is the permeability or hydraulic conductivity, , which describes the ease with which a fluid can move through pore spaces or fractures. The permeability is modeled as a random field. Let u denote the pressure head or mean potential and q the velocity or flux. We assume that the medium occupies a bounded domain D in R 2 , with Lipschitz continuous boundary Γ. Let Γ = Γ D ∪ Γ N with Γ D ≠ ∅ and Γ D ∩ Γ N = ∅, where Γ D and Γ N denote the portions of Γ where Dirichlet and Neumann boundary conditions on u are prescribed, respectively. The tensor  is a 2 × 2 symmetric and uniformly positive definite tensor that represents permeability. Since the coefficients in this tensor can never be completely known at every point in a heterogeneous porous medium, we assume that  = (x, ) is a random field. Furthermore, only statistical properties of  are assumed.
Let Ω be the set of random events, ℑ the minimal -algebra of the subsets of Ω, and P an appropriate probability measure, then (Ω, ℑ, P) denotes a probability space. Then (x, ) ∶ D × Ω → R is a family of random fields indexed by x such that for a fixed x ∈ D, (⋅, ) is a realization of the permeability and for a fixed realization ∈ Ω, (x, ⋅) is a random variable.
We define the space of square integrable random variables with respect to P, L 2 P (Ω), as follows: Let be a real-valued random variable belonging to (Ω, ℑ, P) and suppose there exists a probability density function ∶ R 2 → R such that the expected value of is given by The mean value of the random field  at x ∈ D is ⟨(x, ⋅)⟩, the covariance of  at x, y ∈ D is cov(x, y) = ⟨((x, ) − ⟨(x)⟩)((y, ) − ⟨(y)⟩)⟩ = ∈ (x, y),

Primal formulation
In this formulation, we seek a random field solution u(x, ) ∶ D × Ω → R such that P-almost surely where n denotes the unit outward normal vector to Γ N , g(x) is the prescribed constant pressure head on Γ D , and f(x) is the source or sink term. Note that f(x) and g(x) could also be random fields but in this paper they are assumed to be deterministic functions. The solution of (1) enables the mean and standard deviation of u to be determined everywhere in D. Once u has been determined, the velocity field q can be derived using Darcy's Law. Define the solution space W and the test space W 0 to be the tensor product spaces of deterministic functions defined on D and stochastic functions defined on the probability space as follows respectively, where Then the weak formulation of the primal problem is: find u ∈ W such that If the permeability tensor is strictly positive and bounded, that is, then the Lax-Milgram lemma can be used to establish that there exists a unique solution to the problem (3).

Mixed formulation
An alternative formulation that allows us to derive accurate approximations for the fluxes is to reformulate the primal problem (1) as a first-order system by explicitly introducing Darcy's Law. In this formulation, the solution u(x, The solution of (6) gives the mean potential (or pressure) and normal fluxes together with information about their mean and standard deviation everywhere in D.
The solution spaces are the tensor product spaces Y = L 2 (D) ⊗ L 2 P (Ω) and where The weak formulation of the mixed problem is: where There exists a unique solution to this problem provided that the bilinear forms are continuous and coercive and the inf-sup inequality is satisfied. 13 The permeability field is required to satisfy (5).

PERMEABILITY APPROXIMATION
To convert the stochastic primal and mixed formulations into deterministic problems, we need to represent the stochastic variability of the permeability tensor (x, ) by an appropriate set of independent random variables { 1 ( ), … , d ( )}. In Part 1, 14 two approaches were described to represent the stochastic variability of (x, ). The first approach, herein referred to as coloured noise, assumes that the permeability varies randomly throughout D according to a given correlation function. In this case, the permeability coefficient is approximated using a KLE. 6,9,15,16 The permeability tensor (x, ) possesses a proper orthogonal decomposition where (x) = ⟨(x, )⟩ and { i , i (x)} is the set of eigenvalues and eigenvectors of the covariance function (x, y): Since  is nonnegative definite, the eigenvalues are real and we label them in decreasing magnitude 1 > 2 > ….
When the correlation function is of exponential type and D is a rectangular domain, there exists closed-form solutions to the eigenvalue problem (12) (see Ghanem and Spanos,6 for example). In this article, we make full use of the closed-form solutions and only random fields whose correlation function is of exponential or square-exponential type are considered. Examples in which the eigenvalue problem is solved numerically can be found in Lu and Zhang 17 and a description of numerical algorithms is provided in Ghanem and Spanos. 6 In such cases, the additional computational cost of solving the eigenproblem (12) needs to be considered.
In the coloured noise approach, (x, ) is approximated by a truncated KLE,  d (x, ( )), where d is the number of terms retained in the the KLE (11). The quadratic mean square convergence of  d (x, ( )) to (x, ( )) is guaranteed as d → ∞. The choice of d usually depends on the rate of decay of the eigenvalues. For example, in applications where the eigenvalues decay slowly due to small correlation lengths, d may have to be very large.
Xiu and Karniadakis 18 used uniform random variables, hence ensuring that  d (x, ( )) is bounded between two positive values with probability 1. A consequence of this approach is that the random variables in (11) are not guaranteed to be independent, thus this condition needs to be assumed explicitly. 19 The second approach, herein referred to as white noise, assumes that the permeability varies randomly and independently in D. This approach is often used to approximate parameters such as rainfall or groundwater recharge, which generally do not show strong spatial correlation. Although the permeability is spatially correlated, in practical applications, the domain D can be decomposed into subdomains corresponding to hydrogeological units on which piecewise constant hydraulic properties are assumed to hold. Different statistical parameters can be assigned to different regions of D, thus reflecting the diverse hydraulic behavior of natural deposits. A coarse subdomain decomposition of D is performed and a continuous random field  d k is associated with each subdomain D k , k = 1, … , N D , such that where N D is the number of subdomains in D. Each subdomain D k , which may be of irregular shape, is enclosed within a rectangularly shaped domain D ′ k , that is, D k ⊂ D ′ k , for k = 1, … , N D , chosen to be the smallest rectangle enclosing D k . Although a KLE is implemented for each subdomain D k , the eigenvalue problem (12) is solved with respect to D ′ k . When the conductivity coefficient (⋅, ) is assumed to be a Gaussian process, the random variables i in (11) are normally distributed. In these circumstances, the random variables have the desirable property of being uncorrelated and independent. However, this also makes problems (1) and (6) ill-posed since the diffusion coefficient is not bounded below and above by positive constants. 9 In fact, it is well known that the permeability is required to satisfy (5). Although Gaussian functions possess an infinite spectrum, it can be shown that well-defined discrete solutions can be obtained if a relatively small variance is used.
From a mathematical point of view, the white noise approach possesses significant advantages with respect to single domain KLE-based approaches. For example, the linear systems are tridiagonal, 20 which means that block diagonal preconditioners can be constructed to solve these problems efficiently.

POLYNOMIAL CHAOS FOR LOGNORMAL RANDOM FIELD
Although Gaussian processes are commonly used to model uncertainty in engineering problems primarily due to their simplicity, in some cases, it is preferable to use a lognormal process, particularly when the quantity under consideration is constrained to be always positive. This is true when modeling the permeability or hydraulic conductivity of a porous medium. Lognormal random fields are very popular among physical scientists and modelers for various reasons. First, there are several studies, the data of which are summarized in References 21 and 22, that show that parameters such as hydraulic conductivity or transmissivity are often lognormally distributed. Second, although the lognormal distribution possesses an infinite upper bound, it only admits the positive part of the physical spectrum. This is obviously consistent with the physical properties of these parameters. There is substantial evidence, direct and indirect, that supports the view that the permeability is described by a lognormal distribution (Freeze 23 ). The random field  d (x, ( )) is a lognormal random field if the logarithm of  d is a Gaussian random field, that is, lognormal random fields are of the form exp(g(x, )) where g is a Gaussian random field. Lognormal random fields can be characterized by defining the mean field and covariance of the underlying Gaussian random field.
If we discretize (x, ( )) using the truncated KLE,  d (x, ( )), then There are advantages to using truncated KLE in polynomial chaos. For example, the optimal mean square convergence property of the KLE mentioned earlier means that the number of variables required for a given accuracy is minimized, hence limiting the system size.
Since  d (x, ( )) is a random variable, it can be projected onto a polynomial chaos basis of order p where L k are deterministic functions derived from (14) and for which closed forms can be obtained algebraically (see References 4,6,[24][25][26] and k are multi-dimensional chaos polynomials in d random variables 1 , … , d , of degree less than or equal to p.

STOCHASTIC FINITE ELEMENT APPROXIMATION
The implementation of the spectral stochastic finite element method (SSFEM) for discretizing the weak formulation of the primal problem (3) involves separate discretizations of the deterministic and stochastic spaces. According to the Galerkin method, we define the finite-dimensional subspaces The deterministic space X 0 is discretized using the classical finite element basis functions i (x), i = 1, … , N u , where N u is the number of finite element nodes. These basis functions are piecewise linear on a partition Z h of D defined by triangular finite elements T i , i = 1, … , N e , such that, where N e denotes the number of finite elements. Here h denotes the discretization parameter and describes the size of the finite elements in Z h . Let E h be the collection of numbered edges, e i , i = 1, … , N edge , where N edge is the total number of edges in Z h .
The stochastic space L 2 (Ω) is discretized by means of polynomial chaos of order less than or equal to P in d random variables i , i = 1, … , d. The polynomial chaos basis for T h contains multidimensional polynomials of degree less than or equal to P, and d represents the number of random variables (number of terms retained in the KLE expansion). The polynomial chaos basis is chosen so that the following orthogonality condition is satisfied In this article, the probability measure corresponds to that of a d-dimensional normal distribution. Hence, the basis for T h consists of d-dimensional Hermite polynomials.

Linear system
To obtain the discrete linear system associated with the weak formulation (16), the mean potential u is approximated by Substituting the expansion (19) for u h in (16) and choosing test functions w h = k (x) l ( ), k = 1, … , N u , l = 1, … , P, yields the linear system of equations where A is a sparse matrix of size N u P × N u P with a block-structure The diagonal blocks of A are defined as tensor products of the mean stiffness matrix, K 0 , and ⟨ i ⟩ 2 , that is, where and denotes the mean value of the conductivity field (x, ). The off-diagonal blocks of A are tensor products of the stiffness matrices, K l , with the coefficients of the polynomial chaos expansion, c i,j,l = ⟨ l i j ⟩, i, j = 1, … , P, and l = 1, … , d, that is, where Thus, the coefficient matrix A can be expressed in the tensor product form (see Powell 9 ) where the entries of the stochastic matrices are defined by It is evident that the sparsity of the global stochastic coefficient matrix A is governed by the coefficients of the polynomial chaos expansion, while the sparsity of the blocks of A is determined by the sparsity of the deterministic finite element stiffness matrix. The maximum degree of the polynomial chaos expansions of u and (x, ⋅) should be different. In fact, it can be shown that for the Galerkin matrix A to be positive definite 4,16,27 all polynomials of degree less than or equal to 2P have to be included in the polynomial chaos expansion of . It is only when this condition is satisfied that a full Galerkin projection of the polynomial chaos expansion of  is obtained. Following Ullmann, 4 the number of chaos polynomials used for the representation of  is It can be demonstrated 4,27 that the inner product ⟨ k i j ⟩ is non-zero in only finitely many cases. In fact, ⟨ k i j ⟩ = 0 for all k with total degree greater than 2P. A consequence of this observation is that given a fixed number of random variables d, the infinite polynomial chaos expansion of  automatically truncates itself as part of the SG method (see Figure 1D). Hence, since the expansion truncates naturally, no error is incurred in the representation of .
To make this clearer, let us consider the case in which d = 3 and p u = 3. According to (17), the size of the stochastic space for the approximation of u is P = 20. Now, the number of Kronecker products N can take the value 20 if the same maximum polynomial order, p  , is used for the expansion of the lognormal conductivity coefficient. Alternatively, maximum polynomial orders of 4, 5, or 6 can be used to give the number of Kronecker products corresponding to 35, 56, or 84, respectively. Although any value of p  can be used, it is only for p  = 6 (p  = 2p u ), which corresponds to N = 84, that a full Galerkin projection of the lognormal random field is obtained. Furthermore, only in this case is the global Galerkin matrix A guaranteed to be positive definite (see remark 2.3.4 of Ullmann 4 ). Figure 1 illustrates the block sparsity pattern of A for different values of N. Note that if polynomials of maximum order p  = 6 are used for the chaos expansion of the permeability tensor, then every block of A is non-zero (see Figure 1D). If polynomials of order higher than six are considered, the matrices G k corresponding to orders higher than 2P will have only zero entries.

Implementation and solution strategies
The dimension of the global coefficient matrix A grows rapidly with p and d, which means that it is never completely assembled due to memory constraints. As originally observed by Ghanem and Kruger,7 it is necessary to store d + 1 matrices of size N u × N u corresponding to each K k , k = 0, … , d, in (26) and the non-zero entries of the stochastic matrices G k . The discrete linear system (20) is solved using the preconditioned conjugate gradient (PCG) method. The first preconditioners that were developed for the iterative solution of this system were based on incomplete factorizations of the diagonal blocks of A. 7,8 We define the block-diagonal preconditioner  bdiag and the mean preconditioner  mean by respectively. Each PCG iteration involves the solution of a system of the form z = r, where r is the residual vector. This requires the solution of P subsystems of equations each of size N u × N u with coefficient matrix K 0 . Powell and Elman 9 claimed that any efficient deterministic solver can be used for the solution of the P subsystems and proposed the use of one V-cycle of black-box AMG. The crucial advantage of using black-box AMG is that the computational cost of one V-cycle is linearly proportional to the discretization parameter h.
It is observed that, when Gaussian random variables are employed, the preconditioned system is positive definite only when the variance and the order of the polynomials are small. 9 This is a consequence of the infinite support of the Gaussian distribution and the violation of condition (5) for the permeability tensor. Preconditioned CG breaks down when this condition is violated. Therefore, the use of Hermite polynomials is limited to problems with small variances.
For the SFEM method to be computationally efficient and competitive with respect to traditional sampling methods, the CG method needs to be equipped with robust preconditioners that are optimal with respect to h, d, p and especially . It is well known that the performance of preconditioners deteriorates for problems in which  has a large standard deviation. This is due to the fact that the off-diagonal blocks of A, which are usually ignored when constructing these preconditioners, become increasingly significant in this case.
To overcome this deficiency, a new preconditioner that fully exploits the block structure of A is proposed. This preconditioner ensures that important information contained in the off-diagonal blocks is captured when  has a large standard deviation. This is achieved through the addition of an inner iteration to the preconditioning operation which essentially implements a full inversion of the global stiffness matrix A using a bSGS algorithm. The preconditioner, which to our knowledge has not been used in the SFEM context before, takes the form, Note, however, that the preconditioner  bSGS is neither assembled nor inverted directly. Each inner iteration requires the solution of P subproblems of size N u × N u using an appropriate fast solver such as UMFPACK or one V-cycle of AMG for the deterministic problem under consideration. The symmetry of the preconditioner is guaranteed by performing two iterative sweeps (forward and backward).
Two stopping criteria are used for the proposed algorithm. The inner iteration is terminated when where = 10 −8 unless a maximum number of iterations maxitb is achieved. At this stage, the current approximation for z is chosen to be the preconditioned residual needed within the current CG iteration. In general, the CG algorithm should decrease the number of CG iterations and, in particular, it should improve the iteration count for those problems for which the off-diagonal blocks of A are as significant as the diagonal blocks, that is, problems in which the spatial random field has a large standard deviation.

STOCHASTIC MIXED FINITE ELEMENT APPROXIMATION (SMFEM)
The approach to SMFEM for solving the mixed variational problem (6) is similar to the one presented in the previous section. However, the mixed finite element approximation requires the definition of subspaces for H(div; D) in addition to L 2 (D). We consider the Raviart-Thomas space of lowest order RT 0 as a suitable space for the approximation of the velocity solution and M 0 (K) is defined to be the space of piecewise constant functions.
The stochastic space L 2 (Ω) is discretized by means of polynomial chaos. The spaces for the stochastic approximation are consequently given by The discrete variational formulation of (9) is: find q h ∈ V h and u h ∈ W h such that

Linear system
The potential u h and flux (or velocity) q h are expressed in terms of the expansions Substituting for u h and q h using expansions (31) into (30), we obtain the discrete linear system where A is a sparse symmetric matrix of size N edge P × N edge P and B is a sparse block diagonal matrix of size N e P × N edge P.
The structure of C where is governed by the coefficients of the polynomial chaos expansion, while the sparsity of each of the blocks of C corresponds to the sparsity of the deterministic velocity matrix and the deterministic divergence operator. The diagonal blocks of A are given by (22) with while the off-diagonal blocks are given by (24) with The diagonal blocks of B are defined by where When the permeability tensor is approximated in terms of a polynomial chaos basis of order p, A reduces to size N edge p × N edge p and is tridiagonal and B reduces to size N e p × N edge p. The diagonal blocks of A are given by (22), while the off-diagonal blocks have the form where

Implementation and solution strategies
As for the SFEM method, the coefficient matrix C is never assembled. In addition to storing d + 1 matrices of size N edge × N edge and the entries of the stochastic matrices G k , we also store the matrix B 0 of size N e × N edge . The efficient solution of the saddle-point system (32) is an active field of research. 10,28,29 The approach adopted here follows the experience gained from solving the deterministic system. The system is solved using MINRES with a preconditioner based on the approximation of the Schur complement by sparse direct or AMG methods. For the stochastic system, we use a preconditioner that follows from its deterministic version given by where Following the discussion for the second-order problem in Section 5.2, the preconditioner (40) is expected to be efficient only for problems in which the spatial random field is characterized by small or moderate standard deviation. A robust preconditioner with respect to , for the mixed stochastic formulation remains an active area of research.  Powell and Elman. 9 Dirichlet conditions u(0, y) = 1 and u(1, y) = 0 are imposed on the vertical boundaries and homogeneous Neumann conditions ∇u ⋅ n = 0 are imposed on the horizontal boundaries. Thus, the dominant flow direction is from left to right. The first order problem is solved in this section corresponding to the system of equations described in (6).

COMPARISON OF SMFEM AND MCM
The spatial discretization uses a triangular mesh with h = 1∕64, thus the number of unknowns given by the mixed formulation is the sum of the number of elements, N e = 8192, and number of edges, N edg = 12 416. The stochastic space is discretized in a similar fashion to the one described for the SFEM case, that is, polynomial chaos up to order p u = 4 are used for both u and q.
Solution profiles for various values of p u and N r are shown in Figure 2. For the mean velocity (x-component) solution, the profile presented is along y = 0.5 and for the variance solution is along x = 0.5. An in-depth convergence study for a sampling point having coordinate (0.5, 0.5) is reported in Table 1.

TA B L E 1 Convergence analysis of MCM and SMFEM for test problem with lognormal distribution
Polynomials of order two are sufficient to achieve convergence to the fourth significant digit for the mean whereas polynomials of order three are required to achieve the same level of accuracy for the variance. This is in agreement with the convergence rate of the mean and variance solution for the potential recorded for the second order problem.
It is apparent from the data in Table 1 that the Monte Carlo mean for the x-component of the velocity field does not converge for the sample sizes considered. This suggests that a larger sample is required to achieve the desired level of accuracy. Equally, the variance does not converge for the maximum sample size herein considered. Table 1 also includes solution timings for MCM and SMFEM. Although the data show that SMFEM is more efficient than MCM, this conclusion cannot be generalized. In fact, the performance of preconditioned MINRES deteriorates significantly for conductivity fields with large standard deviations when lognormal distributions are used. Evidence of this is reported and discussed in Section 9.1.

ITERATIVE SOLVERS FOR SFEM
The algorithms used for the numerical experiments are structured so that the coefficient matrix is never fully assembled. Only the non-zero entries of the polynomial chaos coefficients (appropriately indexed) and N matrices (associated with the polynomial chaos discretization of the spatial random field) are stored. However, in order to guarantee the well-posedness of A, N has to be very large. Hence, the memory requirements of SG for the stochastically nonlinear case are significantly larger than the corresponding linear case. In Part 1, it was shown that the version of the mean-based preconditioners,  bdiag and  mean , based on an incomplete Choleski factorization of K 0 was significantly less efficient than the AMG and UMFPACK versions. Hence, only the latter two versions are considered. A symmetric Gauss-Seidel smoother is used for the AMG implementation. The simulations have been performed in serial within the MATLAB environment installed on the SRIF-3 Cluster machine (Merlin) at Cardiff University. Thus, the CPU timings reported in the following sections can be compared directly with those reported in Part 1.

Test problem 1-variable
This test problem is described in Section 7. The conductivity coefficient  is a lognormal field, the spatial variability of which is described in Section 7. The underlying Gaussian distribution has constant mean = 1 and four different values of the standard deviation are considered. The discretization parameter is fixed at h = 1∕32. The size of the stochastic space and the total number of Kronecker products are as those reported in Table 2 and the total number of unknowns corresponds to those for h = 1∕32 in Table 2.
The number of conjugate gradient iterations, preconditioned by  bdiag , N it , and t CPU are reported in Table 3. Set-up times depend only on the size of the problem (which in this case corresponds to h = 1∕32), and therefore, they are approximately equal for all values of .
The results presented in Table 3 can be summarized as follows: 1.  A KLE is performed for each subdomain and the number d of terms retained in the expansion is the same for each subdomain. The case in which different values of d are used in each subdomain has not been considered and is a subject for further study. The same spatial model with l x = l y = 0.5 is used for each subdomain. The discretization parameter is fixed, h = 1∕32, and the size of the problem is given in Table 2. Iteration count and timings for CG preconditioned with the block-diagonal of A are given in Table 4. Also, for this test problem, the set-up times are approximately equal for all values of . Similar observations are drawn for this test problem as the ones highlighted in Section 8.1.1. The exponential growth of N it and t CPU with increasing p is also observed for this test problem. In addition, it appears that the deterioration in the performance of the preconditioner is exclusively due to the increase in the value of in each subdomain from case to case. In fact, in all four cases, the mean values D 1 ,D 2 ,D 3 ,D 4 are equal. This indicates that the preconditioned solver is robust with respect to discontinuities in the mean value of the permeability.

Block symmetric Gauss-Seidel preconditioner
In this section, the same test problems presented in Section 8.1 are solved using a bSGS preconditioner for CG. The algorithm used in the experiments is described in Section 5.2. A fixed number of iterations, maxitb, is used as stopping criteria for  bSGS , each iteration comprising a forward and backward sweep. The experiments reported in the following sections are performed using maxitb = 1. The reason for this choice together with an in-depth analysis on the performance of  bSGS for several values of maxitb is given in Section 8.2.3. The UMFPACK implementation of the bSGS preconditioner is straightforward and it is identical to the one used in Part 1 for the linear case. In contrast, the AMG implementation requires additional pre-processing to be implemented. In fact, in contrast to the linear case, the tensor products G k ⊗ K k , k = 1, … , N, have several contributions to the blocks of the leading diagonal of the coefficient matrix, depending on the value of d and p u . So, for example, fixing d = 2 and p u = 3, the contributions are shown in Table 5.
Note that the matrix G 1 is diagonal in that the product G 1 × K 1 contains the mean information (this is omitted from Table 5). In the nonlinear case the AMG grids have to be computed for each diagonal block entries of the global system. Thus, the AMG pre-processing is implemented P times and the grids are stored before the iterative solution process begins. Clearly, the preconditioner set-up time contributes significantly to the overall CPU cost of the solver. Table 6 reports the number of CG iterations and CPU times for test problem 2. The results presented in Table 6 can be summarized as follows: 1. The block symmetric Gauss-Seidel preconditioner displays a significant improvement in terms of number of CG iterations. This improvement becomes more evident for large values of ; 2. The comparison of the data with those of Table 3 (block-diagonal preconditioner) reveals that the Gauss-Seidel preconditioner is generally computationally cheaper and the improvement in performance increases with larger values of ; 3. The difference in performance between the exact and approximate versions of the preconditioner increases for larger . In fact, for = 0.9, the AMG solution times are approximately three times larger than for UMFPACK.

8.2.2
Test problem 2-discontinuous-isotropic conductivity field. Table 7 reports the number of CG iterations count and CPU times for test problem 3 using a bSGS preconditioner. As for test problem 2, a significant improvement in both N it and t CPU is achieved. A large saving in computational cost was recorded for higher polynomial orders and large . In fact, the CPU time is reduced by 60% compared with results obtained using the  bdiag preconditioner. Significant time reduction is equally achieved for lower polynomial orders and coefficient of variation . This is around 37% for p u = 2 and = 0.5, and around 53% for p u = 3 and = 0.7. As observed previously for the preconditioner  bdiag , it appears that a discontinuous conductivity coefficient (jumps in the mean conductivity value at the subdomain boundaries) does not worsen the preconditioner performance. It is, in fact, the standard deviation that has a significant negative impact on the performance of both  bdiag and  bSGS . Not even using the  bSGS algorithm and therefore including the off-diagonal blocks of A (which retain information on the fluctuations about the mean) can optimality of N it with respect to be achieved.

Performance analysis
The numerical experiments presented show that CG equipped with a bSGS preconditioner is significantly more efficient than traditional mean-based preconditioners. This conclusion depends on the stopping criterion chosen for the inner preconditioned Gauss-Seidel algorithm. The results reported in the tables are obtained using a single iteration of the bSGS algorithm.
The choice of maxitb can be optimized. Consider test problem 2 with fixed p u = 4. Simulations are performed increasing the value of maxitb until only one CG iteration is required for convergence. CG iteration count and timings for these experiments with d = 4 and d = 6 are reported in Table 8. Note that for this analysis, the UMFPACK version of the preconditioner was used.
The results reported in Table 8 suggest that the best solution times are not given by the same stopping criterion for all values of . In fact, it appears that for large standard deviations = 0.7, 0.9, the best solution timings are obtained for small maxitb. However, for = 0.3, 0.5, the best performance in terms of CPU time is obtained using moderate values of maxitb.
The case in which maxitb is large and t CPU is low corresponds to the situation in which convergence is obtained in one CG iteration. It is clear that, under these circumstances, the bulk of the computational work is performed by the preconditioner (bSGS) and very little by the main solver (CG). Given that the preconditioner should only serve as a means to improve the conditioning of the system matrix, the results showing just one CG iteration are not considered in the following analysis. On the other hand, this aspect reveals that an independent Gauss-Seidel (symmetric or not) solver could be a very efficient alternative to Krylov subspace iterative schemes. In Section 8.3, results obtained using Gauss-Seidel solvers are reported for all test problems considered in this paper. Excluding the data associated with one CG iteration, Table 8 shows that, in general, a small number of internal iterations for the  bSGS preconditioner is sufficient to achieve the best performance for all values of standard deviation considered for this test problem. However, it is only for = 0.9 (d = 4, 6) and = 0.7 (d = 6), that the best performance is achieved using maxitb = 1. For = 0.5 (d = 4, 6) and = 0.7 (d = 4), the best performance is given by maxitb = 2, and for = 0.3 (d = 4, 6), for maxitb = 3. Figure 3A and B shows the number of CG iterations versus CPU times for maxitb = 1, 2, 3, 4, 5, 6 for d = 4 and d = 6, respectively. The figures highlight that there is a clear linear relationship between the number of CG (preconditioned with  bSGS ) iterations, computational time, and the standard deviation of the spatial random field for all values of maxitb. This figure clearly shows that the best convergence rate is given by maxitb = 1 and, for this reason, it was chosen as the optimal stopping criterion for the  bSGS preconditioner.

Gauss-Seidel solvers
The performance analysis carried out on test problem 2 in the previous section revealed that for small standard deviation ( = 0.3 and = 0.5), the Gauss-Seidel algorithm used as a stand alone solver could be a valid alternative to Krylov subspace solvers for the solution of SFEM systems with lognormal permeability. The same observation was made for the linear case in Part 1. Here results using the bSGS solver (bSGS) are presented. The circumstances under which Gauss-Seidel is more efficient than Krylov subspace solvers are explained. The symmetric Gauss-Seidel solver includes a forward and a backward sweep per iteration and the algorithm is essentially the one used for  bSGS . The nonsymmetric case only includes a forward sweep per iteration. In both cases, the stopping criteria are determined by the error norm, satisfying a specific tolerance.
Reorderings of the block structure of A aimed at reducing the bandwidth of the coefficient matrix are irrelevant for the lognormal case given that A is block dense. In our implementation, the structure as presented in Figure 1 is retained using the summation of progressive (i = 1, … , N) Kronecker terms. This ordering is the most natural as it represents the summation of decreasing modes obtained from the polynomial chaos expansion of the permeability (see Equation (14)).
The tolerance for the GS solvers is set to 10 −8 . In each table, we list iteration count N it and solution times t CPU for both bSGS and bGS. Only experiments using UMFPACK to invert the diagonal blocks of A are reported.  Table 9 lists iteration counts and timings for test problem 1. The findings of this table are summarized as follows:

Test Problem 1-variable
1. GS solvers are not optimal with respect to ; 2. bSGS is computationally more efficient than CG preconditioned with  bSGS only for small standard deviations; 3. Nonsymmetric Gauss-Seidel solver (bGS) is very efficient for small and moderate standard deviations. However, for large values of it is outperformed by CG preconditioned with  bSGS ; 4. As for the previous case, the bGS solver is consistently more efficient than the symmetric implementation.

8.3.2
Test problem 2-discontinuous-isotropic conductivity field Table 10 lists iteration count and timings for test problem 2. Similar observations to the ones highlighted for test problem 2 are found from the data presented in this table. Furthermore, the results show that a discontinuous permeability has no negative impact on the performance of Gauss-Seidel solvers. This becomes evident if we compare N it for = 0.5 for this problem with that of test problem 2 (continuous permeability) for = 0.5 (which corresponds to = 0.5). The CPU time for the block Gauss-Seidel preconditioners increases quadratically with respect to mesh size h. Therefore, on a mesh with h = 1∕64, it is predicted that the CPU times in Table 10, which are for a mesh with h = 1∕32, would increase by a factor of four. So for test problem 3, in which the conductivity field is discontinuous, the predicted CPU times on a mesh with h = 1∕64 with p u = 4 are 240 and 2120 seconds, respectively, for d = 4 and d = 6. If these times are compared with the CPU times for MCM on the much simpler test problem 1 given in Table 1 (t CPU = 25 000 for N r = 40 000), then we see that the SMFEM with block GS preconditioner is at least an order of magnitude faster in terms of CPU time.

Comparison and conclusions
In this section, a large number of methods have been tested to identify the most efficient solver for the stochastic formulation of the diffusion problem. To identify the most efficient and robust methods with respect to h, and discontinuous , the data presented in the previous tables are summarized in Figures 4 and 5. Only the case p = 4 is considered for d = 4, 6. The methods included in the figures are listed below.
Note that for the AMG case, the time required to construct the grids and smoother for the approximation is added to the solution times. The UMFPACK case does not require any set-up time. Figures 4 and 5 show that the conjugate gradient solver preconditioned with  bSGS is the most efficient method for problems with medium/large standard deviation and discontinuous permeability. Gauss-Seidel solvers also perform well in these circumstances and for small they are in fact the best-performing methods.
Mean-based preconditioners are, in general, not robust and efficient for SFEM with lognormal distributions. There is very little difference in terms of performance between the AMG and UMFPACK versions of the preconditioner.
The outcome of this analysis reveals that CG preconditioned with  bSGS is robust and efficient, performing well in all settings considered in this article and therefore should generally be used for the solution of SFEM with lognormal distributions. Gauss-Seidel solvers represent a viable alternative to Krylov subspace iterative methods.

Schur complement preconditioner
This section reports the performance of preconditioned MINRES for the system derived for SMFEM (cf. Section 7). The preconditioner used is the one described in Section 6.2. The Schur complement is computed exactly (using, eg, UMFPACK) or approximated using one V-cycle of AMG code.

Conclusions
While it was concluded that the performance of MINRES equipped with the Schur complement preconditioner described in (40) is acceptable for the solution of the stochastic mixed formulation in the linear case (see Part 1), the same cannot be concluded for the nonlinear case. The experiments reported in Tables 11 and 12 show that the CPU cost is too large (30 hours to solve test problem 2 on a coarse mesh, h = 1 32 ) for this method to be used with lognormal random fields. It becomes apparent that for the nonlinear case that it is crucial to include information contained in the off-diagonal blocks of the coefficient matrix into the preconditioned system. The Kronecker product preconditioner of Ullmann 4 offers this possibility. Very recently, Powell and Ullmann 11 extended its implementation to the nonlinear case achieving a significant improvement in MINRES CPU cost. The authors also proposed H(div) preconditioning using augmenting schemes which, although being dependent on the choice of the augmentation parameter, seem to achieve very promising results.

CONCLUSIONS
The performance of iterative methods for solving the linear systems of equations arising from stochastic finite element methods for single phase fluid flow in porous media is investigated. The uncertainty in the permeability is modeled using lognormal random fields and is characterized by its mean and standard deviation, which can vary spatially. In addition to being optimal with respect to the number of random variables in the KLE, the order of the polynomial chaos, and the finite element discretization parameter, the iterative solver is required to be optimal with respect to and . The mean and standard deviation may vary discontinuously throughout the domain. When the permeability possesses large standard deviation, it may fail to satisfy the condition (5). However, the condition can be satisfied by transforming the Gaussian random field into a lognormal one by projecting the KLE onto the polynomial chaos of order at most p. 6,26 The coefficient matrix arising from the discretization of (1) and (6) using lognormal random fields becomes block dense and ill-conditioned, which makes the linear systems very difficult and expensive to solve. The conjugate gradient method was used to solve the linear systems of equations arising from stochastic finite element discretization of the primal formulation. Block diagonal, mean-based, and block Gauss-Seidel preconditioners were implemented to accelerate the convergence of the CG method. Two versions of the block Gauss-Seidel preconditioner were implemented-one based on a sparse direct solver and the other on AMG. For small values of , the preconditioner based on block Gauss-Seidel was found to be optimal with respect to h, d, and p. This preconditioner also considerably reduced the number of CG iterations in comparison to the block diagonal preconditioner. For ≥ 0.5, this optimality is lost. However, the block Gauss-Seidel preconditioner still outperforms the block diagonal preconditioner and the improvement is greater as the mesh is refined. Furthermore, the growth in the number of CG iterations is more gradual as increases compared with the block diagonal preconditioner. The introduction of spatially varying random fields does not deteriorate the performance of the preconditioners. In other words, the performance is no worse than when a constant random field corresponding to the largest standard deviation used in the discontinuous case is used. The outcome of this analysis reveals that the AMG version of the block Gauss-Seidel preconditioner is very efficient and the most robust solver considered for the solution of SFEM linear systems (linear case). However, although CG with a block Gauss-Seidel preconditioner performs better than with block-diagonal and mean preconditioners, these methods are always outperformed by stand-alone Gauss-Seidel solvers when solving these systems.
For the mixed formulation, MINRES was used to solve the linear systems of equations arising from the SMFEM. A preconditioner based on the Schur complement was used. For small values of , optimal convergence behavior was obtained with respect to h and d. There was a weak dependence on p. However, the performance of the preconditioner deteriorated significantly for moderate to large values of . As for the primal formulation, the introduction of discontinuous conductivity coefficients had little impact on the performance of the preconditioner. However, SMFEM becomes more expensive than SFEM for this class of problems.
Although computationally more expensive than SFEM, the efficient solution of SMFEM is largely dependent on the preconditioner used with the chosen iterative solver. The numerical experiments showed that the Schur complement preconditioner is h-optimal not only when the complement is inverted exactly but also when it is inverted approximately using one V-cycle of AMG.
The experiments also showed that the preconditioner is not robust with respect to the permeability. The preconditioner that only uses the diagonal of the coefficient matrix A is robust when the permeability possesses small standard deviations but generally inadequate for large standard deviations. In the latter case, in fact, the off-diagonal blocks of A become significantly more important and these are not included in this preconditioner. This lack of robustness was also encountered when using the mean-based preconditioner for the solution of linear systems obtained by SFEM. The off-diagonal blocks of the coefficient matrix were successfully incorporated into a preconditioner by means of a symmetric block Gauss-Seidel algorithm. Unfortunately, due to the structure (and specifically the presence of a zero-block) of the coefficient matrix of the SMFEM system, the same approach cannot be used for the Schur complement preconditioner.
The efficient solution of discrete linear systems obtained from stochastic mixed formulations is currently a very active research area. The Kronecker product preconditioner proposed by Ullmann 4 significantly reduces MINRES iteration counts. However, this does not always translate into improvements in CPU performance. In fact, it is shown that the Schur complement preconditioner performs better (in terms of CPU cost) than the Kronecker preconditioner for test problems in which the permeability possesses a large standard deviation. 4 More recently, Ullmann and Powell 30 introduced an approximation to the Schur complement as preconditioner that is robust with respect to the discretization parameters and only slightly sensitive to the statistical properties of the permeability.