Space and chaos‐expansion Galerkin proper orthogonal decomposition low‐order discretization of partial differential equations for uncertainty quantification

The quantification of multivariate uncertainties in partial differential equations can easily exceed any computing capacity unless proper measures are taken to reduce the complexity of the model. In this work, we propose a multidimensional Galerkin proper orthogonal decomposition that optimally reduces each dimension of a tensorized product space. We provide the analytical framework and results that define and quantify the low‐dimensional approximation. We illustrate its application for uncertainty modeling with polynomial chaos expansions and show its efficiency in a numerical example.


INTRODUCTION
The statistically sound treatment of modeled uncertainties in simulations comes with significant additional computational costs. Since a deterministic model can already be arbitrarily complex, the computation of statistics for general problems may soon become infeasible unless some kind of model reduction is involved.
In this work, we propose a multidimensional Galerkin proper orthogonal decomposition (POD) that can simultaneously and optimally reduce the physical dimensions of the model and the dimensions related to the uncertainties.
For the quantification of uncertainties in partial differential equation (PDE) models and their numerical discretization, one may distinguish two categories of solvers 1 -sampling based methods, notably the Monte-Carlo method, and Galerkin-type projection methods. In this work, we focus on the latter. For a basic explanation and relevant references on the Monte-Carlo method and its extensions see Reference 1, for an application in elliptic PDEs see Reference 2, and for a combination with stochastic collocation and tensor techniques see Reference 3. Galerkin-type methods for solving PDEs with uncertainties are also referred to as spectral stochastic methods and base on a polynomial chaos expansion (PCE) of the random variables describing, for example, uncertain parameters or the candidate solution; see, for example, Reference 4 for a thorough introduction and, for example, References 5 and 6, for example applications.
Since every dimension of the multivariation adds a dimension to the problem, a numerical discretization quickly becomes infeasible in terms of memory requirements, even if the dimensions are treated independent of each other. Several approaches to overcome this complexity have been proposed like sparse grids, 9 construction of reduced chaos expansions via, say, Proper Generalized Decomposition 10,11 or Principal Component Analysis or Karhunen-Loève expansions, [12][13][14] or the use of particular tensor formats to reduce or to handle the data more efficiently. [15][16][17][18] In this work, we extend a Galerkin POD approach from space-time discretizations (as we have considered it in Reference 19 for optimization with PDE constraints) to general tensor product spaces including, in particular, those Hilbert spaces that are used for standard PCE formulations. This approach then defines low-dimensional bases both for the spatial and the uncertainty dimensions that optimally represent the data for a given PCE. As a result we drastically reduce the overall dimension of the problem which enables efficient uncertainty quantification and, prospectively, optimal control of systems with uncertain parameters.
Finding optimal representations for the dimensions is comparable to identifying low-rank tensor structures for the data, as it has been treated in 20,and 21. In contrast to these works, where a predefined structure is adaptively filled to approximate the solution, the proposed POD approach takes a given, possibly high-dimensional, data set and reduces it. The justification of this top-down approach is that the obtained reduction is optimally fitted to the given problem so that it can be used for further efficient explorations -mainly because this approach admits a direct interpretation of the bases for Galerkin discretizations. This relation to Galerkin projections defines the common ground with the PGD approaches, 10 where as in the other approaches mentioned in this paragraph, the optimal bases are constructed in an adaptive bottom-up fashion.
Most similar to our approach is the work 12 on reduced chaos expansions of coupled systems, where, basically, a Galerkin POD approach is used for two uncertainty dimensions. There, the authors start with a PCE of bivariate random coefficient and obtain optimal bases via the left and right eigenvectors of a generalized eigenvalue problem involving a covariance matrix and a mass matrix. This approach via the eigenvectors of a covariance matrix is one way to define a POD basis (see, e.g., Reference 22) while the inclusion of the mass matrix provides optimality in the relevant discrete function spaces; see Reference 23. Our approach extends the scope of this work by introducing the tensorized formulation that allows for reduction of multivariate uncertainties together with the spatial dimension in one framework.
The paper is organized as follows. In Section 2, we review the space-time Galerkin POD approach and how it extends to problems with an uncertainty dimension. Then we formulate the Galerkin POD for a product space of arbitrary dimensions and provide the POD compression algorithms and results. Next, in Section 3, we show that a PCE discretization exactly fits into this multidimensional Galerkin POD framework. In Section 4, we illustrate the use of the PCE and its POD reduction for a generic linear convection diffusion PDE. Finally, in Section 5, we provide a numerical example that shows the applicability and efficiency of this approach and show that a naive POD reduction based on random snapshots is not useful for PDEs with uncertain parameters.

MULTIDIMENSIONAL GALERKIN POD
We briefly recall the basic notions from our previous work, 19 where we considered a Galerkin POD in the two dimensions space and time. The idea of locating space and time dependent functions that, for example, solve a partial differential equation, in the space-time outer product space naturally extends to functions that depend on space, time and a random parameter , that is, in the space-time-uncertainty product space where Γ is the domain of the random parameter and P is the associated probability measure; see, for example, Reference 20 where stationary problems are treated in this setup.
Also, the approach of considering the approximation in the product of the finite dimensional spatial  ⊂ L 2 (Ω) and time  ⊂ L 2 ((0, T)) spaces extends to approximating x in where  is the finite dimensional space that models a polynomial chaos expansion of the functions in L 2 (Γ, P ).
And, finally one may approximate a function x via its orthogonal projection onto ⊙ ⊙, wherê are chosen optimally with respect to x for given dimensions of the subspaces. We provide a general formulation of the product spaces, their discretization, and their optimal low-dimensional approximation. For i = 1, 2, … , N, let be d i dimensional Hilbert spaces with inner product (⋅, ⋅)  i and mass matrix We will use the formal vector of the basis functions to write, for example, pointwise to the entries of the formal matrix Ψ i Ψ T i . Finally, let L  i ∈ R d i ×d i be a factor such that We consider the product space of spaces of square integrable functions with the inner product where d i denotes the measure associated with  i . We represent a function x ∈  via or, equivalently, via the N-dimensional tensor of the coefficients Note that where ⊗ denotes the Kronecker product and where vec(X) ∈ R d 1 ⋅d 2 ···d N denotes the vectorization of a tensor X ∈ R d 1 ×d 2 ×···×d N defined as the vector of all entries of X in reverse lexicographic order. The mode-1 matricization X (1) that is used in the following relation is obtained, if, roughly speaking, the first dimension of X is skipped in the vectorization and, instead, the resulting d 2 ⋅ d 3 · · · d N vectors of dimension d 1 are used as columns of the matrix X (1) ∈ R d 1 ×(d 2 ⋅d 3 ···d N ) . In the same way, the mode-matricization X ( ) ∈ R d ×(d 1 ···d −1 d +1 ···d N ) is defined as the matrix with the vectors obtained from unrolling all dimensions except of the k-th dimension as columns. Moreover, for a matrix L ∈ R d * ×d , with d * arbitrary, the -mode tensor product L• X ∈ R d 1 ×···×d −1 ×d * ×d +1 ×···×d N is defined as the tensor that is obtained by applying the inverse of the -mode matricization operation to LX ( ) .

Lemma 1.
For a function x ∈  with its representation X as in (2), one has where X (1) is the mode-1 matricization of the coefficient tensor X and || ⋅ || 2 F ∶= ||vec(⋅)|| 2 2 denotes the squared Frobenius norm of a tensor (or a vector).
Proof. We use the properties of the Kronecker-product ⊗, the -mode tensor product • , the vectorization operator vec, and the -mode matricization operator ⋅ ( ) as introduced right before this Lemma to directly compute By permutations of the tensor X, the dimension associated with any  i can take the role of the first dimension with L  1 in the formula of Lemma 1. To avoid technicalities, we will consider permutations that simply cycle through the dimensions. Therefore, we introduce the operator that permutes a tensor Note that Π N X = X and that, for matrices M (where N = 2), it holds that ΠM = M T .
Corollary 1 (of Lemma 1). For any i ∈ {1, … , N}, the norm of x ∈  can be expressed as With these expressions for the norm of the function x ∈  related to a tensor X via (2), we can provide an interpretation of the higher-order singular value decomposition 24 in terms of low-dimensional space discretizations as it is the backbone of the POD.

Theorem 1. Given x ∈ . For any i ∈ {1, … , N} and for a correspondingd i ≤ d i , the space spanned bŷ
optimally approximates  i in the sense that x is best approximated in For any other i, one can apply Corollary 1 first. ▪ For the overall projection error between x and its projectionx ontô with i of dimensiond i as defined in Theorem 1, one has that where (i) k is the kth singular value of X (i) as they appear in the SVD for the definition of i . The estimate (3) follows directly from [Equation (24) in Reference 24] if one takes into account the scaling by the factors of the mass matrices. Note that while a single  i is optimally approximated by i by virtue of Theorem 1, the approximation of  by ∏ N i=1 i might not be optimal in the same sense; see the discussion in [p. 1267 in Reference 24].

POLYNOMIAL CHAOS EXPANSION AS PRODUCT SPACE
be a tuple of random variables i that take values in a domain Γ i ⊂ R and that are distributed according to a probability measure dP i . Ifỹ is a function that depends on , that for any value of takes on values in a Hilbert space, say, L 2 (Ω) for a domain Ω in R 2 or R 3 , and that has a bounded variance with respect to , one may approximateỹ by a suitable Note that y is a random variable and that the expected value Ey ∈ L 2 (Ω) of y is defined as A finite dimensional approximation y toỹ can be sought in derived from a Polynomial Chaos Expansion. Here we will consider d i -dimensional spaces with k i being the Lagrange polynomials of degree d i − 1 defined through the distinct Gaussian quadrature nodes with respect to the measure dP i ; see Reference 25 for formulas and algorithms. With the corresponding quadrature weights is exact for polynomials up to degree 2d i − 1. By virtue of this exactness, and since the correspondingly defined Lagrange polynomials are orthogonal and fulfill k i ( Remark 1. By the fundamental theorem of Gaussian quadrature the nodes (6) are the roots of that polynomial d i i of degree d i that is orthogonal to all polynomials of degree d i − 1 with respect to the inner product with the measure dP i . For many probability measures dP, there exists a commonly preferred choice of the corresponding orthogonal polynomials { } =1, … ,d , where the index also denotes the degree of the polynomial, which can be constructed by explicit recursion formulas; see, for example, [Tab. 4 in Reference 7]. We further note the direct relation of these recursions to the Jacobi matrix (i.e., [Eqns. (4) and (5)

APPLICATION EXAMPLE
For a domain Ω ⊂ R d , with d = 2 or d = 3, for a given-right hand side f ∈ L 2 (Ω) and a given vector where we assume that the diffusivity coefficient depends on a random vector = ( 1 , … , N ).
For the following derivation of the model equations, we assume homogeneous Dirichlet conditions or homogeneous Neumann conditions for the boundary. Nonzero boundary conditions can be included in standard ways. For well-posedness one may assume that there is at least one boundary patch with Dirichlet conditions, that the domain is bounded with a boundary smooth enough so that a Poincaré-type inequality holds, that b is sufficiently regular and bounded, and, most importantly, that is positive and bounded away from zero. If this is the case, then unique solvability of a weak formulation in the corresponding Sobolev spaces follows by standard arguments. In particular, for given f and b and > 0, we may well assume that system (8) has a unique solution y for any realization of and, thus, y itself can be seen as a random variable depending on .
As in standard finite element approaches, for every realization , we locate the corresponding solution y in H 1 0 (Ω) and require (8) to hold in the weak sense, namely for all v ∈ H 1 0 (Ω). To account for the uncertainty, we assume the solution in the product space of the space variable and the uncertainty dimensions as in (4) and require (9) to hold in a weak sense in the Γ i inner products (see Sec. 3 in Reference 7), that is, where now v is a trial function from the ansatz space We may cluster the uncertainty dimensions Γ i into Γ and write This weak formulation both in the spatial and in the uncertainty domain, is readily discretized by finite elements in space and chaos expansion with quadrature in the uncertainty as follows. For a finite dimensional approximation, let the FEM space  0 be spanned by Ψ 0 (compare (1)) and let A ∈ R d 0 ×d 0 be the discrete convection/diffusion operator: where the products and the application of the differential operators are understood component-wise.
To save space in the formal derivation of the equations that include the Polynomial Chaos Expansions, we will formally use the strong differential operator With that, with discrete ansatz spaces as in (5), and with the ansatz for the solution where Y is the tensor of coefficients (see (2)), Equation (10) is discretized as where the left-hand side, together with (11), becomes thanks to the linearity of the involved differential operators and the Kronecker products.
Next we successively approximate the integrals with respect to the probability measures by the corresponding quadrature rules (see (7)) to obtain Since the Lagrange polynomials are a nodal basis, it holds for all which is a completely decoupled system for every combination ( . To derive the Galerkin POD reduced system, we replace Ψ i byΨ i , for i = 0, 1, … , N in (12). We assume that the reduced bases were obtained as proposed by Theorem 1. The derivation, however, works for any (reduced) basis.
For illustration, we consider the case N = 1, that is, the spatial dimension and a univariate uncertainty. Then, the reduced system coefficient matrix reads Here, the operatorÂ is the POD projection of A : whereas, for the given choice of Ψ 1 and the weights w see Theorem 1. By orthonormality of the POD basis, one obtains that  (14). Accordingly, the reduced system does not decouple and, in this univariate case, requires the solution of ad 0 ⋅ d 1 -dimensional system. The derivation of the general reduced multivariate systems goes along the same lines and results in a possibly fully coupled system of dimension ∏ N j=0d j which still can be prohibitively large. For these cases, one may consider leaving a certain dimension, say Γ N , unreduced and rather solve d N systems of size ∏ N−1 j=0d j . Using this idea recursively one can balance the number of systems and their size.

NUMERICAL EXAMPLE
In this section, we illustrate how the general ideas of Section 2, as they were made specific for PDEs with multivariate uncertainties in the parameter in Sections 3 and 4, can be applied for uncertainty quantification in numerical simulations. Motivated by [example 3.1], 20 we consider a stationary convection diffusion problem as in (8) with uncertainty in the conductivity coefficient.
As the geometrical setup, let Ω ⊂ R 3 be a cylindrical domain of diameter D o = 2 without its core of diameter D i = 0.8 that is subdivided into four subdomains Ω i , i = 1, 2, 3, 4, as illustrated in Figure 1. The height of computational domain is H = 0.5.
To model the uncertainty in the conductivity coefficient , independently on each subdomain Ω i , we assume to be a random variable of a random parameter i via and the inhomogeneity as see Figure 2 for a snapshot of the solution for = 7.5 ⋅ 10 −4 being constant across the domain. Moreover, we introduce Υ as the quantity of interest defined as the spatially averaged value of the solution y over the surface of a concentric annular ring of diameter 0.1 that is aligned with the inner boundary at the top surface of the domain; see Figure 1 for the arrangement.
For the spatial discretization, we use continuous and piecewise linear finite elements on a discretization of the domain by tetrahedra. Although the mesh is refined at the critical parts, namely the edges of the domain and the surfaces where Υ is computed or where the Dirichlet condition is applied, we need about 240,000 degrees of freedom for the spatial dimension to be within four digits of accuracy for some selected parameters; see Table 1. The values of interest of this numerical study are the expected value EΥ and the variance VΥ of the quantity of interest Υ that we approximate by a PCE with various levels of refinement.  Note: Here, = 7.5 ⋅ 10 −4 , min = 5 ⋅ 10 −4 , max = 10 ⋅ 10 −4 , and = ( max , min , max , min ) and the mesh used for the numerical study later with the seemingly correct digits are printed in bold letters.
In each uncertainty dimension, we apply a PCE with the same number d of Lagrange polynomials of degree d − 1 that, through the particular choice of the interpolation points, are orthogonal with respect to the inner product induced by the considered density distribution; see Remark 1. We write pce[d] to refer to the PCE discretization of dimension d as well as the expected value/variance of Υ based on this discretization.
Thus, the considered full order model will compute the solution via the system (13) formulated in the outer product of the spaces spanned by the FEM basis Ψ 0 and the four PCE bases of dimension d represented by the corresponding orthogonal polynomials: , n = 1, 2, 3, 4.
As can be seen in Tables 2 and 3, for computing the expected values and variances of Υ, convergence of the PCE discretization is achieved already for low dimensions. However, even though the computations are well parallelized, the computation times for the moderate PCE discretizations are already in the order of days; see Table 4.
This gives motivation for the use of reduced-order models based on a small number of realizations to approximate EΥ and VΥ. As we will see, such an approach is capable to improve the estimate of a low-dimensional PCE discretization by one order of magnitude with little computational overhead.
We will investigate three approaches to compute projection bases to approximate the full order FEM simulation by a low-order approximation. Basically, every method computes a reduced approximation spaceΨ 0 to replace Ψ 0 in the ansatz space defined in (16). Practically, the computation of the reduced spaces comes with definitions of projection matrices, so that the reduced order FEM operatorÂ can be obtained from the full order operator A as in (15).
i } are the two nodes that define the pce [2] polynomials, i = 1, … , 4 and with the reduced-order basisΨ 0 computed as suggested by Theorem 1.
• wRB (weighted reduced bases) -As suggested in Reference 27, the basisΨ 0 is computed by a greedy procedure that sequentially selects those vector from a set of solutions that most improves the model approximation taking into account the underlying probability distribution through weighting. Following Reference 27, we consider the approximation error weighted by the corresponding probability distribution function. However, the error is not estimated (which is one of the core features of reduced bases approaches) but computed directly. This means a possibly overhead in computational effort but allows for unbiased evaluations of the approximation performance as there is no additional estimation error. As the training set, we use 16 random realizations of . To account for this uncertainty in the numerical setup, we repeat the experiment 10 times and report the median values.

F I G U R E 3
The difference in EΥ in the beta-2-5 distribution case computed via the pce [5] and the pcePOD approximation of dimension k'=6 on the base of pce [2] • rndPOD (random POD) -We use random realizations of for computing a POD basisΨ 0 to be used instead of Ψ 0 in the full order model (16). Again, to account for the involved randomness in the basis computation, we resort to the median value of 10 runs.
We set up the numerical experiments as follows: For varying dimensions (which we denote by k') of the reduced spaceΨ 0 , we compute the pce [5] approximation of Υ, EΥ, and VΥ by means of the reduced-order model based on , n = 1, 2, 3, 4, and compare to the computed expected value and variance of Υ computed by a pce [5] discretization of the full order model based on (16). A view of the difference in the expected value over the full domain is given in Figure 3.
The computed results are laid out in Figures 4 and 5 for plots of the errors and in Figures 6 and 7 for their distribution over 10 realizations (of the wRB and rndPOD approximation) for the two considered distributions of the random variables. There, we also compare to the error level of a full order pce [2] approximation so that every reduced-order model (ROM) result that underbids this level is easily identified as a real improvement in terms of accuracy.
For a measure beyond the comparison of the first and second moment, we employed the Kolmogorov metric that measures the maximum difference between the cumulative distribution function of two random variables y 1 and y 2 , that is, where for y * ∈ R the cumulative distribution function F y i ∶ R → [0, 1] describes the probability that the value of the random variable y i is less or equal y * , i = 1, 2. For our experiments, we let Υ pcePOD , Υ rndPOD , Υ wRB denote the approximation to the random variable Υ that was computed by means of pcePOD, rndPOD, and wRB, respectively and compute the Kolmogorov metric to measure the distance to full order solution Υ. The numerical evaluation of the metric was conducted for k'=8 for both considered distributions of using the empirical cumulative distribution function based on 10 6 samples and a subsequent linear interpolation to a common grid of 2000 points. The values of the measurements are displayed in Table 5. Again, the reported values for rndPOD and wRB are the median value of 10 realizations. Overall,  since the sampling introduces an extra layer of randomness to the evaluations, we have repeated the evaluations 5 times and reported the median value. In addition, we have plotted the differences F Υ pcePOD − F Υ , F Υ rndPOD − F Υ , and F Υ wRB − F Υ (the latter two for five realizations each) in Figure 8. The improvement in terms of computational effort can be inferred from the measured timings laid out in Table 4. We note that for these small POD dimensions, the effort for computing the POD modes (around 7 s) and computing the pce [5] approximation with the reduced-order models (around 0.35 s) is negligible if compared to the time to compute the data or even the evaluation of pce [5] with the full model; see Table 4.
The presented results can be summarized and interpreted as follows: • Concerning our proposed approach pcePOD, for both choices of the distribution of , we observe that for k'=5 the error in the reduced-order model drops below the error of a direct pce [2] approximation. Further enrichment of the basis does not show a reliable improvement, which lead us to the conclusion that little accuracy is added by the basis vectors beyond k'=5.

F I G U R E 8
The estimated differences in the cumulative distribution function for Υ estimated by a full order FEM and pce [5] approximation and by the reduced order models pcePOD, wRB, and rndPOD for the beta-2-5 (left) and the uniform distribution (right) of . Note that for display purposes not every data point is included in the plots so that the peak values are not as high as reported in the numbers in Table 5 TA B L E 5 The Kolmogorov metric for the approximation of the random variable Υ (evaluated by a full order pce [5]) by the reduced order models namely the different reduced FEM basis of size k'=8 by pcePOD, wRB, and rndPOD and an approximation by a pce [2] discretization with the full order FEM basis • In contrast, the wRB and rndPOD approaches show a steady drop in the approximation error with increasing ROM dimensions, though at a higher median error level. Only for k'=15, 16, these approaches show better results than the pce [2] approximation with the full order model.
• Since both wRB and rndPOD base on a training set with random realizations, we checked if the approximation can be improved by a larger training set. While for rndPOD a positive effect of a larger training set seems to be indicated by the data, for the wRB approximation some data points even suggest a reverse effect.
• Another observation with wRB and rndPOD is that the realization of the training set can influence the performance by about two orders of magnitude, as can be seen from the scatter plots of 10 realizations in Figures 6 and 7.
• We note that the definition of the bases is optimized towards the expected value of the solution, at least the bases in pcePOD and rndPOD are optimal in the corresponding inner product. Nonetheless, from the computation of VΥ by a pce [5] approximation with the ROM, we find that the approximation quality of VΥ follows that of EΥ. • The differences in the cumulative distribution function (displayed in Table 5), as estimated via the Kolmogorov metric defined in (17), indicate that a lower order PCE approximation may well approximate the moments but seems to be missing the actual distribution. On the other hand, with roughly the same effort of a lower order PCE, one may set up the reduced order models to estimate the random variable with an improved accuracy of a factor between 3 and 7. Here, F I G U R E 9 Link to code and data again the pcePOD approach clearly and reliably outperforms the wRB and rndPOD reduced order models. Moreover, as for the computation of the moments, using a larger set of training data for wRB and rndPOD did not show a visible improvement in accuracy.
The codes that set up, perform, and post process the numerical examples as well as the raw data of the presented cases are available as laid out in Figure 9.

CONCLUSION
The theory of multidimensional Galerkin POD naturally applies to problems with multivariate uncertainties and can be made tractable for numerical experiments by exploiting the underlying tensor structures.
We have confirmed that multidimensional POD that includes Polynomial Chaos Expansions of the candidate solutions leads to a significant efficiency gain in the uncertainty quantification. For illustration, we considered a high-dimensional linear convection diffusion example with a multivariate uncertainty in the parameters. In comparison, the direct POD approach based on random realizations and a distribution-informed weighted greedy method to select a reduced basis from a training set proved to be similarly capable of improving the computation, though at significantly higher dimensions of the reduced-order model.
In this study, we showed that the very low-dimensional reduced-order models could even improve on the approximation of relevant statistical quantities by the training set. As a future direction we want to state that accurate uncertainty quantification with a very low-dimensional model provides an efficient approach to the more challenging task of optimal control of uncertain PDE systems.

ACKNOWLEDGMENTS
This research was supported by the DFG through the Research Training Group Mathematical Complexity Reduction Project number 314838170. Open Access funding enabled and organized by Projekt DEAL.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.6874085, reference number zenodo.6874085. ENDNOTE * The numbers in the name for the beta distribution reflect the general way of defining the (p, q)-distribution on (0, 1) via the two parameters