Uncertainty quantification in coastal aquifers using the multilevel Monte Carlo method

We consider a class of density-driven flow problems. We are particularly interested in the problem of the salinization of coastal aquifers. We consider the Henry saltwater intrusion problem with uncertain porosity, permeability, and recharge parameters as a test case. The reason for the presence of uncertainties is the lack of knowledge, inaccurate measurements, and inability to measure parameters at each spatial or time location. This problem is nonlinear and time-dependent. The solution is the salt mass fraction, which is uncertain and changes in time. Uncertainties in porosity, permeability, recharge, and mass fraction are modeled using random fields. This work investigates the applicability of the well-known multilevel Monte Carlo (MLMC) method for such problems. The MLMC method can reduce the total computational and storage costs. Moreover, the MLMC method runs multiple scenarios on different spatial and time meshes and then estimates the mean value of the mass fraction. The parallelization is performed in both the physical space and stochastic space. To solve every deterministic scenario, we run the parallel multigrid solver ug4 in a black-box fashion. We use the solution obtained from the quasi-Monte Carlo method as a reference solution.


Introduction
Saltwater intrusion occurs when sea levels rise and saltwater moves onto the land.Usually, this occurs during storms, high tides, droughts, or when saltwater penetrates freshwater aquifers and raises the groundwater table.Since groundwater is an essential nutrition and irrigation resource, its salinization may lead to catastrophic consequences.Many acres of farmland may be lost because they can become too wet or salty to grow crops.Therefore, accurate modeling of different scenarios of saline flow is essential [1,65] to help farmers and researchers develop strategies to improve the soil quality and decrease saltwater intrusion effects.
Saline flow is density-driven and described by a system of time-dependent nonlinear partial differential equations (PDEs).It features convection dominance and can demonstrate very complicated behavior [74].
As a specific model, we consider a Henry-like problem with uncertain permeability and porosity.These parameters may strongly affect the flow and transport of salt.The original Henry saltwater intrusion problem was introduced by H.R. Henry in the 1960s (cf.[35]).The Henry problem became a benchmark for numerical solvers for groundwater flow (cf.[74,67,66,18].In [61], the authors use the generalized polynomial chaos expansion approximation to investigate how incomplete knowledge of the system properties influences the assessment of global quantities.Particularly, they estimated the propagation of input uncertainties into a few dimensionless scalar parameters. The hydrogeological formations typically have complicated and heterogeneous structures.These formations may consist of a few layers of porous media with various porosity and permeability coefficients (cf.[59,64]).Measurements of the layer positions and their thicknesses are only possible up to some error, and for the materials inside the layers, the average parameters are typically assumed.Thus, these layers are excellent 1 arXiv:2302.07804v1[math.NA] 13 Feb 2023 Notation QoI g quantity of interest g D computational spatial domain D 0 , D 1 , . . ., D L hierarchy of spatial meshes T 0 , T 1 , . . ., T L hierarchy of temporal meshes L number of levels s complexity h (or h), n spatial step size and number of spatial degrees of freedom on level τ (or τ ), r time step size and number of time steps on level m number of samples (scenarios) on level expectation and variance Θ multidimensional domain of integration in parametric space ω, ξ(ω) random event and random vector φ(x, ω) porosity random field K(x, ω) permeability random field ρ(x, ω) density random field q(t, x, ω) volumetric velocity D tensor field D = D(q): molecular diffusion and dispersion of salt κ(x) expectation of κ(x, ω) d physical (spatial) dimension c = c(t, x, ω) mass fraction of salt (solution of the problem) candidates to be modeled by random fields.Further, due to the nonlinearities in the problem, averaging the parameters does not necessarily lead to the correct mathematical expectation of the solution.
To model uncertainties, we use random fields.Uncertainties in the input data propagate through the model and make the solution (e.g., the mass fraction) uncertain.An accurate estimation of the output uncertainties can facilitate a better understanding of the problem, better decisions, and improved control and design of the experiment.
The following questions can be answered: 1. How long can a particular drinking water well be used (i.e., when will the mass fraction of the salt exceed a critical threshold)?
2. What regions have especially high uncertainty?
3. What is the probability that the salt concentration is higher than a threshold at a certain spatial location and time point?
4. What is the average scenario (and its variations)?
5. What are the extreme scenarios?
6. How do the uncertainties change over time?
Many techniques can quantify uncertainties.A classical method is Monte Carlo (MC) sampling.Although it is dimension-independent, it converges very slowly and requires many samples.This method may not be affordable for time-consuming simulations.Nevertheless, even up-to-date techniques, such as surrogate models and stochastic collocation, require a few hundred to a few thousand time-consuming simulations and assume a certain smoothness by the quantity of interest (QoI).
Another class of methods is the class of perturbation methods [17].The idea is to decompose the QoI with respect to (w.r.t.) random parameters in a Taylor series.The higher-order terms can be neglected for small perturbations, simplifying the analysis and numerics.These methods assume that random perturbations are small (e.g., up to 5% of the mean, depending on the problem).For larger perturbations, these methods usually do not work.
There are quite a few studies where authors model uncertainties in reservoirs (cf.[10,72]).Reconnecting stochastic methods with hydrogeological applications was accomplished in [7], where the authors analyzed a collaboration between academics and water suppliers in Germany and made recommendations regarding optimization and risk assessment.The fundamentals of stochastic hydrogeology and an overview of stochastic tools and accounting for uncertainty are described in [63].
The review [70] deals with hydrogeologic applications of recent advances in uncertainty quantification, probabilistic risk assessment, and decision-making under uncertainty.The author reviewed probabilistic risk assessment methods in hydrogeology under parametric, geologic, and model uncertainties.Density-driven vertical transport of saltwater through the freshwater lens on the island of Baltrum (Germany) is modeled in [57].
In [39], the authors examined the implications of transgression for a range of seawater intrusion scenarios based on simplified coastal freshwater aquifer settings.They stated that vertical intrusion during transgressions could involve density-driven convective processes, causing substantially greater amounts of seawater to enter the aquifer and create more extensive intrusion than horizontal seawater intrusion in the absence of transgression.
The quantification of uncertainties in stochastic PDEs could be a significant challenge due to a) the large possible number of involved random variables and b) the high cost of each deterministic solution of the governed PDE.The MC quadrature and its variance-reduced variants have a dimension-independent error convergence rate O(N − 1 2 ), and the QMC has the worst-case rate O(log M (N )N −1 ), where N is the number of samples, and M indicates the dimension of the stochastic space [47].The MC method is not affected by the dimension of the integration domain, such as collocations on sparse or full grid methods [2,50].A numerical comparison of other QMC sequences is presented in [58].
Construction of a cheap generalized polynomial chaos expansion-based surrogate model [76,42,41] is an alternative to the MC method.Some well-known functions, such as the multivariate Legendre, Hermite, Chebyshev, or Laguerre functions, have been taken as a basis [53,76].Surrogate models have pros and cons.The pros are that the model can be easily sampled once constructed, and all samples are almost free (much cheaper than sampling the original stochastic PDE).For some problem settings, sampling is unnecessary because the solution can be computed analytically (e.g., computing an integral of a polynomial).The nontrivial part of surrogate models is to define how many coefficients are needed and how accurately they should be computed.Another difficulty is that not every function can be approximated well by a polynomial.The MLMC methods do not have such limitations.
This work is structured as follows.Section 2 describes the Henry problem and numerical methods to solve it.The well-known MLMC method is reviewed in Section 3. Next, Section 4 details the numerical results, which include the numerical analysis of the Henry problem, computing different statistics, the performance of the MLMC method, and the practical performance of the parallel ug4 solver for the Henry problem [35,67] with uncertain coefficients.Finally, we conclude this work with a discussion in Section 5.
Our contribution: We investigate the propagation of uncertainties in the Henry-like problem.Assuming the porosity, permeability, and recharge are uncertain, we estimate the uncertainties in the density-driven flow.To reduce the high computing complexity, we applied the existing MLMC technique.We use the multigrid ug4 software library as a black-box solver, allowing us to solve the Henry problem and others (see more in [65]).We run all MLMC random simulations in parallel.To the best of our knowledge, we are unaware of any other studies where Henry's problem [35,67] was solved using MLMC methods with uncertain porosity, permeability, and recharge parameters.

Problem setting
In coastal aquifers, salty seawater intruding on the formation on one side (the seaside) displaces the pure water due to water recharge from land sources and precipitation from the other side.Due to its higher density, seawater mainly penetrates along the bottom of the aquifer.This process can achieve a steady state but may be timedependent due to the periodicity of the recharge or controlling the pumping rate from the wells.An accurate simulation of the salinization is vital for the prediction of water resource availability.However, the accuracy of such predictions strongly depends on the hydrogeological parameters of the formation and the geometry of the computational domain, denoted by D.
The aquifer D ⊂ R d , d ∈ {2, 3}, can be modeled as an immobile porous matrix filled with liquid phase-a solution of salt in water.Due to the nonhomogeneous density distribution, gravitation induces the motion of the liquid phase.This motion transports the salt, which is otherwise subject to molecular diffusion.
A straightforward but very demonstrative model of coastal aquifers is the so-called Henry problem, first considered in [35].In this two-dimensional setting, the aquifer is represented by a rectangular domain entirely saturated with the liquid phase (Fig. 1).The salty seawater intrudes from the right side, and pure water is recharged from the left.The top and bottom are considered impermeable.Analogous settings with partially saturated domains are considered in [69].
The mass conservation laws for the entire liquid phase and salt yield the following equations where φ : D → R denotes the porosity, K : is the mass fraction of the salt (or of the brine) in the solution, ρ = ρ(c) indicates the density of the liquid phase, and D(t, x) : [0, +∞) × D → R d×d denotes the molecular diffusion and mechanical dispersion tensor.For the velocity q(t, x) : [0, +∞) × D → R d , we assume Darcy's law: where p = p(t, x) : [0, +∞) × D → R is the hydrostatic pressure, µ = µ(c) denotes the viscosity of the liquid phase, and g = (0, . . ., 0, −g) T ∈ R d represents the gravity vector.Inserting ( 3) into (1-2) results in a system of two time-dependent PDEs in the unknowns c and p.This system should be closed with boundary conditions for c and p and an initial condition for c.
Following the classical setting in [35], for this variant of the Henry problem, we set and with a constant scalar D ∈ R, and the identity matrix I ∈ R d×d .Furthermore, we assume the isotropic permeability This setting is consistent with the problem setting in [74].However, we do not assume the Boussinesq approximation and keep the density variable for all terms.For the initial conditions, we set The boundary conditions are presented in Fig. 1(a).On the right side of the domain, we impose Dirichlet conditions for the c and p variables that represent the adjacent seawater aquifer: On the left side, we prescribe the inflow of fresh water: where e x = (1, 0) , and qin is a constant.For the classical formulation of the Henry problem, this value was set to qin = 6.6 • 10 −2 kg/s in [74] or qin = 3.3 • 10 −2 kg/s in [67,66].The Neumann zero boundary conditions are imposed on the upper and lower sides of D.
An example of c(t, x) and q(t, x) for the parameters from Table 1 is presented in Fig. 1(right).The dark red color corresponds to c = 1, and dark blue corresponds to c = 0. Due to its higher density, the saltwater intrudes Parameter Values and Units Description  In the classical formulation, this salt triangle initially increases over time but achieves a steady state (cf.[74,67,66]).However, the initial nonstationary phase may take significant time.Investigating this phase is especially important to understand the system behavior when changing the recharge.For this, in addition to the mean and variance, we consider the mass fraction at 12 points (listed below) and an integral value-the total amount of pure water (as in Eq. 23).The list of chosen points follows: The motivation is to consider points where the concentration variation is considerable.In addition, the mass fraction c at each point x is a function of time.These spatial points may help track salinity changes over time in groundwater wells and understand which areas in the aquifer are most vulnerable.Farmers can use this information to take action, such as decreasing salinity or adapting strategies by planting salt-tolerant crops.

Modeling porosity, permeability, and mass fraction
The primary sources of uncertainty are the hydrogeological properties of the porous medium-porosity (φ) and permeability (K) fields of the solid phase-and the freshwater recharge flux qx through the left boundary.The QoIs are related to the mass fraction c, a function of φ, K, and the recharge.We model the uncertain φ using a random field and assume K to be isotropic and dependent on φ: The distribution of φ(x, ξ), x ∈ D, is determined by a set of stochastic parameters ξ = (ξ 1 , . . ., ξ M , ...).Each component ξ i is a random variable depending on a random event ω.For concision, we skip ω and write ξ := ξ(ω).
The dependence in Eq. ( 10) is specific for every material.We refer to [54,55,16] for a detailed discussion.In the proposed model, we use a Kozeny-Carman-like dependence where the scaling factor κ KC is chosen to satisfy the equality K(E [φ])I = E(K), resembling the parameters of the standard Henry problem.The inflow flux is kept constant across the left boundary but depends on the stochastic variable q in .We also assume that the inflow flux is independent of φ and K.

Numerical methods for the deterministic problem
The system (1-2) is numerically solved in the domain D × [0, T ], where the symbol × denotes the Cartesian product.After the discretization of D using quadrilaterals of size h, we obtain D h .Equations (1-2) are discretized using a vertex-centered finite-volume scheme with a "consistent velocity" for the approximation of Darcy's law (3), as presented in [23,24,25].The degrees of freedom associated with D h are denoted by n.
There are two degrees of freedom per grid vertex: one for the mass fraction and another for the pressure.We use the implicit Euler method with a fixed time step τ for time discretization.The number of the computed time steps is r = T /τ .We use partial upwind for the convective terms (cf.[23]).Therefore, the discretization error is of the second order w.r.t. the spatial mesh size h.However, the diffusion in ( 2) is minimal compared with the velocity.For the grids in the numerical experiments, the observed reduction of the discretization error after grid refinement corresponds to the first order.Thus, we assume the first-order dependence of the discretization error w.r.t.h, which is consistent with the numerical experiments.Furthermore, the Euler method provides the first-order dependence of the discretization error w.r.t.τ .
The implicit time-stepping scheme provides unconditional stability but requires the solution to an extensive nonlinear algebraic system of the discretized equations with n unknowns in every time step.The Newton method is used to solve this system.Linear systems inside the Newton iteration are solved using the BiCGStab method (cf.[4]) preconditioned with the geometric multigrid method (V-cycle, cf.[32]).In the multigrid cycle, the ILU β -smoothers [33] and Gaussian elimination are used as the coarse grid solver.
To construct the spatial grid hierarchy D 0 , D 1 , . . ., D L , we start with a coarse grid consisting of 512 grid elements (quadrilaterals) and n 0 = 1122 degrees of freedom.This grid is regularly refined to obtain all other grid levels.After every spatial grid refinement, the number of grid elements is multiplied by a factor of four.Consequently, the number of degrees of freedom is increased by a factor of four (i.e., n ≈ n 0 • 2 d , d = 2; see Table 2).This hierarchy is used in the geometric multigrid preconditioner and MLMC method.We also construct the temporal grid hierarchy T 0 , T 1 , . . ., T L .The time step on each temporal grid is denoted by τ with τ +1 = 1 2 τ .The number of time steps on the th grid (level) is r +1 = 2r and r = r 0 2 , where r 0 is the number of grid points on T 0 .On the th level, the MLMC uses the grid D × T .Up to six spatial and time grids were used in the numerical experiments.
In the context of this work, it is critical to estimate the numerical complexity of the deterministic solver w.r.t. the grid level .The most time-consuming part of the simulation is the solution of the discretized nonlinear system.Typically, it is challenging to predict the number of Newton iterations in every time step, but in the numerical experiments, two iterations were sufficient to achieve the prescribed accuracy.Accordingly, the linear solver was called at most two times per time step.Furthermore, the convergence rate of the geometric multigrid method does not depend on the mesh size (cf.[33]).Hence, the computation complexity of one time step is O(n ), where n is the number of the degrees of freedom on the grid level .Therefore, the overall numerical cost of the computation of one scenario on grid level for r time steps is 3 Multilevel Monte Carlo Various numerical methods can quantify uncertainty, and every method has pros and cons.For example, the classical MC method converges slowly and requires numerous samples.To reduce the total computing cost, we apply the MLMC method, which is a natural idea because the deterministic solver uses a multigrid method (see Section 2.3).The MLMC method efficiently combines samples from various levels.Further, we repeat the main idea of the MLMC method.A more in-depth description of these techniques is found in [13,14,27,28,34,71,44].We let ξ(ω) and g(ξ) = g(ξ(ω)) represent a vector of random variables and the QoI, respectively, where ω is a random event.The MLMC method aims to approximate the expected value E [g] with an optimal computational cost.In this work, g could be c(t, x, ξ) in the whole domain or at a point (t, x) or an integral over a subdomain.The MLMC method constructs a telescoping sum, defined over a sequence of spatial and temporal meshes, = 0, . . ., L, as described next, to achieve this goal.Moreover, g, numerically evaluated on level , is denoted by g h ,τ , or, for simplicity, by just g , where h and τ are the discretization steps in space and time on level .Further, we assume that E [g h,τ ] → E [g] as h → 0 and τ → 0.
Furthermore, s 0 is the computing cost to evaluate one realization of g 0 (the most expensive one from all realizations).Similarly, s denotes the computing cost of evaluating g − g −1 .For simplicity, we assume that s for g − g −1 is almost the same as s for g .The number of iterations is variable; thus, the cost of computing a sample of g − g −1 may fluctuate for various realizations.
For a better understanding, we consider a two-level MLMC (cf.[28]) and estimate the optimal number of needed samples on both levels.The two-level MLMC has only two meshes: a coarse one and a fine one.The QoI E [g] can be approximated on the fine mesh by E [g 1 ] and on the coarse mesh by E [g 0 ].Furthermore, where g (j) 1 − g (j) 0 := g 1 (ξ j ) − g 0 (ξ j ), ξ j is a random vector, and m 0 and m 1 represent the numbers of quadrature points (numbers of samples/realizations) on the coarse and fine meshes, respectively.The total computational cost of evaluation ( 13) is m 0 s 0 + m 1 s 1 .The variances of g 0 and g 1 − g 0 are denoted by V 0 and V 1 , and the total variance is obtained by 0 use independent samples.By solving an auxiliary minimization problem, the variance is minimal if . Thus, with the estimates of the variances and m 0 , we can estimate m 1 .
The idea presented above can be extended to a case with multiple levels.Thus, we can find (quasi-) optimal numbers of samples m 0 , m 1 , . . ., m L .The MLMC method calculates E [g L ] ≈ E [g] using the following telescopic sum: In the above equation, level in the superscript ( , i) indicates that independent samples are used at each correction level.As increases, the variance of g − g −1 decreases.Thus, the total computational cost can be reduced by taking fewer samples on finer meshes.We recall that h = h 0 • 2 −2 and τ = τ 0 • 2 − .We assume that the average cost of generating one sample of g (the cost of one deterministic simulation for one random realization) is where d = 2 is the spatial dimension, and γ = 1 is determined by the computational complexity of the deterministic solver (ug4).We let V be the variance of one sample of g − g −1 .Then, the total cost and variance of the multilevel estimator in Eq. ( 14) are L =0 m s and L =0 V /m , respectively.For a fixed variance, the cost is minimized by choosing m to minimize the following functional for some value of the Lagrange multiplier µ 2 : To determine m , we take the derivatives w.r.t.m and set them equal to zero: After solving the obtained equations, we obtain To achieve an overall variance of ε 2 , that is, we substitute m with the computed m = µ V s , and obtain From the last equation, we obtain V s , and The total computational cost is (for further analysis of this sum, see [28], p.4).

Definition 1
We let In addition, Y := L =0 Y denotes a multilevel estimator of E [g] based on L + 1 levels and m independent samples on level , where = 0, . . ., L.Moreover, Y = m −1 m i=1 (g The mean squared error (MSE) is used to measure the quality of the multilevel estimator: To obtain an MSE smaller than ε 2 , we ensure that both are smaller than ε 2 /2.Combining this idea with a geometric sequence of levels in which the cost increases exponentially with the level while the weak error E [g L − g] and multilevel correction variance V decrease exponentially leads to the following theorem (cf.Theorem 1, p. 6 in [28]): Theorem 2 We let d denote the problem dimension.Suppose positive constants α, β, γ > 0 exist such that α ≥ 1 2 min(β, γd), and Then, for any accuracy ε < e −1 , a constant c 4 > 0 and a sequence of realizations {m } L =0 exist, such that and the computational cost is This theorem (see also [37,36,11,13,27]) indicates that, even in the worst-case scenario, the MLMC algorithm has a lower computational cost than that of the traditional (single-level) MC method, which scales as O(ε −2−dγ/α ).Furthermore, in the best-case scenario presented above, the computational cost of the MLMC algorithm scales as O ε −2 .
Using preliminary tests, we can estimate the convergence rates α for the mean (the so-called weak convergence) and β for the variance (the so-called strong convergence).In addition, α is strongly connected to the order of the discretization error (see Section 2.3), which equals 1, and precise estimates of parameters α and β are crucial to distribute the computational effort optimally.

Numerical Experiments
The goal is to reduce the total computational cost of stochastic simulations.We use the MLMC method to compute the mean value of various QoIs, such as c in the whole domain, c at a point, or an integral value (we call it the freshwater integral): where I(•) is the indicator function identifying a subdomain {x : c(t, x, ω) ≤ 0.012178}, meaning the mass of the fresh water at a time t.Each simulation may contain up to n = 0.5 • 10 6 spatial mesh points and a few thousand time steps (r = 6016 on the finest mesh).

Software and parallelization:
The computations presented in this work were performed using the ug4 simulation software toolbox (https://github.com/ug4/ughub.wiki.git)[60,73].This software has been applied for subsurface flow simulations of real-world aquifers (cf.[65]).The toolbox was parallelized using MPI, and the presented results were obtained on the Shaheen II cluster provided by the King Abdullah University of Science and Technology.Every sample was computed on 32 cores of a separate cluster node.Each simulation (scenario) was localized to one node to reduce the communication time between nodes.All scenarios were concurrently computed on different nodes.A similar approach was used in [41,42].Simulations were performed on different meshes; thus, the computation time of each simulation varied over a wide range (see Table 2).
Porosity and recharge: We assume two horizontal layers: y ∈ (−0.75, 0] (the upper layer) and y ∈ [−1, −0.75] (the lower layer).The porosity inside each layer is uncertain and is modeled as in Eq. ( 24): where Additionally, the recharge flux is also uncertain and is equal to where ξ 1 , ξ 2 , and ξ 3 are sampled independently and uniformly in   We observed that the variability (uncertainty) of the mass fraction might vary from one grid point to another.At some points (dark blue regions), the solution does not change.At other points (white-yellow regions), the variability is very low or high (dark red regions).In regions with high uncertainty, refining the mesh and applying the MLMC method make sense.
Before we run the MLMC method, we first examine the solution c(t, x) at 12 preselected points (see Eq. ( 9)). Figure 4   Example.In Fig. 5, we demonstrate the probability density function (pdf) of t * (ω) = min t {t : Q F W (t, ω) < 1.2} (left), and the pdf of t * (ω) = min t {t : Q F W (t, ω) < 1.7} (right).On average, after approximately 29 time steps (on the left) and six time steps (on the right), the volume of the fresh water becomes less than 1.2 and 1.7, respectively.The initial volume of the fresh water was 2.0.Example. Figure 7 (left) displays the evolution of the pdf of c(t, x, ω) at a fixed point x = (1.85,−0.95) in time t = {3τ, . . ., 48τ }.From left to right, the farthest left (blue) pdf corresponds to t = 3τ , the second curve from the left (red) corresponds to t = 4τ , and so on.In the beginning, t = 3τ , and the mass fraction c is low, about 0.15 on average.Then, with time, c increases and, at t = T = 48τ , is approximately equal to 1. Example.The next QoI is the earliest time moment when c(t, x), at fixed x = (1.85,−0.95), becomes smaller than the threshold value 0.9 (maximum is 1.0).Figure 7 (right) presents its pdf.On average, after t ≈ 10 time steps, the mass fraction becomes smaller than 0.9, but 40 time steps are needed in some scenarios.Next, we research how g − g −1 depends on the time and level.All graphics in Fig. 8 display 100 realizations of the differences between solutions computed on two neighbor meshes for every time point t i , i = 1 . . .48 (along the x-axis).The top left graphic indicates the differences between the mass fractions computed on Levels 1 and 0. The other graphics reveal the same, but for Levels 2 and 1, 3 and 2, 4 and 3, and 5 and 4, respectively.The largest value decreases from 2.5 • 10 −2 (top left) to 5 • 10 −4 .Considerable variability is observed for t ∈ [3,7] and t ∈ [8,25].Starting with t ≈ 30, the variability between solutions decreases and stabilizes.From these five graphics, we can estimate that the maximal amplitude decreases by a factor ≈ 2, at 0.015, 0.008, 0.004, 0.0015, and 0.0008.However, it is challenging to make a similar statement about each time point t.This observation makes it difficult to estimate the weak and strong convergence rates and the optimal number of samples correctly on each mesh level.They are different for each time t (and for each x).For some time points, the solution is smooth and requires only a few levels and a few samples on each level.For other points with substantial dynamics, the numbers of levels and samples are higher.Because g − g −1 is random, we visualize its mean and variance.Figure 9 demonstrates the mean (left) and variance (right) of the differences in concentrations g − g −1 , = 1, . . ., 5. On the left, the amplitude decreases when increases.A slight exception is the blue line for t ≈ 9, 10, 11 (right).A possible explanation is that the solutions g 0 or g 1 are insufficiently accurate.The right image presents how the amplitude of the variances decays.This decay is necessary for the successful work of the MLMC method.We also observe a potential issue; the weak and strong convergence rates vary for various time points t.Thus, determining the optimal number of samples m for each level is not possible (only suboptimal).
At the beginning t = 0, the variability is zero and starts to increase.We observe changes during a specific time interval, and then the process starts to stabilize after ≈ 45 time steps.The variability is either unchanging from level to level or decreases.Table 2 contains average computing times, which are necessary to estimate the number of samples m at each level .The fourth column contains the average computing time, and the fifth and sixth columns contain the shortest and longest computing times.The computing time for each simulation varies depending on the number of iterations, which depends on the porosity and permeability.We observed that, after ≈ 6016 s, the solution is almost unchanging; thus, we restrict this to only t ∈ [0, T ], where T = 6016.For example, if the number of time steps is r = 188 (Level 0 in Table 2), then the time step τ = T r = 6016 188 = 32 s.The time step τ is adaptive and changing from τ = 6016 128 = 32 s (very coarse mesh) to τ = 6016 6016 = 1 s (finest mesh).Starting with level = 2, the average time increases by a factor of eight.These numerical tests confirm the theory in Eq. ( 12), stating that the numerical solver is linear w.r.With estimates for each level, we can estimate the rates of α and β (Eqs.(21a)-(21b)) in weak and strong convergences.
The slope in Fig. 10 can be used to estimate the rates of the weak (left) and strong (right) convergences.The differences are indicated on the horizontal axis.We use computed variances V and computing times (work) s from Table 2 to estimate the optimal number of samples m and compute the telescopic sum from Eq. ( 15) to approximate the expectation.
Table 3 lists m for a given total variance Table 3: Number of samples m computed using Eq. ( 18) as a function of the total variance 2 .
After the telescopic sum is computed, we can compare the results with the QMC results.Figure 11 depicts the decay of the absolute (left) and relative (right) errors vs. levels along the x-axes.The 'true' solution was computed using the QMC method on the finest mesh level L = 5.

Conclusion
We investigated the applicability and efficiency of the MLMC approach for the Henry-like problem with uncertain porosity, permeability, and recharge.These uncertain parameters were modeled by random fields with three independent random variables.The numerical solution for each random realization was obtained using the well-known ug4 parallel multigrid solver.The number of required random samples on each level was estimated by computing the decay of the variances and computational costs for each level.These estimates depend on the minimization function in Eq. (17).
We also computed the expected value and variance of the mass fraction in the whole domain, the evolution of the pdfs, the solutions at a few preselected points (t, x), and the time evolution of the freshwater integral value.We have found that some QoIs require only 2-3 of the coarsest mesh levels, and samples from finer meshes would not significantly improve the result.Note that a different type of porosity in Eq. ( 24) may lead to a different conclusion.
The results show that the MLMC method is faster than the QMC method at the finest mesh.Thus, sampling at different mesh levels makes sense and helps to reduce the overall computational cost.Limitations.1.It may happen that the QoIs computed on different grid levels are the same (for the given random input parameters).In this case the standard (Q)MC on a coarse mesh will be sufficient.2. The time dependence is challenging.The optimal number of samples depends on the point (t, x) and may be small for some points and large for others.3. Twenty-four hours may not be sufficient to compute the solution at the sixth mesh level.Future work.Our model of porosity in Eq. ( 24) is quite simple.It would be beneficial to consider a more complicated/multiscale/realistic model with more random variables.A more advanced version of MLMC may give better estimates of the number of levels L and the number of samples on each level m .Another hot topic is data assimilation and the identification of unknown parameters [40,43,48,62].Known experimental data and measurements of porosity, permeability, velocity or mass fraction could be used to minimise uncertainties.
includes 12 subfigures.Each subfigure presents 600 QMC realizations of c(t, x) and five quantiles depicted by dotted lines.The dotted line at the bottom indicates the quantile 0.025.The following dotted line is the quantile 0.25, and the dotted line on the top indicates the quantile 0.975.All five quantiles from the bottom to the top are 0.025, 0.25, 0.50, 0.75, and 0.975, respectively.We observe that c at the final point t = T varies considerably.

Figure 5 :
Figure 5: The pdf of the earliest time point when the freshwater integral Q F W becomes smaller than 1.2 (left) and 1.7 (right).The x-axis represents time points.

Figure 11 :
Figure 11: Decay of the absolute (left) and relative (right) errors between the mean values computed on a fine mesh via QMC and on a hierarchy of meshes via MLMC at the fixed point (t, x, y) = (12, 1.60, −0.95).x-axis contains the mesh levels.

Table 1 :
Parameters of the considered density-driven flow problem

Table 2 :
t. n and r .Number of degrees of freedom n , number of time steps r , step size in time τ , average, minimal, and maximal computing times on each level .