Effects of non-Gaussian copula-based hydraulic conductivity fields on macrodispersion

Authors


Abstract

[1] This is an application of a spatial copula model that is fitted to a real world data set. The copula model allows modeling of pure spatial dependence independently of the marginal distribution. Using non-Gaussian copula models it is demonstrated that the spatial dependence structure of the Borden aquifer is significantly non-Gaussian—despite the fact that the Borden aquifer is commonly thought of as a relatively homogeneous porous medium with a small variance of hydraulic conductivity. In addition to evaluating the spatial dependence structure of the Borden hydraulic conductivity data set using copulas, goal of this study is to explore if the structure of the hydraulic conductivity field influences a physical property, such as plume evolution as evaluated by second spatial central moments of concentration fields. For this comparison, two types of hydraulic conductivity fields were fitted to the Borden hydraulic conductivity data set: one with a Gaussian and the other with a non-Gaussian type of dependence. These two types of hydraulic conductivity fields were constructed such that their second-order spatial moments are identical, and hence they cannot be distinguished by semivariogram-based geostatistics. This paper illustrates that the spatial dependence structure of the Borden hydraulic conductivity data set is significantly non-Gaussian. Despite the fact that Borden is a relatively homogeneous porous medium, and despite the fact that both types of spatial fields are not distinguishable by their variograms, the solute transport characteristics based on these two types of isotropic fields differ significantly in two-dimensional settings. The difference is less pronounced in three-dimensions with anisotropy. It is postulated that non-Gaussian spatial dependence of hydraulic conductivity and a more skewed marginal distribution of hydraulic conductivity will have significant implications in the other more heterogeneous aquifers.

1. Introduction

[2] Over the last 30 years considerable progress has been made to incorporate spatial variability of hydraulic conductivity fields and the effects of this variability such as the variability of the groundwater velocity field into stochastic models of solute transport (among many others: Dagan [1982, 1984, 1988, 1990]; Gelhar and Axness [1983]; Neuman and Zhang [1990]; Rubin [1990]; Burr et al. [1994]). In all these approaches, the spatial variability of the hydraulic conductivity field has been modeled with a Gaussian dependence structure. Among others, Gómez-Hernández and Wen [1998]pointed out the importance of non-Gaussian spatial structures for evaluating the above mentioned effects of spatial variability, in their case on groundwater travel times. Approaches exist for modeling non-Gaussian structures, for example via training images [Strebelle, 2002] or transition probabilities [Carle and Fogg, 1996]. Zinn and Harvey [2003]were able to model one type of non-Gaussian spatial dependence using a Chi-Square transformation, not a full copula model. This paper presents an approach where the dependence model, a multidimensional spatial copula, is fitted to the spatial dependence structure of real world data.

[3] Spatial copulas as introduced by Bárdossy [2006]provide full stochastic models to represent spatial dependence structures independently of marginal distributions. The distribution that fits best to the observed data can be used, and it can be nonsymmetric. Log-transformations, commonly used to ensure a symmetric marginal distribution of hydraulic conductivity (K) in variogram-based geostatistics are not necessary. In this work, two copula models are fitted to a dependence structure based on a real world K data set from a well characterized field site located at the C.F.B. Borden, Canada [Sudicky, 1986]: a Gaussian and a non-Gaussian copula. Both are designed such that they are not distinguishable by second-order moments (their spatial covariance functions) and they have identical marginal distributions. The impacts of these two types of spatial dependence structures of K on solute transport behavior will be tested in a series of detailed numerical tracer tests, evaluated using a Monte Carlo approach.

[4] A numerical tracer test is a tool used to evaluate the migration of solutes in heterogeneous aquifers [Frind et al., 1987; Smith and Schwartz, 1980; Naff et al., 1998]. A slug of conservative tracer is instantaneously injected and transported within a steady state flow field. The resulting concentration field is recorded and its spatial moments analyzed. By classical theory [Bear, 1972; Gelhar and Axness, 1983], the slope at the linear proportion of the second central spatial moment of a concentration field can be related to the dispersion coefficient. The advective spreading of a solute migrating in the subsurface is influenced by the groundwater velocity field, which in turn is directly impacted by the spatial distribution of K. Due to this spreading, portions of the plume advance more rapidly than the average velocity in some zones, while in other zones migration rates are slower than the average velocity. This spreading phenomenon is commonly referred to as macrodispersion. Here, solute spreading will be examined in both two- and three-dimensional high-resolution K fields.

[5] The evaluation of the solute migration is performed in a Monte Carlo style, thus enabling a statistical and hence more conclusive evaluation of the spreading process. Many realizations of K fields were generated for each type of spatial dependence of K, and for each K field, a numerical tracer experiment was conducted. For the purpose of this paper, local dispersion is reduced as much as numerically possible and the modeling domain is finely discretized in order to avoid numerical dispersion and oscillations.

[6] In the Borden aquifer, one of the most extensively studied aquifers of the world, hydraulic conductivity measurements were performed in great detail [Sudicky, 1986]. The Borden aquifer is considered to be relatively homogeneous with a multi-Gaussian spatial dependence structure. Traditional geostatistical analyses have been conducted on this data set, most notably bySudicky [1986] and by Woodbury and Sudicky [1991]. Sudicky [1986] examined 32 cores spaced 1m apart along two orthogonally intersecting lines AA' and BB', and subsampled each 1.75 m long core in 0.05 m intervals. A permeameter test was conducted on each subsample resulting in 1152 measurements of K.

[7] The idea behind this paper is that if a non-Gaussian spatial dependence structure of K, as modeled by a fitted copula, leads to a different solute transport behavior than when it does based on Gaussian K or ln(K) assumptions, then this will have implications for solute transport in most other aquifers. Furthermore, if such deviations are found, other hydrogeological parameters that depend on the spatial structure of K could be impacted.

[8] The objective of this paper is two fold: (1) to use copulas as a non-Gaussian stochastic model to simulate spatially correlated random fields of K based on real world data, and (2) to analyze the effects of non-Gaussian dependence solute plume evolution within such K fields.

[9] The organization of this paper is as follows. As a first step, the Borden data are analyzed geostatistically with copula-based methods and compared with traditional methods. A theoretical copula function is fitted to the data (section 2). This theoretical copula model of spatial dependence is used for the simulation of K fields, which are subsequently used as input for a steady state flow and transient solute transport model (section 3). The characteristics of the solute transport plume are analyzed (section 4) and conclusions are made (section 5).

2. Geostatistics of the Borden Aquifer Using Copulas

[10] In this section, a geostatistical analysis of the K data of the Borden aquifer is performed using Copulas. Woodbury and Sudicky [1991] reported that outliers for low K found in data set caused difficulties in estimating the variogram and which lead to nonlinear adjustments to fit exponential theoretical variogram; they attributed the outliers to be the result of small lenticular lenses of low hydraulic conductivity material. Either a normal distribution or an exponential distribution was fitted to log K measurements. It will be seen that such adjustments of removing extreme values from a data set are not necessary using copulas. In fact, when using copulas, the marginal distribution, which can include outliers, is treated independently of the spatial structure of K. Any distribution function, parametric or nonparametric, can be fitted to the sample of measurements. The distribution function that fits best to the data should be chosen to represent the data.

2.1. From Classical Geostatistics to a Copula Model

[11] Commonly, spatial dependence structures are modeled by a semivariogram math formula (equation (1)) or a covariance function math formula (equation (2)), where h is the separation vector between two measurement locations and math formula is the value of the measurement at location xi. Hence, both math formula and math formula are average variances of the measurement values for all pairs of points which are separated by a distance of h. The arithmetic average of the marginal distribution is denoted as math formula, the number of pairs for a certain separation distance is denoted as math formula.

display math
display math

[12] Another possibility to model spatial dependence are indicator variograms. Indicator variables math formula are defined using a cut off value math formula (equation (3)). The form of an indicator variogram is given in equation (4). Extreme values or skewed marginal distributions impact an indicator variogram less than an covariance function.

display math
display math

[13] For a Gaussian or symmetric spatial dependence, the indicator variogram corresponding to a threshold of the inverse of a quantile q should be identical to the indicator variogram corresponding to the inverse of math formula. Figure 1 shows empirical indicator variograms for the quantiles math formula and math formula, all of which are different, strongly indicating non-Gaussian dependence.

Figure 1.

Indicator variograms corresponding with thresholds chosen according to different quantiles based on the Borden K data set in horizontal direction.

[14] Note that the discussion related to indicator variograms is used here to demonstrate that a nonsymmetric type of dependence is seen in the data. However, the subsequent analysis is not based on indicator variograms but on much more general models using copulas, which can express a different degree of dependence in different quantiles. There is a relation between a theoretical copula model and indicator variograms [Bárdossy, 2006].

[15] Instead of these averaged measures described so far, the structure of spatially distributed parameters can be analyzed and modeled using copulas. Any multivariate distribution math formula can be represented with a copula C [Sklar, 1959]:

display math

where math formula represents the ith one-dimensional marginal distribution of the multivariate distribution.

[16] Assuming that C is continuous, then the copula density math formula can be written as

display math

A bivariate copula density expresses a symmetrical dependence with respect to the minor axis math formula of the unit square, if

display math

[17] A Gaussian copula models this symmetric dependence; a family of non-Gaussian copulas representing nonsymmetrical dependence was introduced byBárdossy [2006] and extended by Bárdossy and Li [2008]. Another overview with a focus on uncertainty estimates related to copula-based interpolation is given byHaslauer et al. [2008]. For a detailed explanation of how to analyze the spatial dependence structure using empirical bivariate Copula densities and for estimating the parameters of a copula with a non-Gaussian dependence structure, the interested reader is referred to the above mentioned publications.

[18] Two measures that summarize information from the bivariate copula densities over multiple separation distances are the copula-based rank correlation (“Rank”,equation (8)), and the asymmetry of an empirical bivariate copula density (“Sym”, equation (9)). Both are measures that show certain aspects of the data and of simulated fields. Their properties are explained in the following paragraphs. Neither of the two measures is a model that might be used for simulation.

[19] “Rank” is a measure for the degree of the spatial dependence. For a pair of points with a short separation distance, the rank correlation is expected to be very high, that is close to unity. Its value will decrease to a value close to zero for large separation distances. The copula rank correlation is a measure comparable to the semivariogram or the spatial covariance function. It expresses the correlation coefficient of empirical bivariate copula densities for varying separation distances. The distance where the copula rank correlation reaches zero is effectively the range of traditional geostatistics [Bárdossy and Li, 2008].

[20] “Sym” is a measure for the symmetry of the empirical copula density function representing which range of quantiles the density is strongest (equation (9)). High positive symmetry values indicate strong dependence for high quantiles and high negative symmetry values indicate strong dependence for low quantiles. A Gaussian-type dependence structure is fully symmetric in the sense ofequation (7), and its measure of symmetry math formula. Gaussian dependence is most pronounced for extremes and equally strong in the extreme high and in the extreme low values. In a spatially distributed K field with Gaussian dependence structure, both the highest-K and lowest-K values have the strongest dependence and hence form isolated patches. The more positive the copula symmetry is, the higher the degree of dependence for high quantiles, the more patches of high-K zones form, and the higher the likelihood that connected and continuous zones of low K are present. Similarly, the more negative the copula symmetry, the higher the degree of dependence for low quantiles, and the higher the chance for presence of connected high-K zones that could rapidly advect solutes.

[21] Each of these measures is calculated for a given value and/or direction of the separation vector h. math formula is the value of the empirical distribution of the measurements z at location xi.

display math
display math

[22] The spatial dependence structure was analyzed by plotting the measures “Sym” (equation (9)) and “Rank” (equation (8)) over varying separation distances, which are compared with traditional measures, the semivariogram (equation (1)) and the covariogram (equation (2)) (see section 2.1). Figures 2 and 3 show these four measures in the horizontal and in vertical direction, respectively.

Figure 2.

Measures of spatial dependence (semivariance γ[(cm s)−2], covariance “cov” [(cm s)−2], rank correlation math formula, and symmetry “sym” [−]) over separation distance in horizontal direction for the Borden K data set.

Figure 3.

Measures of spatial dependence (semivariance γ[(cm s)−2], covariance “cov” [(cm s)−2], rank correlation r[−], and symmetry “sym” [−]) over separation distance in vertical direction for the Borden K data set.

[23] The semivariograms compare well with a traditional variogram published [Woodbury and Sudicky, 1991]. There is no visible difference between the covariogram and copula rank correlation, which is attributable to the fairly symmetric marginal distribution. As expected from the empirical bivariate copula densities, the rank correlation is slightly higher for short separation distances in the vertical direction (∼0.7 m) than in horizontal direction ( math formula). From the copula-based rank correlations, the anisotropy ratio in the horizontal-to-vertical integral scales is found to be math formula, which is slightly smaller than previously estimated value of math formula [Woodbury and Sudicky, 1991].

[24] For short separation distances, the measure “Sym” is positive in both directions, reaching zero, and becoming negative in the horizontal direction. This means that for separation distances larger than ∼0.3 m in the vertical direction, the strongest dependence is in the low quantiles. This property is ignored here, because to our knowledge no copula model exists that can describe a behavior corresponding to a sign-change in the measure “Sym”. However, the error introduced is considered to be small because the important factor influencing microstructure is the fit between theoretical and empirical copula for short separation distances.

[25] From analyzing the semivariograms, covariograms and rank correlations separately for both cross sections AA' and BB' and jointly (red, blue, and green on Figures 2 and 3, respectively), it is evident that the joint behavior is a mixture of results from both cross sections. This phenomenon has been observed in the past (section 1, Woodbury and Sudicky [1991]) and is much more pronounced in the measure “Sym.”

[26] A purely Gaussian spatial dependence structure would require the symmetry measure to be zero, which was not observed. However, this deviation from zero could be by chance. Hence, a simple bootstrap method was employed, 500 data from the Borden data set were simulated using the Gaussian copula fitted to the Borden K data, each time the symmetry was calculated for a given set of separation distances, both in vertical and in horizontal directions. The shaded areas on the symmetry distance diagrams provided on the bottom panels of Figures 2 and 3indicate the zones within which with 90% confidence symmetry should occur, if the spatial dependence structure is Gaussian. For small separation distances in both horizontal and vertical directions, the symmetry of the Borden data set is outside of these bounds, indicating a 90% significant non-Gaussian spatial dependence in the Borden data set for short separation distances.

2.2. Copula Models for the Borden K Data Set

[27] Copulas are capable of modeling a different degree of dependence for different quantiles, and thus contain more information compared to an average variance for a given lag distance, as in semivariograms. The different degree of dependence can be visualized best by examining bivariate copula density functions. Figure 4 shows two bivariate copula densities for two different separation distances both in horizontal (Figure 4a) and in vertical direction (Figure 4b). For larger separation distances, the structure of the bivariate copulas flattens out, reaching ideally a flat copula density, which is reached at a separation distance corresponding to the “range” in traditional geostatistics. With shorter separation distances of empirical bivariate copula densities, it is evident that the highest values of K which are scaled to be close to unity in the copula-space unit square do not have the same degree of dependence as the lowest values of K which would be expected if the spatial dependence structure was Gaussian. Rather, the highest degree of dependence occurs in the highest quantiles, both in the vertical and horizontal directions. This means, for a pair of points separated by a short separation distance, that the probability of encountering another high-K value at the second point is high, if also a high-K value was measured at the first point. There are two main differences between the horizontal and the vertical direction: First, the required separation distance to reach a flat copula structure is longer in the horizontal direction (anisotropy). Second, the maximum correlation is weaker in horizontal direction.

Figure 4.

Bivariate copula densities for different separation distances and directions. Shown are (a) empirical horizontal, (b) empirical vertical, (c) corresponding theoretical bivariate copula densities.

[28] One possibility to model non-Gaussian dependence is to transform a Gaussian dependence structure nonmonotonically. Such a nonmonotonic transformation can be the “v transformation” as shown byBárdossy and Li [2008], which transforms a standard normally distributed function Y into a nonsymmetric distribution X using two parameters m and k (equation (10)).

display math

[29] A theoretical copula model was fitted to the entire Borden data set using the maximum likelihood based approach described by Bárdossy and Li [2008]. Figure 5 shows the values of the optimized likelihood for various math formula pairs. The maximum likelihood was obtained for the combination math formula.

Figure 5.

Contour plot of the Log-Likelihood functionL that was optimized for various parameter sets of math formula.

[30] The simulations for the two types of spatial K fields, Gaussian and non-Gaussian, were chosen such that the two types cannot be distinguished by their second-order moments. This means they are identical in their rank correlations, which means in the Borden case, for practical purposes that their variograms are identical.Figure 6shows that the rank correlation based on the measurements (black lines), is identical to the average rank correlation for all of the Gaussian simulations (thick blue lines), and to the average of the non-Gaussian simulations (thick red lines). The thin blue and red lines indicate the rank correlations of each simulation. The underlying structure of both the Gaussian and the non-Gaussian fields is a superposition of common variogram types (nugget, exponential, spherical, Gaussian). There is strong agreement between empirical, Gaussian, and non-Gaussian semivariograms (Figure 7); similar agreement was observed between rank correlations.

Figure 6.

Rank correlation based on the measurements (black line), average rank correlation of 200 Gaussian simulations (thick blue line), and average rank correlation of the 200 non-Gaussian simulations (thick red lines). The thin blue and red lines indicate the rank correlation of each simulation.

Figure 7.

Semivariograms based on the measurements (black line), average semivariogram of 200 Gaussian simulations (thick blue line), and average semivariogram of the 200 non-Gaussian simulations (thick red lines). The thin blue and red lines indicate the semivariograms of each simulation.

[31] The best fitting theoretical correlation structure to the empirical rank correlation for the Gaussian fields is a superposition of multiple variograms given in the form “sill · type (range)”: 0.35 · nugget(0.0) + 0.33 · exponential(3.4) + 0.32 · gaussian(9.4) (section 2.2), which corresponds generally well with the model used by Sudicky [1986] and has an exponential component as stipulated by Ritzi and Allen-King [2007]. This structure has a relatively high nugget of 0.35, which is math formula higher than the one found by Sudicky [1986]. However, here the spatial structure is fitted to the empirical rank correlation, whereas earlier the structure was fitted to a standardized covariogram that did not include outliers. A larger nugget implies a larger error term to the spatial field and would reduce any effects the spatial field of K might have on solute transport.

[32] For the non-Gaussian K fields, a Gaussian structure had to be found such that their structure after the v transformation would be identical to the structure of the Gaussian fields and to the structure of the Borden aquifer. Such an involved procedure of generating fields whose spatial rank correlations are identical was deemed necessary because the results of the solute transport analysis are only then comparable between the different types of spatial dependence structures. The correlation structure math formula with mean math formula of the “underlying” field X is normal math formula (index “ug” for “underlying Gaussian”) and needs to have a spatial structure defined by a covariance structure such that the new correlation structure after the transformation Rv fits to the empirical correlation structure of the Borden data set. This is expressed in equation (11): A function f is required that maps the spatial dependence of the underlying Gaussian field Rugto the correlation of field with the non-Gaussian dependenceRvand that considers the two parameters of the v-copulam and k. With the help of equation (12) Gaussian correlations math formula can be converted to rank correlations R [Joe, 1997]. The best fitting correlation structure for the underlying Gaussian field (Rug) that were used for v-transformation is math formula. This part of the methodology assures that the spatial second-order moments of the Gaussian K fields and the fitted v copula-based K fields are identical. Only then a comparison of solute transport characteristics makes sense.

display math
display math

[33] An example of simulated K fields for the Gaussian dependence structure and for the non-Gaussian v copula-based dependence structure is shown onFigures 8 and 9, respectively. Both types of fields do not seem to differ by much visually, and their rank correlations are identical, although they are significantly different statistically. The Gaussian field has the strongest and equally strong dependence in the extreme highs and extreme lows. In spatially distributed fields, this dependence results in isolated patches with such extreme values. In the v copula-based fields, the dependence is less strong in both extremes, but still highest in the high quantiles. The connected zones occur for medium to low values. Anisotropy leads in both cases to layered structures.

Figure 8.

Example of a simulated Gaussian spatial dependence structure fitted to the Borden data set.

Figure 9.

Example of a simulated v copula spatial dependence structure fitted to the Borden data set.

2.3. Marginal Distribution of Hydraulic Conductivity

[34] One advantage of using copulas for spatial statistics is that the type of marginal distribution is arbitrary. It could even be a nonparametric distribution. In this paper, the marginal distribution of the 1152 measurements of K was described using a Weibull distribution function (equation (13)) with the parameters math formula and math formula which fitted best in a maximum likelihood sense among a variety of types of distribution functions. This is the distribution selected to produce the realizations of the K fields used in the simulations.

[35] The data set has a mild positive skewness of math formula and a fairly small variance in the natural logarithm of K of math formula (Table 1). For comparison, spatially distributed K fields used by Zinn and Harvey [2003]exhibit a much higher ln(K)-variance of 9. Generally, the range between smallest and largest K value is ∼30 m d−1 and is considered to be a fairly small portion of hydraulic conductivities encountered in geologic media (Table 1, Figure 10),

display math
Figure 10.

Histogram and fitted theoretical distribution function of the Borden K data set. Shaded in pink is 1 order of magnitude range.

Table 1. Descriptive Statistics Based on Various Units of K of the Borden Data Set
 cm s−1m d−1ln(m d−1)lg(m d−1)
Mean0.0111349.622.110.92
Var.0.00003224.260.390.07
Std. dev.0.0057004.930.620.27
Min0.0000570.05−3.02−1.31
Max0.03291428.443.351.45
Range0.03285728.396.362.76
Skewness0.810.81−1.66−1.66
Kurtosis0.500.507.387.38

3. Groundwater Flow and Solute Transport Model

[36] Saturated groundwater flow through the constructed K fields was simulated under steady state conditions and the transport of a conservative solute body from an instantaneous release is used to contrast the spreading behavior for both the Gaussian and non-Gaussian structured K fields. The robust numerical model HydroGeoSphere [Therrien and Sudicky, 1996] is used to solve standard flow and advection-dispersion equations with careful attention paid to space and time discretization issues to ensure solution accuracy.

[37] The domain size was chosen to be long enough in all dimensions such that the plume analysis, especially the calculation of the macrodispersion coefficient at late time (section 4) is done before the plume touches a boundary of the domain. A maximum of 1% of current maximum concentration at any boundary node was allowed. The grid spacing needed to fulfill two criteria. First, it was to be fine enough to avoid numerical dispersion and oscillation which was ensured by grid refinement tests. Second, one spatial correlation length contains at least five grid elements in all dimensions. The analysis was performed in two dimensions ( math formula nodes, 200 Monte Carlo runs for each type of field) and three dimensions ( math formulamillion nodes, 100 Monte Carlo runs for each type of field). With the help of several test runs it was found that the domain size had to be at least ten, preferably twenty times the correlation length in the main direction of flow and at least six times the correlation length perpendicular to the main direction of flow. The three-dimensional analyses were considered necessary, especially in non-Gaussian K fields, due to the possibility of connected zones of high-K or low-K barriers to flow. Basic model parameters are listed for the two- and three-dimensional simulations inTable 2. The setup of the model is sketched on Figure 11.

Figure 11.

Sketch showing the dimensions (length l), the discretization (number of elements ne and grid spacing dx), the initial and boundary conditions for the three-dimensional domain for the numerical tracer tests.

Table 2. Parameters of the Numerical Models Used for Two and Three Dimensions
 2D3D
Domain Length [m]
lx100075
ly40010
lz4
 
Number Elements [-]
nx500375
ny20050
nz200
 
Grid Spacing [m]
dx20.2
dy20.2
dz0.01
 
Hydraulic Gradient [-]
i = dh/dl0.10.0045
 
Local Dispersivities [m]
math formula20.21
math formula0.20.021
math formula0.020.0021

[38] In the two-dimensional case, in addition to the Gaussian and the v transformed K fields, other transformations were performed to evaluate their effects. For comparison, a field that corresponds to a chi-shaped transformation (“chi”) and an extreme transformation (“extr”) were analyzed. For each transformation, the inverse transformation (indicated by a minus sign) was analyzed. For an inverse transformation, the order of the random sample of K values is reversed. Inverse transformations mean that the parameters of the copula stay identical, only the dependence is inverted. If a copula has strongest dependence in high quantiles, then the inverted copula has strongest dependence in the low quantiles. In all transformations, the final fields are not distinguishable by their first two moments.

[39] The steady state flow model is a steady state system with constant head boundary conditions at the left and right boundary. All other domain boundaries were chosen to be Neumann boundaries with zero flow conditions. The key component of the flow model is the spatially distributed K field.

[40] In the solute transport model, the initial concentration at all nodes was set to zero, except in a small zone at the left side of the domain which has a unity concentration assigned to represent a slug injection of solute. The K values within a radius of 5 grid lengths around the injection node were set to the geometric mean of all K values of the domain, to avoid that the early time plume behavior significantly influences travel times of the solute plume.

[41] The pore geometry in real porous media varies in space, which results in a heterogeneous flow field. Variability in the flow field causes an additional deformation of the transported solute plume. This spreading due to velocity variability at different scales is called dispersion. Two dispersion processes are important for this: hydrodynamic dispersion (mechanical dispersion plus diffusion) and macrodispersion.

[42] Structures in the conductivity field that are smaller than the grid cell size are not explicitly resolved from a numerical model. Therefore, dispersive effects on this Darcy scale are considered within the mathematical model. Dispersion processes are modeled to have the same influences on a solute plume as molecular diffusion due to Brownian motion, only on another scale: A spreading of the solute plume in every direction. Therefore, both processes are summarized in the hydrodynamical dispersion/diffusion coefficient Dlocal (equation (14)), where math formula is a characteristic length scale, v is the groundwater velocity in flow direction, and math formula is the molecular diffusion coefficient ( math formula),

display math

[43] Structures bigger than the grid size are resolved within the numerical model. Therefore, such structures will result in a heterogeneous flow field. This leads to dispersion on a greater scale, or macroscale, due to the same reasons as explained above. The total dispersion in a groundwater flow and solute transport model with a heterogeneous K structure is therefore the sum of macrodispersion and hydrodynamic dispersion (equation (15)).

display math

In this study, Dlocal was decreased as much as numerically possible. By doing so, the effects of the heterogeneous K field on solute transport stood out.

4. Analysis of Spatial Moments of Solute Plume

[44] This paper adapts the notation of Freyberg [1986] to describe the zeroth, first, and higher spatial moments, and their relation to plume evolution and spreading rates. Spatial moments are integrated measures to describe complex plume behavior. In three dimensions, the nth moment is defined in equation (16); the nth central moment is defined in equation (17), where math formula and math formula is the concentration at a location math formula and at time t; math formula is the porosity; math formula are spatial coordinates.

display math
display math

[45] The zeroth moment math formula (equation (16)) corresponds to the total solute mass in the system and the first standardized moment (equation (18)) corresponds to the coordinate math formula of the center of mass.

display math

[46] The ratio between second central moments and the zeroth moment is a measure for the bulk spatial spreading of the plume in each direction. The third and fourth standardized central moments are measures for the skewness and the kurtosis of the solute plume.

[47] The analysis of the spatial moments is critical to this study, because the second central spatial moments are linked to the macrodispersion tensor as described in equation (19). As the conservative tracer migrates through the K field, driven by a head gradient, it is expected that the second central spatial moment initially grows at a nonlinear rate but reaches a linear behavior at later time when the macrodispersion coefficient reaches an asymptotic value.

display math

[48] For every Monte Carlo simulation and for every time step, the spatial moments were calculated and later averaged to eliminate dispersive artifacts related to the “dispersion” in plume centroid positions across realizations as pointed out by Rajaram and Gelhar [1993].

[49] The set of Monte Carlo realizations for each type of K structure was analyzed to produce an average concentration plume and its uncertainty as described by the concentration variance. For multi-Gaussian fields, a symmetric variance plume is expected.

4.1. Results of Solute Transport Behavior in Two Dimensions

[50] First it has to be pointed out that the velocity of the center of mass of all plumes modeled in two dimensions moves on average with almost identical velocity, independent of the type of spatial dependence structure that was used for simulating the K fields (Figure 12, bottom panel; m100). This fact makes the following results related to spatial moments comparable among each other. If the average first central moments of a conservative plume are identical, the effective K that represents the effective K fields should be identical. It was thought that the best measure of an effective K in the given case is the K obtained when considering each Monte Carlo simulation as a Darcy experiment. The resulting empirical distributions of K based on the Gaussian, the v transformed, and the inversely v-transformed dependence structure is shown on Figure 13. Those distributions are fairly similar, however an important trend can be seen: The distribution of K values based on the v transformed K field, the one that fits best to the Borden data, is slightly shifted toward higher-K values compared to the Gaussian K structure. For lower quantiles, the v-transformed K values are slightly shifted toward lower hydraulic conductivities. This observation is comparable to observations made byZinn and Harvey [2003].

Figure 12.

(bottom) First (m100), (middle) second (m200), and (top) third (m300) spatial moments in principal direction of flow for two-dimensional simulations. The first moment indicates the position of the center of mass. The second moment is corrected for local dispersion and indicates the spreading of the plume. The third moment indicates the skewness of the plume.

Figure 13.

Empirical distributions of K obtained from running each two-dimensional Monte Carlo simulation as a Darcy experiment.

[51] The spreading of the plumes as measured by the second central spatial moment over time exhibits a typical behavior with nonlinear growth in early times and linear growth in late time Freyberg [1986]—independent of which spatial structure was used for the simulation of the K field (Figure 12, middle, m200). However, the slope of the linear portions of the graphs is different. Most importantly, slopes based on the fitted Gaussian (thick dark blue) and the fitted non-Gaussian v copula (thick dark red) spatially distributed fields are different. This difference occurs despite the fact that both of them are not distinguishable by second-order moments (their semivariograms) and despite the fact the average velocity of their center of masses are identical. The slope of second central moment over time is milder for the v transformed non-Gaussian spatial dependence structure of K than for the Gaussian structure. Low- to medium-K values are connected in the non-Gaussian v copula model, leading to blocking of the spreading solute, and a smaller spreading in general, as indicated by a smaller value of macrodispersion derived from the slope of the linear proportion (equation (19)): math formula and a math formula, which equates to dispersivities of 1.08 m and 0.74 m (Table 3).

Table 3. Average Dispersion and Dispersivity Values for Different Dependence Models and Their Inverted Structures, for Two-Dimensional Simulationsa
 gvv-chichi-extrextr-
  • a

    In all cases, math formula. The inverted structures are indicated by “-.”

math formula2.621.793.271.283.321.473.48
Difference to Gaussian math formula0−0.830.65−1.340.7−1.150.86
math formula1.080.741.350.531.370.611.44

[52] In addition to the two type of fields fitted to the Borden data (g and v), observations can be made from the other structures that were analyzed for comparison purposes. The dispersivities based on dependence models with strongest dependence in high quantiles (v, extr, chi) are smaller than the dispersivity values based on the Gaussian dependence structure ( math formulam). Similarly, the dispersivities based on dependence models with strongest dependence for low quantiles (v-, extr-, chi-) are math formula m. Table 3gives an overview over the macrodispersivity values calculated based on Gaussian and v transformed dependence structures of K, for the two other types of non-Gaussian spatial dependence (“chi” and “extr”), and for the inverted structures. The inverted fields have more and on average bigger zones of connected high-K zones, which act as “channels” or “windows” and transport on average more of the solute faster than average. The result are generally larger macrodispersion coefficients based on spatial fields with copulas of negative symmetry. This rule could be violated for extremely skewed marginal distributions. The slopes of their second central moments are symmetric around the Gaussian graph for a type of copula symmetry (Figure 12): The slopes of second spatial moments over time based on copulas with positive symmetry are smaller than based on Gaussian copulas, the slopes based on copulas with negative symmetry are larger.

[53] Figure 12 (top) shows the third central moment in main direction of flow, normalized by the second central moment. All curves flatten out after math formula250 d, but at a different level, which means that the mean concentration plume is somehow distorted in a different manner for every underlying dependence model of K. The seven dependence models can be distinguished into three groups, similarly as was the case for the second central moments. The third spatial moment with an underlying Gaussian K field reaches at late time a value of math formula. The third spatial moments with an underlying non-Gaussian structure with strongest dependence in high quantiles flatten out to a positive third spatial moment, and the ones with an underlying inverted non-Gaussian structure flatten out to a negative third spatial moment.

[54] A negative third moment (skewness) indicates that parts of the solute are temporary trapped in low conductivity zones while the bulk of the solute plume moves further, enhanced by connected zones of high K (inverted non-Gaussian structures). On the contrary, for the underlying non-Gaussian structures where the solute is forced to flow through connected zones of medium to low K, the skewness tends to be positive. In the latter case, only some small parts of the solute mass advect faster than the bulk solute.

[55] The same phenomena can be observed looking at average solute plumes at a given time and at plumes of variance of concentration at a given time. It becomes evident, that the average solute mass after math formula is very similar, independent of the type of the underlying spatial dependence structure of K. Figure 14a shows the average plume based on a Gaussian K field at math formula and Figure 15ashows the average plume based on a non-Gaussian v copula K field at math formula. There are only minor differences visible, such as a slightly higher peak concentration at the very center of the plume in the case of the non-Gaussian v-copula-based underlying structure. However, the difference becomes evident in the variance “plumes” (Figures 14b and 15b). For the Gaussian dependence structure, the variance plume is symmetric around the center of mass, corresponding to zero third spatial moment. The background values are quite certain, the location of the center of mass is a little less certain, but the location of the edges of the plume where the highest concentration gradients occur is least certain. In principle, this scheme is also true for the underlying non-Gaussian v-copula-based K field, however the location of the upstream edge of the plume is again less certain than the location of the edge in downstream direction.

Figure 14.

Mean concentrations and variance of concentrations calculated at time math formula in each grid cell over 200 realizations for Gaussian dependence structure in the hydraulic conductivity fields.

Figure 15.

Mean and variance of concentrations calculated at time math formulain each grid cell over 100 realizations for a non-Gaussian dependence structure in the hydraulic conductivity fields.

[56] The analytical solution for concentration variance of Vomvoris and Gelhar [1990] was used to check the numerical solutions (equation (20)). To calculate the concentration variance math formula the following parameters were used: math formula, the variance of the ln(K) data (0.39); math formula, the flow factor of Gelhar and Axness [1983]; math formula, the mean concentration gradient along the longitudinal axis of the plume where math formulawas calculated using standard three-dimensional solution of the advection-dispersion equation (Sudicky [1983], math formula from Figure 12, math formula (Table 3); math formula, transverse dispersivity, was augmented with local diffusion as described in the work of Burr et al. [1994]; li, the integral scale of the ln(K) field in the ith direction when the field is described by a hole exponential covariance function. Here, an error was introduced since the effective range of an exponential covariogram was used and not a length fitted to a hole exponential covariogram. However, the sensitivity of the lengths scales was within reasonable ranges was not found to influence the results significantly.

display math

[57] Generally the fit of the analytical solution of Vomvoris and Gelhar [1990]to average modeled concentration variances based on Gaussian K fields was good. However, the observed tailing in concentration variance when the spatial structure of K is non-Gaussian cannot be reproduced. This tailing is pronounced at math formula and weakens for larger times (Figure 16).

Figure 16.

Mean concentrations and concentration variances along principal direction of flow at math formula. Comparison between numerical results based on averages of 200 simulations based on spatially distributed K fields with “g” and “v” structures compared with analytical solutions of mean concentrations [Sudicky, 1983] and concentration variance [Vomvoris and Gelhar, 1990].

4.2. Test of Significance

[58] The results so far have indicated that there is a difference in macrodispersivity depending on the type of model used to simulate the spatial dependence of K: a Gaussian model and a non-Gaussian v-copula-based model were fitted to the Borden data set. Especially for the second moment in the interval math formula, which was used for the calculation of the dispersion coefficients, confirmation that the differences were significant and not an artifact of chance, was necessary. At any given time step, the spatial moments exhibit a positive skewness and it is supposed, that they are not normally distributed. Therefore, a Wilcoxon Rank-Sum Test was applied to the second spatial moments in x direction of the Gaussian model and all v transformed models. The test is a nonparameterical statistical procedure that tests the significance whether two distributions math formula and math formula are based on the same population or differ from each other by a shift. The null and alternative hypotheses can be formulated in three ways:

[59] 1. H01: math formula against H11: math formula, ‘ math formula ’ for at least one x.

[60] 2. H02: math formula against H12: math formula

[61] 3. H03: math formula against H13: math formula, ‘ math formula ’ for at least one x.

[62] All tests showed similar and clear results. Therefore, just one example for the comparison of the Gaussian and the v transformed model that was fitted to the Borden data set is given here. Table 4 shows the critical p values for the null hypotheses H01 with math formula representing spatial moments of the Gaussian model and math formula the spatial moments of the v copula model. The p values (significance level) can be understood as the probability of making an error if the null hypothesis is rejected. Therefore, a low p value indicates that the null hypothesis has to be rejected. In this case, H01is rejected with low probability error for all four time-steps in the relevant interval math formula. It can also be seen that for very small time-steps, it is not very likely to reject the null hypothesis. That is consistent with the data, as for smaller time-steps the spatial moments are more or less the same.

Table 4. Critical pValues of the Wilcoxon Rank-Sum Test for Spatial Second Moments at Different Times
Time [days]Critical p Value for H01
10.283744661
200.382815922
500.017914135
750.004437013
1000.026335570
1250.001146688
1500.000006460
1750.000001279
2000.000000299
2330.000000071
2670.000000029
3000.000000869

[63] To conclude, it can be stated that the spatial moments of the non-Gaussian v copula model are systematically smaller than the spatial moments of the Gaussian model. The tests showed, that the spatial moments of the v, chi, and extr models are indeed significantly smaller than the spatial moments of the Gaussian model. Also the spatial moments of the v-, chi-, and extr- models are significantly bigger than the spatial moments of the Gaussian model.

4.3. Results of Solute Transport Behavior in Three Dimensions

[64] The two-dimensional analysis presented so far is a depth-averaged model. Three-dimensional systems are more realistic, flow paths can be more connected. Similar analyses as before in two dimensions were performed in three dimensions, with the following adaptations: The hydraulic gradient in principal flow direction was adjusted to a value of 0.0045 which is the observed gradient at the Borden site. Additionally, the local dispersivities were decreased even further by 1 order of magnitude, the number of cells was increased significantly and the dimensions of the domain slightly decreased (Table 2). Vertical anisotropy was introduced (section 2.1). Four types of K fields were analyzed in four dimensions (“g”, “v-”, “g10”, “v10”):

[65] 1. “Base case”: Two types of fields, one with a Gaussian (“g”) and one with a non-Gaussian v copula (“v”) dependence structure. These two types are the exact same spatial fields as the “g” and the “v” fields in the two-dimensional simulations.

[66] 2. “Increased variance”: Two types of fields, again with the same “g” and “v” spatial dependence structure, and also with the same mean K ( math formula), labeled “g10” and “v10”. The only difference is in the variance of the marginal distribution, which is increased by a factor of ten. This resulted in a variance of math formula instead of math formula and parameters in equation (13) of math formula and math formula. The two corresponding distribution functions are shown on Figure 17. This is a significant increase in the variance, but not unrealistic for naturally occurring media. It is a nice feature of copulas to treat the marginal distribution independently of the spatial structure.

Figure 17.

Distribution functions of the marginal distributions used. Both are of the type of equation (13), one with parameters fit to the Borden data set (labeled “original”), the other with the same mean but a 10 times increased variance (labeled “increased varaince”).

[67] In the three-dimensional setting, each containing the same mass of solute, the average center of mass in the base case for both “g” and “v” K fields moves with identical velocity. The increased variance with a bigger skewness toward small K values leads for both the Gaussian and non-Gaussian dependence structure to a slightly slower movement of the plume (Figure 18, bottom).

Figure 18.

(bottom) First (m100), and (top) second (m200) spatial moments in principal direction of flow for three-dimensional simulations. The first moment indicates the position of the center of mass. The second moment is corrected for local dispersion and indicates the spreading of the plume.

[68] The spreading of the plumes as measured by second spatial moments in the base case is much less pronounced in three dimensions compared to the spreading in two dimensions. Like in two dimensions, the dispersion based on the v transformed K field is smaller than the dispersion based on the Gaussian field (compare Figure 18 (top) with Figure 12 (middle)). For both “g” and “v” in the base case scenario, math formula, which equals to a dispersivity value of 0.36 m. However, the macrodispersion in the scenario with the increased variance lead to significantly increased macrodispersion coefficients, namely math formula ( math formula) in the case of the Gaussian K field, and to math formula ( math formula) in the case of the v copula-based field. Like in two dimensions, the dispersivity is also in three dimensions smaller for a v copula-based structure than for a Gaussian based structure. The dispersion coefficients and dispersivity values are summarized for the three-dimensional simulations onTable 5.

Table 5. Average Dispersion and Dispersivity Values for Different Dependence Models and Their Inverted Structures, for Three-Dimensional Simulationsa
 gvg10v10
  • a

    In all cases, math formula. The inverted structures are indicated by “-.”

math formula0.0220.0230.2070.088
Difference to Gaussian math formula00.0010.1850.066
math formula0.360.353.941.52

[69] The increased dispersion in three dimensions is attributed to more tortuous pathways of groundwater flow, especially more blocking of flow by small-K values.Figure 19visualizes these effects by showing a cross-section of a simulated three-dimensional block with the same copula-based K field but with the two different marginal distributions. An increased variance by a factor of ten in the marginal distribution leads to an about ten times increased dispersivity value if the spatial dependence structure of the distributed K fields is Gaussian. However, if the structure is not Gaussian, this relationship seems not to hold. In the case of the structure of a v copula fitted to the Borden data with highest dependence in high quantiles, a ten-fold increase in variance of the marginal distribution was found to lead to an increase in dispersivity of a factor of math formula.

Figure 19.

Two types of spatially distributed K fields. Both with the same structure as modeled by the v copula fitted to the Borden data set. The marginal distributions are different. (a) The field shown has the marginal distribution fitted to the Borden data set, (b) the field has the same spatial structure, the same mean, but a ten times increased variance. The different marginal distribution functions are shown on Figure 17.

[70] A Darcy experiment was also conducted in three dimensions. The empirical distributions of the effective K values obtained for each simulation run are plotted on Figure 20. With a more skewed marginal distribution (dotted lines), the difference of effective K values is much more pronounced than with a more symmetric marginal distribution (solid lines).

Figure 20.

Empirical distributions of K obtained from running each three-dimensional Monte Carlo simulation as a Darcy experiment.

5. Conclusions

[71] Copulas offer a flexible tool to model a variety of spatial dependence structures, including non-Gaussian structures. The objective of this study has been to analyze if a spatial dependence structure as modeled by a non-Gaussian copula could lead to some other observable physical property compared to when the spatial dependence was modeled using Gaussian dependence. A theoretical copula model, one with a Gaussian and one with a non-Gaussian spatial dependence were fitted to a real world data set of hydraulic conductivity measured at the Borden site. The physical property that was analyzed was the spreading of a conservative solute under steady state saturated conditions. Solute transport is largely influenced by velocity variations which were introduced by the finely discretized heterogeneous hydraulic conductivity field.

[72] The theoretical spatial copula models, particularly non-Gaussian copula models used in this paper, offer a flexible stochastic model that fits better than a Gaussian model to observations. Such models can parameterize a wide range of dependence structures, and are hence more general. With such a non-Gaussian v copula model, a method was shown that can be used to simulate non-Gaussian spatial fields which are not distinguishable by their first two moments, but they are distinguishable by their spatial copulas.

[73] Different spreading behavior of solute plumes in numerical tracer tests based on those different types of K fields was observed—despite the fact that the different types of spatial K fields are not distinguishable by classical geostatistical methods, and despite the fairly homogenous setting of Borden with its small variance of K. In two dimensions, the dispersion coefficient obtained from Monte Carlo second central spatial moment analysis of the solute plumes was shown to be significantly different when K fields with a Gaussian spatial dependence were used as input compared to when K fields with non-Gaussian spatial dependence were used. In three dimensions, the dispersion coefficients were shown to be less different. Three-dimensional flow paths combined with anisotropy in vertical direction compared to vertical direction compensate much of the non-Gaussian dependence structure of Borden. However, only a fairly slight change in the marginal distribution, such that the marginal distribution is different from Borden but still within a possibly naturally occurring range and with the same mean, leads also in three dimensions to significantly different transport behavior between the Gaussian and non-Gaussian spatial dependence structures of Borden. This indicates that for a “well behaved” marginal distribution with small variance and small skewness, the spatial dependence structure might not be the most critical factor influencing solute transport behavior. However, the less well behaved a marginal distribution is, the more important becomes the type of the spatial dependence structure of K for predicting solute transport behavior, and even small deviations from a Gaussian dependence structure such as the Borden spatial dependence matter.

[74] The spatial structure of K as modeled by using second-order moments may not be sufficient in all cases to describe solute transport properties.

[75] Two spatial copula models, one Gaussian and one non-Gaussian were used to model the spatial dependence structure of K of the Borden aquifer. This paper shows that the spatial dependence structure of hydraulic conductivity of the Borden aquifer is not Gaussian at a 95% significance level for the important short separation distances, in both horizontal and vertical directions. This result is even more important considering that the Borden aquifer is quite homogeneous and has a small variance of K. This conclusion drawn from the results of a bootstrap algorithm, which is based on the Gaussian copula model fitted to the Borden K data set.

Acknowledgments

[76] This work was funded by the German Research Foundation under project Ba1150/12-1 and Ba1150/12-2. Support was given by the International Research Training Group NUPUS.

Ancillary