A geostatistical approach to recover the release history of groundwater pollutants



[1] In this paper the problem of recovering the temporal release history of a pollutant is approached with a geostatistical methodology that analyzes the pollutant concentration measured at a given time in the aquifer. The adopted methodology was developed by Snodgrass and Kitanidis [1997] for one-dimensional flow and transport. Here it is extended to the case of two-dimensional transport and additional improvements are carried out, with important consequences on technical applications. A literature numerical case study is used to assess the quality of the results and the performance of the algorithms with regard to (1) the plume sampling scheme, (2) the impact of concentration measurement errors, (3) the impact of errors on the estimated aquifer parameters, and (4) the erroneous identification of the hydraulic gradient direction. The new applications focus on the incorporation of nonpoint and multiple sources in order to quantify the relative legal liability of the different sources. The results of the numerical analysis show that the method provides a reasonable description of the release history and associated estimate error variance.

1. Introduction

[2] The increasing interest in environmental issues has led to greater attention to the quality of groundwater. The huge number of pollution events, which have occurred over the last two decades, makes the protection and restoration of groundwater of utmost importance. Scientific efforts in the subsurface flow study field have been primarily focused on flow and transport characteristics and on the corresponding parameter identification issues. Since 1990 increasing attention has been paid to the problem of recovering the release history of a pollutant. This is because the release history can be a useful tool for assessing how to share the costs of remediation of a polluted area among the responsible parties [Skaggs and Kabala, 1994]. Moreover, the knowledge of the release history gives information on the spread of future pollution and permits the improvement of remediation plans [Liu and Ball, 1999]. From a legal and regulatory point of view, it is also important to determine the release time period and the highest values of concentration released [Snodgrass and Kitanidis, 1997].

[3] The problem of deducing a release history from concentration data, which are measured in a limited set of locations in the aquifer, belongs to the inverse problem class whose solutions do not satisfy the mathematical requirements of existence, uniqueness and stability. A few methods have been proposed in literature for the recovery of a release history, using different approaches [Skaggs and Kabala, 1994; Snodgrass and Kitanidis, 1997; Woodbury et al., 1998]. Among these methods, we have found that the geostatistical approach (GA), set up by Snodgrass and Kitanidis [1997] for a 1-D flow and transport case, is a promising tool for further practical developments.

[4] This study deals with an extension of the GA and the subsequent testing of the new developments. Some preliminary results of the first improvement of the procedure, i.e., the extension from 1-D to 2-D, are given by Butera and Tanda [2001]. A brief description of the GA follows to introduce the proposed extensions; for more details on the method, see Snodgrass and Kitanidis [1997].

[5] According to Snodgrass and Kitanidis [1997], the restored release history, s(t), can be considered a random process, defined through its probability density function and its statistical moments, as it cannot be determined without uncertainty, due to the dispersion phenomena; s(t) can be usefully defined as an unknown random N × 1 vector, obtained from the discretization of the unknown function. The concentration data zi, observed at M locations and at time T, are related to s(t) through the equation z = h(s,v) + r, where v is a vector that includes the aquifer parameters and r is the measurement error vector (bold characters denote vector or matrix). Assuming that the aquifer parameters are known, h(s,v) reduces to h(s). For a conservative solute, the relation between the observed concentration z and the solute input s is linear, therefore it is possible to write z = Hs + r where H is the transfer matrix of M × N size.

[6] The methodology is applied in two steps. First, the structural analysis is performed to estimate the parameters of the random process, the mean coefficient vector (β) and the parameter vector of covariance function (ϑ). The elements of the vector s(t) are then determined, and the error covariance matrix is computed (for details, see Kitanidis [1995, 1996] and Snodgrass and Kitanidis [1997]). Snodgrass and Kitanidis [1997] suggested to apply a useful transformation, from s to equation image, to constrain the solution to be nonnegative, as the nature of the concentration requires: equation image = α(s1/α − 1) where α is a positive number so that equation image > −α. For the analysis of the statistical properties of s(t), given equation image, see Kitanidis and Shen [1996].

2. Two-Dimensional Transport From a Point Source

[7] The 2-D description can be of remarkable interest in real field conditions when dealing with transport of pollutant at regional scale that is characterized by a 2-D flow behavior. It is rather unusual to have 3-D field data that are so reliable and complete as to be able to fully describe the 3-D behavior of the pollutant plume; the 2-D description is therefore used as a technical approximation and, in this context, the present developments can be considered as a consistent approach.

[8] We now consider the transport of a nonreactive contaminant in a 2-D confined saturated aquifer where the flow can be described on a horizontal xy plane. We assume a steady uniform flow in the x direction with effective velocity v and given constant dispersion coefficients. For a conservative contaminant, the advection-dispersion equation is given by

display math

where C(x, y, t) is the pollutant concentration in the groundwater, Dx and Dy are, respectively, the longitudinal and transversal dispersion coefficients, and F(t)δ(x − x0) δ(y − y0) is the source term representing the solute injection located at x0, y0; F(t) is the contaminant mass discharge (i.e., Mass/Time), introduced into the aquifer. The source function F(t) is equal to the concentration history, s(t), times the injected water discharge, q(t): F(t) = s(t)·q(t). In the following, it is assumed that q(t) is small enough that it does not affect the uniform groundwater flow.

[9] For a contaminant spreading in an infinite plane from a point source located at x0 = 0, y0 = 0, given the following initial and boundary conditions: C(x, y, 0) = 0, C(x, y, t) = 0 x→±∞, C(x, y, t) = 0 y→±∞, the analytical solution of equation (1) can be written as a convolution integral:

display math


display math

is the kernel (or transfer) function. g(x, y, T − t) relates the mass discharge injected at time T − t to the concentration data observed at time T in x, y. The time t = 0, the lower limit of integration, occurs before the beginning of the release (for instance before opening the waste disposal or the pollutant well). In order to extend the procedure outlined by Snodgrass and Kitanidis [1997] for the 1-D transport problem to the present 2-D case, each term of the H matrix is defined as follows:

display math

or, in the constrained case where the unknown s is replaced by equation image (for details, see Snodgrass and Kitanidis [1997]),

display math

2.1. Application to a Numerical Example

[10] To test the 2-D procedure we have developed a numerical example. The release history of a nonreactive pollutant spreading in a 2-D aquifer from a point source located at x0 = y0 = 0 is given by the following function:

display math

which has been used by several authors [e.g., Skaggs and Kabala, 1994; Snodgrass and Kitanidis, 1997] to test 1-D cases.

[11] The aquifer is assumed to be homogeneous with a known velocity and dispersion coefficients. The synthetic pollution event is obtained through the numerical integration of equation (2) at T = 300, where s(t) is given by equation (5), q(t) = 1, and v = 1; Dx = 1 and Dy = 0.1. All the quantities are made dimensionless.

2.2. Impact of Sampling Interval

[12] Assuming a sampling scheme limited to the line y = 0, three different sets of data are used to recover the release history. The investigations focus on the agreement between the true s(t) with the computed one, and on the range denoted by the 5% and 95% quantile. Figure 1 illustrates the three data sets that were analyzed. Measurements are taken with uniform intervals between x = 0 and x = 300. Sets A, B and C, contain 11, 21 and 31 measurements, respectively. In the application of the geostatistical method, a Gaussian covariance function and a constant, but unknown, mean of the random process s(t) is assumed. The geostatistical method is applied in the constrained case. The results of the structural analysis for data sets A, B, and C are listed in Table 1; they are obtained starting from the same initial guess (ϑ0).

Figure 1.

Sampled concentration at T = 300, y = 0; case A, 11 data; case B, 21 data; case C, 31 data.

Table 1. Structural Analysis Results: Variance (ϑ1) and Correlation Length (ϑ2)
 ϑ1 Initial Valueϑ2 Initial Valueϑ1 Estimatedϑ2 EstimatedNumber of Iterations
Case A0.4200.32417.117
Case B0.4200.32015.3212
Case C0.4200.27614.4720

[13] The computed release histories (the median values of the pdf), together with the stripe delimited by 5% and 95% quantile, are depicted in Figure 2; set B seems to be the best compromise between the number of samplings and a close-fitting with equation (5), as it shows an inter-quantile stripe with a width comparable to that obtained from a larger amount of data (set C). Set B is used later in this study to check the performance of the geostatistical method in the presence of errors.

Figure 2.

The computed release history s(t) and the 95% and 5% quantile lines; case A, 11 data; case B, 21 data; case C, 31 data.

[14] In the field cases, when the solution is not known, techniques for assessing the reliability of the results have to be applied; very good results can be obtained applying a validation procedure (similar to the validation of kriging interpolation) using different sets of available data and comparing the different release histories obtained.

2.3. Impact of Erroneous Measurements and Hydraulic Parameters

[15] The results shown in Figure 2 are obtained under optimal conditions of exact concentration measurements and hydraulic parameters. In field applications, however, measurement errors and uncertainties in the hydraulic parameters usually occur. It seems then appropriate to check the proposed method in the presence of measurement errors and inaccurate hydraulic parameters. To test the impact of these errors on the results, the two steps of the geostatistical procedure (i.e., the structural analysis and the calculation of the release function) are performed using different erroneous input values of the velocity or the dispersion coefficient, the hydraulic gradient direction and the concentration data. The impact of these errors on the estimated release history is discussed in the following.

2.3.1. Erroneous Velocity Intensities

[16] A 10% error in the velocity intensity has been assumed when computing the release history; Figure 3 illustrates the obtained results (the median values of s(t)) with v = 1.1 and v = 0.9 in equation (4). Figure 4 shows the estimate error variance of s(t) for the true and the erroneous velocity values that are used. Figures 3 and 4 reveal that a velocity underestimation leads to an earlier start of the release process, due to the advection rule. As a consequence of the earlier start, the release peaks are overestimated, since the dispersion process acts for a longer time. The uncertainty, quantified by the estimate error variance (Figure 4), increases significantly. The opposite occurs for v = 1.1, where a notable underestimation of s(t) is obtained, while the estimate error variance is comparable with the v = 1 case.

Figure 3.

The restored release history when the velocity value is erroneous.

Figure 4.

The estimate error variance behavior when the velocity value is erroneous.

2.3.2. Erroneous Dispersion Coefficients

[17] Errors in the dispersion coefficient values can be ascribed to the inaccurate evaluation of the dispersivity and/or the velocity which, for Peclet numbers greater than 100, determine the dispersion coefficient values [Bear, 1972].

[18] The tests are carried out considering a ±10% error on Dx and Dy (Dx = 1.1, Dx = 0.9; Dy = 0.11, Dy = 0.09). An erroneous longitudinal dispersion coefficient value, which is not linked to an error in the velocity, keeps the peak and the saddle points of the computed release history in phase with the true solution (Figure 5); the fluctuation amplitude of the estimate error variance is however increased when the longitudinal dispersion is overestimated and reduced when the longitudinal dispersion is underestimated (Figure 6). As shown in Figure 6, the estimate error variance increases for Dx = 1.1, while the opposite occurs for Dx = 0.9. The impact of errors in the transversal dispersion coefficient Dy (not reported here for the sake of brevity) is remarkably lower, both on the computed release history and on the estimate error variance. This behavior can be ascribed to the location of the sampling points on the y = 0 line which reduces the impact of the Dy coefficient in equation (2).

Figure 5.

The restored release history when the longitudinal dispersion coefficient is erroneous.

Figure 6.

The estimate error variance behavior when the longitudinal dispersion coefficient is erroneous.

2.3.3. Errors in Flow Direction

[19] Errors in the identification of the main flow direction can also occur in the real field applications. In the assumed reference co-ordinate system, which originates at the source point and has the x axis aligned with the flow direction, they can be managed as errors in the data coordinates. Assuming an error in flow direction of γ degrees, two different illustrative scenarios can be considered; in the first situation (Figure 7, case D) measurements are collected outside y = 0 but ascribed to y = 0, sampling values lower that the true ones. In the second scenario (Figure 7, case E) measurement are collected on y = 0 but ascribed to a line outside y = 0; then values higher than the true ones are detected.

Figure 7.

Erroneous main flow direction.

[20] These incorrect assumptions lead to remarkable discrepancies between the restored release history and the true one. Figure 8 shows the results obtained for an error of γ = 3°: as expected, for case D, the computed release history remains lower than the true one, while the concentrations of the release process are greatly overestimated in case E. Given the hydrodispersive parameters, to justify the higher concentration data C(x,T) the algorithm computes a higher pollution injection rate; the same trend can be recognized for the estimate error variance (Figure 9).

Figure 8.

The restored release history with an error of 3° in the main flow direction: case D and case E.

Figure 9.

Estimate error variance of s(t) when the main flow direction is erroneous: case D and case E.

2.3.4. Measurement Errors

[21] To test the geostatistical methodology in the presence of measurement errors, which are to be expected in the real field due to the limited precision of the measurement devices and the influence of sampling conditions, the true concentration data sampled at T = 300 are affected by a random relative error, according to the following relationship:

display math

where δn is a random number from a Gaussian standard population, ε stands for the error amplitude and the product εδn is equal to the relative measurement error in location xn.

[22] Three different error levels are considered: ε = 0.01 means a very small data error, ε = 0.05 represents an acceptable level of inaccuracy, and ε = 0.1 refers to a significant error. The erroneous data sets are shown in Figure 10 together with the exact ones. In Figure 11 the increased value of the first concentration peak in the computed release history obtained from the data with ε = 0.1 can be ascribed to the excess error in the x = 165 and x = 180 samplings. Similar considerations can be made for the data set with ε = 0.05 where, by contrast, defective measurement errors lead to an underestimation of the concentration peaks. As expected, increasing the error amplitude ε, the estimate error variance increases as well (Figure 12). The geostatistical procedure, in this condition, clearly points out the reduced reliability of the restored release history.

Figure 10.

Erroneously sampled data: ε = 0.01, ε = 0.05, and ε = 0.1.

Figure 11.

Restored release histories using noised concentration data.

Figure 12.

Estimate error variance of the release history computed using noised concentration data.

3. Two-Dimensional Transport From a Source Area

[23] In the real field, the point source pattern is a restrictive hypothesis. To account for more realistic field conditions, we have developed the 2-D geostatistical procedure to manage the release process that spreads from an area of finite size A0 (area source). The area A0 is assumed rectangular in shape with dimensions lx and ly in the x and y directions, respectively; the total pollutant mass rate injected into the aquifer can be computed as:

display math

The function f(t) stands for the pollutant mass rate per unit area, s(t) is the solution concentration history, q′(t) is the water rate for unit area (infiltration rate). q′ is assumed constant in time, uniform over the area and low enough to keep uniform groundwater flow conditions. Given the following initial and boundary conditions: C(x, y, 0) = 0, C(x, y, t) = 0 x→±∞, C(x, y, t) = 0 y→±∞, the concentration in the plume coming from an area release can be computed by integration of equation (2) over the area A0. Assuming a coordinate system originating in the centroid of the area A0, with the x axis aligned in the flow direction and a uniform f(t), it results that:

display math

with a kernel function g0(x, y, T − t) given by:

display math

In addition to the hydraulic parameters of the aquifer, the dimensions lx and ly affect the g0 function; since the error function in (9) ranges from zero to the extreme value 1, it is easy to verify that the influence of lx and ly is appreciable only if the source dimensions are comparable to the plume extension. The dimension ly can affect the analysis if ly/2 ≈ y or ly > 4equation image (if ly/2 > y), while the dimension lx is significant when lx ≈ (x − vT0) or lx > 4equation image (if lx/2 > (x − vT0)). T0 is the time elapsed from the start of the release. The geostatistical procedure outlined by Snodgrass and Kitanidis [1997] can be extended to the finite area source with a new formulation of the transfer matrix, according to the new expression of the kernel function (9).

[24] A numerical example has been performed to highlight the role of the injection area. A regional aquifer with uniform unit velocity, Dx = 0.5 and Dy = 0.05 is considered. The aquifer is subjected to a pollutant process caused by a constant unit infiltration rate q′, in an area of dimensions 5 × 5 whose centroid is located at the origin of the coordinate system (the variables are made dimensionless). The mathematical expression (5) is used for the f(t) function and a concentration field is computed using equation (8) at the time T = 300. A set of 24 concentration measurements located on the line y = 0 is considered. The release history is computed under the two assumptions that the available concentration data came from a point source (false) and a 5 × 5 area source (true). In Figure 13 the F(t) computed functions are shown; note that to obtain the F(t) in the area source case, one has to multiply the function f(t) by the total infiltration area A0. The erroneous assumption of a point source leads to an underestimation of the amount F(t) of the released pollutant.

Figure 13.

Release function computed using different approaches to model the sources.

[25] Furthermore, this underestimation causes large errors in the forecasting of the future groundwater pollution. In Figure 14 the plume developments (still shown on the line y = 0) at time T = 400 and T = 500 are depicted as a result of the computation from the two release histories in Figure 13. This result suggests that the error in the description of the pollutant source geometry leads, in the present case, to a systematical underestimation of the forecasted solute concentrations.

Figure 14.

Forecasted concentration values obtained from release histories computed with a different source geometry.

4. Recovering the Release Histories of Two Independent Point Sources

[26] A regional aquifer can be polluted by several sources; due to the advection-dispersion transport process, the plumes can overlap and collapse into a single plume. There can be remarkable uncertainty in identifying the location of active sources and in distinguishing the relative amounts and time of the occurrence of each of them. The problem of recovering the different release histories is investigated in this section. The geostatistical methodology can be very useful to achieve this aim and, in the following part, it will be developed in a simple but meaningful example.

[27] We now consider a regional aquifer with two independent polluting point sources located at P1(x1,y1) and P2(x2,y2), according to a reference system with the x axis aligned with the main flow. The flow and the boundary conditions are the same considered in the previous section. Due to the linearity of the advection-dispersion equation, the concentration values in the aquifer can be computed using the superposition method with the following expression:

display math

where s1(t) and s2(t) are the unknown release histories of the two sources if, as in section 2, a constant unit discharge q is assumed to be injected in the point sources. The starting time t = 0 for the description of the two release histories should, obviously, be the same and T is the time that has elapsed up to the sampling time.

[28] If the two processes s1 and s2 are described through N1 and N2 discrete values, equation (10) can be written using the matrix notation: z = Hs where z is the measurement vector [M × 1], H the transfer matrix of the whole process [M × (N1 + N2)] and s the vector [(N1 + N2) × 1] of the release histories. H and s can be properly denoted with matrix block notation:

display math

where H1 and H2 are the transfer matrix of the process s1 and s2, respectively.

[29] The identification of the release histories is performed simultaneously, since it is impossible to share a priori the amount coming from each source. The covariance matrix Q of the process s is also a block matrix, with dimensions (N1 + N2) × (N1 + N2), and it depends on the covariance of the two processes because, due to the assumption of independent source release, the cross covariance is zero:

display math

Once the matrixes H and Q and the vector z have been completed, the procedure to identify s is the same as in section 1. It is possible to apply, as in the following example, the constrained case methodology.

[30] A numerical application has been developed as an example: the two sources are located in P1(0,0) and P2(100,6); the hydraulic - dispersion parameters are: vx = 1, vy = 0, Dx = 1 e Dy = 0.1. The different release histories of the two sources are shown in Figure 15.

Figure 15.

Release history: (a) for source 1; (b) for source 2.

[31] It was assumed that 21 samplings have been chosen on y = 0 and 21 samplings on y = 4 at T = 300. The recovered release histories, together with the interquantiles, are shown in Figures 16 and 17 for sources 1 and 2 respectively. The results show an excellent agreement with the real release histories.

Figure 16.

Source 1: the computed release history and the 95% and 5% quantile lines (two sources).

Figure 17.

Source 2: the computed release history and the 95% and 5% quantile lines (two sources).

[32] The developed procedure can be easily applied to identify the pollution fractions in case of multiple sources or when the main important sources are inside a polluting area. In this way, the proposed methodology can be a useful tool to calculate the location of the pollutant source itself: the user, ignoring the actual location of the source, can assume different locations and the procedure will identify the most probable ones as those with the highest release amounts.

5. Discussion and Conclusions

[33] The geostatistical procedure used to recover a release history, here developed in a framework of a 2-D transport process, shows a very good performance. The proposed case studies suggest that different behaviors can develop, depending on the nature of the concentration or the hydraulic parameter errors. In the case of noise-corrupted data, the geostatistical procedure is able to recognize their reduced reliability and reports an increased estimate error variance that enlarges the stripe, delimited by the 95% and 5% quantiles, around the computed release history. On the contrary, errors in the hydraulic parameters influence the estimate error variance in different ways, by broadening or narrowing the uncertainty range. In the latter case, the underestimation of the dispersion coefficient can erroneously lead to a release history that seems to be very reliable because the calculated error variance is low. The method, in fact, is not able to recognize erroneous hydraulic parameters: the use of incorrect velocity and dispersion values is the same as the assumption of an incorrect transfer function, i.e., a wrong description of the transport process. Therefore an underestimation of the dispersion coefficients or a overestimation of the groundwater velocity lowers the effect of the dispersion mechanism and, consequently, reduces the estimate error variance. The GA has been extended to overcome the limits of a single point pollutant source. This development considers the planimetric extension of the source, and shows that the effective release area size affects the computation when the monitored plume size is comparable with the source extension. The source extent in the transversal direction to groundwater flow should be considered more often in modeling; due to the uniform direction of the main flow, the plume and the source extent in the transversal direction are closely related, and they remain comparable for a long time.

[34] We have also considered the case of multiple source points. The computed results of our numerical example suggest that it is possible to share the pollution quotas between the two sources, together with the confidence interval of the obtained release histories. The developed methodology could be extended to several sources, and it could be helpful in detecting the different pollution contributions inside a large area. The multiple source procedure can be used effectively as an investigation tool to identify the likely location of the polluting source inside a large unknown area, once the groundwater hydraulic parameters are known.

[35] An important question is whether the considered approach could effectively handle the heterogeneity of porous formations using the classical approach [Bear, 1972] by means of an equivalent homogeneous soil with appropriate constant dispersion coefficients. The stochastic approach [Dagan, 1989] has, in fact, pointed out that the dispersion coefficients in heterogeneous formations are not constant, e.g., for mildly 2-D heterogeneous formations and a uniform average flow, the longitudinal dispersion coefficient increases with the travel time and tends to an asymptotic value. We are at present working on some improvements and modifications in the GA to obtain a satisfactory restoring of the release history in heterogeneous aquifers.


[36] This research was financed in part by the Italian Ministry of Education, University and Research (MIUR). The authors would like to thank the reviewers of the paper for their useful comments.