How to Find Aquifer Statistics Utilizing Pumping Tests? Two Field Studies Using welltestpy

We present a workflow to estimate geostatistical aquifer parameters from pumping test data using the Python package welltestpy. The procedure of pumping test analysis is exemplified for two data sets from the Horkheimer Insel site and from the Lauswiesen site, Germany. The analysis is based on a semi‐analytical drawdown solution from the upscaling approach Radial Coarse Graining, which enables to infer log‐transmissivity variance and horizontal correlation length, beside mean transmissivity, and storativity, from pumping test data. We estimate these parameters of aquifer heterogeneity from type‐curve analysis and determine their sensitivity. This procedure, implemented in welltestpy, is a template for analyzing any pumping test. It goes beyond the possibilities of standard methods, for example, based on Theis' equation, which are limited to mean transmissivity and storativity. A sensitivity study showed the impact of observation well positions on the parameter estimation quality. The insights of this study help to optimize future test setups for geostatistical aquifer analysis and provides guidance for investigating pumping tests with regard to aquifer statistics using the open‐source software package welltestpy.


Introduction
The knowledge of aquifer heterogeneity is essential for predicting subsurface flow and transport but is hardly considered in practice (Fiori et al. 2016;Rajaram 2016), resulting in erroneous and uncertain predictions of contaminant distributions. The variation of hydraulic conductivity over many orders of magnitude is a well-known phenomenon (Bear 1972), but its detection and modeling still remains a challenge. Spatial resolution is usually inhibited by data scarcity due to limited accessibility and high investigation costs.
Geostatistical descriptions of aquifer properties provide information on spatial variability in hydraulic quantities. A well-established approach is transmissivity, being the vertically integrated hydraulic conductivity, which has been approximated as a log-normally distributed random variable with a spatial correlation (Dagan 1989;Gelhar 1993). Following this approach, heterogeneity can be described with a few characterizing parameters: mean μ, log-variance σ 2 , and horizontal correlation length . The first two are essential to determine average flow velocity (Sánchez-Vila et al. 2006), which is the basis for reliable contaminant transport predictions for, for example, risk assessment and groundwater remediation. The strength of heterogeneity determines the spreading of solutes, thus providing σ 2 and can be used to approximate transport parameters such as dispersion coefficients (Dagan 1989).
A shortcoming of stochastic methods is the need for spatially distributed transmissivity data for parameter estimation. Classical geostatistical one-point and twopoint analyses rely on multiple point measurements, for example, from flowmeter, permeameter or direct push, causing high cost due to the large amount of necessary data. A promising opportunity to overcome this bottleneck is using pumping test data to achieve a first guess on heterogeneity parameters. In this line, we present the missing link in form of tools to infer aquifer statistics such as log-variance σ 2 and horizontal correlation length by interpreting pumping tests in heterogeneous media. Classical interpretation methods for pumping tests assume homogeneity, such as Thiem's or Theis' solutions. These methods provide estimates of average transmissivity when the setting allows flow to sample sufficient volume of the aquifer. However, they cannot cover the impact of heterogeneity close to the pumping well (Indelman and Abramovich 1994;Leven and Dietrich 2006). There is no constant transmissivity being representative for a heterogeneous media in the nonuniform flow setting of a pumping test. Distinct representative values emerge as function of the radial distance to the pumping well: the geometric mean T G = e μ for the far field and the harmonic mean T H = T G e −σ 2 /2 at the pumping well when considering a constant head boundary conditions there. Other boundary conditions lead to different representative T values (Sánchez-Vila et al. 2006). Zech et al. (2012Zech et al. ( , 2016 were the first to provide hydraulic head solutions, which reproduce the pumping test behavior in aquifers of log-normal random heterogeneity. They made use of the upscaling method Radial Coarse Graining to derive an effective transmissivity T eff (r) as function of the aquifer statistical parameters μ, σ 2 , and and depending on the radial distance r following the lines of Neuman et al. (2004). Zech et al. (2012) and Zech et al. (2016) further derived semi-analytical solutions for the pumping test drawdowns h eff (r, t) by solving the steady state and transient groundwater flow equation under the use of the effective transmissivity T eff (r). Both solutions h eff (r, t) quantify the effect of heterogeneity on the drawdown through the aquifer statistics μ, σ 2 , and . They can be interpreted as extensions of Thiem's and Theis' solutions to heterogeneous aquifers. However, tools are missing to apply the semi-analytical solutions to field data.
Closing the gap to applications, we provide implementations of these drawdown solutions h eff (r, t) for pumping tests. They are the first freely available forward models reproducing the effect of intermediate scale heterogeneity on drawdowns, particularly close to the pumping well where well flow is not ergodic. The key application of the code is the interpretation of pumping test data to infer aquifer statistics μ, σ 2 and representing aquifer heterogeneity at the scale of a few meter up to tens of meters related to the resolution of typical field pumping tests in fully penetrating wells. Although flow toward wells is inherently three-dimensional (3D) in heterogeneous media, drawdown signals are depth averaged thus providing depth-averaged aquifer characteristics (Sánchez-Vila et al. 2006;Zech et al. 2012).
We present a standardized workflow to estimate geostatistical parameters from pumping test data and demonstrate it for two field studies. We introduce the freely available open-source code welltestpy (Müller 2021) as a tool to manage and analyze pumping test data for quantifying heterogeneity aquifer. For each parameter, we specify its sensitivity toward test setup. Thus, reliability of parameter estimates can be inferred. Results provide valuable information on the design of future field campaigns in order to optimize parameter estimates.

Methods and Data welltestpy
The Python package welltestpy (Müller 2021) provides a workflow to infer parameters of subsurface heterogeneity from pumping test data. It contains routines to handle, visualize, process and interpret field exploration campaigns based on well observations. Figure 1 shows the package organization. The structure and functionalities of sub-routines is outlines in the Supporting Information where also technical details on code administration are discussed. welltestpy is compatible with Python 3.5 and higher, and supports Linux, MacOS and Windows. It can be installed with pip or conda: pip install welltestpy conda install -c condaforge welltestpy welltestpy is part of the GeoStat-Framework (https:// geostat-framework.org), which contains Python packages for geostatistical applications and subsurface flow and transport simulations. Currently, welltestpy is focused on pumping test campaigns but will be extended to other well based exploration techniques such as slug tests or dipole tests. Any test campaign, once stored in the welltestpy format ".cpm," can be handled with a few lines of Python-code, including the run of estimation procedures. For instance, the estimation results of this paper can be reproduced via: import welltestpy as wtp camp = wtp.load_campaign ("horkheim.cmp") estimation = wtp.estimate. ExtTheis2D( name="estimate", campaign= camp, generate=True) estimation.run() Details and tutorials on the use of welltestpy can be found in the online documentation under https://welltestpy .readthedocs.io.

Field Sites
We interpret pumping test data from two hydrogeological field sites in southern Germany: The Horkheimer Insel, close to the River Neckar in Heilbronn, is heterogeneous while the Lauswiesen site near Tübingen is more homogeneous. Key feature of both test campaigns for application here is the short distance of spatially distributed observations to the pumping well. Aquifer heterogeneity shows highest impact close to the pumping well. Thus, detection of heterogeneity characteristics, such as correlation length, requires drawdown observations close to the pumping well. In this regard, statistical estimates represent aquifer heterogeneity at the scale of the pumping test, being in the intermediate range in the order of a few tens of meters. Although both aquifers are unconfined, drawdowns show no signs of associated effects due to the short duration of pumping. Proof is given in diagnostic plots in the Supporting Information. Furthermore, both sites offer complementary estimates of aquifer statistics based on other characterization methods, such as flowmeter. We use these values for qualitative comparison given their dependence on method characteristics such as the particular support volume and spatial resolution.

Horkheimer Insel
The fluvial unconfined aquifer consists of unconsolidated sediments with poorly sorted gravel and sand and has an average saturated thickness of L = 3 m (Schad and Teutsch 1994). The prevailing hydraulic gradient is 0.1% toward the River Neckar. We make use of drawdown data of 4 × 2 transient pumping tests (Schad 1997). Each of the four pumping wells is used twice with different observation wells per test. Drawdown values are normalized to their pumping rate to combine pumping tests performed at the same well. Figure 2(left) shows the spatial distribution of wells and associated observation wells per test. The distances between wells range between 6.5 and 27 m. Schad (1997) presented an extended hydrogeological analysis, including hydraulic conductivity estimates from flowmeter, permeameter and grain size analysis. An average transmissivity of T = 3.1e − 2 m 2 /s is reported for the pumping tests analyzed with Theis' method. One and two-point statistical results depend on the type of measurement method: log-conductivity variance ranges from σ 2 = 1.57 for flowmeter, σ 2 = 2.32 for grain size analysis to σ 2 = 3.17 for permeameter. Detailed variogram analyses suggested vertical and horizontal correlation lengths of v = 0.15 − 0.25 m and h = 6 − 10 m, respectively.

Lauswiesen
The aquifer consists of unconsolidated, alluvial sediments with poorly sorted gravel and sand with a small amount of fines (<10%). The saturated thickness of the unconfined aquifer is 5 − 6 m, while the average hydraulic gradient is 0.2% to 0.3% parallel to the Neckar valley. The data set stems from a series of five transient pumping tests (Leven 2020

Extended Theis' Solution
We analyze pumping tests with regard to their information content on heterogeneity making use of the extended Theis' solution of Zech et al. (2016). We will briefly outline the concept, for a detailed method description we refer to Zech and Attinger (2015) or Zech et al. 2016. Starting point is a spatially distributed transmissivity T (x, y) being described by a log-normal distribution with characterizing parameters μ and σ 2 being the mean and variance of the underlying normal distribution of ln(T ). We consider a Gaussian spatial correlation function ρ(r) = exp − 1 2 (r/ ) 2 with the correlation length .
The extended Theis solution h eff (r, t) (Figure 3

Inverse Estimation and Parameter Sensitivity
Inverse estimation of heterogeneity parameters is based on type curve fitting. We minimize the root mean square error (RMSE) between the pumping test observations and the extended Theis' solution. For optimization, we use the shuffled complex evolution (SCE) algorithm (Duan et al. 1992), implemented in spotpy (Houska et al. 2015). The optimal guess delivers the parameter set of T G , σ 2 , , and S simultaneously. Table 1 shows the parameter ranges used during estimation. Mean transmissivity T G and storativity S are fitted in log scale since they vary over orders of magnitude (Bear 1972;Gelhar 1993). The extended Theis' solution enables testing a broad range of log-transmissivity variance values. The maximal length scales exceeds the diameters of the well setups at each site by a factor of two. The workflow comprises standardization of pumping test results by rescaling all drawdowns to a normalized pumping-rate of Q = 1 m 3 s and resampling of drawdown time series with quadratically increasing step size to account for the logarithmic time evolution. Details are provided in the Supporting Information.
We apply the Fourier amplitude sensitivity test (FAST) (Saltelli et al. 1999), implemented in spotpy (Houska et al. 2015), to infer the sensitivity of heterogeneity parameters during estimation. FAST is a variance based global sensitivity test, analyzing the Fourier series expansion of the target function. It computes sensitivities between 0 and 1 (low to high). Parameter interactions are taken into account by relating the ratios between values.
FAST is also used to analyze the sensitivity of each parameter toward the radial distance to the pumping well. Target metric here is the deviation of the extended Theis type curve from the Theis solution. This procedure is independent of the pumping test observations.

Estimated Parameters
Results of the type-curve fitting procedure for both sites are summarized in Figure 4. We estimated the geostatistical parameters (T G , σ 2 , ) and storativity S for each individual pumping test as well as in one fitting run combining all tests at a site in standardized form. The extended Theis' solution with the optimal parameter set, that is, from the combination of tests, is displayed along the drawdown observations in Figure 5. Time series observations are well covered by the extended Theis' solution. A comparison to the results of the classical Theis solution can be found in the Supporting Information.
Estimates of statistics ( Figure 4) vary between tests showing the natural local heterogeneity. Exceptions are estimates of σ 2 and for the tests at B2 and B5 of the Lauswiesen site, which correspond to an apparent homogeneous drawdown. The lack of contrast of drawdown behavior at the well and in the far field inhibits the estimation of σ 2 and . The mean over all individual test estimates (for each of the four parameters) is quasi the same as the estimate for the combined tests, proving the robustness of the estimation procedure.
The optimal parameter estimates at both sites are reasonably in line with results of previous geostatistical investigations based on other field methods ("Field Sites" section). Given the larger support volume of depth averaged drawdown signals, variances are expected to be smaller than those resulting from aquifer characterization methods at smaller scale. σ 2 Y = 0.5 for Lauswiesen and σ 2 Y = 2 for Horkheim relate well to those estimates from multilevel slug and flowmeter tests. Depth averaging hardly impacts horizontal correlation length. Thus, the estimates of = 13.1 m and = 9.7m from Lauswiesen and Horkheim pumping test data are in the same range as those of previous investigations. The slight increase of values can be related again to the larger support volume of pumping test, being appropriate to detect sedimentary connections at larger scales and distances. At the same time, estimates refer to the comparably small scale of the test settings with distances of a few tens of meters between wells. Thus, the characterized heterogeneity parameters refer to intermediate scale aquifer heterogeneity and should not be extrapolated to greater scale. Figure 6 shows the general sensitivity of parameter estimates toward the distance to the pumping well. The log-transmissivity mean and the log-storativity show a high sensitivity value over the whole domain as expected given their impact also on homogeneous aquifer drawdowns. Storativity controls the time evolution of the drawdown whereas mean transmissivity controls the spatial behavior. The log-transmissivity variance has the highest sensitivity close to the pumping well since strong deviations of transmissivity have the highest effect on drawdown where the head gradients are largest. This behavior is represented by the effective transmissivity at the pumping wellbeing the harmonic mean T H = T G exp(σ 2 ).

Sensitivity
The impact of the correlation length rises within the first meters and then stays constant at the same low level as the variance. characterizes the transition behavior of the effective drawdown from near well to far field ( Figure 3). Thus, its estimation depends on the σ 2 (near well) and T G (far field). This is also confirmed by the evolution of parameter estimates during optimization which is discussed in the Supporting Information. As a side effect, the length scale is getting more influential for higher values of variance. On the other side, if the drawdown is apparently homogeneous, that is, it can be represented by Theis' solution with T G , makes the estimation of a length scale impossible given the implication of zero variance. However, this behavior does not imply that the site is homogeneous. Instead, it can be a random effect that the local harmonic mean of transmissivity at the pumping well is identical to the geometric mean of the field. Figure 7 shows the individual sensitivities of each parameter estimate at the two field sites. Parameter sensitivities are similar among tests and at both sites. The high sensitivity of T G is due to the general sensitivity of the parameter and related to the amount of observations in the far-field. Storativity and σ 2 show medium sensitivity. The correlation length shows a very low sensitivity for these particular well settings. relies on drawdown observations in the transition zone to the far field,  particularly the distance of a few meters from the pumping well. Figure 2 reveals that both campaigns lack of observation wells located within less than 5 m distance.

Summary and Conclusion
We present a workflow to infer parameters of subsurface heterogeneity from pumping test data exemplified at two sites using the open source Python package welltestpy. Geostatistical analysis of transmissivity is based on the extended Theis' solution of Zech et al. 2016. Beside mean transmissivity and storativity, welltestpy allows gaining robust estimates of log-transmissivity variance σ 2 and horizontal correlation length . These parameters characterizing spatial variability are essential to improve predictions of groundwater flow and contaminant transport for, for example, risk assessment and remediation under uncertainty induced by the inherent heterogeneity of aquifer structure and properties.
The application of the workflow shows that statistical information of transmissivity can be successfully estimated from standard pumping test measurements at heterogeneous field sites. Estimates of mean transmissivity and storativity are as reliable as with Theis' solution. Estimates of log-transmissivity variance σ 2 and correlationlength require specific positioning of observation wells.
The sensitivity analysis indicates optimal locations of observation well to infer aquifer statistical parameters, providing valuable information for pumping test designs. Storativity and its sensitivity depends on the confinement of the aquifer. We expect a smaller radius of impact and higher sensitivity of storativity for unconfined conditions while the cone of depression will spread farther and more rapidly in confined systems. When the field study's focus is on reliably inferring statistical parameters, key points are: (1) tests at multiple pumping wells are necessary to infer the log-transmissivity variance σ 2 of the overall field, (2) reliable values of correlation length require observations close to the pumping well (i.e., a few meter distance). Observations at and close to the pumping well provide most information on heterogeneity since pressure gradients are the steepest here and drawdowns are deviate from mean behavior.
With pumping test being a standard field investigation method, the welltestpy-workflow provides a quick, simple and cost-efficient option to infer first estimates of aquifer statistics, particularly at highly heterogeneous sites. The workflow can be easily transferred and applied to other field campaigns considering the following key aspects: (1) the project objectives is on detecting aquifer heterogeneity statistics leading to a test design which allows observing drawdowns at and close to the pumping well ("transition zone") where observations are not reproduced by "homogeneous" solutions (e.g., that of Theis'); (2) the interest is in intermediate scale heterogeneity characteristics, for example, correlation length and variance of facies structures at the scale of a few up to several tens of meters, being in line with the scale and support volume of the observation method; (3) the test setting is adapted to confinement conditions with smaller transition zones for unconfined aquifers.