We study predictive modeling of groundwater transport that accounts for three mechanisms: mean advection, macrodispersion, and mass transfer. A general methodology is presented and applied to a series of nonsorbing tracer tests along multiple pathways on scales ranging from ca. 70 to 300 m, in a highly heterogeneous aquifer at Forsmark (Sweden). The mean water residence time cannot be predicted well using a simple water balance model. Longitudinal macrodispersivity λL (L) and a mass transfer parameter group are extrapolated from the control tracer experiments, to yield accurate predictions of tracer discharge, once the mean water residence time is constrained. A relatively simple modeling framework based on Fickian macrodispersion and diffusion seems to be adequate for reproducing the tracer discharge in this complex and highly heterogeneous porous media.
 Predictive modeling of groundwater transport by definition involves some type of scale extension and is subject to uncertainty [e.g., Dagan, 1987, 1989; Rubin, 2003; Carrera, 1993]. Although the accuracy of predictive modeling is important in many applications related to groundwater contamination [e.g., Berglund and Cvetkovic, 1995; Andricevic and Cvetkovic, 1996; Maxwell and Kastenberg, 1999; Kaluarachchi et al., 2000; de Barros et al., 2009; Molin and Cvetkovic, 2010; Molin et al., 2010], its experimental quantification on the field scale is still a major challenge. One of the relevant issues is understanding the impact of individual transport mechanisms (such as mean advection, macrodispersion, and mass transfer) on the accuracy of predictive modeling.
 Dispersion on the field scale has been studied experimentally either as a macroscopic process where a field-scale (macro) dispersion coefficient is inferred [Becker and Shapiro, 2000; Niemann and Rovey, 2000; Amerson and Johnson, 2003; Birk et al., 2005], or as a direct effect of aquifer heterogeneity in cases where sufficient, small-scale (local) hydraulic information is available [Julian et al., 2001; Peng et al., 2000; Salamon et al., 2007; Bianchi et al., 2011]. Even if field-scale transport models using macrodispersion or local-scale heterogeneity can reproduce tracer test outcomes on a given scale, field data is rarely available to test the accuracy of scale extension, which has been addressed primarily using numerical simulations [e.g., Peng et al., 2000; Tiedeman and Hsieh, 2004].
 The mean advection and macrodispersion are clearly important for the bulk movement and shape of tracer discharge (breakthrough, (M/T)) in aquifers; however, mass transfer can affect the transport significantly [Cvetkovic and Shapiro, 1990; Cvetkovic and Dagan, 1994; Cvetkovic et al., 1998; Carrera et al., 1998], in particular at later times [Haggerty et al., 2000; Cvetkovic and Haggerty, 2002; Haggerty et al., 2004]. Field-scale tracer transport that in the bulk is reasonably well described by advection and macrodispersion [Birk et al., 2005], is more accurately described if mass transfer is included [Geyer et al., 2007]. Diffusive mass transfer is even more important for accurate transport modeling if the tracers of interest are sorptive [Cvetkovic et al., 2007; Cvetkovic, 2010; Cvetkovic et al., 2010; Cvetkovic and Frampton, 2010].
 For characterizing diffusive mass transfer in the field, tracer tests need to be performed over relatively long times. Furthermore, for assessing the accuracy of upscaled tracer discharge, transport observations are required over several, successively larger scales. A kilometer scale tracer test in a carbonate aquifer [Birk et al., 2005; Geyer et al., 2007], for instance, has been well reproduced by a simple advection-dispersion equation (ADE) model, but extension to larger scales could not be investigated in this case. The thoroughly characterized tracer test at Mobile (AL) [Molz et al., 1986] is on a ca. 30 m scale, with main effort directed toward using hydraulic information to reproduce transport observations [Peng et al., 2000]. The well-known tracer tests at Mirror Lake (NH) have been carried out on a scale of ca. 40 m, whereas a tracer test that included contaminant degradation reported by Amerson and Johnson  was on a ca. 100 m scale. In fact, tracer tests are typically carried out below a 100 m scale [Ptak and Teutsch, 1994; Ptak et al., 2004].
 The tracer tests conducted as part of the Macrodispersion Experiment (MADE) at Columbus (MI) are exceptional in several ways. The heterogeneity of hydraulic properties is relatively large and has been well characterized, both in terms of hydraulic conductivity and flow meters [Boggs et al., 1992; Bohling et al., 2012]. With the injection area of the original tracer tests at MADE of roughly 4 × 4 = 16 m2, and the horizontal versus vertical integral scale area of 10 × 1.5 = 15 m2 based on detailed flow measurements [Bohling et al., 2012], transport was strongly influenced by local conditions around the injection section; this has limited the test scale (the bulk of the plume was confined close to the injection boreholes), but also has complicated the interpretation of the test outcomes. Both models based on heterogeneous advection [Salamon et al., 2007; Fiori et al., 2013] and an advection-dispersion model combined with first-order exchange (dual porosity) [Harvey and Gorelick, 2000; Bianchi et al., 2011; Zheng et al., 2011] were shown to approximately reproduce transport observations. The MADE-related studies point to the need for tracer tests on larger scales and under ergodic conditions, such that processes can be discriminated, and their effect on the accuracy of transport predictions on extended scales, investigated.
 In this work, accuracy in predictive modeling of field-scale tracer discharge (mass release, breakthrough) in aquifers is addressed. We define accuracy and outline a methodology for evaluating it by combining field-scale experimental results and modeling. The methodology is implemented on a series of tracer tests performed in a highly porous and conductive granitic aquifer, on scales from ca. 70 to 300 m along three independent pathways. A “small”-scale test (70 m) is chosen as the “control experiment” and used for parameter estimation. Based on this estimation, the accuracy of predictive modeling of two additional experiments along two independent pathways and on significantly larger scales (outcomes) is studied, with a particular emphasis on elucidating the roles of advection, macrodispersion, and diffusive mass transfer.
2. Problem Formulation and Hypothesis
 The aquifer of interest is assumed heterogeneous with sufficient secondary porosity (or immobile zones) such that mass transfer takes place. The field-scale transport is driven by a mean hydraulic gradient that is specified in case of a natural gradient, or by a known pumping rate if radially convergent flow is considered. Our working hypothesis is that three transport mechanisms are dominant for predictive modeling: (i) advection following mean water movement; (ii) macrodispersion relative to the mean water movement and due to spatial variability of hydraulic properties; (iii) diffusive mass transfer between the primary porosity (mobile water) and the secondary porosity (immobile water), resulting in retention. As has been documented in the past, a critical issue for groundwater transport modeling in heterogeneous aquifers is ergodicity [Dagan, 1987, 1991; Andricevic and Cvetkovic, 1998; de Barros et al., 2009]. Model verification and accuracy of predictive modeling will typically require different approaches for ergodic and nonergodic transport conditions. In our following analysis, we shall assume that tracer injection is over a sufficiently large scale such that transport is approximately ergodic and expected outcomes will approximately coincide with single realization outcomes.
 The main question of this work is as follows: How accurately can we predict tracer discharge (breakthrough) on the field scale with respect to the three controlling mechanisms? In other words, can the three controlling transport mechanisms be extended from scale L1 to scale L2 > L1 in a statistically homogeneous aquifer, to yield accurate forecasts of tracer discharge, with accuracy suitably defined?
3. Field Experiments
 This study is based on observations from tracer tests performed as part of site investigations in a highly conductive fractured aquifer at Forsmark (Sweden) [Follin, 2008; Lindquist et al., 2008].
 A series of tracer tests were carried out with pumping from borehole HFM14 and with injection at several surrounding boreholes; the site area is shown in Figure 1a and the areal test configuration is shown in Figure 1b. The tests effectively constitute three independent pathways, specifically between boreholes HFM15→HFM14 (72 m), between HFM19→HFM14 (246 m), and between HFM13→HFM14 (297 m) (Figure 1b).
 Boreholes were designed to characterize a densely fractured deformation zone with transmissivity in all boreholes being >10−4 m2/s; injection sections are in the range 10–15 m. Thus, if we think of an aquifer 10 m thick and transmissivity as 10−4 m2/s, then this corresponds to a hydraulic conductivity of roughly 0.001 cm/s, which is on the higher end of the semiimpervious aquifer classification, or, on the limit of a “good aquifer” [Bear, 2007].
 Our focus here is on three tests with injection in boreholes HFM13 (tracer Terbium, distance 297 m, 14 m section, recovery 43%), HFM15 (tracer uranine, distance 72 m, 10 m section, recovery 89%), and HFM19 (tracer Dysprosium, distance 246 m, 14 m section, recovery 79%); distances between boreholes are at 20 m below sea level. Pumping at HFM14 is 350 L/min = 21 m3/h. Injection rate is between 0.1 and 1% of the pumping rate; thus, a weak dipole flow regime is applied. Termination time is a few hundred hours for uranine, and 2000 h for Dy and Tb. Injection time Δt is 3 h for Tb and uranine, and 200 h for Dy, approximated as a step function. The measured breakthrough curves (BTCs) are given as normalized discharge for uranine, Dy, and Tb in Figure 2. Details of the hydraulic and tracer experiments can be found in Lindquist et al. .
 Our control experiment of choice is the tracer test with uranine at L1 = 72 m, as this is the smallest scale of the three tests. The outcome tests against which we shall assess accuracy of predictive modeling are the remaining two tests carried out over greater (about factor 4) distances: Dy (L2 = 246 m) and L2 = Tb (297 m).
 The relatively high transmissivity of the aquifers is due to dense fracturing. Clearly, the dense fractures that are also internally heterogeneous to some degree are anticipated to strongly disperse a tracer. Most significantly, with the test sections >10 m, the assumption of ergodicity (over the test section in both the injection and detection boreholes) appears reasonable.
 In typical applications, our main interest is solute discharge or flux-averaged concentration at specified locations (e.g., a compliance, control boundary, or plane) at some length scale L2. Hence, breakthrough curves (BTCs) will be the main focus of this work. Let J(L1) denote tracer mass discharge (M/T) observed on a scale L1 and J(L2) denote tracer discharge observed on a scale L2, where L2 > L1. Predictive modeling in the present context is forecasting J(L2), given the observation J(L1), in other words, extrapolating J from scale L1 to the scale L2. Our modeling framework is based on the flow paths and water travel times [Rainwater et al., 1987; Cvetkovic and Dagan, 1994; Cvetkovic et al., 1998; Fiori et al., 2002], incorporating mean advection, field-scale macrodispersion, and diffusive mass transfer.
 Let a tracer be released at a specified location and detected at another location at a distance L, using an injection and a detection (pumping) borehole. The tracer residence time probability density function h (1/T) at the observation borehole is expressed in the Laplace domain as [Cvetkovic et al., 1998; Cvetkovic and Haggerty, 2002]
where f is the water travel time probability density function at L, g is a memory function that characterizes the mass transfer process, s is the Laplace Transform variable, and the hat denotes Laplace Transform. The function h quantifies the normalized mass discharge J at L for unit pulse injection; in the absence of mass transfer, g = 0, and h = f, i.e., the tracer residence time is equivalent to the water residence time.
 Equation (1) is applicable for the case where mass transfer parameters are effective (constant) values along the flow path between the injection and detection boreholes. A variety of forms are available for the memory function g, for capturing single or multiple rate mass transfer [Cvetkovic and Haggerty, 2002; Cvetkovic, 2012]. Furthermore, the hydrodynamic transport quantified by the water residence time distribution, f, can range from Fickian to different forms of nonFickian or anomalous transport. The most powerful and general analytical model for capturing a wide range of transport behavior is the tempered one-sided stable (TOSS) density defined in the Laplace domain as [Cvetkovic and Haggerty, 2002; Cvetkovic, 2011a]
where is the mean and is the coefficient of variation of the water residence time (T). The parameter a is a cutoff rate, c a scaling parameter, and α is an exponent in the range 0 < α < 1. Equation (2) can be reduced to all typical forms used for hydrological transport [Cvetkovic, 2011a] primarily by the choice of α. If α = 1/2, equation (2) is the solution of the advection-dispersion equation with injection and detection in the flux; for a→0, we recover the one-sided stable density expressing anomalous transport.
 Rather than work with the entire BTC, we may define suitable measures that characterize J ∼ h. Let denote an operator applied on the tracer mass discharge J(L2) and denote some suitable measure of the mass discharge J(L2) on scale L2, such that , and with M being the number of measures used. For instance, could denote peak discharge, peak arrival time, temporal moments, mass recovery at a given time, or fractional arrival time. Such measures have been used both in applications [e.g., Rainwater et al., 1987; Niemann and Rovey, 2000] as well as for more fundamental understanding and process discrimination [e.g., Cvetkovic, 2010]. The specific measures we shall use are defined in Appendix A.
 Combining equations (1) and (2) opens for a wide range of analytical modeling possibilities, from the ADE model, to anomalous and a variety of nonFickian forms (e.g., using ). Equations (1) and (2) therefore constitute a rich “model space” for model sensitivity studies. Our starting point here will be a retention model (g) that has been shown adequate for crystalline rock, and a hydrodynamic transport model (f) that is best understood and with as few parameters as possible, until discredited by observations; this is consistent with the general prevailing scientific principle [Popper, 1992].
 For the hydrodynamic component of transport, we shall assume that the advection-dispersion equation (ADE) is applicable with two independent parameters: The mean water residence time , and a longitudinal macrodispersivity (λL). The mass transfer is to be modeled as Fickian diffusion into an infinite immobile zone [Carslaw and Jaeger, 1959; Neretnieks, 1980; Becker and Shapiro, 2000; Reimus et al., 2003; Reimus and Callahan, 2007; Cvetkovic et al., 2007, 2010; Cvetkovic, 2010], controlled by a single parameter group . Tracer transport then depends on three parameters: , λL, and ψ. The parameter (or rather a parameter group) ψ will be approximately the same for all nonsorbing tracers considered in this study, since the difference would be only due to differences in diffusivity in water, which are comparatively small. In the case of sorbing tracers and strong differences in sorption coefficients, ψ will be tracer dependent. Details of the model to be used for evaluation of the tracer tests are given in Appendix B.
 Our methodological steps for assessing accuracy of predictive modeling are summarized as follows:
 1. From the measured BTCs J(L1) of the control experiment(s), infer key parameters ( , αL, and ψ) using the model for J.
 2. “Predict” tracer discharge for the outcome experiments at scales L2, i.e., J(L2) using the model for J.
 3. Compute accuracy index ω (equation (A2)) (or the absolute percentage error ε) in a stepwise fashion such that the impact of dominant mechanisms on the accuracy can be assessed. This last step will aid in understanding the impact and predictability of individual transport mechanisms.
5.1. Control Experiment
 First, we calibrate the three parameters , λL, and ψ using equation (B8) for uniform mean flow with (equation (B5)), and the normalized breakthrough data for uranine at the 72 m scale. The three parameters affect the computed BTC differently, with some overlap. A sensitivity analysis for this transport model was presented using the Bode sensitivity functions in Cvetkovic et al. .
 The best model fit to the data for uranine is shown in Figure 3. The calibrated parameters are: λL = 29 m, and ψ = 0.045 h−1/2. The calibrated longitudinal macrodispersivity is large, almost half of the transport domain, which is consistent with the high variability of hydraulic properties in this type of formation, both between and inside fractures; macrodispersion is also enhanced by a relatively large transport section that captures a significant part of the hydraulic variability.
 An independent estimate of ψ may be obtained as follows. Assume that Archie's law is applicable as De = θmDw, where m is an empirical exponent and Dw is the diffusivity in water, with a typical value of around 10−5 m2/h [Cvetkovic et al., 2007]. The secondary (matrix) porosity of the test aquifer is known to be relatively high in this type of formation, around say 1% [Selnert et al., 2007]. With m = 1.6 [Cvetkovic et al., 2007], and the active specific surface area of sf = 5000 m−1 [Cvetkovic and Frampton, 2010, 2012], we get , which is close to the calibrated value of 0.045 h−1/2. Although the above retention parameters are not site specific, they do indicate that the calibrated ψ is in a reasonable and physically based range.
 The three parameters , λL = 29 m, and ψ = 0.045 h−1/2 inferred from the control experiments are now the basis for predictive modeling of the outcome experiments with Dy and Tb along pathways in the same aquifer and on respective scales of 246 and 297 m.
5.2. Predictive Modeling
 Predictive modeling of the transport experiments with Tb and Dy addresses simultaneously two important issues: First, the applicability of estimated parameters λL and ψ as transport/material properties of the heterogeneous aquifer, and second the validity of the transport model (equation (B8)) to upscale tracer transport from 72 to 246 m, respectively, 297 m.
 As noted earlier, is the most important transport parameter to estimate. Based on the information available in form of transmissivities for the injection boreholes, we use the expression (C2) for extending from scale L1 to scale L2, which combines water mass balance with equation (C1).
 Table 1 summarizes the predicted for the three pathways. It is seen that the calibrated value for uranine is 16 h compared to the value of around 14 h obtained from equation (C2), i.e., a reasonably close estimate. Using obtained from equation (C2) to predict for Dy and Tb, we find significant deviations. Specifically, the solid curves in Figure 4 obtained from equation (B8) using predicted (Table 1) with λL and ψ inferred from the uranine (control) 72 m experiment, deviate significantly from the observed BTCs. Obviously, equation (C2) underestimates the calibrated .
Table 1. Model Parameters Assuming Uniform Mean Flow
T × 104 (m2/s)
δ (Effec.) (m)
 Next, we explore whether the parameters λL and ψ estimated from the control experiment with uranine reflect properties of the aquifer that will be applicable even for the outcome experiments with Dy and Tb on significantly larger scales. The specific task is to see whether outcome experiments with tracer Dy and Tb can be predicted with equation (B8), by calibrating alone.
 In Figure 4, we show the BTCs of Dy and Tb following calibration of ; for Db calibrated is 1.7 times the predicted value from equation (C1), and 2.2 times for Tb again using equation (C1) (Table 1). It is seen that once is calibrated, equation (B5) yields a close representation of the outcome BTCs along both pathways. This indicates that estimated parameters λL and ψ obtained on the 72 m scale, as well as the transport model (B5) are applicable in this aquifer, for extending to scales of 246 and 297 m along independent pathways.
 In Figure 5, the accuracy index ωϕ of the predictive modeling is illustrated. The transport is thereby characterized with fractional arrival time, where the mass fraction ϕ is shown as the independent variable (x axis); note that the recovered mass fraction in the tests is below 70%, i.e., observations of the asymptotic part of the breakthrough curve are not available.
 The solid curves in Figure 5 are obtained by scale extension of based on equation (C2). Once is calibrated as summarized in Table 1, accuracy improves significantly (dashed curves in Figure 5). We observe a somewhat higher accuracy up to mass fraction ϕ = 50% for Dy, noting that this part of the BTC is most important, e.g., for risk assessment.
5.3. Radially Convergent Flow
 So far we have assumed uniform mean flow. In the following, we explore the consequences of assuming radially convergent flow by using equation (B8) with equation (B6). For a weak dipole, uniform and radially converging flow conceptualizations may provide suitable limiting cases; hence, a comparison between the two is of interest. Reimus et al. , for instance, have compared transport parameters for advection, macrodispersion, and diffusive mass transfer inferred from a tracer test on a 30 m scale, assuming both a uniform and a radially converging flow regime. They found that the mean advection and macrodispersion parameters differed, whereas the retention parameter (corresponding to our ψ) was not affected.
 In Figure 6, the model BTCs for radially convergent and uniform flow using equation (B8) with equations (B5) and (B6) are compared with measurements. All the transport parameters are the same for the two flow regimes, as defined in Table 1. It is encouraging that both flow regimes provide a reasonable representation of the data when considering all three BTCs. The exception is uranine in the later part of the BTC for a radially convergent flow (red curve, Figure 6) where deviations are apparent. Adjustments of the mean water residence time and macrodispersivity λL in the range 10–20% from the values in Table 1 could provide a closer fit between the model (red curves, Figure 6) and data for tracers Dy and Td. However, the uranine BTC in Figure 6 is clearly better represented by the uniform flow regime, which under present conditions does seem to be more accurate and robust for predictive modeling of transport. Hence, we have used the parameters in Table 1 obtained by assuming uniform mean flow as the basis for our study.
6. Testing of Hypothesis
 As already stated, our working hypothesis is that at least three transport mechanisms need to be included for accurate predictive modeling: Advection, macrodispersion, and mass transfer. A simple means to test this hypothesis is to exclude the mechanism that under present conditions (nonsorbing tracers) appears to have the least impact, namely mass transfer, and compare the results to observed BTCs.
 The normalized tracer discharge is shown in Figure 7 for all three pathways, where we have set ψ = 0 in equation (B4). The mean water residence time and longitudinal macrodispersivity were calibrated for uranine to yield and λL = 49 m, respectively; this can be compared to 16 h and 29 m when mass transfer was included (Figure 4). Comparing modeled BTCs in Figures 4 and 7, we see that for uranine the BTC is reproduced relatively well by a model that includes advection and macrodispersion only, although some deviation in the tail is apparent (black solid line in Figure 7).
 Modeled BTCs for Dy and Tb in Figures 4 and 7 are obtained in two ways. First, we use the new value λL = 49 m from the control experiment (uranine) and use the same (for Dy) and (for Tb) as was calibrated in Figures 4, but now with ψ = 0 (dashed curves); this clearly shows the effect of mass transfer under present conditions. Second, λL = 49 m from the control experiment (uranine), assuming no mass transfer (ψ = 0) is again used, but now is calibrated to obtain the best match with measurements, as (for Dy) and (for Tb) (solid curves).
 The red and blue solid curves in Figure 7 clearly demonstrate that excluding mass transfer for nonsorbing tracers yields less accurate predictions, if the aquifer dispersion properties are assumed applicable on the outcome test scales, as was done for predictions in Figure 4. The modeled BTC for Dy (blue solid curve) is relatively close to the measured BTC up to the peak, but deviates notably beyond the peak. At larger scales and for longer advective times, the modeled BTC deviates even more (red solid curve in Figure 7, for Tb), both in the peak and tail.
 Comparison between Figures 4 and 7 confirms that including all three mechanisms of advection, macrodispersion, and mass transfer improves the accuracy of predictive modeling of groundwater transport under present conditions.
 The classical tracer experiments at the well-characterized sites such as Borden [Sudicky, 1986; Mackay et al., 1986], Cape Cod [LeBlanc et al., 1991; Garabedian et al., 1991], and MADE [Boggs et al., 1992; Zheng et al., 2011] have improved our confidence in using local-scale heterogeneity data for reproducing plume spatial evolution on scales up to say 50–100 m. Detailed K-measurements in these studies have been invaluable for assessing the applicability of stochastic theories; however, such measurements will rarely be available in applications, in particular for other than relatively shallow sedimentary aquifers. In cases where detailed measurements of K are not available, scale extension of transport from L1 to L2 > L1 is an important alternative. Furthermore, macrodispersion has been a prime focus of the classical field studies, whereas here focus on mass transfer is equally important. The Forsmark tracer tests presented in this paper therefore complement the existing groundwater transport field studies.
 Since the early 1980s, macrodispersion in groundwater transport has been an outstanding issue. The study by Gelhar et al.  was an attempt to summarize macrodispersion estimates versus scale from known tracer tests to that date. This summary subsequently served as a basis for a broader discussion on scaling of hydraulic properties in the subsurface [Neuman, 1994].
 In the summary of estimated macrodispersivities in Gelhar et al. , three levels of reliability were noted: high, medium, and low. Of the estimates with high reliability, none extend the scale of 250 m and the 250 m one is from the Cape Cod aquifer with a low macrodispersivity due to its mild heterogeneity. Of the other tests, the next scale for a test with high reliability is around 100 m from a sand/sandstone aquifer, also relatively homogeneous with a low macrodispersivity of around 1 m. Furthermore, in the tests summarized by Gelhar et al. , there is no report on attempts to discriminate potential effect of mass transfer on the macrodispersive process. Most significantly, no specific attempt has been documented in the summarized tests to address a significant extension of scale in one single formation, say on the order of factor 4 as discussed in the present study.
 The dispersivity value of 29 m that was estimated in the present work as applicable on scales 70–300 m after mass transfer was accounted for, is set into the context of the macrodispersivity estimates reported in Gelhar et al.  in Figure 8. The black and blue symbols are all the estimates with high reliability as reported by Gelhar et al. , where the blue symbols are values from the relatively homogeneous aquifers of Cape Cod and Borden. The green symbol is the estimate from the MADE site that is based on the second moment of the observed plume [Adams and Gelhar, 1992]. Note that Figure 8 is in effect a “window” of Figure 1 or 2 in Gelhar et al. .
 Figure 8 indicates that there is no particular pattern in the scattered data, as in fact can be expected given the variety of hydrogeological setting in which the tests have been carried out. The estimated values for the Forsmark site are on the higher end of the macrodispersivity values, however, given the complex and highly heterogeneous structure of the densely fractured aquifer [Follin et al., 2008], this is not surprising. Scales of the two sites with similar hydrogeology and level of heterogeneity, Borden and Cape Cod (90 and 250 m, respectively), are comparable to the pathway scales at the Forsmak site (70—300 m). Moreover, the macrodispersivities at Borden and Cape Cod are comparable in spite of different scales, similar to what we found at Forsmark. Yet the values of the macrodispersivities are very different, almost a factor 30 larger at Forsmark, which can only be attributed to the dramatically different hydrogeological structure and heterogeneity between Borden/Cape Cod and Forsmark aquifers.
7.2. Limitations and Extensions
 Given the simplicity of the expression (C2), it is not surprising that predictions of the mean water residence time are relatively poor for a detailed reproduction of the outcome tests. In view of the structural complexity of the studied aquifer [Follin et al., 2008], it is in fact quite encouraging that is reproduced by equation (C2) within a factor of only 2. Once is calibrated, equation (B8) with λL = 29 m and upscale tracer transport to a factor 4 with high accuracy, at least up to the observed mass fraction of <70%. This is presumably due to the fact that the aquifer is statistically homogeneous over the experimental scales (<300 m), but also because the test sections capture most of the variability (i.e., conditions are approximately ergodic). In this respect, good geological characterization of the experimental site was instrumental for understanding the possibilities and limitations of predictive modeling. Further study could reveal whether the limited hydraulic information available can be combined with numerical fracture network simulations [Frampton and Cvetkovic, 2011] to constrain λL around the observed range. Moreover, available independent information on the retention properties of the deformation zone can be analyzed statistically in order to independently infer mass transfer parameters and compare these to the calibrated values .
 In this study, we have presented a methodology for assessing the accuracy of predictive modeling of groundwater transport; the methodology builds on the general transport framework for travel times and tracer fluxes [Rainwater et al., 1987; Shapiro and Cvetkovic, 1988; Cvetkovic and Dagan, 1994]. It has been applied to tracer tests along multiple independent pathways conducted as part of the Forsmark site investigations.
 Based on obtained results, the following main conclusions are drawn:
 1. The simple mass balance model (C2) combined with an empirical law for porosity (equation (C1)) predicts the mean water residence time within approximately a factor 2 of the calibrated values; this yields an accuracy index for fractional arrival times in the range 40–50%.
 2. The transport model based on the ADE with matrix diffusion in uniform flow (equation (B5)) has been shown to be adequate and robust for predictive modeling of tracer transport in the tested heterogeneous aquifer. λL and ψ, estimated on a small scale (72 m) seem applicable on scales four times larger: Once the mean water residence time is constrained, the model (B8) predicts tracer discharge with accuracy index over 90% (at least up to 60% of mass recovery).
 3. The hypothesis that mass transfer needs to be included for upscaling and accurate predictive modeling has been verified; by excluding mass transfer with ψ = 0, advection and macrodispersion mechanisms reproduce the observed BTCs for the two observation pathways with a significantly lower accuracy, than if diffusive mass transfer is accounted for.
 Field-scale tracer tests in groundwater are always site specific; therefore, strictly, any tracer test study can verify or discredit a transport model only for the specific conditions considered. In view of the hydrogeological variety, different transport models will, in general, be required for different conditions. The fact that groundwater transport could be upscaled by a factor 4 in a highly heterogeneous aquifer of complex structure using a simple model (equation (B5)) is encouraging. It is consistent with other observations which indicate that under ergodic conditions and transport scales with sufficient plume development, the ADE provides a robust predictive modeling tool [e.g., Becker and Shapiro, 2000; Sudicky et al., 2010]. For accurate predictive modeling, the ADE will typically need to be coupled with an appropriate mass transfer mechanism, depending on the internal structure of the immobile porosity as conveniently expressed by alternative forms of the memory function g [Cvetkovic, 2012].
 Let a measure of tracer discharge denote the value obtained from predictive modeling, and the same measure for the experimental outcome (observation). We define:
 Thus implies no error or maximum (full) accuracy with respect to . A measure of accuracy complementary to can be defined as , such that implies full accuracy and zero accuracy in predictive modeling, relative to the actual outcome.
 If ϕ denotes a mass fraction of tracer discharge on scale L2, then we shall use the fractional arrival time tϕ for a few specified ϕ values as a set of measures .
 With denoting the experimental outcome and the model prediction, we define
as an “accuracy index” for fraction ϕ; tϕ is computed by solving the implicit equation
where (B8) is the normalized tracer discharge.
 With the above definitions, one can now decide, depending on the context, what value of ω is a suitable accuracy limit, above which the modeling is accurate, for measures (that in our case is the fractional arrival time tϕ). For instance, one can set 75% as a limit, such that ω > 0.75 implies an accurate model. In such a case, modeled measures would be within a 50% interval (±25%) relative to the observation. This is clearly an arbitrary (subjective) definition and will depend on the scale and practical problem (or perceived risk related to the problem) at hand. Note that (equation (A1)) corresponds to the “absolute percentage error” used for instance in economic forecasting [e.g., Armstrong and Callopy, 1992].
 Mass transfer is to be modeled as Fickian diffusion into an infinite immobile zone [Carslaw and Jaeger, 1959; Neretnieks, 1980; Becker and Shapiro, 2000; Shapiro, 2001; Reimus et al., 2003; Reimus and Callahan, 2007; Cvetkovic et al., 2007, 2010; Cvetkovic, 2010]; this is obtained by specifying the memory function g as a function of a single parameter ψ [Cvetkovic et al., 2007; Cvetkovic, 2010; Cvetkovic et al., 2010]:
where sf (1/L) is the active specific surface area, De (L2/T) is the effective diffusion coefficient, and θ is an effective matrix porosity. A discussion on other forms of g and their relation to equation (B1) can be found, e.g., in Cvetkovic .
 Our starting point is the advection-dispersion equation (ADE) model for advection and dispersion, whereby macrodispersion is modeled as a Fickian process with longitudinal macrodispersivity λL as the key parameter. In the case of uniform mean flow, we can then write the water residence time probability density function in the Laplace domain by setting α = 1/2 in equation (2) as [Cvetkovic and Haggerty, 2002; Cvetkovic, 2011a]
with λL being the longitudinal macrodispersivity.
 Equation (1) is in effect a transfer function. Let J0 denote the tracer injection function; then combining equations (1) and (B1) yields
 The three parameters , λL, and ψ need to be inferred from the control experiment on the scale L1, and then used for predictive modeling of the experimental outcomes on the scale L2.
 If the injection is approximated as a finite pulse (step function) over duration Δt, i.e., where M0 (M) is the total injected tracer mass, then we have from equation (B4) the simplest transport model for computing tracer discharge:
where Δt is known.
 If radially converging conditions are assumed, the corresponding expression for the tracer discharge is [e.g., Becker and Charbeneau, 2000; Reimus et al., 2003]
where Ai is the Airy function, and
with χ being the distance L normalized by the borehole diameter.
 The real-time computations of the normalized tracer discharge are obtained by numerical inversion of equation (B5) or (B6), i.e.,
 In the present study, we use a standard numerical Laplace transform inversion package in Mathematica(R) for computing (equation (B8)). The normalized cumulative discharge is inverted for computation of fractional arrival times (equation (A3)).
 The movement of a tracer by mean flow is known to be a dominant transport mechanism. Analysis has shown quantitatively that transport, especially up to the peak of a BTC, is most sensitive to mean advection [e.g., Cvetkovic et al., 2010]. When calibrating model parameters to tracer test results, for example, even small deviations in can imply comparatively large deviations between model and data.
 There are different strategies for predictive modeling of . One strategy could be to incorporate all hydraulic information available and simulate groundwater flow under pumping conditions to determine the mean water residence time from the injection to the detection boreholes (provided that porosity is known). In our case, transmissivity measurements are available only at the boreholes, and no independent measurements of the flow porosity are available. We shall take advantage of the fact that our case study aquifer is (densely) fractured and that empirical relationships are available for estimating flow porosity based on transmissivity measurements.
 The so-called cubic law is a theoretical relationship (derived for homogeneous fractures) for relating (effective) aperture and transmissivity. It is well known from experiments that the cubic law is not accurate for reproducing observed water travel times. Issues related to different “effective” apertures (transport and hydraulic) have been discussed in the literature [e.g., Tsang, 1992]. Recently, a compilation of all hydraulic and tracer tests conducted in Swedish crystalline rock resulted in a new empirical relationship between an effective aperture relevant for transport and transmissivity [Hjerne et al., 2010]:
where T is given in (m2/s) and δ in (m). In the following, we shall utilize equation (C1) with the transmissivity measurements available for the injection boreholes. The pumping rate in the detection borehole Q = 21 m3/h can now be combined with the water balance equation to yield the expression for the mean water residence time as
 The pumping is done over a borehole section of length D. The flow porosity can be defined as the “open volume” (referred to as effective aperture) divided by the pumping section, i.e., δ/D, where δ(L) is the effective open length over the section D. Transmissivity for the aquifer is available as single values for the three flow path (test) directions as indicated in Figure 1b and Table 1.
 This work was supported by the Swedish Nuclear Fuel and Waste Management Co. (SKB). The author is grateful to Sten Berglund (HydroResearch AB, Sweden), Jan-Olof Selroos (SKB, Sweden), and Sven Follin (SF GeoLogic AB, Sweden), who provided valuable comments that have improved the original version of the manuscript. The author is also grateful to three anonymous reviewers who provided constructive comments and suggestions.