## 1. Introduction

[2] Predictive modeling of groundwater transport by definition involves some type of scale extension and is subject to uncertainty [e.g., *Dagan*, 1987, 1989; *Rubin*, 2003; *Carrera*, 1993]. Although the accuracy of predictive modeling is important in many applications related to groundwater contamination [e.g., *Berglund and Cvetkovic*, 1995; *Andricevic and Cvetkovic*, 1996; *Maxwell and Kastenberg*, 1999; *Kaluarachchi et al*., 2000; *de Barros et al.*, 2009; *Molin and Cvetkovic*, 2010; *Molin et al*., 2010], its experimental quantification on the field scale is still a major challenge. One of the relevant issues is understanding the impact of individual transport mechanisms (such as mean advection, macrodispersion, and mass transfer) on the accuracy of predictive modeling.

[3] Dispersion on the field scale has been studied experimentally either as a macroscopic process where a field-scale (macro) dispersion coefficient is inferred [*Becker and Shapiro*, 2000; *Niemann and Rovey*, 2000; *Amerson and Johnson*, 2003; *Birk et al*., 2005], or as a direct effect of aquifer heterogeneity in cases where sufficient, small-scale (local) hydraulic information is available [*Julian et al*., 2001; *Peng et al*., 2000; *Salamon et al*., 2007; *Bianchi et al*., 2011]. Even if field-scale transport models using macrodispersion or local-scale heterogeneity can reproduce tracer test outcomes on a given scale, field data is rarely available to test the accuracy of scale extension, which has been addressed primarily using numerical simulations [e.g., *Peng et al*., 2000; *Tiedeman and Hsieh*, 2004].

[4] The mean advection and macrodispersion are clearly important for the bulk movement and shape of tracer discharge (breakthrough, (M/T)) in aquifers; however, mass transfer can affect the transport significantly [*Cvetkovic and Shapiro*, 1990; *Cvetkovic and Dagan*, 1994; *Cvetkovic et al*., 1998; *Carrera et al*., 1998], in particular at later times [*Haggerty et al*., 2000; *Cvetkovic and Haggerty*, 2002; *Haggerty et al*., 2004]. Field-scale tracer transport that in the bulk is reasonably well described by advection and macrodispersion [*Birk et al*., 2005], is more accurately described if mass transfer is included [*Geyer et al*., 2007]. Diffusive mass transfer is even more important for accurate transport modeling if the tracers of interest are sorptive [*Cvetkovic et al*., 2007; *Cvetkovic*, 2010; *Cvetkovic et al*., 2010; *Cvetkovic and Frampton*, 2010].

[5] For characterizing diffusive mass transfer in the field, tracer tests need to be performed over relatively long times. Furthermore, for assessing the accuracy of upscaled tracer discharge, transport observations are required over several, successively larger scales. A kilometer scale tracer test in a carbonate aquifer [*Birk et al*., 2005; *Geyer et al*., 2007], for instance, has been well reproduced by a simple advection-dispersion equation (ADE) model, but extension to larger scales could not be investigated in this case. The thoroughly characterized tracer test at Mobile (AL) [*Molz et al*., 1986] is on a ca. 30 m scale, with main effort directed toward using hydraulic information to reproduce transport observations [*Peng et al*., 2000]. The well-known tracer tests at Mirror Lake (NH) have been carried out on a scale of ca. 40 m, whereas a tracer test that included contaminant degradation reported by *Amerson and Johnson* [2003] was on a ca. 100 m scale. In fact, tracer tests are typically carried out below a 100 m scale [*Ptak and Teutsch*, 1994; *Ptak et al*., 2004].

[6] The tracer tests conducted as part of the Macrodispersion Experiment (MADE) at Columbus (MI) are exceptional in several ways. The heterogeneity of hydraulic properties is relatively large and has been well characterized, both in terms of hydraulic conductivity and flow meters [*Boggs et al*., 1992; *Bohling et al*., 2012]. With the injection area of the original tracer tests at MADE of roughly 4 × 4 = 16 m^{2}, and the horizontal versus vertical integral scale area of 10 × 1.5 = 15 m^{2} based on detailed flow measurements [*Bohling et al*., 2012], transport was strongly influenced by local conditions around the injection section; this has limited the test scale (the bulk of the plume was confined close to the injection boreholes), but also has complicated the interpretation of the test outcomes. Both models based on heterogeneous advection [*Salamon et al*., 2007; *Fiori et al*., 2013] and an advection-dispersion model combined with first-order exchange (dual porosity) [*Harvey and Gorelick*, 2000; *Bianchi et al*., 2011; *Zheng et al*., 2011] were shown to approximately reproduce transport observations. The MADE-related studies point to the need for tracer tests on larger scales and under ergodic conditions, such that processes can be discriminated, and their effect on the accuracy of transport predictions on extended scales, investigated.

[7] In this work, accuracy in predictive modeling of field-scale tracer discharge (mass release, breakthrough) in aquifers is addressed. We define accuracy and outline a methodology for evaluating it by combining field-scale experimental results and modeling. The methodology is implemented on a series of tracer tests performed in a highly porous and conductive granitic aquifer, on scales from ca. 70 to 300 m along three independent pathways. A “small”-scale test (70 m) is chosen as the “control experiment” and used for parameter estimation. Based on this estimation, the accuracy of predictive modeling of two additional experiments along two independent pathways and on significantly larger scales (outcomes) is studied, with a particular emphasis on elucidating the roles of advection, macrodispersion, and diffusive mass transfer.