Contaminant transport through regional-scale natural geological formations typically exhibits several “anomalous” features, including direction-dependent spreading rates, channeling along preferential flow paths, trapping of solute in relatively immobile domains, and/or the local variation of transport speed. Simulating these plume characteristics can be computationally intensive using a traditional advection-dispersion equation (ADE) because anomalous features of transport generally depend on local-scale subsurface properties. Here we develop an alternative simulation approach that solves the full nonlocal, multidimensional, spatiotemporal fractional-order ADE with variable coefficients in a Lagrangian framework using a novel non-Markovian random walk method. This model allows us to simulate anomalous plumes without the need to explicitly define local-scale heterogeneity. The simple model accurately simulates the tritium plume measured at the extensively characterized MADE test site.
 The dispersion of dissolved solutes through natural porous media is usually “anomalous” in that it does not follow Fick's 2nd law [e.g., Berkowitz et al., 2006]. Typical anomalous features of large-scale, multi-dimensional plumes include 1) different growth rates along different directions (due to an inherently anisotropic depositional or structural geologic environment), 2) “heavy,” non-Gaussian leading edges and irregular channelling of the plume front, 3) solute sequestration in relatively immobile rock, and 4) local variation of either mean transport speed or dispersion rate. Simulating these large-scale plume characteristics using a traditional local (Fickian) advection-dispersion equation (ADE) can be computationally intensive, if not technically impossible, due to the influence of decimeter-scale rock properties that often have very long-range correlation [Zheng and Gorelick, 2003]. Nonlocal equations [e.g., Haggerty et al., 2000; Benson et al., 2001; Berkowitz et al., 2006] have been developed to characterize the anomalous dispersion at a much coarser scale, but their successful application to field plumes is limited to 1-dimension (1-d). This study develops a solution of the nonlocal, multi-dimensional, spatiotemporal fractional ADE (fADE) with spatially dependent transport coefficients in a Lagrangian framework using a novel random walk method. This method allows us to simulate many of the characteristics of anomalous dispersion that are missed by a 1-d nonlocal model (such as direction-dependent spreading rates) without the burden of explicitly representing the fine-scale subsurface heterogeneity in a numerical model. We are not aware of any other nonlocal method with the same capability.
 We evaluate our model by simulating the tritium plumes measured at the highly heterogeneous Macrodispersion Experiment (MADE) test site [Rehfeldt et al., 1992] where the above four main anomalous features are conspicuous and cannot be captured fully by previous transport models. It is noteworthy that Lu et al.  extended Benson et al.'s  1-d space-only fADE to capture the 3-d MADE tritium plumes; however, their model restricted heavy-tailed motion to 1-d, did not account for transfer of mass to a relatively immobile phase, and did not account for the spatial variation of transport parameters. Their preliminary model did not adequately simulate the observed resident concentrations. The present study allows us to overcome all of the limitations of Lu et al.'s  model.
 Pioneering research on nonlocal theories, experiments, and computational models for transport through hierarchical porous media (with discretely or continuously evolving heterogeneity) can be found in the volume edited by Cushman [1990, pp. 1–6]. See also work by Neuman  and Cushman et al.  for expositions on nonlocal dispersive models with generalized spatiotemporal memory kernels. Anomalous transport in 1-d can be characterized by nonlinear growth of the centered second moment, and/or non-Gaussian leading or trailing edges of a plume emanating from a point source. These characteristics develop because of nonlocal dependence on either past conditions (time) [e.g., Cushman et al., 1994] or far upstream (space) concentrations [e.g., Cushman and Ginn, 2000]. The temporal nonlocality can be physically attributed to mass transfer of solute between relatively immobile and mobile phases [Haggerty et al., 2000] and transport in segregated regions of high and low permeability [Berkowitz et al., 2006]. The space nonlocality may be caused by the long-range dependence of aquifer structure and high variance of transport parameters [Fogg et al., 2000]. The space and the time fADE has been demonstrated by Benson et al.  and Schumer et al. , respectively, to accurately model the space and time nonlocal properties of solute transport. Their models were limited to 1-d and constant parameters. By combining these two techniques and releasing their limitations, we solve the following novel, spatiotemporal nonlocal transport model:
where C = C(x, t) is the scalar concentration, β is the capacity coefficient, 0 < γ ≤1 is the order of the Caputo time fractional derivative, v is the velocity vector, D is the dispersion coefficient, H−1 is the inverse of the scaling matrix providing the order and direction of the fractional derivatives, and M = M(dθ) is the mixing measure [Meerschaert et al., 2001]. Model (1) is a multiscaling space-time fADE, the scaling limit of random jumps (with finite mean and infinite variance) with independent, infinite mean, random waiting times (see Meerschaert and Scheffler  for the 1-d case with constant v and D).
 The physical model (1) can simulate many anomalous plume traits due to its embedded terms. These terms arise from the highly heterogeneous (subgrid) velocity field. The eigenvalues of the scaling matrix H are the scaling coefficients 1/α of the growth process in the mobile phase (where 1 < α ≤ 2 and α is direction-dependent). The mixing measure M(dθ) defines the shape and skewness of the plume in d-dimensions. The amount and rate of mass partitioning to a relatively immobile phase is described by γ and β. The time-fractional derivative term in (1) represents a fractal distribution of first-order transfer rates [Schumer et al., 2003]. Finally, the strength of the nonlocal plume spreading is allowed to vary with the local-scale heterogeneity of transport coefficients, so the measurements of v and D can be used to condition directly the nonlocal model (1).
Equation (1) is a generalization of previous fractional or local transport models. For example, in forced flow in 1-d, H−1 = 1/H = α, M(+1) = 1, M(−1) = 0, and (1) without mass partitioning reduces to the commonly used 1-d fADE ∂C/∂t = −v∂C/∂x + D∂αC/∂xα. When the motion process is multi-dimensional Brownian motion, H−1 = 2I, and the classical 2nd-order ADE is recovered. When v and D are constant and there is no mass partitioning, (1) reduces to the multiscaling space-only fADE proposed by Meerschaert et al. . Without the time-fractional term, (1) reduces to the space-only fADE developed by Zhang et al. . Note the space fractional derivative is equivalent to a convolution-Fickian flux with a space-dependent, power-law decay, kernel [Cushman and Moroni, 2001, equations (1) and (8)].
 A non-Markovian random walk (RW) method is developed to approximate (1), since no analytical solution is available, and the Lagrangian method is more efficient for large transport problems compared to the Eulerian method. The RW is also the only known way to solve the nonhomogeneous, multiscaling, space-only fADE [Zhang et al., 2006]. There are two main steps in this RW scheme. First, while a particle is in motion between sticking events, it follows a multiscaling compound Poisson process [Zhang et al., 2006]
where (t) denotes the particle location at the actual clock time t; n is the number of random jumps by t; Ri is a random length and P(Ri > r) ∼ r−1; the jump direction is a random unit vector drawn from the CDF of the mixing measure M(d); and, Ri and are independent. The jump component of RH along the eigenvector belonging to the kth eigenvalue 1/αk of H may also have information about the local dispersion strength D(x) via [Zhang et al., 2006]
where k represents the direction of the kth eigenvector of H, Θ = 1 if ∂D/∂xk > 0 and −1 otherwise, dSα and dSα−1 denote the random noises underlying independent α-order and (α − 1)-order standard Lévy motions, respectively, and tmi is the motion (or operational) time. Note here the tmi is an exponentially distributed random variable and can only be simplified to be constant for late time (D. A. Benson et al., Stochastic model for multi-rate mobile/immobile contaminant transport, submitted to Water Resources Research, 2007). Also note that the drift due to a constant or variable v can be added conveniently to the particle motion [see Zhang et al., 2006].
 In the second step, the individual waiting time timi of each particle at time step i is assigned randomly as a Lévy-stable noise scaled by the capacity coefficient β
so t = (tmi + timi). This has a form similar to the first term on the right hand side of (3), and represents solute dispersion in time. At any observation time, a particle is either in motion or trapped, so we can directly distinguish between the mobile and immobile concentrations.
 Exact verification of the RW solution is impossible due to the lack any another solution. We have checked the RW using numerous simple cases where other numerical or analytic solutions can be found. Two examples are shown in Figure 1. For a 1-d case with Gaussian jumps and constant v and D (which can also be solved numerically by the fast Fourier transform (FFT) method [Schumer et al., 2003]), the RW approximation of (1) matches the FFT solution (Figure 1a). The trailing edge due to trapping is evident compared to the Gaussian solution of the traditional ADE. Second, the RW solution of (1) for a 2-d case with weak mass partitioning approximates closely the FFT solution of the corresponding simplified fADE ∂C/∂t = −v· ∇C + D∇MH−1C (Figures 1b and 1c). The channeling of plumes along two preferential flowpaths defined by the mixing measure is apparent. A further example with γ = 0.6 is used (Figure 1d) to show the strong effect of mass partition on solute transport in 2-d (note–this problem cannot be solved by other methods).
3. Application to the MADE Plumes
 The natural-gradient tracer tests conducted at the Columbus Air Force Base in northeastern Mississippi, commonly known as the MADE test site, have sparked continued interest due to the strong influence of “high” subsurface heterogeneity on solute transport [Adams and Gelhar, 1992]. Anomalous dispersion was observed, including a heavy leading edge, a near-source peak, direction-dependent scaling rates [Meerschaert et al., 2001], local variation of transport speed [Lu et al., 2002] and loss of mobile mass [Adams and Gelhar, 1992]. The positively skewed plumes cannot be captured by the classical ADE with a coarse scale flow field [Adams and Gelhar, 1992]. Others have attempted to simulate the tracer plumes using many different methods in the last decade [see Molz et al., 2006]. The commonly accepted conclusion is that the classical ADE with a decimeter-scale flow field [Zheng and Gorelick, 2003], or the dual-domain approach with an unknown resolution of flow field [Molz et al., 2006], may capture the high-dimensional, anomalous plumes at the MADE site. The practicality of the first method is questionable due to the huge amount of aquifer information and computational power required. The tradeoff between the dual-domain approach and the discretization of the flow field remains an open question, although Schumer et al.  show that a single-rate, dual-domain model is inadequate.
 Here we apply the nonlocal model (1) and its RW solution to simulate the MADE-2 tritium plumes. We start with a 2-d model because the plume grows little to none in the vertical direction and calibration is easier than the 3-d case. Four “snapshots” of the plume were measured at 27, 132, 224, and 328 days after injection, respectively. The mass along each (multinode) observation well is integrated vertically and then Kriged to get a continuous, 2-d horizontal concentration map for each snapshot.
 Model parameters are either estimated or calibrated. First, the multiscaling rates were estimated along the longitudinal and lateral directions as 1.1 and 1.5, respectively, using the measured plume centered second moments [Meerschaert et al., 2001]. Second, following the 1-d model built by Schumer et al.  we estimate γ = 0.35 and β = 0.05 days−0.65 by fitting the measured 3H mobile mass (Figure 2a). These values are similar to those obtained by Schumer et al.  for a previous bromide plume. The remaining parameters, including M(dθ), v, and D, can be calibrated. The mixing measure can be calibrated using the observed concentration profiles, or much more simply, it can be estimated visually. We build a space-dependent, three-node mixing measure (Figure 2b), based on the general appearance of the observed plume and preliminary calibrations. For simplicity, we assume constant v, by limiting v in the range estimated by Benson et al. . All four snapshots are simulated, where particles are released at the injection wells (representing the injection of tritiated water), and then move according to (2) and are trapped according to (4). All model boundaries are open to solute motion. Here the initial and boundary conditions can be handled conveniently by the random walk approach similar to those for the boundaries of the ADE [see also Zhang et al., 2006]. However, for other type of conditions, such as a prescribed (fractional-order) flux or a reflective boundary, a random walk scheme different from that of the ADE is needed. Two objectives are selected for parameter calibration, including ln (C) at all observation wells (Figure 3) and the positively skewed and fan-shaped appearance of the plume.
 The simulated concentration at day 27 has a much heavier leading edge than the actual one (not shown here). This discrepancy is discussed by Benson et al. . The initial tracer injection may have significantly altered the ambient flow direction and led to more radial flow. The simulated concentrations at the following three snapshots capture the main behavior of measured plumes (with the second and the last snapshots shown in Figure 3). Compared with the model of Lu et al. , the improvement of model fitting is significant.
 The simulated mass recovery is similar to the measured values, with over-recovery at the beginning and under-recovery at the end due to the mass partitioning to relatively immobile (fine-grained) facies. Also note that the calibrated v (vx = 0.30 m/day, vy = 0) and D (Dx = 0.30 + 0.00143x m1.1/day, Dy = 0.30 m1.5/day) are consistent with the analysis of the hydraulic conductivity (K) statistics by Benson et al. . The increase of D is also consistent with the analysis of the depositional history and the measured K data that suggest that the central portion of the test site may have different deposits with higher permeability compared to the source area [Rehfeldt et al., 1992].
 The spatiotemporal nonlocal, multiscaling fADE model (1) is a tool to characterize and predict realistic plumes, without the burden of explicitly characterizing the small-scale heterogeneity. Simple random walk algorithms can be developed to approximate (1) with variable transport parameters and mixing measures, enhancing the flexibility and computational efficiency of the model.
 The application of the model (1) may be difficult at the typical environmental project due to the lack of detailed K data and/or measurement of the plume evolution. The applicability of the nonlocal transport model can be improved greatly if one can predict the main parameters (especially the space and time nonlocal parameters) given certain heterogeneity information. The space scale index α can be estimated given the K statistics [Benson et al., 2001], where both high variance and long-range dependence of K need to be present. A preliminary test also shows that the mixing measure for the MADE plumes may be predicted by assuming an ancient braided river system: first calculate the proportion of stream segments, and then flow, in each direction for a superposition of multiple sine waves. The mixing measure depends only on the ratio of paleochannel wavelength (λ) to amplitude (A). The average of best fit mixing measures at the MADE site is approximately λ/A = 10–20 (Figure 2c). In addition, the time scale index γ can be related to the distribution of low-K deposits [Zhang et al., 2007]. Future work is needed to test the predictability of all parameters.
 This work was supported by the National Science Foundation under DMS–0539176, EAR–0748953, EAR–0749035, and the Department of Energy under DE-FG02-07ER15841. We also thank the workshop “Stochastic Transport and Emergent Scaling on Earth's Surface” sponsored by National Center for Earth-surface Dynamics (NCED) under EAR-0210914 and University of Illinois under EAR-0636043. This paper does not necessary reflect the views of the NSF, DOE, or NCED.