How Fluvial Deposits Distort Aquifer Recharge Estimated From Groundwater Age: Insights From a Landscape Evolution Model

Sustainable management of groundwater relies on robust estimates of aquifer fluxes. As such, recharge can be estimated from groundwater age using analytical models assuming a homogeneous aquifer. We propose to assess how this assumption affects recharge estimates in fluvial deposits. First, we use CHILD, a landscape evolution model, to generate plausible deposits after tens of thousands of years of river evolution. Then, we use the fractional packing model to compute porosity and permeability of unconsolidated sediments assuming a mixture of a coarse and a fine fraction. Finally, we use PFLOTRAN to simulate groundwater flow and mean age over two thousand years of spatially uniform recharge. This reproduces the boundary conditions underlying Vogel's analytical model to estimate recharge rates. Our results are based on 20,000 realizations from varying inputs: grain sizes, aggradation rate of the river, porosity, recharge, etc. For most models, the mean estimate of the recharge rate remains below an absolute error of 25%. But fluvial heterogeneities can lead to local errors ranging from close to zero to several thousand percent, which means that estimates may vary drastically depending on sampling locations. A sensitivity analysis using the δ‐importance measure and CUSUNORO curves exposes how fluvial processes influence the range and distribution of the error. For instance, a high bank erodibility and small coarse grain size under low aggradation rates favor the reworking of deposits, which increases heterogeneity and leads to high errors. This work paves the way for a more reliable estimation of recharge by considering the sedimentary context.


of 23
Age is an abstract concept in the case of water, and three main definitions can be formulated (McCallum et al., 2015;Suckow, 2014). The idealized or advective age (a) is the time needed by a molecule of water to reach the sampling location from the recharge zone. While this definition is convenient to estimate fluxes, advective age is unmeasurable: samples often include water from different flow paths moving at different speeds that mix through diffusion and dispersion. Thus, a water sample does not have an age, but a distribution of ages. The mean age (b) uses the average of that distribution as age of the sample. An environmental tracer that follows the exact same behavior as water, that is, same response to transport, mixing, and chemical reactions, would be an ideal tracer. But this is rarely the case. The tracer age (c) is the age returned by an environmental tracer, and its average over a water sample corresponds to the mean age only in the case of an ideal tracer.
Age and recharge are linked through analytical models corresponding to specific aquifer geometries and properties: for instance, Vogel (1967)'s model focuses on unconfined aquifers and assumes homogeneous aquifer properties. The validity of this assumption can be questioned considering the prevalence of geological heterogeneities in the subsurface. Several studies have shown that sedimentary heterogeneities lead to the mixing of flow paths with different ages that perturbs the estimation of groundwater age (e.g., Broers, 2004;McCallum et al., 2014;Varni & Carrera, 1998;Weissmann et al., 2002). But little attention has been given to their impact on recharge rate estimates. Kozuskanich et al. (2014) study perturbations to Vogel's model by placing horizontal lenses in a two-dimensional aquifer. While this framework lays out the impact of heterogeneities, it ignores the strong effect of considering three dimensions on flow and reactive transport (Burnett & Frind, 1987;Steward, 1998) and negates the non-random and diverse distribution of heterogeneities. Such simplified approach restrains our view of the relationship between the sedimentary context and the recharge rate estimates.
Similarly to Kozuskanich et al. (2014), we aim at assessing the impact of heterogeneous sedimentary deposits on the reliability of recharge estimates from Vogel's model. But we propose to explore the opposite approach: a three-dimensional framework (Section 2) based on a landscape evolution model to simulate plausible fluvial heterogeneities, thus uncovering the sedimentary context of the aquifer, and extensive simulations to unravel the relationship between the sedimentary context and errors in recharge rate estimates based on a sensitivity analysis using 20 ,000 realizations (Section 3 and 4). To preserve the tractability of the study and keep the focus on the impact of sedimentary heterogeneities, we limit ourselves to the following setting: • Unconsolidated deposits from single-threaded, sand-bedded meandering rivers at the scale of a single channel belt, because of their sharp heterogeneities even over short distances (e.g., Bridge, 2003;Jordan & Pryor, 1992;Miall, 2014) and their importance as aquifer systems (e.g., Galloway & Sharp, 1998;Magee, 2009;Sharp, 1988). • Hypothetical ideal tracer to compute a mean age rather than a tracer age, which would introduce further bias into the estimation of recharge. Since only an age value is required in Vogel's model, we do not compute higher-order moments of the age distribution. • Unconfined aquifer at steady state with a flat topography and a spatially uniform recharge to stay as close as possible to Vogel's aquifer conceptualization. As such, no other boundary condition, for instance a river, is included. Even so, this hydrogeological setting mirrors existing settings around paleochannels (e.g., Magee, 2009;Samadder et al., 2011;Sinha et al., 2013).

Methods
Our framework to simulate unconfined aquifers in unconsolidated fluvial deposits follows three steps: (a) simulating the deposits using CHILD, a landscape evolution model ; (b) simulating the hydraulic properties using the fractional packing model (Koltermann & Gorelick, 1995); (c) simulating groundwater flow and mean age using PFLOTRAN (Hammond et al., 2011). Two extra steps aim at estimating the recharge rate using Vogel's model (Vogel, 1967) and at understanding how fluvial heterogeneities influence the absolute error in the recharge estimates using a sensitivity analysis.

Simulating River Evolution and Deposits
While fluvial deposits can seem randomly distributed in the subsurface, they portray the lateral migration and vertical aggradation of the river that shaped them through erosion and deposition. Reproducing that evolution emerges as an obvious strategy to reproduce plausible distributions of fluvial deposits, but simulating all the 3 of 23 physical processes behind it quickly becomes intractable, especially over geological timescales. Landscape evolution models such as the Channel-Hillslope Integrated Landscape Development Model, more widely known as CHILD Tucker, Lancaster, Gasparini, Bras, & Rybarczyk, 2001), exploit empirical and simplified physical laws to reproduce efficiently the wide range of processes driving landscape evolution.
CHILD uses a triangular irregular network of nodes to represent the landscape surface. Such irregular scheme avoids some potential biases of regular grids (Braun & Sambridge, 1997) and allows horizontal displacements such as lateral migration. To limit complexity and interpolation biases, deposits are recorded on a curvilinear grid whose horizontal coordinates are fixed and regular . Our setting simulates the section of a river starting from a 12 by 8.4 km, flat domain with a slope of 3 × 10 −4 m/m. The river inlet stands in the middle of the upper side of the domain, at an elevation of 3.6 m, while the outlet stands in the middle of the opposite side. Flow coming out of the inlet corresponds to a fixed drainage area of 2,000 km 2 and varies depending on rainfall. The stratigraphic grid that records the deposits is only defined over a 10 by 6.2 km sub-domain to avoid border effects ( Figure 1). We focus on four of CHILD's processes to vary the fluvial deposits: stochastic rainfall, aggradation, lateral migration or meandering, and overbank deposition.
Instead of simulating continuous rainfall, CHILD includes a stochastic process based on the Poisson pulse rainfall model (Eagleson, 1978;Tucker & Bras, 2000). In this model, rainfall becomes a series of discrete storm events of constant rainfall intensity separated by inter-storm events without rainfall. Storm duration, storm rainfall intensity, and inter-storm duration follow exponential probability functions with their mean value as parameter. Those mean values can vary in time, introducing larger trends in the climatic variability ( Figure 1). Inlet flow and runoff resulting from rainfall over the domain are routed downslope using the steepest-descent method.
In CHILD, vertical aggradation is driven by the river's elevation, which is implemented as a boundary condition: inlet and outlet elevation can vary up and down through time to depict externally imposed aggradation and incision phases (Figure 1). This geometrical approach provides a more straightforward solution than attempting to include the processes driving aggradation . We randomly draw time series of aggradation rates so that the inlet reaches 16 m in less than 75,000 years of simulation to limit the computational burden. The inlet cannot go below 2 m, so that the river cannot incise the initial topography, which helps setting geologically consistent boundary conditions for the subsequent flow simulations.
Lateral migration of the river bends relies on the topographic steering mechanism (Dietrich & Smith, 1983), in which variations in river-bed topography due to point bars steer the flow toward the outer bank, thus modifying the bank shear stress and driving erosion (Lancaster, 1998): bank erodibility ∕ per unit stress ; bank shear stress ∕ 2 ; lateral unit vector (dimensionless) .
The bank shear stress depends on river bed sediments, river discharge, and river channel geometry (for more details see Lancaster, 1998). Channel migration ensues by assuming that the sediment supply is sufficient for deposition on the inner bank to preserve the channel width, a common approach in sedimentology to limit computational costs (e.g., Bradley & Tucker, 2013;Howard, 1996;Lopez et al., 2008). Erosion and deposition develop on an adaptive mesh by removing nodes after erosion along the outer bank and adding nodes after point bar deposition on the inner bank. While that model greatly simplifies meandering processes, it has been shown to develop realistic meandering patterns (Lancaster, 1998;Lancaster & Bras, 2002), which ultimately is what matters here.
During meandering, CHILD can handle two grain sizes of sediments: a coarse fraction deposited in point bars along the meanders, and a fine fraction deposited in the floodplain. Overbank deposition relies on the observation that the sedimentation rate in floodplains decreases exponentially with the distance from the river channel (Howard, 1992(Howard, , 1996Mackey & Bridge, 1995;Pizzuto, 1987). The model behind it, adapted from where sed vertical deposition rate of fine sediments at a floodplain node [ ∕ ] ; overbank deposition rate constant overbank distance-decay constant [ ] . The elevation of floodwater depends on discharge following Leopold and Maddock (1953)'s power-law hydraulic-geometry equations, which describe river geometry in general: at-a-station depth scaling exponent (dimensionless); Runoff, river geometry, and meandering are updated after each storm event, while overbank deposition only occurs in flood events when storm intensity is high enough that H w > H b .

Simulating Hydraulic Properties
CHILD's output is a three-dimensional curvilinear grid that contains the fraction of coarse sediments deposited on top of an initial regolith ( Figure 1). An initial step interpolates this fraction into a regular grid of cell size 50 × 50 × 0.5 m (Figure 2), which makes subsequent flow simulations faster and more stable, both appealing qualities when handling thousands of realizations. It also cuts the topography, which is deemed flat in Vogel's model, so our study can focus on the impact of fluvial heterogeneities.
Flow simulation requires hydraulic properties of porosity and permeability, which can be determined using the fractional packing model (Koltermann & Gorelick, 1995;Figure 2). This model applies to a binary mixture of unconsolidated sediments, precisely CHILD's output. It stems from the ideal packing model, which is based on the principle that mixture decreases porosity: 1. Starting from coarse grains only, adding fine grains fills the space between the coarse grains and decreases the porosity. 2. When the fraction of fine grains is equal to the porosity of the coarse fraction, fine grains completely fill the space between the coarse grains, leading to a porosity minimum. 3. After that threshold, the fine fraction dominates the packing, and as it increases the mixture becomes more homogeneous and the porosity increases.
However, this ideal model tends to underestimate porosity. The fractional packing model is an empirical adjustment to fit laboratory measurement of porosities at various confining pressures.
The fractional packing model also includes a model of permeability based on the Kozeny-Carman equation (Koltermann & Gorelick, 1995):   Figure 1 using the fractional packing model.

of 23
When the fraction of fine grains is less than the porosity of the coarse fraction, the representative grain diameter is the geometric mean grain diameter. When the fraction of fine grains is greater than the porosity of the coarse fraction, the representative grain diameter is the harmonic mean grain diameter.
The regolith can consist of former alluvium or saprolite whose hydraulic properties are the same as the fine sediments deposited by CHILD. It acts as a low permeability layer before the basement, which corresponds to the bottom of the regular grid and is impermeable.

Simulating Groundwater Flow and Mean Age
Our boundary and initial conditions mimic those of Vogel's model ( Figure 3): uniform recharge from the top, impervious bottom and sides except for the downstream side of CHILD's setting, which is the only possible exit for the flow. A simulation lasts 2,000 years, which should be long enough to reach steady state in all cases. Transient groundwater flow and age are simulated using PFLOTRAN, a parallel subsurface flow and reactive transport framework (Gardner et al., 2015;Hammond et al., 2011). PFLOTRAN's parallelism and scalability are invaluable when aiming for thousands of realizations. Groundwater flow is simulated using Richards' equation, which applies the mass conservation equation to a variably saturated, single-phase flow: 8 of 23 groundwater flux porosity, here from the fractional packing model (dimensionless); intrinsic permeability, here from the fractional packing model The relative permeability comes from the Mualem relative permeability function based on the van Genuchten saturation function.
Mean groundwater age is simulated using Goode (1996)'s equation, which applies the mass conservation equation to the concept of age mass: ϕ porosity, here from the fractional packing model (dimensionless); s saturation The age mass is the product of a water parcel's mass in a control volume with the mean age of that water. Unlike traditional approaches, which only model the effects of advection -so are limited to an advective age -this formulation also includes the effects of dispersion and mixing, without which groundwater age simulation is biased (Goode, 1996).

Estimating Recharge Rate
Vogel's model (Vogel, 1967) is an analytical model commonly used to estimate recharge rate from groundwater age (e.g., Gilmore et al., 2016;McMahon et al., 2011;Wood et al., 2015), which we apply to each cell of the simulation grid: In theory, this model only applies to unconfined homogeneous aquifers of constant thickness with uniform recharge, although in practice those assumptions are unlikely to be all met. It is also known as the exponential model, because the frequency distribution of groundwater age with respect to depth under this configuration is exponential.
The absolute error quantifies the offset between the apparent recharge estimated with Vogel's model and the actual recharge ( Figure 4): where absolute error (percent); apparent recharge rate actual recharge rate, here 25 mm/yr [ ∕ ] .
Thus, Equation 8 quantifies the impact of fluvial heterogeneities on the estimates.

Sensitivity Analysis
By simulating the processes behind fluvial heterogeneities, we can go beyond the statistical analysis of the estimation error and its extent, and begin a geological analysis of the link between heterogeneities and error. But the coupled model described in the previous sections is an intricate maze of processes and properties, making the relationship between inputs and output intractable through the model. A more straightforward approach, sensitivity analysis, evaluates how much each input contributes to the value of the output, giving us a glimpse at the internal machinery of the model. Following Borgonovo et al. (2017), we divide our analysis in three main purposes: parameter prioritization, interaction quantification, and trend identification.
Parameter prioritization aims at identifying the input that contributes the most to the variability of the output. Global sensitivity measures quantify those contributions by fixing an input to a given value, measuring the reduction in variability, and repeating the operation for different fixed values. The prevalent variance-based sensitivity analysis measures the contribution of a given input to the variance of the output (Sobol', 1993): where Sobol' index; output; input i from a set of n inputs; ∼ all the n inputs except i.

of 23
Behind this approach lies the assumption that the second central moment, that is, the variance, is enough to describe the variability of the output. But all the moments of a variable provide information about its distribution, so eliminating any of them inevitably leads to a loss of information. The δ moment-independent sensitivity analysis overcomes this issue by comparing areas below probability density functions instead of variances (Borgonovo, 2007;Plischke et al., 2013). It measures the shift in the output distribution due to a given input: where δ-importance measure; output; input i from a set of n inputs; separation between two probability density functions; f Y density of the output; Interaction quantification aims at determining whether inputs have additional effects beside their individual variations. Without interactions between the inputs, all the variations in the output come from the individual effects of the variations in the inputs. Thus, the sum of the global sensitivity measures can indicate the relevance of interactions between the inputs (Borgonovo, 2007;Saltelli et al., 2000): where Sobol' interaction index; δ-importance interaction measure; output; input i from a set of n inputs.
Both measures can be interpreted as the fraction of the output's variation that can be explained by interactions.
Finally, trend identification aims at determining if the output increases or decreases when an input increases. CUSUNORO curves form a compact visualization to determine the importance of each input and trends (Plischke, 2012, 2019). CUSUNORO stands for cumulative sums of normalized reordered output: where cumulative sum for output value k out of m values; output value; order permutation of input X i : mean of the output.
When the CUSUNORO curve is positive, the output increases when the input used for ordering increases. When the CUSUNORO curve is negative, the output decreases when the input used for ordering increases. This behavior highlights trends and non-linear relationships. And the higher the extrema of the curve, the stronger the impact of the input, giving another way to assess parameter prioritization.

Application and Results
Our results come from simulating 20,000 aquifers using CHILD, the fractional packing model, and PFLOTRAN as described in the previous section, and from a sensitivity analysis to better understand the impact of the sedimentary context on the recharge estimates.

Capturing the Sedimentary Context Through Simulation Inputs
Our simulation framework comes close to a hundred parameters (see Table S1 in Supporting Information S1 for the complete list of input parameters), and a realization can take tens of hours to run. Thus, including all the parameters in the sensitivity analysis is computationally infeasible. We focus instead on a handful of parameters ( Figure 5) that govern the fluvial processes and directly affect hydraulic properties and recharge.
All the inputs are sampled from uniform distributions that aim at exploring a parameter space as large as possible while enabling river meandering. Rainfall parameters come from estimates of the parameters for the Poisson pulse rainfall model over the continental United States (Hawk & Eagleson, 1992), which cover a large variety of climates and are often used in studies using CHILD (e.g., Tucker, 2004;Istanbulluoglu & Bras, 2005;Attal et al., 2008). While we keep storm and inter-storm durations constant, the mean storm rainfall randomly varies over time periods between 500 and 5,000 years with a correlation of 0.75 between two successive values to limit 13 of 23 abrupt variations. The fine grain diameter corresponds to silt on Wentworth (1922)'s scale (0.0039-0.0625 mm), and the coarse grain diameter to sand (0.0625-2 mm). The river inlet elevation varies over random time periods ranging from 500 to 5,000 years based on a uniform distribution of aggradation rates from −1 to 1 m/kyr, a negative value meaning incision, following the model of . As stated in Section 2.1, the river inlet elevation must reach 16 m before 75,000 years without going below 2 m, which leads to an average thickness for a channel belt in a meandering setting (Gibling, 2006). Three other parameters control the sedimentary output of CHILD: the bank erodability, the overbank deposition rate constant, and overbank distance constant, whose values derive from other studies using CHILD (e.g., Lancaster et al., 2003;. The fine and coarse grain porosities are defined relatively to the minimum porosity, and cover the large range of porosities in unconsolidated sediments. The recharge rate is based on a global data set of recharge estimates (Moeck et al., 2020), and is meant to capture the diversity of potential values while facilitating reaching steady state with a limited computational burden.
More than 20,000 realizations were necessary to accommodate for failed ones. PFLOTRAN's computation time was set to 10 hr at most, and about 0.76% of the realizations did not converge in time. Their input values show that failure tends to occur at low fine grain diameters or low aggradation rates with a high variability. CHILD ends up with a lower success rate, with 1.60% of the realizations crashing or failing to complete under 50 hr. It appears that low aggradation rates with a high variability favor failure, so does a high bank erodability. This relates to issues in CHILD's adaptive mesh, which has trouble to keep up when river migration induces too much change in the mesh. In both cases, the failure rates are low enough to prevent any bias in the input distributions, which remain uniform ( Figure 5), and they should have no impact on our findings.
Even though flow and age are simulated over 2,000 years, the mean groundwater age never reaches that age. It even stays below 1,000 years for the most part ( Figure S1 in Supporting Information S1). This suggests that most of the simulation domain reaches steady state in all the realizations.

Impact of Sedimentary Heterogeneities on Recharge Estimates
Our simulation framework ends with three-dimensional realizations of estimates of recharge rate from Vogel's model, from which we determine an absolute error from the actual recharge rate. Those realizations differ in the amount and extent of their heterogeneities through their porosity and permeability and in their recharge rate, and two distinct behaviors stand out.
The former floodplains simulated by CHILD are filled with homogeneous fine sediments, which lead to errors in estimated recharge rate below 25%. A discrepancy of a few millimeters per year usually exists, which can be attributed to several aspects: first, the behavior of Vogel's model along the impervious boundaries is unlikely to hold in practice; second, permeable deposits can perturb flow paths over several kilometers inside the floodplain (Figure 3), which favors mixing and drives the mean groundwater age further away from the advective age of Vogel's model; finally, small numerical errors should be expected from the spatial and temporal discretization of the simulation, which aims at finding a balance between speed and precision. The channel belts, however, are filled with a more or less heterogeneous mixture of coarse and fine sediments, which lead to apparent recharge rates from 1 mm/yr to more than 10,000 mm/yr in a few realizations ( Figure S1 in Supporting Information S1). This translates to asymmetric differences with the actual recharge rate: while the recharge rate tends to be underestimated, overestimates can reach far higher values ( Figure 6). All the realizations contains absolute errors above 25%, with errors above 100% being frequent, some realizations even displaying errors above 10,000%. When averaged over all the pixels of a realization, 96.585% of the estimates fall below a 25% error despite those remarkable extrema. Nevertheless, standard deviations are high, indicating a high variability of estimates in a realization. In the field, recharge estimates rely on sampling the tiny fraction of an aquifer accessible through well bores, so practitioners should anticipate widely different results depending on where they sample because of sedimentary heterogeneities.
A similar analysis on the absolute error leads to similar conclusions. The channel belts pull the mean errors above 25% and the standard deviations remain high, pointing at the variability caused by heterogeneities. Those two variables, error mean and error standard deviation, seem to increase with the channel belt (see Figure S2 in Supporting Information S1), suggesting that rivers with a more extensive lateral migration lead to more fluvial heterogeneities and higher errors. The fraction of a realization with an absolute error higher than 25% suggests a 14 of 23 similar link between fluvial processes and absolute error. A sensitivity analysis should provide a more complete and quantitative look at that relationship.

Relationship Between Sedimentary Context and Estimation Error
Sensitivity analysis methods require unidimensional inputs and outputs. We summarize time-series inputs from CHILD using a mean and standard deviation to get a sense of their central value and associated variability ( Figure 5). We do the same with the realizations, using the mean absolute error and the standard deviation of the absolute error ( Figure 6). In addition, we use the fraction of a realization with an error higher than 25% as a glimpse at the spatial extent of the error. To assess the soundness of the sensitivity analysis, we add an extra input unrelated to the outputs: a dummy variable randomly sampled between 0 and 1. Figure 6. Statistical summary of the difference between apparent and actual recharge rate and of the associated error for the 20,000 realizations. Cumulative distributions and summary statistics are given for each realization. The values for realization # 12,053 illustrate the shape of the cumulative distributions. Figure S1 in Supporting Information S1 shows the same summary about the mean groundwater age and the apparent recharge rate. Figure S2 in Supporting Information S1 depicts the relationship between error mean, error standard deviation, and error fraction and the spatial distribution of the errors in the realizations.
δ-importance measures, Sobol' indices, and CUSUNORO curves attribute similar relevance to the inputs, which underlines the robustness of the parameter prioritization (Figure 7 and Figure S4 in Supporting Information S1). Only two of the inputs selected for sensitivity analysis show no impact on the outputs: the overbank distance decay constant and the overbank deposition rate constant. This lack of impact of overbank deposition comes from the simplification of the depositional processes: a single grain size is deposited in the floodplains, which leads to homogeneous overbank deposits that fit best the assumptions of Vogel's model. This highlights the fundamental principle of what shapes the error: deviation from the exponential distribution with depth of groundwater age expected by Vogel's model.
At the lowest level, deviation from the ideal distribution of groundwater age depends on the deposition of heterogeneous sediments. This is made clear by the CUSUNORO curves of the bank erodibility and, to a lesser degree, of the mean storm rainfall. When those parameters increase, lateral migration increases, so the river can wander over a larger domain, widening the channel belt and increasing fluvial heterogeneities. This results in higher and more widespread errors in recharge estimates (see error mean and fraction in Figure 7). Reworking of existing deposits plays a major role in increasing the error, as shown by the aggradation rate (see error mean and fraction of its average and deviation in Figure 7): a higher rate speeds up vertical aggradation, leaving less time for the river to rework deposits and spread heterogeneities, while periods of lower aggradation or incision favor the reworking of previous deposits, increasing fluvial heterogeneities and the error. Part of the coarse grain diameter's behavior matches this interpretation: larger diameters decrease lateral migration, leading to lower, less widespread errors (see error fraction in Figure 7). Another part does not (see the non-linear behavior of the error mean in Figure 7), highlighting the dual role of the coarse grain diameter as input to CHILD and to the fractional packing model. At the same time, the CUSUNORO curves of the fine grain diameter do not show any behavior consistent with slowing down lateral migration. This suggests two things: first, fine grains have a smaller impact on fluvial processes than coarse grains, which is consistent with the non-linear relationship between grain diameter and lateral migration; second, the deposition of heterogeneous sediments alone is insufficient to perturb the distribution of groundwater age.
Indeed, groundwater age is contingent on flow behavior, which is first driven by the hydraulic properties of the deposits. The minimum porosity affects coarse and fine deposits alike, and, contrary to the coarse and fine porosity ranges, it shows little impact on the outputs (Figure 7). Thus, deviation from the ideal distribution of groundwater age also depends on the contrast in the hydraulic properties of the deposits. As shown by the CUSUNORO curves of the coarse and fine porosity ranges (Figure 7), this contrast affects the amplitude of the error in recharge estimates and its extent. This highlights the disparate impact of the hydraulic contrast in the deposits: on one side, a high contrast leads to areas of very low or very high ages at any depth, which leads to high errors locally; on the other side, a lower contrast means that more age transfer is possible between fine and coarse deposits, which leads to lower but more homogeneous and widespread errors ( Figures S5 and S6 in Supporting Information S1). While the end results look similar, porosity and permeability act differently on the distribution of groundwater age and the error. A higher porosity means slower fluxes through the deposits, and different porosity values between fine and coarse deposits mean different groundwater ages and higher errors. At the same time, a higher fine porosity range or lower coarse porosity range increases the ratio of fluid volume between fine and coarse deposits, which leads to more age transfer toward the coarse deposits (Bethke & Johnson, 2002) and higher errors (see their opposite curves for the error mean and fraction in Figure 7). Together with the grain diameters, the porosity ranges defines the permeability for the fine and coarse deposits. Variations in permeability mean that flow paths diverge from the horizontal and parallel paths assumed in Vogel's model, leading to variable flow velocities and higher errors.
Finally, the CUSUNORO curves of the recharge rate ( Figure 7) highlight a last level influencing the deviation from the ideal distribution of groundwater age: the overall flow velocity. Indeed, a higher recharge rate means faster fluxes, which lowers groundwater ages and lessens the impact of heterogeneities, so decreases the error. As shown by the CUSUNORO curve of the error mean, that effect is non-linear and dampens as the recharge rate increases.
In the end, fluvial processes are at the root of the contrasts in the hydraulic properties that shape the estimation error, but those contrasts can be amplified or attenuated by the properties of the deposits and the fluxes into the aquifers. Those intricate effects suggest significant interactions between the inputs, which is confirmed by both δ I and S I (Figure 7 and Figure S4 in Supporting Information S1), but to different degrees. δ I remains close to 16 of 23 Figure 7. Sensitivity analysis between some inputs to CHILD, the fractional packing model, and PFLOTRAN and three outputs summarizing 20 ,000 threedimensional realizations of absolute error in recharge estimation. The analysis is based on the δ-importance measures and the CUSUNORO curves. Figure S4 in Supporting Information S1 provides Sobol' indices. ECDF: Empirical Cumulative Distribution Function of inputs.
17 of 23 or below 0.35 for all the outputs, which suggests the absence of strong coupling or extreme non-linearities and support the robustness of the parameter prioritization (Liepmann & Stephanopoulos, 1985). S I remains above 0.35 for all the outputs, which suggests the opposite. Scatter-plots between inputs and outputs show no extreme non-linearities (see Figure S3 in Supporting Information S1). Moreover, Sobol' indices tend to perform poorly when the inputs are randomly drawn, which is even more true for interaction quantification because it requires many more realizations than parameter prioritization. But the sampling schemes that aimed at improving their reliability are unadapted to time-series inputs and failed realizations. Thus, the results from S I must be taken with caution.

Discussion
Our results highlight the clear impact of fluvial heterogeneities on recharge rate estimates, but also its non-randomness: the variability and distribution of the error can be explained by the sedimentary context.

Characterizing the Validity of Analytical Models
A valuable aspect of numerical modeling comes from the ease of creating what-if scenarios for exploratory analysis and hypothesis testing. This helps build insight on geological processes whose impact can far exceed human lifespan, which is usual in landscape evolution (Kirkby, 1996). Analytical models that deal with groundwater age rely on numerous assumptions that are unlikely to hold true in practice. Numerical scenarios to test those assumptions can give us valuable insights on the applicability of such models.
Vogel's model relies on the Dupuit-Forchheimer approximation and several other assumptions that have a tight control on its theoretical applicability: unconfined aquifer, uniform recharge, homogeneous aquifer properties, constant aquifer thickness, and horizontal extent larger than the thickness. We violate two of those assumptions to different degrees, and with different impacts. Fluvial systems develop on sloped surfaces, and usually the channel slope is lower than the valley slope. In our setting, it leads to slight differences of thickness along the aquifers in the fluvial deposits. But those differences seem to have a minimal impact on recharge rate estimates, as hinted by the lack of large errors correlated with the slope in homogeneous deposits (Figure 4 and Figures S2, S4, and S6 in Supporting Information S1). This suggests that Vogel's model can bear minor violations of its assumptions. Further testing would be required to more precisely quantify the impact of thickness variations.
Here, we have focused on the assumption of homogeneity by considering plausible fluvial heterogeneities. Although those heterogeneities differ in type and distribution with Kozuskanich et al. (2014), we reach similar findings when it comes to permeability contrasts increasing the error. In addition, our work suggests that circulation between less permeable and more permeable deposits plays a part in increasing the overall error. Due to the comprehensive sampling of potential heterogeneities, the mean absolute error is an order of magnitude higher than reported in Kozuskanich et al. (2014), and maximum errors are even two orders of magnitude higher. This finding has potential ramifications for recharge estimation, which usually relies on just a few samples, often collected from within the channel belt where error variability is the highest. It implies that estimates could vary considerably depending on sampling locations, and that an enormous number of sampling locations would be required to have any confidence in a mean recharge estimate.
Of course models are never reality, and our error estimates cannot be directly translated to real applications. But our work strengthens a wide body of literature emphasizing the importance of sedimentology in hydrogeology (e.g., de Marsily et al., 2005;Fogg, 1986;Huggenberger & Aigner, 1999), and is consistent with observations from the field. For instance, Stimson et al. (2001) have shown how groundwater flow and age are controlled by sedimentary heterogeneities in an alluvial fan of Cochabamba Valley, Bolivia: off-axis fine deposits lead to longer residence time than the coarser deposits along the fan's axis, which also leads to changes in water chemistry. Suckow et al. (2020) have shown how a homogeneous model is inappropriate to estimate recharge in the Hutton Sandstone, a confined aquifer of the Surat Basin, Australia: the Hutton Sandstone was deposited by fluvial systems feeding deltas to a lake, forming permeable channels in a matrix of less permeable floodplain and lacustrine deposits, which requires a dual porosity model. More generally, uncertainty is often left out of recharge estimation, but comparison of different estimation methods (Coes et al., 2007;McMahon et al., 2011) and attempts at quantifying uncertainties (Crosbie et al., 2010(Crosbie et al., , 2018 show a variability greater than an order of 18 of 23 magnitude. Although other factors can explain those variations, subsurface heterogeneities have been shown to exert a strong hold on recharge (Fu et al., 2019;Hartmann et al., 2017).
That being said, we have shown that not all settings are equal, even when focusing on a single geological context: deposits from a meandering, alluvial river. Thus, understanding the sedimentary context can help assess the confidence in the recharge rate estimates. Some parameters have a clear physical meaning and a clear impact, which includes the grain diameters and their control over the variability and spatial distribution of the permeability. Other parameters are more abstract. The bank erodibility is one of those, because in practice it includes the actual erodibility of the deposits and the effect of other components such as vegetation (Lancaster, 1998). Such parameters are hard to measure, and an analysis of the sedimentary context should focus on migration and aggradation rather than on those parameters. Heterogeneities increase the absolute error, so the main question should be: did the fluvial system deposit and preserve heterogeneities?

Predicting Recharge From Groundwater Age
The strength of analytical models comes from their conceptual simplicity and their ease of use: Vogel's model only requires four parameters, three of them forming the basis of any hydrological study. But this simplicity comes at a cost too easily and too often neglected: the validity of Vogel's model leans on five assumptions about the hydraulic and geological context of the aquifer. Understanding the local geological context is key to assess to what extent the assumptions behind analytical models are met. It should also help target the most homogeneous areas of an aquifer for groundwater sampling, which could improve the confidence in the recharge rate estimates. Geomodeling can help fill the gap between the data to get a better view at the three-dimensional architecture of the heterogeneities and at the related uncertainty. Aside from assessing the tolerance to breaking models' assumptions, studies such as this one could help characterize the distribution of groundwater age depending on the spatial distribution of heterogeneities. The distribution of groundwater age is the main input to the lumped parameter models used to determine groundwater age from environmental tracer concentrations (Małoszewski & Zuber, 1982), but usually the age distribution is based on simplified aquifer geometries similar to that of Vogel's model.
What lays behind Goode (1996)'s work is that estimating recharge using groundwater age is not a flow problem, it is a transport problem. This explains the influence of heterogeneities, which must be taken into account, as in any transport problem (de Marsily et al., 2005). From that perspective, several authors have advocated against using groundwater age, and in favor of using environmental tracer concentrations in numerical flow and transport simulations instead (e.g., Kozuskanich et al., 2014;Suckow, 2014;Turnadge & Smerdon, 2014). The increase in computing power and software efficiency certainly makes this approach more and more practical (Gardner et al., 2015). But the modeling burden is a radical opposite to analytical models (Sanford, 2011). Hence the importance of analyzing the geological context: if the assumptions of analytical models are met, there is no need for further numerical modeling and, if they are not, the geological interpretations required for numerical modeling are already available.
Indeed, numerical modeling for reactive transport requires a three-dimensional conceptualization of the geological heterogeneities to build realizations of their geometries. The increase in computing power has also enabled the practical use of landscape evolution models and stratigraphic forward modeling (e.g., Bianchi et al., 2015;Huang et al., 2015;Koltermann & Gorelick, 1992). But those models are difficult to parameterize and condition to data, and they are usually applied to regional rather than local modeling. This means that applying CHILD in a real case study is difficult. Fortunately, geostatistical methods offer an intermediate solution between equivalent homogeneous properties and stratigraphic forward modeling. Weissmann et al. (2002) have used Gaussian simulations in the context of groundwater age, but Gaussian simulations have long been criticized for their lack of geological realism and the ambiguity of their inputs (e.g., Gómez-Hernández & Wen, 1998;Journel & Zhang, 2006;Zinn & Harvey, 2003), even in the groundwater age realm (McCallum et al., 2014). The more recent multiple-point simulations better reproduce geological heterogeneities while conditioning various types of data as well (e.g., Mariethoz & Caers, 2014;Tahmasebi & Sahimi, 2016;Zahner et al., 2016). Models like CHILD could provide the training images required by such techniques and help infuse sedimentary knowledge into transport simulations.

Using an Integrated Approach to Explore Groundwater Systems
This work complements Kozuskanich et al. (2014) by analyzing extensive simulations from an integrated framework reproducing the geological processes behind heterogeneities in three dimensions. Taking into account those processes guarantees that the spatial distribution of the heterogeneities results from a coherent geological scenario rather than statistical randomness, which affects groundwater flow (e.g., Tetzlaff & Schafmeister, 2007;Webb & Anderson, 1996;Xu et al., 2021). Without data to fit, there is no boundary to the number and types of processes that can be integrated in such framework, except for the computational burden of having to generate thousands of realizations over geological timescales.
From this perspective, our setting remains quite simplistic. For the sake of speed and stability, we do not fully leverage the advantages of having a lateral migration model integrated in a landscape evolution model, for instance by accounting for losses of water between rainfall and runoff , by starting from a non-flat topography such as a valley (Lancaster, 1998), or by simulating the processes driving aggradation and incision using a less geometrical approach (Lancaster, 1998;. CHILD itself could be improved to better represent fluvial processes and their impact, for instance by adding more than two grain diameters, which should change flow behavior in the floodplains. Currently, the floodplains are filled with fine sediments only, but in reality they would be more heterogeneous, with some of the coarser grains being deposited there as well. Another important process ignored here is avulsion, after which rivers change course entirely. This as well should change the floodplains, and would allow us to to simulate thicker aquifers. Alternatives to CHILD that include some of those aspects could also be explored (e.g., Lopez et al., 2008;Pyrcz et al., 2009;. We could also go one step further in the hydrogeological setting. For instance, the porosity of the different grain types could be based on variations of grain shape rather than being randomly drawn. And now that the impact of sedimentary heterogeneities is established, other hypotheses of Vogel's model could be relaxed as well. On top of starting from non-flat topography in CHILD, we could preserve the topography before simulating groundwater flow and age. We could introduce other heterogeneities, such as heterogeneous recharge rates due to spatially or temporally varying rainfall, soil properties, or land cover. We could estimate groundwater age from tracer concentrations instead of simulating it, which would add another layer of bias (e.g., Weissmann et al., 2002;Larocque et al., 2009;McCallum et al., 2014McCallum et al., , 2015. And, in reality, sediment deposition and subsurface flow occur in parallel rather than sequentially, so coupling all the simulation steps would bring us closer to the ideal setting. But such capabilities remain in their infancy (Partington et al., 2017).
But one must remain careful not to switch from complex to complicated. More processes represented with more details mean more parameters, including some that can blur the interpretability of the analysis. This is true when using a few demonstrative realizations, because those have to be defined first, and when using exhaustive simulations, because three-dimensional outputs are hard to analyze, and simplifying them only shows part of the story. As a result, trends in the relationship between inputs and outputs are hard to identify from scatterplots alone (see Figure S3 in Supporting Information S1). From this perspective, the CUSUNORO curves tell a story consistent with the sedimentary context and the fluvial processes involved, and with the observations of Kozuskanich et al. (2014). They constitute a powerful visualization technique that provide more insight than the usual parameter prioritization, although a deeper, more tailored analysis should lead to a better understanding of the impact of some parameters-especially the grain diameters and the porosity ranges-on groundwater age and estimation error. While interactions between inputs appear obvious, they remain hard to quantify using a sensitivity analysis, because they require many realizations or specific input sampling schemes. As such, software stability and efficiency are becoming essential in hydrogeology, and more generally in geoscience.

Conclusions
Hydrogeological modeling is often reductionist in the sense that it considers the physics of the processes occurring today, mainly groundwater flow and reactive transport, and neglects the geological processes that built up aquifers. But what is happening today depends on what happened yesterday, forming a flow of causal relationships between all those processes. By simulating the fluvial processes that lead to an unconfined aquifer, we have exposed considerable local errors when estimating recharge using groundwater age and Vogel's model. Those errors depend on the contrast in the hydraulic properties of the sediments. High river migration and low 20 of 23 aggradation lead to more heterogeneous deposits, which favor such contrast. Porosity and permeability amplify or attenuate the impact of heterogeneities, mainly through their contrast between different sediment fractions rather than through their absolute values. Non-linear effects and interactions seem to dominate their impact on groundwater age, whose behavior, as highlighted in other studies, differs from that of groundwater flow. Recharge influences the impact of that contrast in hydraulic properties, and errors become less pronounced for high recharge. This suggests that Vogel's model should be carefully used in settings of high river migration, low aggradation, high porosity or permeability contrast, or low recharge, and that numerical flow and transport simulations should be favored when possible.
In such settings, taking into account the heterogeneities appears essential to have any confidence in the recharge estimates. Simulating them from a physical perspective over geological timescales remains challenging, but individual models exist and we already have the computing resources to aim for a more holistic approach. This requires more multidisciplinary research to capitalize on the knowledge of other fields. One is computer science, to make better use of high-performance computing and GPUs. The other is sedimentology, to better reproduce sedimentary processes and assess their impact on hydrogeological processes, and to better understand the sedimentary context of aquifers, which could help assess the confidence in our estimates by better exploiting existing data. Other contexts could also be explored to expand this work, for instance confined aquifers, but this only gives us a qualitative insight at the credibility of analytical models. The next question should be how to actually improve the estimates, either by adapting this framework and applying it to real cases instead of what-if scenarios, or by developing new analytical or statistical models to estimate recharge rates.