Toward catchment hydro-biogeochemical theories

Department of Civil and Environmental Engineering, The Pennsylvania State University, University Park, Pennsylvania College of Earth, Ocean, and Atmospheric Science, Oregon State University, Corvallis, Oregon Laboratory of Ecohydrology ENAC/IIE/ECHO, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland Center for Applied Geosciences, University of Tübingen, Tübingen, Germany Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden Earth and Environmental Systems Institute, Department of Geosciences, The Pennsylvania State University, University Park, Pennsylvania Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland Department of Geography, University of Zurich, Zurich, Switzerland

. Urbanization, agriculture, droughts, and disturbance (fire, insects, and disease), can significantly alter subsurface structures and therefore exert far-reaching influence on catchment functioning in the short term and catchment evolution in the long term. Here, we present examples that illustrate these connections but are not intended to be exhaustive.
Example 1: Land use changes may shortcut water and solute export from land to aquatic systems Agricultural crops now constitute the largest biome on Earth, occupying one third of the global ice-free land area (Ellis & Ramankutty, 2008;Ramankutty et al., 2018;Ramankutty, Evan, Monfreda, & Foley, 2008). Sprawling agricultural lands imply the replacement of deeper-rooted trees by shallower-rooted crops (Fan, Miguez-Macho, Jobbágy, Jackson, & Otero-Casal, 2017;Jackson et al., 1996) and tile drains and pipes in the shallow subsurface (Blann, Anderson, Sands, & Vondracek, 2009;King et al., 2015). Irrigation and flow regulation can also have unexpected impacts on local flow paths and global water footprint (Jaramillo & Destouni, 2015). These modifications may reduce vertical connectivity and recharge to aquifers, resulting in shallower flow pathways and rapid shunting of water to streams Smith et al., 2018). The consequences of this "short-circuiting" of subsurface structures remain poorly characterized and understood ( Figure 2). For example, structural alteration modifies the delivery of reactive solutes such as nitrate and dissolved oxygen and therefore changes redox conditions. Shallow soils, with the presence of O 2 , typically have limited denitrification (conversion of nitrate to gases such as N 2 ), whereas denitrification can occur faster and to a larger extent in deep subsurface zones with limited O 2 (Kolbe et al., 2019a;Tesoriero, Terziotti, & Abrams, 2015). Thus, one may infer that short-circuiting flow paths via shallower soils may reduce the extent of nitrate removal via denitrification and contribute to eutrophication problems. Urban watersheds, on the other hand, represent a different type of human perturbation. They are often characterized by impervious surfaces that facilitate surface runoff and sewer pipes F I G U R E 1 A conceptual diagram of a headwater catchment as a natural integrator of hydrological flow paths and biogeochemical processes. The flow paths depend on external hydroclimatic forcings and internal catchment structures above ground (e.g., vegetation, topography) and below ground (e.g., distribution of porosity, permeability). Reactive materials are illustrated here using organic carbon (OC, red) that decreases in abundance with depth and dissolving minerals that increase with depth. These physical and chemical structures also dictate the contact time of water with reactive materials, regulating both rates of biogeochemical transformations and concentrations of reaction products (e.g., DOC, Ca, Si) that enter the stream that enhance rapid subsurface flow (Grimm et al., 2008). These unique, rapid flow characteristics can similarly channel nutrients and carbon quickly to urban streams (Grimm et al., 2008;Kaye, Groffman, Grimm, Baker, & Pouyat, 2006).

Example 2: Deeper roots developed in warming climates may deepen flow paths
While land-use changes may reduce vertical connectivity and short-circuit water and solute fluxes, changing climate can have multi-faceted, possibly diverging effects on hydrological and biogeochemical processes. Warming-related disturbance, including fire, insects, and disease, could change root density and rooting depth. For example, fires may increase or decrease root biomass of tall grasses, depending on fire frequency (Johnson & Matchett, 2001). Woody encroachment into grasslands and salt marshes has become a global phenomenon as a result of changing climate (Saintilan, Wilson, Rogers, Rajkaran, & Krauss, 2014;Schreiner-McGraw et al., 2020;Veach, Dodds, & Skibbe, 2014). Because roots of woody plants penetrate deeper than those of grasses, they can generate deeper macropores (Beven & Germann, 2013). Similarly, prolonged and severe droughts in a warming climate may trigger the deepening of roots that tap groundwater at depth (Brunner et al., 2015;Canadell et al., 1996;Fan et al., 2017;Jackson et al., 1996), potentially enhancing recharge to the deeper groundwater (Barnhart et al., 2016;Berghuijs, Woods, & Hrachowitz, 2014;Carroll, Deems, Niswonger, Schumer, & Williams, 2019;Godsey, Kirchner, & Tague, 2014;Hartmann, Gleeson, Wada, & Wagener, 2017;Van Loon et al., 2016). In addition, this enhanced groundwater recharge, enriched in DOC and CO 2 (g) from the shallow subsurface, can promote rock weathering by introducing acidity to deeper, more reactive parent rocks (Andrews & Schlesinger, 2001;Mottershead, Baily, Collier, & Inkpen, 2003;Wen, Sullivan, Macpherson, & Li, 2020). Deeper penetration of nitrogen can also stimulate mineral oxidation at depths that are often O 2 -depleted and nutrient-limited (Martin, 2017;Perrin, Probst, & Probst, 2008). At the Konza tallgrass prairie, Kansas (USA), woody encroachment has persisted for decades (Veach F I G U R E 2 Deepening and shallowing of roots and flow paths may develop under changing climate and human domination in the Anthropocene. (a) Deepening flow paths may develop during droughts and woody encroachments. They can enhance the deeper penetration of water and reactive solutes (e.g., O 2 , CO 2 , NO 3 ), accelerate rock weathering at depth, and produce cations and dissolved inorganic carbon (DIC). (b) Shallowing flow paths can expand as a result of impervious surfaces and pipes in urban areas and shallow-rooted crops and engineered drainage in agriculture lands. These structures can short-circuit water fluxes from soils to streams, minimizing interactions with reactive minerals at depth. These changes may alter water cycles, gas fluxes (CO 2 , N 2 ) and solutes, ultimately modulating climate-water-human interactions and feedbacks et al., 2014), possibly explaining decreasing discharge (Nippert & Knapp, 2007), increasing groundwater CO 2 concentrations (Macpherson et al., 2008;Tsypin & Macpherson, 2012), changing N cycles (Kemp & Dodds, 2002), and more dynamic concentration-discharge relationship at the woody-encroached sites.

| Why do we need hydro-biogeochemistry theories?
These coupled processes dictate not only water storage and release, but also fluxes of reactive gases such as CO 2 and methane back into the atmosphere, ultimately providing feedbacks to climate. At longer time scales, changes in rooting depth and subsurface structure can reduce or amplify rates of channel incision, alter the depths of groundwater tables, and dictate the evolution of catchment geomorphology Harman & Cosans, 2019;Rempe & Dietrich, 2014;Riebe, Hahm, & Brantley, 2017;. There is mounting evidence that these changes, once thought to occur at the time scales of centuries to millennia, are now occurring at much shorter time scales (months, years, or decades) (Caplan et al., 2019;Hirmas et al., 2018;Robinson et al., 2019). This is particularly the case for carbonate and shale, which underlie about 20 and 25% of Earth's land area, respectively (Martin, 2017;Suchet, Probst, & Ludwig, 2003) and are highly responsive to changes in environmental conditions (Beaulieu, Godderis, Donnadieu, Labat, & Roelandt, 2012;Gaillardet, Calmels, Romero-Mujalli, Zakharova, & Hartmann, 2019;Sullivan et al., 2019).
These physical flow and biogeochemical transformation processes are naturally coupled in catchments and in natural systems generally. Our understanding of these systems, however, often focuses on particular processes of interest to individual disciplines. For example, the catchment hydrology community has made tremendous progress in developing transit time theory. In parallel, the soil biogeochemistry and geochemistry fields have advanced reaction kinetic theories substantially (Brantley, 2008;Davidson & Janssens, 2006;Lasaga, 1984;Sierra et al., 2015), mostly in well-controlled, small-scale batch, or column systems or plot-scale warming experiments. These traditional reaction kinetic theories, with simplified flow systems, however have shown limitations in describing observations in natural systems Maher, Steefel, DePaolo, & Viani, 2006;White & Brantley, 2003), as will be discussed in more details later. This divergence between traditional theories and observations in natural systems highlights the limitations in our understanding of how hydrology interacts with biogeochemistry, and underscores the need for integrated theories of biogeochemical reaction kinetics that reflect hydrological controls at scales beyond well-mixed reactors and wellcontrolled experiments. Yet currently such integrated theories do not yet exist. As a result, we generally cannot answer questions about earth surface system responses to climate and human forcings. For example, how does subsurface structure regulate hydrological and biogeochemical processes under different climate, geology, vegetation, and land use conditions? And how do unique catchment structures dictate their responses to changing perturbations? This paper advocates for developing hydro-biogeochemistry theories that can offer an integrated, mechanistic view of hydrological and biogeochemical processes at the catchment scale. Full appreciation of the breath and depth of changing earth surface systems requires broad integration across many earth and environmental science disciplines including climatology, ecology, geomorphology, geology, hydrology, and biogeochemistry, among others. Delving into this wide range of disciplines is well beyond the scope of the paper. We argue however, water movement and biogeochemical transformation are generally at the core of catchment and CZ functioning. We briefly review two existing, closely related theories, discuss emerging concepts toward integration, identify gaps in data collection and model development, and propose a road map toward integrating hydro-biogeochemical theory.
2 | EXISTING THEORIES 2.1 | The transit time theory in hydrology Catchment hydrology has revolved around a few key concepts and theories, including hydrological connectivity, water storage and discharge relationships, and transit time theory (Kirchner, 2009;McGuire & McDonnell, 2006;Wohl et al., 2019). These theories have been carefully studied and are closely related (Ali & Roy, 2009;; van Meerveld, Kirchner, Vis, Assendelft, & Seibert, 2019;Western, Blöschl, & Grayson, 2001). Among these, transit time theory is the most relevant to understanding biogeochemical transformations. It has been well known that rates of biogeochemical transformations depend on the contact time between water and reactive materials, although the form of the dependence is often not clear. Transit time theories aim to answer the questions "How old is stream water?" and "How old is groundwater?" Catchment hydrologists originally attempted to answer these questions using analyses of discharge data (hydrographs) from stream gauging stations, some of which date back to the 1800s. The unit hydrograph method (Linsley, Kohler, & Paulhus, 1958;Sherman, 1932) estimates the magnitude, timing, and duration of stormflow assuming a linear and time-invariant runoff response, whereas recession curve analysis reveals nonlinear relationships between discharge and storage (Karlsen et al., 2019;Kirchner, 2009;Nathan & McMahon, 1990;Wittenberg, 1999). In recent decades, however, stable water isotopes ( 2 H and 18 O) and other passive tracers have revealed that stormflow originates primarily from "old water" stored in the subsurface, instead of "new" event water from recent precipitation (Kirchner, 2003;McDonnell, 1990;Rodhe, 1987;Rodhe, Nyberg, & Bishop, 1996). Streamflow responds promptly to precipitation because the propagation of pressure signals (celerity) is substantially faster than the transport of actual water molecules (velocity) (Beven, 1981;Beven, 2004;Kirchner, Feng, & Neal, 2000;McDonnell & Beven, 2014).
Transit time is one key metric of catchment hydrology that has emerged from this body of research. Transit time (also travel time) is defined as the difference between the time a water parcel enters a catchment by precipitation and the time it exits, either at the catchment outlet or as water vapor via evapotranspiration (McGuire et al., 2005;McGuire & McDonnell, 2006). The understanding of transit time has evolved from a time-invariant concept assuming temporal stationarity to the recent development of time-variant transit time distributions and StorAge Selection functions, recognizing the importance of temporally variable storage and fluxes (Botter, Bertuzzo, & Rinaldo, 2011;Harman, 2015;Remondi, Kirchner, Burlando, & Fatichi, 2018;Rinaldo et al., 2015;van der Velde, Torfs, Zee, & Uijlenhoet, 2012). The need to summarize a distribution of transit times into simple statistics can facilitate intercatchment comparison through their mean transit times, estimated from time series of stable water isotopes or other passive tracers (e.g., Cl) in precipitation and streamflow. However, determining the mean of a skewed distribution can be problematic and misleading, as the mean is strongly influenced by the tail of the distribution and cannot be reliably constrained by the available travel-time tracers (Seeger & Weiler, 2014;Stewart, Morgenstern, & McDonnell, 2010). Recently, the young water fraction, that is, the fraction of streamflow younger than 2-3 months, has been shown to be less affected by aggregation errors and temporal nonstationarity (Jasechko, 2016;Kirchner, 2016a;Kirchner, 2016b). Its relatively modest data requirements make it potentially useful for intercatchment comparisons. Even more recently, new water fractions (Kirchner, 2019)-as distinct from young water fractions-have emerged as a tool to quantify the fraction of water moving through the catchment on time scales of hours, days, or weeks. In other words, we can use transit time distributions, young water fractions, and new water fractions to distinguish the hydrologic functioning of different catchments and their response to external forcings across diverse climates, topographic settings, and soil conditions (Birkel, Soulsby, & Tetzlaff, 2011;Godsey et al., 2010;Harman, 2015;Knapp, Neal, Schlumpf, Neal, & Kirchner, 2019;McGuire et al., 2005;Rinaldo et al., 2015;Rodhe et al., 1996;Soulsby, Tetzlaff, Rodgers, Dunn, & Waldron, 2006;Tetzlaff et al., 2009;Tromp-van Meerveld & McDonnell, 2006a; Tromp- van Meerveld & McDonnell, 2006b). The water-isotope based approaches, however, face many challenges . Among these, one relevant to biogeochemical processes is that they do not resolve spatial heterogeneity and therefore do not quantify the contact time of water with reactive materials, a key determinant of biogeochemical transformations.

| Biogeochemical reaction kinetic theories
Kinetic theories of water-rock interaction delineate the dependence of rock weathering (mineral dissolution and precipitation) rates on intrinsic rock properties such as reactivity and surface area, and environmental conditions including water chemistry and its distance from thermodynamic equilibrium. The most commonly used reaction kinetic theories are those built upon the foundation of the transition state theory (Brantley, 2008;Helgeson, 1968;Helgeson, Murphy, & Aagaard, 1984;Lasaga, 1984;Lasaga, 1998). These theories have guided the development of reaction rate laws used for the prediction of mineral dissolution and precipitation rates. The parameters of these rate laws are typically inferred from time series of measured solute chemistry in well-mixed laboratory systems under controlled geochemical conditions (Berner, 1978;Blum & Stillings, 1995a;Plummer, Parkhurst, & Wigley, 1979). Weathering rates estimated from these rate laws, however, are often 2-5 orders of magnitude higher than those inferred from observed weathering profiles that record the distributions of elemental concentrations over depth (Blum & Stillings, 1995b;Drever & Clow, 1995;Navarre-Sitchler & Brantley, 2007;Richards & Kump, 2003;Sverdrup, Warfvinge, Blake, & Goulding, 1995;White, 1995;White & Brantley, 1995;White & Brantley, 2003). For reactive transport models (RTMs) to reproduce the reaction fronts observed in weathering profiles, the effective surface area of reacting minerals often has to be set orders of magnitude smaller than the measured bulk surface area (Heidari, Li, Jin, Williams, & Brantley, 2017;Moore, Lichtner, White, & Brantley, 2012).
The kinetic theory of microbe-mediated redox reactions, including reactions relevant to soil respiration, the decomposition of organic matter, and denitrification, is rooted in enzyme kinetic theory. It follows the form of Monod or Michaelis-Menten equations (Michaelis & Menten, 1913;Monod, 1949) that take into account concentrations of electron donors such as organic carbon (OC) and electron acceptors such as oxygen and nitrate. The biogeochemical redox ladder describes the sequential order of electron acceptors being used (Van Cappellen & Gaillard, 1996). Parameters in the rate law, including half-saturation constants that quantify the affinity of an enzyme to a particular OC compound, are often indicators of its molecular structure-its degradability and the height of the reaction activation energy barrier (Davidson & Janssens, 2006). Several complications arise when these rates are used in natural soil environments. One is that OC in soils and shallow organic-rich layers is typically a mixture of thousands of different molecules, making it challenging to quantify their reaction rates. Second, although the temperature dependence of reaction rates can, in theory, be described by the Arrhenius equation, the temperature sensitivities of different OC molecules differ such that their collective dynamics are nonlinear and remain elusive.
Discrepancies also arise from fundamental differences between laboratory systems and natural systems. In wellmixed laboratory systems, heterogeneities and transport limitations are reduced as much as possible. In natural systems, physical and chemical heterogeneities are omnipresent (Erlandsson et al., 2016;Bao et al., 2014;Zhu, 2009). Water finds its way via paths of least resistance (or highest conductivity); and reactive materials occur heterogeneously. Consequently, the well-mixed assumption used to derive traditional rate laws cannot be justified beyond local scales. Also, precipitation is intermittent and temperatures vary across seasons. Variables such as temperature and soil moisture have long been shown to influence reaction rates in natural systems (Crowther et al., 2016;Bond-Lamberty & Thomson, 2010;Trumbore & Czimczik, 2008;Manzoni & Porporato, 2009;Yan et al., 2018a;. These divergences between traditional theories from controlled, small-scale systems and field observations highlight the limitations in our understanding of how hydrology interacts with biogeochemistry, and underscore the need for an integrated theory of biogeochemical reaction rates that reflects hydrological controls in spatially heterogeneous, natural systems at scales beyond well-mixed reactors.

| Connecting reactive and exposure times with biogeochemical reactions
Transit time distributions and young water fractions allude to key linkages between physical flow, biogeochemical reactions, and solute export (Hrachowitz et al., 2016). However, they represent the time water spends in a catchment irrespective of whether and how long it is in contact with reactive materials. Quantification of biogeochemical transformations at the catchment scale will require measures for contact times with reactive materials. Concepts and quantifications have emerged in the hydrogeology and reactive transport literature that connect transit time measures to biogeochemical reactions.
In a homogeneous subsurface system where permeability and reactive minerals are uniformly distributed, the relevant time scale for advection is the mean residence time, that is, the average time that water spends in a system. This mean residence time is similar to the mean transit time discussed earlier. The time scale of diffusion or dispersion, whichever is dominant, is the time scale needed for these processes to homogenize a system. Although dispersion is often the dominant process, diffusive time scales can become important at small spatial scales and may determine the rates of interphase mass transfer. Defining the characteristic time scales of reactions is tricky and depends on the type of reactions. Only for first-order decay is there a clear concentration-independent time scale of reaction. In single-substrate Michaelis-Menten kinetics, the characteristic time of a reaction depends on the concentrations of the substrates. In bioreactive systems, we may have to separate the time scale of the microbe-mediated reactions from the time scales of biomass growth and decay. In denitrification, some authors attribute the reactive time scale to the time for the depletion of nitrate or OC, which in itself depends on several interacting processes (Ocampo, Oldham, & Sivapalan, 2006;Pinay et al., 2015;Zarnetske, Haggerty, Wondzell, & Baker, 2011). In mineral dissolution, the characteristic time is typically set as the time to reach equilibrium (Maher, 2011;Salehikhoo & Li, 2015), which is itself complicated because the equilibrium concentration of a particular solute depends on water chemistry such as pH.
Natural systems are ubiquitously heterogeneous, which further complicates both transport and reaction processes. In heterogeneous groundwater systems, the mean age of water in storage will typically be much older than the mean age of water leaving the system (Berghuijs & Kirchner, 2017), for the simple reason that the highest-conductivity parts of the subsurface have the youngest waters and will supply the largest fraction of streamflow (because they flow fastest), and the lowest-conductivity parts of the subsurface will supply a smaller fraction of streamflow and will have the oldest waters (because they flow slowest). This heterogeneity in water ages and fluxes has important implications for chemical reaction rates. As an example, Figure 3 shows a system where fast-dissolving carbonate (magnesite, red) is embedded in slow-dissolving, high-permeability sandstone (quartz, blue) (Wen & Li, 2017;. Only a small fraction of water flows through the reactive zones. Because of the low permeability of the reactive zone, it is the fraction of old water with long residence (travel) times that is in contact with dissolving carbonate minerals and determines the domain-scale dissolution rates. With large carbonate clusters, flowing water rapidly channels through the nonreactive, high-permeability zones, leaving only a small fraction of old water in contact with carbonate. Dissolution rates in heterogeneous systems, therefore, depend not only on intrinsic mineral properties, but also on the spatial patterns of permeability in soils and rocks that regulate water partitioning between reactive versus nonreactive zones, relative to the spatial patterns of reactive minerals that determine reactive interfaces. Because reaction products have to diffuse out of low-permeability reactive carbonate, the diffusion process can become rate-limiting such that the domain-scale reaction rates largely hinge on the rate of diffusion, instead of the reaction kinetics themselves. System-scale dissolution rates in cases with different spatial patterns can be encapsulated into an upscaled rate law equation that quantifies the time to reach carbonate dissolution equilibrium, and the relative magnitude of reactive versus mean transit times is dictated by the spatial organization of carbonate. Such spatial organization can reduce system-scale dissolution rates by up to two orders of magnitude compared to corresponding homogeneous cases for fast reactions such as carbonate dissolution , but by less than 1% for relatively slow reactions such as silicate dissolution (Jung & Navarre-Sitchler,-2018a;Jung & Navarre-Sitchler, 2018b).
With tightly coupled reaction and transport processes, domain-scale reaction rates therefore depend on the relative magnitude of transport and reaction time scales. The Damköhler number, the ratio of the characteristic time scales of transport (advection, dispersion, and diffusion) versus those of reactions, is commonly used to quantify coupled reaction/transport processes (Maher, 2011;Salehikhoo & Li, 2015). The Damköhler number, however, has mostly been used in homogeneous systems. A system-wide Damköhler number cannot take into account the effects of spatial heterogeneity and the fact that the mean transit time differs from the water contact time with reacting materials. To take account F I G U R E 3 (a) Simulation output illustrating the relevance of spatial heterogeneity to physical flow and carbonate dissolution. Water flows through a cross-section of a heterogeneous rock with unevenly distributed, low-permeability carbonate (magnesite) within a highpermeability sandstone quartz domain. The average cluster length of carbonate here is 2 cm. The simulation is at an average flow velocity of 2.7 m/day. The subpanels in (a) illustrate (from left to right) spatial distributions of permeability (κ), magnesite volume percentage, steadystate flow velocity (v), Mg as a product of magnesite dissolution, and local dissolution rates in individual grid blocks (r MgCO3 ). (b) the probability distribution of local residence time in individual grids:τ a,nr,i for nonreactive sandstone grids (blue),τ a,r,i for reactive carbonate grids (red), and combined overall distributionτ a,i. Reprinted with permission from Wen & Li, 2017 of spatial heterogeneity, reactive transit times are needed to quantify water contact time with reactive materials. This measure has emerged in the literature as exposure time (Glassley, Simmons, & Kercher, 2002;Ocampo et al., 2006). The exposure time has been quantified as the travel-time integral of a water particle while it experiences interactions with reactive materials. In the variant of "cumulative relative reactivity," the binary distinction between nonreactive and reactive materials has been replaced by a nonnegative scalar expressing the local reactivity of the aquifer material relative to a reference reactivity (Loschko, Wöhling, Rudolph, & Cirpka, 2016). The exposure time can reproduce patterns of dissolved oxygen and nitrate in physically and chemically heterogeneous aquifers and predict decreases in denitrification potentials as OC becomes depleted over time (Loschko, Wöhling, Rudolph, & Cirpka, 2018). Computationally, the exposure time can replace spatial coordinates and reduce the dimensionality of reactive transport simulations in heterogeneous systems (Ginn, 1999;Loschko et al., 2018;Seeboonruang & Ginn, 2006). The ratio of the reaction time (to reach equilibrium or substrate depletion) versus exposure time (or cumulative reactive transit time) is equivalent to a Damköhler number in heterogeneous systems (Ocampo et al., 2006;Oldham, Farrow, & Peiffer, 2013).
Exposure times (or reactive transit times) and Damköhler numbers have been examined mostly in groundwater systems where the distributions of flows have often been assumed to be at steady state. This general principle can be useful in dynamic, nonstationary catchment systems, except that these measures will need to be expanded to incorporate the time-dependent water dynamics. A potential framework to address nonstationary reactive transit time distributions would be to extend the StorAgeSelection theory to express the dynamic selection of both transit and exposure (reactive) times (Botter et al., 2011;Botter, Bertuzzo, & Rinaldo, 2010;Harman, 2015;Rinaldo et al., 2015). This can be challenging because it implies quantifying not only flow paths, but also spatial patterns of reactive materials, or their characteristics. At the catchment scale, the reactive transit time can span orders of magnitude. The mean transit time of water in a catchment may be a year but the reactive time with OC may be only 10 days because OC mostly resides in highly permeable, well-connected shallow zones where the transit time is relatively short (Figure 1). Conversely, the reactive time with weathering rocks might be 10 1 -10 4 years because unweathered parent rock resides in deeper, low permeability zones with slow flow. For carbon decomposition that occurs largely in organic-rich, shallow, and permeable topsoils, the catchment-scale rates may hinge particularly on the younger end of the transit time distribution (Barnes, Butman, Wilson, & Raymond, 2018;Kirchner, 2016a;Kirchner, 2019). In contrast, catchment-scale rates of weathering and denitrification may depend more on the older water and long tail of the transit time distribution. Because means of transit time distributions often lean toward the older time, they can be useful predictors of weathering rates and geogenic solutes.
Catchment-scale reaction rates therefore hinge not only on mean transit time, but also on the distribution of reactive transit time. For example, concentrations of base actions and silica in groundwater and spring waters have been shown to correlate positively with groundwater age dated by CFC-11, CFC-12, and 3 H/ 3 He ratio (Burns et al., 2003;Rademacher, Clark, Hudson, Erman, & Erman, 2001) and 14 C (Frisbee et al., 2013), indicating that weathering in aquifers may not be at equilibrium as previously thought (Maher, 2011). Concentrations of base cations and weathering rates have also been shown to correlate positively with water transit times that mostly reflect local climate variables rather than landscape characteristics (Zapata-Rios et al., 2015). In any case, distributions of reactive transit times are probably more revealing than a single number, such as the mean transit time for an entire catchment. However, mean transit times, and other single quantities at the catchment scale, are often convenient to use in comparisons across catchments.

| The shallow versus deep flow paths
Existing work at the catchment scale has alluded to connections between flow paths and biogeochemical transformations. The physical properties of the subsurface often vary substantially with depth. Shallow soils have orders of magnitude higher permeability compared to deeper, unweathered bedrock (Welch & Allen, 2014). This implies that the majority of infiltrated water flows through the uppermost layers and has a relatively young age, that is, a few months (Berghuijs & Kirchner, 2017;Jasechko, Kirchner, Welker, & McDonnell, 2016). In contrast, deeper aquifers store much older water with ages from years to millennia. Chemically, shallow soils are weathered materials that are often depleted in geogenic solutes such as Ca and Mg Jin et al., 2010;McIntosh et al., 2017) but are enriched with organic materials consisting of tree roots, leaves, and microbes (Barnes et al., 2018;Billings et al., 2018;Herndon et al., 2015;Jobbágy & Jackson, 2000). Parent rocks reside in the deeper subsurface with negligible presence of organic material but abundant geogenic solutes. These subsurface structures develop over geological time scales and are an emergent outcome of interactions between climate, bedrock lithology, and vegetation conditions Riebe et al., 2017).
These physical and chemical contrasts can impart geochemical signatures on shallow versus deep water flow paths, ultimately manifesting themselves in stream chemistry. As an example, in the Svartberget catchment in northern Sweden, isotope hydrograph separation indicates the dominance of older "pre-event" water in streamflow as rainfall events increase flow by orders of magnitude (Bishop, Grip, & O'Neill, 1990;Rodhe, 1987). Different solutes respond to this flow increase distinctly, with Ca decreasing by more than a factor of two and dissolved organic carbon (DOC) increasing by more than a factor of three, whereas solutes such as Cl remain relatively constant (Hruška, Laudon, Johnson, Köhler, & Bishop, 2001). The depth profiles of reactive solutes indicate significant variations in soil water compositions and flow pathways ( Figure 4). As the water table rises into shallow soils at high flow and progressively activates more superficial flow paths, the streamflow reflects more of the low-Ca and high-DOC water that is the signature of the newly "connected" near-surface soils. The concentration of Cl varies little over depth (not shown) and therefore remains relatively constant (Bishop, Seibert, Köhler, & Laudon, 2004). An analytical solution that integrates vertical distributions of solute concentrations and lateral fluxes can explain the variable stream chemistry under these diverse flow conditions ). In the Shale Hills Critical Zone Observatory (SSHCZO) in Pennsylvania (USA), shallow soil water is characterized by low Dissolved Inorganic Carbon (DIC) from clay weathering and low pH from high soil respiration (Jin et al., 2014), and the groundwater chemistry is characterized by high DIC and divalent cations arising from contact with carbonate (Herndon, Steinhoefel, Dere, & Sullivan, 2018;Sullivan, Hynek, et al., 2016).
At the catchment scale, these distinct subsurface physical and geochemical structures can translate into different patterns of concentration-discharge (C-Q) relationships (Armfield et al., 2019;Radke et al., 2019). Multiple forms of C-Q relationships exist (Moatar, Abbott, Minaudo, Curie, & Pinay, 2017;Musolff, Schmidt, Selle, & Fleckenstein, 2015), with the widely used one being the power-law relationship C = aQ b . The exponent b is the slope of a log-log plot of concentration versus discharge, reflecting whether solute concentrations increase, decrease, or remain mostly constant as flow increases. In the Svartberget example, DOC would exhibit C-Q curves with b > 0, the so-called flushing behavior, Ca would have b < 0 showing dilution patterns, and Cl would have a near-zero b value, exhibiting chemostatic behavior. These are, in fact, the patterns that are observed in Coal Creek, a high-elevation, snow-dominated catchment in Colorado, USA (Zhi et al., 2019).
Stream water composition at different times often represents a mixture of two end-member source waters at different depths: shallow soil water enriched in biogenic solutes and deeper groundwater with abundant geogenic solutes. As a catchment becomes increasingly wet, the proportion of shallow water with high DOC and nutrient content typically increases. This may be why many biogenic solutes, such as DOC and nitrate, exhibit mostly flushing patterns (Botter, Li, Hartmann, Burlando, & Fatichi, 2020;Zarnetske, Bouda, Abbott, Saiers, & Raymond, 2018). Simulations exploring diverse conditions show that the power-law exponent b values are governed by the concentration contrasts between shallow versus deeper waters. A general relationship linking the dependence of b on shallow versus deeper concentration ratios can explain C-Q patterns for more than 13 solutes in three catchments under diverse climate, soil, and topography conditions ( fig. 11 in Zhi et al., 2019). The same relationship (with different parameters) can also explain nitrate export patterns in more than 200 watersheds under different land use conditions across USA . In a global water chemistry data analysis comparing the relative controls of C-Q patterns from above (climate) and below (depth of solute generation), the depth of solute generation has been found to be predominant (Botter et al., 2020). Recent literature has seen a surge in exploring quantitative, causal linkages between C-Q relationships, transit time, and reaction rates (Barnes et al., 2018;Basu et al., 2010;Benettin et al., 2015;Benettin, Fovet, & Li, 2020;Musolff, Fleckenstein, Rao, & Jawitz, 2017;van der Velde, de Rooij, Rozemeijer, van Geer, & Broers, 2010), although more in-depth studies will be needed to further develop a general, quantitative framework for these connections.

| A ROAD MAP TOWARD INTEGRATED THEORIES
As illustrated in these emerging concepts, developing integrated theories at the intersections of hydrology and biogeochemistry requires data and models that can illuminate water's journey and travel time through the invisible subsurface, quantify its interactions with reactive materials, and make the connection between these transport and reaction processes. This section identifies gaps and directions in data collection, and advocates for exploring the full potential of process-based models and data analysis tools.

| Data scarcity in the era of "big data"
Observations underlie all scientific discoveries, yet data scarcity has been one of the biggest challenges facing cross-disciplinary integration for environmental science. The past decades have witnessed rapid advances in data collection technology and an unprecedented flood of earth surface data from satellite remote sensing (Hrachowitz et al., 2013;McCabe et al., 2017). Detailed, high-resolution data on earth surface characteristics have now become available at extraordinary speeds and quantities (Howden, Burt, Worrall, Whelan, & Bieroza, 2010). For many rivers in the USA, water quantity and quality data are available dating back to the 1950s; and Sweden has recently celebrated 50 years of coordinated national aquatic monitoring (Fölster, Johnson, Futter, & Wilander, 2014). Large collaborative research networks have been formed to collect consistent, comparable data sets across the world (Baatz et al., 2018;Weintraub et al., 2019). In this era of "big data," it may sound surprising to say that Earth system science is still data-limited. But, most of the available data are gathered for readily accessible locations in developed, affluent countries, many of which are in temperate climates. Even in affluent countries, there is an alarming trend of discontinuing gauging stations with over 30 years of continuous monitoring (see USGS water quality website) (Laudon et al., 2017;Lorenz & Kunstmann, 2012), and most of the world's streams have never been gauged (Hrachowitz et al., 2013). To move toward hydro-biogeochemical integration, data need to be collected beyond traditional measurements within individual disciplines, to enable multi-faceted views of water's flow paths, residence times, and magnitude of biogeochemical transformation. Intensive, long-term measurements will need to be balanced against extensive spatial coverage, as we cannot measure everything everywhere all the time (Kirchner & Neal, 2011; Figure 5).

| Using reactive solutes to reveal invisible subsurface flow paths
Concentrations and ratios of reactive solutes can provide "fingerprints" of waters' pathways and interactions with their surroundings, in particular when different flow paths have distinct chemical signatures. Water chemistry is often used to reveal the rock lithology that rivers and streams are derived from (Gaillardet, Dupré, Louvat, & Allègre, 1999;Ross, Nippgen, Hassett, McGlynn, & Bernhardt, 2018), and water isotopes can give important clues to the seasonal origins of the waters themselves Jasechko, 2019;Sprenger et al., 2019). Rock types such as carbonates, silicates, and evaporites have characteristic ratios of HCO 3 /Na and Ca/Na (Figure 2b). Waters derived from carbonates are high in both ratios, whereas waters from evaporites are low in both ratios due to their high Na concentrations. The proximity of water chemistry to these end members indicates the lithological origin and mixing ratios. Such combined use of reactive solutes and water isotopes originated half a century ago in efforts to quantify groundwater contributions to river runoff (Pinder & Jones, 1969; Figure 6).
As discussed earlier, water chemistry in shallow, permeable soils is often distinct from the chemistry in deeper, unweathered rocks. This unique depth-dependent characteristic often determines the compositions of end-member source waters and can translate into C-Q relationships that reflect these characteristics. The power-law exponent b in C-Q relationships has been used to infer the chemical contrasts in shallow and deeper zones (Ameli et al., 2017;Zhi et al., 2019;. These examples echo early studies of C-Q relationships using end-member mixing analysis (Johnson, Likens, Bormann, Fisher, & Pierce, 1969;Langbein & Dawdy, 1964). They also underscore the significance of the much less observable vertical connectivity of reactive materials compared to the visible river network and ecohydrological structure expressed in earth surface data (Krause et al., 2017;Rinaldo, Rodriguez-Iturbe, Rigon, Ijjasz-Vasquez, & Bras, 1993;Widder et al., 2014). These C-Q relationships can be used together with water isotopes and tracers, as well as groundwater flux methods (Kalbus, Reinstorf, & Schirmer, 2006;Moeck et al., 2017) to infer flow paths and exposure time. For C-Q relationships that show relatively monotonic increasing or decreasing trends, stream concentrations at high flow and low flow can be used to infer corresponding soil water and groundwater compositions, and a further link to transit time (Kirchner, 2016b;Neal et al., 1990;Zhi et al., 2019).
It should be noted that C-Q relationships are not always monotonic and do not always reflect shallow versus deep end members. Hysteresis loops are commonly observed, potentially indicating the presence of more than two end-members, or heterogeneities in subsurface physical and chemical structure. For example, roots, rock fragments and fractures often generate preferential flow paths that depart from the general pattern of decreasing permeability (and flow) over depth (Beven, 2018;Germann, 1990). Karst landscapes are known for developing high-flow conduits at depth (Hartmann, Goldscheider, Wagener, Lange, & Weiler, 2014;Husic et al., 2019). Chemically, the abundance of reactive materials in specific locations that are connected to streams at different times can also determine the size of the hysteresis loop (Evans & Davies, 1998). The recent advent of high-frequency (15 min), automated stream chemistry F I G U R E 5 A road map toward integrated hydro-biogeochemical theories. Transit time theory and biogeochemical reaction theory have been developed in their respective disciplines. Developing hydro-biogeochemical theories will require integration of these existing theories; it will also require expanding measurements beyond stream channels and into the subsurface, and cleverly using process-based and datadriven model tools, all within the framework of hypothesis testing measurements can advance our understanding of C-Q relationships and spatial heterogeneity. As water tables rise and fall connecting pore water from different depths and locations to streams, high-frequency stream chemistry data potentially reflect highly spatially resolved soil water chemistry (Burns et al., 2019;Duncan, Band, & Groffman, 2017;Riml, Campeau, Bishop, & Wallin, 2019). High-frequency C-Q patterns, therefore, may reflect distinct wetness conditions and flow paths that connect streams to various parts of the subsurface with differentiated chemistry (Knapp, von Freyberg, Studer, Kiewiet, & Kirchner, 2020). This may explain why C-Q patterns of high-frequency data for different events diverge and why patterns in low and high frequency data differ (Burns et al., 2019); they often have distinct antecedent wetness conditions such that connected water flow paths differ in each event. High-frequency measurements, therefore, can potentially be used for space-time substitution to infer the high-resolution distribution of reactive materials over depths and across different parts of catchments. The challenge is that spatial distributions of reactive materials may be highly variable, such that the C-Q hysteresis loops are highly variable among storms of different sizes, or even among similar-sized storms but with varying antecedent conditions.

| Moving beyond stable water isotopes
One of the limitations of stable water isotopes as estimators of residence time for subsurface water is that they cannot quantify waters older than a few months to a couple of years. This has been increasingly recognized in the hydrology and hydrogeology communities (Jasechko, 2019;Sprenger et al., 2019;Staudinger et al., 2019). The hydrogeology community has used a wide spectrum of tracers for older groundwater dating (Figure 7), including radioactive tritium ( 3 H, with a half-life time t 1/2 of 12.3 yr), radioactive krypton ( 85 Kr t 1/2 10.8 yr) produced during reprocessing of nuclear rods, CFCs and SF6 (Abbott et al., 2016;Bethke & Johnson, 2008;Lu et al., 2014). The use of CFC's was able to discern a sharp and unexpected discontinuity between sub-annual and decadal water residence times at the boundary between transient and permanent saturation in the soil profile (Kolbe, Marçais, de Dreuzy, Labasque, & Bishop, 2020). Cosmogenic radionuclides with even longer half-lives have also been used, including 39 Ar (t 1/2 = 239 yr), 14 C (t 1/2 = 5,730 yr), 81 Kr, and 36 Cl (t 1/2 = 229,000 and 301,300 yr, respectively) (Bauer, Fulda, & Schäfer, 2001;Jasechko, 2016;Jasechko et al., 2017;Lu et al., 2014;Visser et al., 2019). As an example, 14 C is generated by neutron bombardment of nitrogen in the upper atmosphere (Geyh, 2000), dissolves in atmospheric water droplets and decays over time, and thus can act as an atomic clock as it travels with water. Concentrations of 14 C in groundwater can be compared to historical atmospheric 14 C abundance to quantify transit times. The use of these radioactive tracers is often challenged by confounding factors such as nonunique anthropogenic and geologic sources, variation in background concentrations, mixing in the atmosphere, and complex equilibrium dynamics in the unsaturated zone (Lu et al., 2014). Various examples however have illustrated their utility in quantifying the age of groundwater (Jasechko, 2019) and in differentiating solutes from different origins (Campeau et al., 2019). F I G U R E 6 Mixing lines in water isotopes and reactive solutes. (a) dual-isotope plot with data from Brooks et al. (2010) contrasting the isotopic compositions of groundwater and xylem water. Groundwater follows the meteoric line while xylem water reflects soil water that has been fractionated by evaporation and therefore deviates from the meteoric water line. (b) Mixing diagrams using Na-normalized molar ratios in the 60 largest rivers on earth (Gaillardet et al., 1999). Open and filled circles are for rivers with TDS (total dissolved solids) >500 mg/L and TDS <500 mg/L, respectively. River chemistry indicates the lithology of the river's source lithology. The patterned inserts represent chemical compositions of silicates, carbonates, and evaporites as end member lithologies (reprinted with permission from Elsevier) Isotopes of reactive constituents have also been commonly used to infer sources and fluxes from distinct reaction paths and sources (Abbott et al., 2016;Bacchus & Barile, 2005;Bolton, Lasaga, & Rye, 1999;Campeau et al., 2017;Campeau et al., 2019;Divers, Elliott, & Bain, 2014;Hubbard et al., 2014;Sebestyen, Shanley, Boyer, Kendall, & Doctor, 2014;Wexler, Hiscock, & Dennis, 2011). For example, in forested catchments, δ 18 O of atmospheric nitrate often varies between +60 and +100‰, compared to the 10 to +15‰ range of δ 18 O in nitrate from coupled mineralization and nitrification during organic matter processing in the shallow subsurface (Kendall, Shanley, & McDonnell, 1999;Kendall, Elliott, & Wankel, 2007). Recently developed microbial denitrifier methods for δ 15 N (Sigman et al., 2001) and δ 18 O (Casciotti, Sigman, Hastings, Bohlke, & Hilkert, 2002) have made rapid analyses of large numbers of samples feasible. DOC composition has also been used to identify distinct flow paths: increased humic acid fractions of DOC during stormflow reflect contributions from surficial soils (Cronan & Aiken, 1985;Hood, Williams, & McKnight, 2005). The δ 13 C values of DOC (δ 13 C-DOC) (St-Jean, 2003) can be used to distinguish between fresh DOC from leached organic matter in surficial soils and DOC from microbial degradation along groundwater flow paths (Schiff, Aravena, Trumbore, & Dillon, 1990). However, the range of mechanisms that can influence δ 13 C means that the identification of carbonate sources for DIC needs to be treated carefully, due to the risk that isotopic signals from one process will be confounded by the influence of other processes (Campeau et al., 2017). 4.3 | Where to measure? 4.3.1 | Beyond the river channel: characterizing the subsurface structure Current water quantity and quality data typically focus on surface-water bodies (streams, rivers, and lakes). Streamflow is often measured at a frequency of minutes. Water chemistry data, with the exception of dissolved oxygen, pH, and nitrate from in-stream sensors, are often measured much less frequently, typically weekly to bimonthly. Our understanding of catchment structure and functioning has focused primarily on "horizontal connectivity," or the visible geomorphological characteristics such as drainage networks (Rinaldo et al., 1993;Rinaldo, Rigon, Banavar, Maritan, & Rodriguez-Iturbe, 2014;Wohl et al., 2019), ecosystem landscape units, and their interfaces at the earth surface (Jencso, McGlynn, Gooseff, Bencala, & Wondzell, 2010;Krause et al., 2017;McClain et al., 2003). Subsurface structure, or "vertical connectivity," has received much less attention, partly because the subsurface is much less accessible and less visible. However, recent years have seen growing recognition of its significance in regulating water fluxes and biogeochemical processes Grant & Dietrich, 2017;Rempe & Dietrich, 2014 F I G U R E 7 Spatial and temporal scales represented by different tracers. While surface hydrologists have routinely used stable water isotopes ( 2 H and 18 O) and nonreactive tracers (Cl, Br) for transit time quantification, groundwater age tracers can be used to date water older than months to years to constrain the long tail of transit time distributions (in the black box). The figure is reprinted with permission from Sprenger et al. (2019) There is still a scarcity of subsurface characterization data, including measures of structure such as soil depths and spatial distributions of soil-regolith-bedrock boundaries, as well as properties including porosity, permeability, and soil and ground water chemistry. These characteristics are usually inferred from boreholes that are few in number and widely spaced. Most soil data focus on the top tens of centimeters, barely scratching the surface of the CZ (Richter & Billings, 2015). Geophysical measurements that nondestructively capture subsurface physical and geochemical information have been highly informative but not yet widely applied Parsekian, Singha, Minsley, Holbrook, & Slater, 2015;St. Clair et al., 2015), with additional challenges of quantitative interpretation. Attempts have also been made to infer subsurface data from transfer functions, although the reliability of the resulting inferred "data" is open to debate. The CZ Observatories in the USA, with their unique focus on looking below the surface and back in time, have pushed subsurface characterization and measurements in the past decade. We now have a much deeper understanding of weathering fronts , microbial activities at depth (Brewer et al., 2019), and how they drive short-term catchment functioning and long-term catchment evolution Fan et al., 2019;Holbrook et al., 2019;Rempe & Dietrich, 2014;Salve, Rempe, & Dietrich, 2012;Sullivan, Li, Goddéris, & Brantley, 2020). The spatial coverage of such studies is very limited, however. And we still know very little about roots and how roots at depth drive physical and biogeochemical processes.
Measurements outside of the channel are essential for understanding water release, solute export, and weathering and biogeochemical processes at the catchment scale. High-frequency sensors in soils and groundwater wells have begun to record the temporal variation of soil water and gases as products of biogeochemical formations (Hasenmueller et al., 2015;Macpherson et al., 2008;Tsypin & Macpherson, 2012). Understanding of surface water nitrate dynamics has benefited from high-frequency sensors (NO 3 and O 2 ), piezometers, and/or subsurface drains that intensively monitor soil water and subsurface flows (Duncan, Band, Groffman, & Bernhardt, 2015;Mellander et al., 2012;Outram, Cooper, Suennenberg, Hiscock, & Lovett, 2016;Rozemeijer, van der Velde, van Geer, Bierkens, & Broers, 2010). Sensorbased studies have illuminated hyporheic processes, which groundwater and surface water and stimulate biogeochemical interactions (Kunz, Annable, Rao, Rode, & Borchardt, 2017). These studies have also revealed new insights into differences and similarities between natural and human-influenced streamflow pathways (Mellander et al., 2014;Rozemeijer et al., 2010), effects of soils and geology (Mellander et al., 2012), and seasonal variations in the connectivity of nitrate sources to streams (Duncan et al., 2015;Outram et al., 2016). We therefore advocate expanding deeper characterization of subsurface structure and depth distribution of properties such as hydraulic conductivity, soil carbon content, and weathering rock composition, as well as deeper observations of CZ functioning.

| Spatial and temporal coverage: extreme environments and end members
Long-term observations have enabled the discovery and quantification of emerging environmental problems. Wellknown examples include the Keeling curve documenting increasing atmospheric CO 2 (Keeling et al., 1976), the discovery of the "ozone hole" above Antarctica (Farman, Gardiner, & Shanklin, 1985), and the records of long-term impacts of acid rain (Driscoll et al., 2001;Likens & Butler, 2018). These long-term observations have driven new research and policy around the globe. But financial constraints of monitoring programs are real. In particular locations, optimal sampling frequency and duration largely depends on the flashiness of streamflow and the response of solute concentrations to streamflow variations. Moatar et al. (2020) suggested that sub-daily frequency is necessary for solutes that are sensitive to streamflow variation such as total suspended solids (TSS), whereas monthly sampling may be sufficient for solutes that are insensitive to streamflow variations. The key is to find the goldilocks, "optimal" measurement frequency and duration that will fully reveal the system's dynamics without redundant information.
In selecting new networks (or modifying old ones) for long-term monitoring, it is important to start with research questions and conceptual models (Lindenmayer & Likens, 2010), but also to bear in mind the idiosyncrasies of unique places . It is also essential to consider the gradient of environmental variables to ensure the inclusion of the widest possible variation in geology, climate, vegetation, and land use conditions. Statistical approaches can help to map the parameter space and guide the choices (O'Hare et al., 2020), but some gaps are already apparent. For example, although 55% of the world population now lives in urban areas (UnitedNations, 2018), urban water quality investigations typically focus on waste water treatment and stormwater (Jovanovic, Hale, Gironás, & Mejia, 2019), with the nonengineered part of urban water quantity and quality being much less studied (Kaushal & Belt, 2012) compared to agricultural or pristine watersheds. More data are available in relatively humid regions compared to semi-arid and arid conditions . Although carbonate landscapes cover about 21% of the Earth's land surface (Hartmann & Moosdorf, 2012) and provide 20-25% of potable water (Hartmann et al., 2014), they are much less studied than silicate-dominated landscapes (Gaillardet et al., 2019;Sullivan et al., 2019).
Additional emphasis also needs to be directed to extreme environments that are particularly sensitive to future changes. Instrumental and paleoclimatic data from the Arctic document its sensitivity to both natural and anthropogenic forcing (Overpeck et al., 1997;Plaza et al., 2019;Turetsky et al., 2020), but other regions that are sensitive to climate change continue to be monitored sparsely. High-elevation mountains around the world and tropical glaciers in the Andes are highly sensitive to climate change, indicating an urgent need for data recording the earth system's response to changing climate (Bales et al., 2006;Barnhart et al., 2016;Liu & Yan, 2017;Hubbard et al., 2018;. However, regions like these often lack resources and are challenging to access, with the consequence that very limited data are available (Mulligan, 2013;Saberi et al., 2019). Monitoring systems are urgently needed in these areas to document long-term alteration of water quantity and quality in the changing climate, possibly identifying early signals of thresholds and tipping points of climate change. These locations could potentially serve as sentinel watersheds, where sentinel is defined as "someone charged to guard or keep watch over an area and sound an alarm if danger comes near." (Magner & Brooks, 2008).

| Data crunching: process-based and data-driven models
If data are available from catchments that span gradients of driving forces and internal structure, cross-site comparisons can be used to identify the most influential factors. Landscapes with similar anthropogenic forcings but different geological history (climatic or tectonic) could be compared, for example, to isolate the effects of individual factors on catchment structure and functioning. Intercatchment comparisons can begin to bracket the range of possible behaviors and locate specific catchments within this range. Ultimately, this will help in formulating hypotheses that can be tested in other places. For example, under similar climate conditions, catchment responses to agriculture, and excessive nitrogen inputs are expected to differ for different geological and lithological conditions (Miller, Tesoriero, Hood, Terziotti, & Wolock, 2017;Tesoriero et al., 2015). Geological conditions, such as the extent of glaciation, can pre-set the physical structure and properties of the subsurface (e.g., soil depth and permeability) whereas lithology determines rock reactivity and depths of weathering fronts. Together they regulate flow partitioning between water flowing laterally to streams and water recharging to groundwater . This flow partitioning has long-term implications for, for example, how much reactive N can be transported to depth and removed via denitrification (Kolbe et al., 2019b;Tesoriero et al., 2015). Generally speaking, responses to excessive nitrogen inputs may be more pronounced for redox-sensitive rocks such as shales and pH-sensitive rocks such as carbonate than slower-dissolving, less sensitive sandstones (Martin, 2017).
It is important to recognize that the issue of data scarcity will probably persist for decades to come. In CZ science, we may never have the "problem" of data deluge. Without further breakthroughs in subsurface imaging and characterization, there will be limitations in how many holes can be dug, how many transects can be mapped, and how many samples can be taken. Thus modeling and data analysis tools are needed to exploit limited data, to determine where additional observations would be most informative, to connect missing data points, to piece together coherent conceptual frameworks of interacting processes, and to project scenarios of the future.

| Process-based modeling for mechanistic understanding and strategic data collection
Developing integrated theories calls for models that can simulate interacting hydrological and biogeochemical processes. Process-based models have been developed historically within disciplinary boundaries. Hydrological models solve for water storage and fluxes at the watershed scale and beyond (Fatichi et al., 2016), and RTMs focus on chemical changes arising from transport and multicomponent biogeochemical reactions, typically in relatively "closed" groundwater systems that have limited interactions precipitation and sunlight Steefel et al., 2015). Solute transport and reaction modules have been developed as add-ons to hydrological models (e.g., (Arnold & Soil, 1994;Santhi et al., 2001)), but they often do not explicitly represent the multi-component reaction thermodynamics and the kinetic theory that lie at the core of geochemistry and biogeochemistry. Only recently have RTMs been brought to the catchment scale Li, 2019;Yeh et al., 2006;.
Like all measurement and modeling tools, process-based catchment-scale RTMs have limitations. With additional reaction and transport processes, they include even more functions (such as reaction kinetic rate laws) and parameters (e.g., reaction rate constants and surface area) than hydrological models that have already been criticized for their complexity, equifinality, uncertainty, and data demands (Beven, 2001a;Beven, 2006;Kirchner, Hooper, Kendall, Neal, & Leavesley, 1996). These issues will persist even though RTMs will be constrained by additional chemical data. The major sources of uncertainty in complex RTMs lie in epistemic uncertainties, that is, the lack of specific knowledge in forcing data and details of reactivities (e.g., spatial distribution and abundance of reactive materials), on top of uncertainties related to hydrology (Beven, 2000;Beven & Freer, 2001). The models' conceptual foundations also represent a major source of uncertainty, highlighting the need to formulate several competing models that would then be confronted with data and be selected or weighted based on the results (Höge, Wöhling, & Nowak, 2018). Remedies for these epistemic uncertainties have been proposed in the literature, including using a rejectionist approach (Beven & Lane, 2019), hypothesis testing (Beven, 2016;Clark, Kavetski, & Fenicia, 2011;Kirchner et al., 1996;Pfister & Kirchner, 2017), and uncertainty analysis (Beven & Binley, 2014;Beven & Freer, 2001). However, the computational cost of solving a spatially explicit, nonlinear, multi-component RTM is high, challenging the application of ensemble-based uncertainty analysis and model weighting/selection methods (Song et al., 2015). Application of rigorous Bayesian techniques in complex, computationally demanding models remains a daunting issue.
In the face of these uncertainties and the need to balance cost and gain in an integrative model, we advocate a "simple but not simplistic," or fit-for-purpose approach (Beven & Lane, 2019). A naïve approach to model integration would be to try to couple everything that is known by hydrologists about hydrology with everything that is known by biogeochemists about biogeochemistry, leading to extremely complex models that can hardly be calibrated and that may suffer from conceptual incompatibilities between the coupled parts, in addition to problems of equifinality (Beven, 2001b;Beven & Freer, 2001). It would be equally naïve to restrict biogeochemistry only to first-order kinetics, or express hydrology as the exchange between fully mixed pools. Over-simplistic models may be suited for rigorous uncertainty analysis, but will not serve the purpose of modeling if key dynamics cannot be captured. The model should be the most "fit" for particular questions and purposes, not the "best" model that can do everything. In our view, the over-arching purpose of such a modeling effort is to provide researchers from different disciplines with a common conceptual framework that represents the system and competing hypotheses about the system's bottlenecks, that is, the rate-limiting processes and properties. This tentative system analysis then can be formulated into competing mathematical models. These integrated models must be good at representing the bottlenecks, but should be allowed to be coarser in their representation of processes and properties that are presumed to be of minor influence. The resulting prototypes of integrated models need to be challenged, first by checking whether they produce the expected system behavior and then by confronting them with observations. Some models will fail, which itself offers understanding of key processes. Iteration will be needed to refine the models, often at the conceptual level, until a satisfactory balance is found between minimizing complexity and maximizing fidelity to the data. Jointly developing such conceptual and mathematical models in an interdisciplinary team can pose a challenge for the researchers involved, as individuals' favorite processes may be deemed less significant for the overall system dynamics. But it can stimulate new ideas and serve as interdisciplinary learning experience.
Model limitations and uncertainties, however, should not be a deterrent to integration between hydrology and biogeochemistry. Even before data are collected, prudently designed numerical experiments can be used to test hypotheses, to pinpoint when and where measurements would be most useful, and to estimate how many measurements would be sufficient (Seibert & Beven, 2009). Studies using hydrological models (Vrugt, Bouten, Gupta, & Sorooshian, 2002) have indicated that, when streamflow data are carefully chosen, 10% of streamflow measurements are sufficient for parameter calibration, meaning that approximately 90% of the data are redundant for parameter identification. The number of data points is often not critical in discriminating between parameter sets; what matters far more is the information content and the efficiency with which information is extracted (Gupta & Sorooshian, 1985;Sorooshian & Gupta, 1983). Model uncertainty and data analysis studies are common in the hydrology literature but much rarer in biogeochemistry. With constraints on resources, such analyses are urgently needed for long-term monitoring and short-term synoptic sampling campaigns. Triangulating between data collection, numerical experiments, and hypothesis testing can help advance toward an integrated theory (Chamberlin, 1965;Clark et al., 2011;Kirchner, 2006). This is done routinely for climate projections but less frequently for earth system models (Solomon, Plattner, Knutti, & Friedlingstein, 2009).

| Learning from data-driven approaches
Physical and biogeochemical processes are entangled by complex interactions, making them challenging to decipher. When large data sets are available, machine learning approaches hold promise for identifying key controls and developing integrated theories . Machine learning approaches have existed for decades, but their use has seen a recent renaissance stimulated by technological advances in GPUs (graphical processing units) (Schmidhuber, 2015). They directly target observations and are less constrained by a priori model assumptions than process-based models, although they are also criticized for their gigantic data requirements, and for not conveying mechanistic insight in any obvious way. Machine learning approaches, especially deep learning approaches that automatically exploit data in the absence of domain knowledge, have been applied by the hydrology community for understanding and forecasting soil moisture and streamflow (Fang, Shen, Kifer, & Yang, 2017;Karpatne et al., 2017;Shen, 2018;Shen et al., 2018). This capability creates possibilities for discovering new patterns, and possibly new concepts beyond those conceived by human experts. With sufficient data, they can perform better in reproducing observations.
The water quality and biogeochemistry community, however, has trailed behind in exploiting these possibilities. Existing usage of machine learning has mostly focused on nondeep approaches using logistic regression and boosted classification trees and artificial neural networks (Tesoriero et al., 2015;Underwood, Rizzo, Schroth, & Dewoolkar, 2017;. One of the biggest challenges in using deep learning approaches in the biogeochemistry and water quality fields is still data scarcity. With the exceptions of some intensively monitored catchments and quantities that can be continuously measured by probes (e.g., pH, electrical conductivity, dissolved oxygen, and NO 3 ; Burns et al., 2019), most solutes are measured biweekly or even bimonthly, often missing events such as large storms. Temporally sparse data are often insufficient to train robust models. Before exploiting deep learning methods in biogeochemistry and water quality, we should estimate the volumes of data required for training robust models. It is possible that the required data volumes would be vastly more than we can ever hope of having. On the other hand, higher sampling frequency does not always bring more information-additional data may just connect the dots between measurements that one already has.
These challenges may be addressed by integrating data-driven and process-based models within the framework of "theory-guided data science" (Karpatne et al., 2017;Reichstein et al., 2019;Shen, 2018). With their roots in the underlying physics, process-based models can offer behavioral insights that are consistent with scientific knowledge. Machine learning techniques can learn to optimally describe the ground truth that can be observed and facilitate the parameterization of process-based models that are parameter intensive. For example, environmental variables (for example, precipitation and surface slope) have been mapped to streamflow measures (such as mean, minimum and maximum streamflow) in a few thousand catchments and applied globally to feed hydrological models (Beck et al., 2016). Outputs of process-based models can be used as inputs for training data-based models together with observation data. For example, climate model simulations that are available at coarse spatial and temporal resolutions have been used as inputs to train statistical models to predict local, finer-scale climate variables (Wilby et al., 1998;Wilks, 2015). The combination of the two types of models may help to fill temporal gaps in historical data records, discern anomalous data that can mislead training processes, and possibly expand spatial coverage from gauged to ungauged basins. Overall, the hybrid approach can potentially lead to models that are data-adaptive yet still obey physical laws and maintain a conceptualized, interpretable framework.

| MOVING FORWARD
Catchments are the fundamental units of Earth's surface. Their above-ground and below-ground structures regulate their hydrological and biogeochemical functioning. Although integrated concepts have emerged in the literature, unifying theories at the interface of hydrology and biogeochemistry do not yet exist. In particular, traditional biogeochemical reaction theories have had difficulties in explaining observations in natural systems without integrating the understanding of hydrological flow paths. Issues of societal relevance, including those of water quantity and quality, persist and also demand integrated theories. The time is ripe to accelerate cross-disciplinary collaboration, not only in data collection, but also in developing theories across disciplines. We advocate embracing the CZ approach, melding theories, measurements, and models across traditional disciplinary boundaries. In particular, we note that data scarcity is still a major challenge in this age of "big data." We call for long-term, continuous monitoring of catchments as they record Earth's hydro-biogeochemical response to rapidly changing environmental conditions, in particular human-induced modifications. Such monitoring should be sufficiently long for the identification of thresholds, tipping points, and temporal trends in this unprecedented Anthropocene, which cannot be extrapolated from Earth's geological past. Measurements of discharge and stable water isotopes will also need to be juxtaposed with measures of multi-faceted reactive solutes and reactive isotopes to decode water's invisible journey through the subsurface. Intensive data collection needs to go beyond river channels and look into the subsurface, mapping physical and biogeochemical distributions over depth to understand the distinct water paths that connect water sources to streams. Intensive measurements will need to be balanced with spatially extensive observations, such that data collection includes extreme environments that are sensitive to climate change and human perturbations, as well as end members across the spectra of climate, geology, vegetation cover, and human influences. Process-based models and data-driven models need be blended to maximize data efficiency and theory development.
More generally, developing integrated theories at the intersections of disciplines will require cultivating earth and environmental scientists who think across disciplinary boundaries with a broader hollistic perspective. Similar to ecotones in ecology, where the blending of distinct conditions brings about high biodiversity (Brownstein, Johns, Fletcher, Pritchard, & Erskine, 2015;Nadis, 2016), scientific ecotones can stimulate and pollinate innovative ideas across disciplines, therefore shedding new light on old questions and puzzles that individual disciplines cannot resolve. One such example is the flourishing field of ecohydrology since the call for understanding climate-soil-vegetation dynamics (Ingram, 1987;Rodriguez-Iturbe, 2000). Although a daunting quest, the field of hydrobiogeochemistry will bring together hydrologists, biogeochemists, ecologists, and other scientists to share tools and ideaologies, and to advance an integrated catchment science and, more broadly, Earth system theories.

RELATED WIREs ARTICLE
Monitoring the riverine pulse: Applying high-frequency nitrate data to advance integrative understanding of biogeochemical and hydrological processes