Advancing isotope‐enabled hydrological modelling for ungauged calibration of data‐scarce humid tropical catchments

Realistic projections of the future climate and how this translates to water availability is crucial for sustainable water resource management. However, data availability constrains the capacity to simulate streamflow and corresponding hydrological processes. Developing more robust hydrological models and methods that can circumvent the need for large amounts of hydro‐climatic data is crucial to support water‐related decisions, particularly in developing countries. In this study, we use natural isotope tracers in addition to hydro‐climate data within a newly developed version of the spatially‐distributed J2000iso as an isotope‐enabled rainfall‐runoff model simulating both water and stable isotope (δ2H) fluxes. We pilot the model for the humid tropical San Carlos catchment (2500 km2) in northeastern Costa Rica, which has limited time series, but spatially distributed data. The added benefit of simulating stable isotopes was assessed by comparing different amounts of observation data using three model calibration strategies (i) three streamflow gauges, (ii) three gauges with stream isotopes and (iii) isotopes only. The J2000iso achieved a streamflow Kling–Gupta efficiency (KGE) of 0.55–0.70 across all the models and gauges, but differences in hydrological process simulations emerged when including stable water isotopes in the rainfall‐runoff calibration. Hydrological process simulation varied between the standard J2000 rainfall‐runoff model with a high simulated surface runoff proportion of 37% as opposed to the isotope version with 84%–89% simulated baseflow or interflow. The model solutions that used only isotope data for calibration exhibited differences in simulated interflow, baseflow and model performance but captured bulk water balances with a reasonable match between the simulated and observed hydrographs. We conclude that J2000iso has shown the potential to support water balance modelling for ungauged catchments using stable isotope, satellite and global reanalysis data sets.


| INTRODUCTION
The planet is currently experiencing significant hydrological change, which is likely to increase in the future.This ongoing dynamic change challenges our ability to model and understand the hydrological cycle, as well as our capacity to prepare for extreme climate events.Hydrological models that help us understand how this change will impact the world's population in addition to water resources and their availability are essential to mitigate and manage the risks associated with hydrological extremes.Hydrological models are dependent on hydroclimatic observations, but this data availability varies globally between countries and for different climates.In total, 82% of the records on the Global Runoff Data Centre (GRDC) come from only 6% of the countries which have supplied streamflow data and most records come from cold climates.Many countries in central Africa, Asia and parts of South and Central America have less than 40 available gauging stations per country, with limited records after the year 2000 available on the GRDC.Even where data is available, the records are often fragmented (Alkama et al., 2013).For these reasons, developing hydrological models that can circumvent the need for large amounts of hydro-climatic data is an important next step to support sustainable water management decision-making related to hydrological stress and extremes.
Given the prevalence of ungauged catchments lacking sufficient data, the development of robust hydrological models in the absence of gauging data is an important goal (e.g., Fowler et al., 2020;Wagener & Montanari, 2011;Watson, Midgley, et al., 2022;Watson, Miller, et al., 2022).Typically, a robust model would aim to simulate hydrological processes and changes that extend beyond the scope of calibration conditions, especially when dealing with stronger climate variability or anthropogenic changes.In general, model robustness can be enhanced with a focus on hydrograph fitting using advanced statistical tools for optimization (Arsenault et al., 2014), parameter sensitivity (Pfannerstill et al., 2015) and uncertainty analysis (e. g., Beven, 2001;Vrugt et al., 2005).All of these steps are considered good modelling practice (Mai, 2023;Singh, 2018).The use of multiple gauges for streamflow calibration has shown to improve model robustness (Li et al., 2010), as well as the use of spatial model evaluation, in addition to the more commonly used time-series calibration (Demirel et al., 2018).Nevertheless, when dealing with ungauged catchments, an alternative data source additionally to gauged discharge to infer streamflow is needed.
Advancements in the availability of global isotope datasets have brought isotope-enabled models to the forefront of hydrological process simulation, the prediction of future hydrological changes and model calibration for data scarce regions.Isotope-enabled models use isotopic information to simulate water fluxes and storages (e.g., Stadnyk et al., 2013;Birkel & Soulsby, 2015;Stadnyk & Holmes, 2020;Stevenson et al., 2021;Holmes et al., 2022;Watson et al., 2023).
Various distinctions can be made regarding how the isotopes are incorporated into hydrological models.In conceptual rainfall-runoff modelling, there are two main types of isotope-enabled models.The first use the isotopes to validate or calibrate the simulated hydrograph separation into water source contributions of total streamflow, known as End-Member-Mixing integrated models (e.g., Watson et al., 2023).
The second use a mass-balance approach for isotope mixing directly connected to each modelled water flux (e.g., Pesántez et al., 2023).
Both approaches have aided our understanding of hydrological processes and improved model robustness (IAEA, 2022).The mass balance approach aims to further increase model robustness by calibrating and validating additional internal processes, such as interception, evapotranspiration, soil-water fluxes and exchanges with the aquifer even at larger catchment scales (e.g., Arciniega-Esparza et al., 2023).With this in mind, there is considerable but relatively unexplored potential to use low-cost streamflow isotope tracers as a combined signal of hydrological components and processes for ungauged calibration in data-scarce regions.
In this study, we have developed an isotope-mixing, mass-balance module for the widely-applied J2000 spatially-distributed conceptual rainfall-runoff model (Kralisch & Krause, 2006;Krause, 2001).The J2000 model has previously been used in cold climates (Germany, France), tropical climates (Malaysia, Nepal, India, Brazil) and temperate climates (Australia, Turkey, South Africa) and has been proven to work across a range of different climate types and hydrological processes (Bonneau et al., 2023;Donmez et al., 2020;Krause, 2002;Kumar et al., 2017;Künne et al., 2019;Nepal et al., 2021;Watson et al., 2018Watson et al., , 2021)).Watson et al. (2023) used J2000iso to calibrate/ validate simulated flow components with an end-member-mixing analysis post-processing step.Building on this work, the new isotopeenabled model simulates both water and isotope fluxes for each modelled hydrological process (from precipitation, interception, mixing in the soil, percolation to the aquifer and outflow as different streamflow components).The model was tested in a tropical meso-scale catchment ($2500 km 2 ) setting in Costa Rica using spatially distributed daily streamflow (3 gauges) and monthly stream isotope (40 rivers across the catchment) data (Birkel et al., 2020).The incorporation of streamflow isotopes and simulation of isotope fluxes aims to enhance the adaptability of J2000 to different data availability scenarios and spatial simulations of internal catchment hydrological processes, providing a potential means of producing robust models in ungauged catchments with the following specific objectives: iii.Assess spatial simulations of internal catchment processes in terms of water sources contributing to streamflow.

| ENVIRONMENTAL SETTING
The San Carlos catchment is located in northeastern Costa Rica, draining the central Volcanic Cordillera mountains into the San Juan River at the Nicaraguan border (Figure 1).The catchment area totals around 2500 km 2 with a maximum elevation close to 2300 m.The land use of the catchment is comprised of pasture and livestock at elevations of around 1500 m a.s.l, while the lowlands at 200 m a.s.l are dominated by mainly industrial agriculture (pineapple and sugarcane).The headwaters are exclusively protected areas with almost undisturbed rainforests.Two active volcanoes, Arenal and Platanar, which are composed of young basaltic and andesitic material, geothermically impact the water temperature but also form a host aquifer for groundwater within the catchment.The lowlands of the catchment are dominated by relatively recent fluvial deposits, which are remnants of eroded volcanic material.The climate is driven by the Caribbean Low-Level Jet, which brings in moist air from the Caribbean Sea, resulting in high annual precipitation, and humidity throughout the year.Estimates of mean annual precipitation (MAP) range from between 3000 to 5000 mm/yr for the catchment, with spatial variability impacted by orographic lifting (Figure 2).Seasonally, precipitation is mainly received between May and November, with slightly drier months between December to April closer to the continental divide (Figure 2).
For a detailed description of the study site, refer to Birkel et al. (2020) and Arciniega-Esparza et al. (2023).

| MATERIALS, MODEL DEVELOPMENT AND SETUP
An isotope flux and mass balance module was developed and added to the fully distributed process-based, rainfall-runoff model J2000iso (Figure 3).This model is based on the J2000 rainfall-runoff model (Krause, 2001), using the modular object-oriented system JAMS  F I G U R E 3 A conceptual overview of the J2000iso model showing the source of the precipitation isotope data, the simulation of canopy interception, fractionation of isotopes during evapotranspiration, the infiltration to the soil water storages (or not as surface runoff), the mixing of precipitation with Middle and Large Pore Storage (MPS and LPS) subsequent simulation of interflow, percolation to the groundwater storages and fast (RG1) and slow baseflow (RG2).The calibration of measured streamflow isotopes at selected points within the catchment and the potential corresponding flow component breakdown (surface runoff, interflow, fast groundwater and slow groundwater flow).(Kralisch & Krause, 2006) which is openly accessible (http://jams.unijena.de/)and fully documented (http://jams.uni-jena.de/ilmswiki/index.php/Main_Page).The JAMS/J2000 modelling system offers a graphical user interface (GUI) facilitating easy access to hydrologists, specialists, and water managers looking to improve model robustness.
To take advantage of potential future developments to existing components for the J2000 model, the isotope module with fractionation and mass balance-based mixing was built as a stand-alone, modular component and added to the existing model component library.Such a universal component simulates the mass balance of all processes where water is stored or moved depending on the hydrological response unit (HRU) and reach delineation of the hydrological model.
The model setup required stationary data including gauging station location and maps of hydrogeology, soil, land use as well as the digital elevation model (DEM) as a basis for the HRU and reach delineation (Flügel, 1995;Pfennig et al., 2006Pfennig et al., , 2009)).
The climate and isotope precipitation input data were regionalized using inverse distance weighting (Shepard, 1968).The streamflow measurements and the isotope data for streamflow and groundwater are temporal data, each associated with spatial position.However, these data are not used as inputs; rather they were used for model calibration and validation.Details regarding the hydrogeology, soil and land use maps are presented, together with the parameterization of model parameters that are spatially distributed.The selection of spatial and temporal data was aligned to previous modelling by Arciniega-Esparza et al. (2023) for inter-model comparisons.The DEM used had a 20-meter resolution.This selection was based on the average size of the catchment and the level of detailed required in the delineated reaches (reaches with measured isotope data).The average size of each delineated HRU was 0.12 km 2 to ensure reasonable model run time (<20 000 hrus, <10 mins run time) and sub-basin size 1.6 km 2 for reach definition.A detailed overview of the model HRU delineation procedure is available in Watson et al., 2020Watson et al., , 2021. .3.1 | Model setup

| Climate
The climate data required by the J2000 model includes daily: (1) maximum air temperature, (2) mean air temperature, (3) minimum air temperature, (4) total solar radiation, (5) mean relative humidity, (6) mean windspeed and (7) total precipitation.Climate variables 1-6 form the input for the Penman Monteith potential evapotranspiration equation (Allen et al., 1998).The climate data was collected from 1981 to 2019, periods that covered existing streamflow and isotope data.Precipitation data was sourced from Arciniega-Esparza et al. (2022), which is a bias-corrected version of the CHIRPSv2 dataset (Funk et al., 2015).The data required for estimating reference evapotranspiration was taken from the ERA-5 dataset (Copernicus Climate Change, 2017), excluding relative humidity, which was taken from the isotope-enabled regional climate model IsoRSM (Yoshimura et al., 2010).To incorporate the raster layers from ERA-5 (30 km resolution), IsoRSM (10 km resolution) and CHIRPSv2 (5 km resolution) to the J2000 model, a centroid was generated of ERA-5 (lowest resolution), resulting in 56-59 spatial data points covering the entire modelling period.

| Isotope sampling and data
Isotope data of precipitation, streamflow and groundwater (as δ 2 H in ‰) was used as model input, calibration and validation of the J2000iso.δ 2 H was used as the sole tracer because it has a larger value range but strongly correlated with δ 18 O (Zhou et al., 2020).Precipitation isotopes were solely used as forcings, while streamflow isotopes formed part of the calibration and validation process.Groundwater isotopes were used as an additional bulk validation as they did not have high temporal resolution (Sánchez-Murillo & Birkel, 2016).
Weekly precipitation isotopes from three stations were used within the catchment to bias correct the isotope precipitation surface derived from IsoRSM (Yoshimura et al., 2010) a relatively typical wet season and the onset of an observed hydrological drought (6 points were excluded due to small sub-basin area).This represented a total of 457 samples with 10-12 samples per point.As a result of a higher spatial density of samples, the modelling procedure focused on a spatial calibration/validation, rather than the more conventional time-series calibration (refer to section 3.4 for further detail).The stream sampling cover sub-catchment sizes from few to around 2000 km 2 and represent almost the full elevation, land cover, soil and geological gradient of the study catchment (Birkel et al., 2020).The streamflow isotopes reflect the elevation gradient and show more variability in the headwaters compared to a more averaged out signal in the lowlands.The evaporation signal was only moderate related to the high relative humidity year-round and the high cloud cover over the rainy season (Arciniega-Esparza et al., 2023).Detailed information about the sampling procedure, analysis method and precision of both precipitation, streamflow and groundwater samples are available in Arciniega-Esparza et al. (2023) and Sánchez-Murillo and Birkel (2016).

| Hydrogeology
A 1:250000 geological map was attained from Denyer et al. (2003) and forms the basis for the spatial hydrogeological units for the model.The catchment geology includes: (1) Quaternary alluvial, RG1max, RG2max, RG1_k and RG2_k were attained from other J2000 models worldwide (Künne, 2021;Watson et al., 2018Watson et al., , 2019;;Watson, Miller, et al., 2022).A calibration factor was used to scale RG1_k and RG2_k for each individual geological unit.Given the larger list of model parameters compared to the standard J2000 models, a sensitivity analysis was included to understand the impact of these parameters on the simulated streamflow and isotope composition.

| Soils
Spatial soil classifications according to texture as well as corresponding depth from SoilGrids250m (Hengl et al., 2017) forms the input soil information within the model.The parameterization process involves a formal procedure that takes the percentage of sand, silt and clay (SSC) for the different soil horizons: A horizon (0-5 cm), B horizon (5-15 cm depth), C horizon (15-30 cm depth), D horizon (30-60 cm depth), E horizon (60-100 cm depth) and F horizon (100-200 cm depth).This procedure determines the available water holding capacity for the middle pore storage (MPS) and air capacity for the large pore storage (LPS) using the Rosetta lite pedotransfer function (Schaap et al., 2001).MPS is defined as the available water holding capacity of the soil, while LPS as the soil's air capacity in which water cannot be held against gravity.In total, four different soil classes were parameterized for the catchment.A soil-water retention curve (theta vs. depth) was generated within HYDRUS using Rosetta lite, emptying a constant upper and lower head boundary condition for each soil type.The MPS was thereafter determined by subtracting the soil water holding capacity at 15 000 and 60 mbar from the soilwater retention curve.LPS was determined by subtracting the water holding capacity at 0 and 60 mbar using the soil-water retention curve.In the absence of detailed measured soil physical properties, the effective holding capacity of MPS and LPS was then calculated by multiplying the soil water holding capacities by the depth from Soil-  resistances form inputs for the potential evapotranspiration calculation.Additionally, LAI was used in the interception module of the model.Rooting depth and sealed grade values (% impervious area) were used within the lumped soil-water component for the calculation of real evapotranspiration and surface runoff.

| Developing a universal isotope mixer for J2000iso
The isotope mixer combines the isotope composition of water in the various stores with transient flow at timestep T À1 (previous day) with T 0 (present day) for each HRU (Figure 3).Likewise, the component mixes the isotope composition of water linking the transport of water volume and isotope composition throughout the modelling chain.This includes the contribution of water within a single HRU, but also the mixing from one HRU to another using the routing scheme from the HRU delineation (Figure 5).The isotope mixing mass balance was performed as: where includes the soil water storages (MPS and LPS) whereas the remaining components are solely mixed in one direction (Figure 5).Additionally, a mixing proportion was used to calibrate the total volume of water mixed applied to the mixing of precipitation with the depression storage (DPS) (water which is not able to infiltrate is stored on the land surface), MPS and LPS as isoMixProp (isotope mixing proportion).The outflow water isotope composition from each HRU as inRD1 (surface runoff), inRD2 (interflow), inRG1 (fast groundwater) and inRG2 (slow groundwater) was mixed with its corresponding reach and calibrated with the IsoMixHruReach parameter (isotope mixing proportion from HRU to the reach).

| Modelling isotope fractionation
The liquid-vapour equilibrium α mas and kinetic isotopic separation (Horita & Wesolowski, 1994) for δ 2 H was performed within the interception and soil-water components of J2000iso.
α mas ¼ e 1 1000 Â 1158:8Â T 3 10 9 À1620:1Â T 2 10 6 þ794:84Â T 10 3 À161:04þ2:9992Â 10 9 where α mas is the liquid-vapour equilibrium isotope fractionation, T is the air temperature (Kelvin).The equilibrium isotopic separation between the liquid and vapour as mas was calculated as (in ‰): F I G U R E 5 An overview of the isotope mixing routine within each HRU, the interconnected mixing for different HRUs, mixing with corresponding reaches and from the headwater reaches to the outlet.A directional parameter was used to mix isotopes in both a forward and reverse direction and was used solely for the soil-water storage.Two calibration factors IsoMix and IsoHRUReach were used to control the proportion of volume mixed within each HRU, from HRU to reach and upstream to downstream reach.
The kinetic fractionation factor psk was determined according to Merlivat (1978) as: where RH is the relative humidity.The atmospheric composition from the precipitation-equilibrium assumption according to Gibson et al.
(2008) was calculated as: where k is a seasonality factor (default 1) and conc P is the isotope composition of the net rainfall time series.The enrichment slope e slope and limiting the isotopic fractionation d star were determined according to Gibson et al. (2016) as: The isotope composition of the residual liquid conc s was determined as: where init:conc S is the starting composition of the soil-water and taken from the mixing between net rain, MPS and x is an exchange factor (default 0.1).Finally, the vapour isotope composition according to Craig and Gordon (1965) was used to calculate the isotope composition of the evaporated fraction conc E as: While kinetic fractionation is influenced by turbulence, and wind speed data were available within this application, the Craig and Gordon (1965) model was preferred to ensure that kinetic fractionation can still be considered for other J2000iso models where potential evapotranspiration is determined in the absence of wind speed data.

| Model calibration and validation
The model calibration utilized measured streamflow from the three

| Parameter sensitivity and uncertainty
A Regional Sensitivity Analysis using a Monte Carlo Analysis (MCA) (Hornberger & Spear, 1981) was performed on each model for the calibration period.To identify the sensitivity of each parameter, the impact of more streamflow gauging information and the benefit of streamflow isotopes on parameters and performance, parameters were scaled randomly between the intervals [0.9, 1.1], resulting in a 10% increase or decrease from the selected optimal parameter set.
The selection of parameters for the sensitivity analysis and the model calibration (section 3.4) was based on model experience of parameters of importance for hydrological simulation in tropical climates (Künne, 2021;Künne et al., 2019) as well as the overall size of the study catchment (Watson et al., 2021) with the exclusion of snow parameters and processes.The sensitivity analysis made use of NSE and LogNSE for the streamflow data, while KGE was used as the evaluation metric for the isotope data.Finally, the resulting parameter uncertainty was estimated using the best 100 MCA results of the Terron Colorado gauging station and plotted as uncertainty bounds around simulated streamflow.

| Simulated streamflow and isotopes with J2000iso
The models calibrated with gauging data tended to perform better in terms of simulated low flows (Dec-Apr) than for peak flows (May-Nov), which was shown by LogNSE values between 0.49-0.72 (Figure 6 and Table 1).Likewise, the overall correlation, bias and variability between simulated and observed streamflow indicated good T A B L E 1 Summary of the model parameters used for the JAMS/J2000iso model for the San Carlos catchment including the type of parameter, a description of the parameter, the standard and *modified for JAMS/J2000iso parameter ranges, the optimal parameter attained from the NSGA-II (Non-Dominated Sorting Genetic Algorithm-II): (Deb et al., 2002), lower and upper bounds used for the Monte-Carlo-Analysis (MCA) for the (a) multi-gauged calibration for Jabillos, Terron Colorado, Boca Tapada gauges using streamflow data, (b) multi-gauged calibration using gauging streamflow data and streamflow isotopes and (c) only streamflow isotopes.).The streamflow isotope seasonality was overestimated particularly at the downstream site (Figure 8a), whereas the match is better for headwater sites that generally show a more variable isotope response (Figure 8b).In general, both combined streamflow with isotopes and isotope only models were able to simulate streamflow isotopes during the rainy season from August to October for both headwater and downstream sampling points.Isotope peaks in November and the seasonal patterns from January to April for the headwater and downstream were underestimated by either of the models.The multi-gauged model with isotopes simulated more heavy isotope enrichment for downstream reaches compared to the model calibrated only with isotopes (Figure 8).
The analysis of model uncertainty and potential variability in simulated streamflow for Terron Colorado illustrated that the 100 best solutions resulted in an overall NSE >0.4, with the most variability simulated for low flows (Figure 9).Similarly, there was limited potential to attain higher NSE values as the simulated peak flow variability across the model ensembles was low (Figure 9).The most sensitive parameters included the outflow rate of slow groundwater (lower aquifer: RG2_K), which had a notable impact on LogNSE, and to a lesser extent NSE, across the gauges (Table 2).The scaling of air sensitivity overall.The incorporation of stream isotopes in the calibration process led to a reduction in the sensitivity of outflow rates of slow and fast groundwater (RG1_k and RG2_k), while highlighting the sensitivity of scaling of air capacity (AC_Adapation) (Table 2).In the calibration with only stream isotopes, most of the parameters had a similar sensitivity except AC_Adapation and the proportion of isotope mixing between DPS, MPS, LPS (IsoMixProp) and the outflow rates of slow groundwater (RG1_K).
The largest streamflow contribution was simulated in the south-eastern headwaters of the San Carlos catchment with 0.18%-0.23%per sub-basin (Figure 11).Around 50% more streamflow was simulated in the headwaters than lowlands.These sub-basin contributions were comparable across all the calibrated models.Unlike bulk simulations of streamflow, ET and storage, the hydrological flow components were different for each model simulation (Figure 11).For the model calibrated with just gauging data, the steep slopes of the southeastern headwater were simulated to have a surface runoff proportion of 37% (0.04%-0.12% of streamflow per sub-basin), while 25% was simulated for interflow (0.01%-0.09% of streamflow per sub-basin) and the remaining 37% for baseflow (0.04%-0.09% of streamflow per subbasin) (Figure 11 and Table 3).In contrast, the multi-gauged and isotope calibration suggested a higher proportion of baseflow 84% (0.13%-0.16% of streamflow per sub-basin) in the steep south-eastern headwaters while interflow was simulated as 14% (0.02%-0.09% of streamflow per sub-basin) and surface runoff as 1% (0%-0.01% of streamflow per sub-basin) of streamflow.The model calibrated with isotopes alone simulated a much larger interflow contribution of 89% (0.02%-0.05% of streamflow per sub-basin).The model calibrated with isotopes and gauges had an overall relatively spatially-uniform surface runoff contribution with highest spatial variability for interflow and baseflow (Figure 11).
F I G U R E 7 Results from the Non-Dominated Sorting Genetic Algorithm NSGA-II (Deb et al., 2002) showing the Nash-Sutcliffe Efficiency (NSE) (Nash & Sutcliffe, 1970), LogNSE and Kling-Gupta Efficiency (KGE: Gupta et al., 2009) for the (a) multi-gauged, (b) multi-gauged with stream isotopes and (c) stream isotope only calibration for Boca, Jabillos and Terron gauges.NSE and LogNSE for stream isotope only are validation as opposed to calibration criteria used for the multi-gauged and multi-gauged with stream isotopes models.

| Integration of hydrological tracers for improved simulation of hydrological processes
The newly developed J2000iso isotope-enabled hydrological model (Figures 3 and 5) showed satisfactory performance (NSE >0.4: IAEA, 2022) in simulating daily streamflow and stream isotopes in a larger humid tropical catchment in Costa Rica (Figures 6 and 9).The isotope-enabled model also captured internal processes such as interflow and baseflow contributions to streamflow consistent with previous empirical findings (Birkel et al., 2020) and spatially-distributed models (Arciniega-Esparza et al., 2023).The largest difference between the isotope-enabled models and the flow-only models was the surface runoff to interflow proportion, which is usually over-estimated with rainfall-runoff models calibrated against discharge.Flow only models tend to underestimate the high infiltration capacities of volcanic soils particularly under forest cover of headwaters that virtually do not generate any surface flow even under extreme rainfall conditions and steep slopes (Birkel et al., 2021).To simulate stream isotopes a stronger dampening through higher water volumes in stor-   showed better performance for peak runoff simulation over-emphasizing surface runoff contributions to the hydrograph, whereas isotope-enabled models simulated more percolation to storage and less surface runoff, producing a better stream isotope fit.Such trade-offs were reported in many other tracer-aided modelling studies (Holmes et al., 2022).Additionally, the precipitation input reflects a significant under-catch in most mountainous tropical regions contributing to model uncertainties (Bruijnzeel et al., 2011).In comparison to previous research, J2000iso simulated groundwater storage of around 3500 mm, 3.5 times higher than STARRtropics (Arciniega-Esparza et al., 2023), and equivalent to about 1 year's precipitation in the headwaters.Total simulated ET, estimated by J2000iso, was around 1200 mm/yr, much lower and more adequate compared with previous estimates (Arciniega-Esparza et al., 2023) (Figure 10).Previous work overestimated ET using a simpler temperature-based algorithm compared to the more realistic Penman-Monteith approach for the tropics.The difference with the spatially distributed STARRtropics is that J2000iso uses a HRU approach resulting in a different spatial aggregation of topography, land cover, soils and geology.Furthermore, the runoff generation and mixing in STARRtropics is directly coupled to topography using a topographical wetness approach with less mixing and quicker runoff generation at higher elevation with steeper slopes and more mixing, longer transit times and slower runoff response in areas where water is prone to accumulate.J2000iso simulated around 50% of total streamflow (32% of the total) being generated from the headwaters compared to the low lands, similar to previous isotopebased estimates for the San Carlos catchment (Birkel et al., 2020).
To pre-test rainfall-runoff model inputs, corresponding outputs and initial component design, users often run a test calibration with a single gauge and objective function (often with NSE).However, for this application, it was not possible to identify any parameter combinations with NSE >0 using just a single streamflow gauge.Additionally, most parameter combinations led to the routing of all streamflow as surface runoff, with minimal to no flows generated from other flow components such as baseflow.The latter might be mathematically possible but is an unrealistic scenario for the tropical San Carlos with forested headwaters and substantial potential for storage in volcanic soils and alluvial aquifers alone.When calibrating with three gauges, more realistic parameter sets were found even with fewer model runs (Li et al., 2010), but the model still suggested relatively high and unrealistic surface runoff (37%).
The model with isotopes, as well as flow gauges used in the calibration, provided the most realistic simulation of hydrological processes and the overall water balance according to previous work (Birkel et al., 2020;Correa et al., 2020), but there is limited data and only small-scale studies help validate the isotope-suggested dominant interflow and sub-surface flow proportion (Birkel et al., 2021).Furthermore constraining the precipitation amount and its isotope composition are likely to have the largest influence and improving these datasets is crucial for isotope-enabled modelling as discussed in Delavau et al., 2017.The elevation gradient for the San Carlos catchment was prominent in the isotope composition of precipitation, streamflow and groundwater (gradient decreases accordingly).These gradients reflected in the spatial isotope pattern as well as the isotope seasonality were described in the papers by Birkel et al. (2020) and Arciniega-esparza et al. ( 2023), which we refer to for more details.
However, we have emphasized the need to match the spatial variability and elevation gradient with a suitable spatially distributed T A B L E 3 Summary table of the total simulated streamflow, surface runoff, interflow and baseflow aggregated for sub-basins between 0-200, 200-400, 400-600 and > 600 m a.s.l.precipitation and precipitation isotope product as input to drive the hydrological model.The bias correction of precipitation isotopes was done externally (Arciniega-Esparza et al., 2023), but there is also potential to develop a machine-learning component within J2000iso to do such bias-corrections internally (Wang et al., 2023).The results could then be cross-validated with observed precipitation and stream isotopes.

| Isotopes for ungauged model calibration
The use of streamflow isotopes provides a means to calibrate hydrological models, especially when complemented with other relevant input data that can be sourced including satellite or reanalysis data (Stadnyk & Holmes, 2020).While we were only able to attain an isotope KGE of 0.3-0.7 across 12 sampling sites, the calibration of streamflow isotopes provided a valuable method to constrain the bulk water balance and hydrological process variability within the catchment (Figures 6 and 10).Comparing the model calibrations I, II and III from the NSGA-II, multiple gauges with isotopes and with only isotopes show that the solutions tended towards a similar overall efficiency, but with a more targeted exploration of the parameter space when using isotopes (Figure 7).The parameter combination outliers that the NSGA-II algorithm included were less evident after a specific number of runs of around 1000 iterations in the calibration using only stream isotopes.This result highlighted that incorporating isotopes in addition to streamflow within the model calibration resulted in a more efficient parameter combination search with less impact on the performance metrics NSE and KGE.This was also illustrated by the regional sensitivity analysis with a more homogeneously distributed sensitivity across all parameters (Table 2).
Peak flow simulations using only isotopes were less certain upstream and in the middle of the basin while low flows were not well constrained downstream likely related to a coarse monthly isotope sampling interval.The low baseflow contribution in the case of the isotope only simulation is related to the overwhelming mixing in the soil reservoir simulated as interflow by J2000iso paired with an overestimated seasonality and underestimation of baseflow.Nonetheless, the calibration with only isotopes showed a relatively good overall match between the simulated and observed hydrographs (Figure 6).We showed that even with adjusted global data products such as a bias-corrected CHIRPSv2 and in combination with stream isotopes, the model was able to provide bulk process simulation.
Hence, the modelling approach using limited hydrometric and isotope data suggested value in developing hydrological modelling for ungauged basins, whenever the overall hydrological processes and water balances are the desired outputs.With better input precipitation data (amounts and isotopes) overall efficiencies could likely be improved and with more accessible soil, plant and groundwater isotope data further process evaluation could be targeted (Sánchez-Murillo et al., 2023;Sánchez-Murillo & Birkel, 2016).
While we ran 5000 iterations using the NSGA-II and MCA, it is likely that this number was not sufficient to fully explore the parameter space (Stadnyk & Holmes, 2023).More runs could potentially improve the search for parameter sets yielding acceptable performance and help a more quantitative parameter uncertainty estimation towards a limits of acceptability approach (Beven, 2022).
Despite the potential of using stable water isotope data for model calibration even of ungauged catchments, there is a crucial need to carefully select sampling frequency (Stevenson et al., 2021) and location to be able to establish within reasonable limits what is a good model representation of a hydrological system (Beven, 2002).

| CONCLUSIONS AND OUTLOOK
The use of isotopes to improve hydrological modelling is a promising avenue but requires updating and modification of current standard water management approaches to include isotope sampling in rainfall, stream- ungauged and require hydrological models to help make water-related decisions, isotope-enabled models can be used to simulate general streamflow trends and robust water balances with limited available streamflow data.Even though there will always be the need for gauged rivers, the J2000iso pilot approach demonstrated within limitations the feasibility to incorporate isotopes into generic hydrological modelling at various spatial scales with the potential to provide much needed water balance and hydrodynamic information at ungauged sites.
i. Develop and test a new isotope-enabled conceptual hydrological model for potential use across various climate types, tailored to user needs, providing a valuable tool for hydrologists, specialists and water managers.ii.Calibrate the new model with different strategies that can inform modelling according to data availability with (1) three streamflow gauges, (2) stream isotopes in combination with three streamflow gauges and (3) stream isotopes alone.
Location and extend of Costa Rica within Central America, (b) the Rio San Carlos catchment including major cities, towns, streamflow gauging stations, streamflow isotopes and location of precipitation isotope samples used to bias correct IsoRSM (Arciniega-Esparza et al., 2023).

F
I G U R E 2 (a) Mean Annual Precipitation (MAP) for the San Carlos catchment using the bias corrected CHIRPS (Climate Hazards group Infrared Precipitation with Stations) dataset with scaling high altitude precipitation by 30% for elevations above 1500 m a.s.l.(Arciniega-Esparza et al., 2022) (b) Yearly precipitation for the periods 1986-2004 and (c) monthly precipitation.
, (Figure 1).The bias-corrected isotope surface captured general precipitation trends but generally underestimates daily extremes (Arciniega-Esparza et al., 2023) (Figure 4).The bias-correction improved the dry-wet seasonality in precipitation isotopes with more enriched isotopes over the dry season and depleted rainy season isotopes, eliminated a consistent isotope offset in IsoRSM and resulted in R 2 from 0.2 to 0.6 for a point-to-pixel comparison.The monthly stream isotope data from Birkel et al. (2020) covers the period 2018-06 to 2019-05, representing 40 sampling points across the catchment covering a range of climate variability, including

( 2 )
Quaternary proximal facies volcanics, (3) Quaternary distal facies volcanics, (4) Quaternary and Pliocene ignimbrites, (5) Quaternary and Pliocene intergrade volcanics, (6) Tertiary volcanics and (7) Tertiary intrusives.The J2000 requires information on two groundwater storages, namely an upper aquifer (RG1) which mainly comprises of weathered material (such as quaternary sediments) responsible for quick groundwater flow and a lower aquifer (RG2) which represents the basement geology and slow-moving groundwater.Parameters for these two storages include: (1) maximum storage capacity (RG1max and RG2max) and (2) storage coefficients (RG1_k and RG2_k).While only limited knowledge is available on how these different geological materials behave, a common simplified model representation involves dividing groundwater into two storage aquifers representing faster and slower flow characteristics.Initial values for Grids250m.Additionally, calibration factors, namely AC adaptation (air capacity-LPS) and FC adaptation (field capacity-MPS) were used to scale LPS and MPS based on the simulated streamflow discharge and isotope composition.The soil-water component supports the model's ability to simulate actual evapotranspiration based on the potential evapotranspiration and simulated soil moisture, surface runoff as RD1, interflow as RD2, percolation to the groundwater component and capillary rise from the groundwater component.
ConA i isotope composition of component A, VolA i is the volume of component A, ConB i is the isotope composition of component B and VolB i is the volume of component B and x the resulting isotope composition.The model components mixed in both directions available gauges and 40 streamflow isotopes for 5000 model iterations.For comparison purposes, three different calibration strategies were implemented: (i) multi-gauged (three) but streamflow only calibration (Model I), (ii) combined multi-gauged streamflow with stream isotopes calibration (Model II) and (iii) stream isotopes only (Model III).For the multi-gauged calibration, streamflow data from Boca Tapada and Jabillos were used between the periods 1988-05-01 to 1990-04-30 in addition to streamflow records from Terron for the periods 1988-05-01 to 1996-12-31.The isotope calibration included the multi-gauged calibration in addition to isotope data between 2018-06 to 2019-05.The model calibration initially used relatively wide standard J2000 parameter intervals (Table1), a scaling factor to optimize the hydraulic conductivity of the RG1 and RG2 and a scaling factor to calibrate the groundwater storage.The selected model parameters were calibrated using the Non-Dominated Sorting Genetic Algorithm NSGA-II(Deb et al., 2002) and the Nash-Sutcliffe Efficiency (NSE:Nash & Sutcliffe, 1970) in standard form (E2) and logarithmic form (LogE2) for the streamflow data.The Kling-Gupta Efficiency (KGE:Gupta et al., 2009) was applied to the isotope data calibration to capture the general trends and variability between the measured and simulated streamflow isotopes.A subsequent validation was performed for the periods 1997-01-01 to 2003-12-31 using the Terron streamflow data.
indicated by a KGE of between 0.61-0.70.The models with gauging data struggled to reproduce peak flows with optimal NSE values of between 0.25-0.45across all the gauges (Figure 6 and Figure 7).The model solutions tended to average out the streamflow dynamics downstream (Terron Colorado, Boca Tapada), while maintaining relatively high streamflow dynamics in the headwaters (Jabillos) (Figure 6).The model calibrated with only streamflow isotopes showed limitations to reproduce streamflow peaks upstream and in the middle of the catchment (Terron) with a NSE value of 0.05, while downstream (Boca) NSE was 0.40 (Figure 6 and Figure 7).Low flows were better simulated than NSE upstream (Jabillos) and in the middle of the catchment with logNSE values of between 0.29-0.45compared to 0.29 downstream.The overall correlation, bias and variability between simulated and observed streamflow was good across all gauges (as a validation) with KGE values of between 0.55-0.69.The models calibrated with streamflow isotopes achieved an overall KGE of between 0.30-0.70across 11 sampling sites and a KGE of > À0.41 at 38 sites (Figure 8 capacity (AC_Adapation), outflow rate of surface runoff (Soil-ConcRD1) and overall hydrograph dampening (flowroute TA) showed high sensitivity in terms of NSE for the three gauges.Expectedly, the overall hydrograph dampening became more sensitive downstream, while the linear reduction of potential evaporation had limited F I G U R E 6 Simulated, observed streamflow and bias correction CHIRPS precipitation for the J2000iso model of the San Carlos catchment for (a) Jabillos, (b) Terron Colorado and (c) Boca Tapada gauging stations using a multi-gauged streamflow calibration, multi-gauged calibration with gauging streamflow data and streamflow isotopes and a model only using streamflow isotopes for calibration.
age was required than what is typically used to simulate streamflow.This dampening, in comparison to the travel time of the discharge wave (flowRouteTA), was used to dampen the stream isotope values to reduce the overall dynamic and daily variability of stream and groundwater isotopes.Hence, models calibrated only with streamflow F I G U R E 8 The simulated and observed stream isotope composition for a selected headwater (reachID 504) and downstream (reachID 104) reach showing the overall The Kling-Gupta Efficiency (KGE: Gupta et al., 2009) as well as the KGE of all streamflow isotopes for the two different isotope models.F I G U R E 9 Terron Colorado with Monte Carlo Analysis results (5000 runs) used to show the best 100 model simulations and understand the potential simulated streamflow variability.T A B L E 2 Results of the Monte Carlo analysis (MCA) used to understand the relative importance of the different model parameters in terms of simulating peak flows and low flows using the Nash-Sutcliffe efficiency NSE and LogNSE for a multi-gauged model, mu

F
I G U R E 1 0 The water balance between precipitation (P), evapotranspiration (ET), streamflow (Q) and change in storage (ΔS).F I G U R E 1 1 The total streamflow for: (a) multi-gauged streamflow calibration, (b) multi-gauged calibration with gauging streamflow data and streamflow isotopes and (c) a model only using streamflow isotopes for calibration for simulated streamflow (a-c) and its flow components: surface runoff (ai, bi,ci), interflow (aii, bii,cii), baseflow (aiii, biii,ciii) (combine fast and slow groundwater flow) from each reach (sub-basin) and proportionated with the total streamflow.
flow and possibly in soils, plants and groundwater in addition to hydrometric monitoring.Water samples to measure stable isotopes do offer an additional data source in comparison to install and maintain costly streamflow gauges that are subject to flood damage and more low flow uncertainties, particularly at remote sites.In fact, many gauges across Africa at least have been rendered inoperable due to flooding, which is projected to increase with climate change.Despite the need to physically collect the water sample regardless of automatic water sampling devices, stable isotopes are robust, easy and inexpensive to measure.The IAEA has been supplying relatively low-cost laser water analysers to many countries, for example, in Africa, South and Southeast Asia and Latin America, supporting capacity to generate the isotope data needed to develop hydrological models, as shown here.Isotopes can capture the high and low flow signatures enabling constraining the overall water balance and internal catchment functioning in the form of baseflow contributions to total streamflow.Given that many parts of the world are still