Virtual Hydrological Laboratories: Developing the Next Generation of Conceptual Models to Support Decision Making Under Change

As hydrological systems are pushed outside the envelope of historical experience, the ability of current hydrological models to serve as a basis for credible prediction and decision making is increasingly challenged. Conceptual models are the most common type of surface water hydrological model used for decision support due to reasonable performance in the absence of change, ease of use and computational speed that facilitate scenario, sensitivity and uncertainty analysis. Hence, conceptual models in effect represent the current “shopfront” of hydrological science as seen by practitioners. However, these models have notable limitations in their ability to resolve internal catchment processes and subsequently capture hydrological change. New thinking is needed to confront the challenges faced by the current generation of conceptual models in dealing with a changing environment. We argue the next generation of conceptual models should combine the parsimony of conceptual models with our best available scientific understanding. We propose a strategy to develop such models using multiple hydrological lines of evidence. This strategy includes using appropriately selected physically resolved models as “Virtual Hydrological Laboratories” to test and refine the simpler models' ability to predict future hydrological changes. This approach moves beyond the sole focus on “predictive skill” measured using metrics of historical performance, facilitating the development of the next generation of conceptual models with hydrological fidelity (i.e., models that “get the right answers for the right reasons”). This quest is more than a scientific curiosity; it is expected by policy makers who need to know what to plan for.


Introduction
"…the sciences do not try to explain, they hardly even try to interpret, they mainly make models.By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena.The justification of such a mathematical construct is solely and precisely that it is expected to work-that is, correctly describe phenomena from a reasonably wide area.Furthermore, it must satisfy certain aesthetic criteria-that is, in relation to how much it describes, it must be rather simple."(von Neumann, 1955)

Hydrological Science Is Limited by the Inability to Perform Controlled Experiments
Catchments represent the outcome of physical, biological and social processes that play out across multiple spatial and temporal scales (Gunderson & Holling, 2002).Whilst hydrological systems globally are subject to the same fundamental underlying physical (if not necessarily biological or social) processes, their diverse local expressions mean that each catchment is generally regarded as unique (Beven, 2000).Moreover, observational limitations mean that it is impossible to fully characterize internal functioning due to poor identifiability of the system (e.g., Guillaume et al., 2019), which is often referred to as equifinality in hydrology (Beven, 2006) or underdetermination in the philosophy of science (Rosenberg & McIntyre, 2019).Moreover, catchment responses are both highly non-linear (with small changes potentially having large consequences) and constantly changing at a range of spatial and temporal scales (Fowler et al., 2022;Montanari et al., 2013) leading to complex causal chains with multiple interactions and feedbacks.
These statements form the backdrop for the hydrological science enterprise, differentiating it from fields that can rely heavily on controlled laboratory experiments (e.g., physical sciences), and from fields where randomized control trials are viewed as the highest line of evidence (e.g., medical sciences).We simply do not have access to multiple identical (or even functionally equivalent) realizations of a hydrological system (such as a catchment) for which we can alter one or more physical characteristics and/or boundary conditions in one set while leaving the other set as controls.Likewise, for practical and ethical reasons, we cannot randomly select a statistical sample of catchments to modify (e.g., construct dams, modify land use, or some other sort of intervention) while using an equivalent statistical sample as a control.Hydrological studies that do involve the use of controlled experiments generally occur at the laboratory or small plot scale and focus on understanding component processes at scales that do not easily or directly translate to the scales relevant to water management.

Progress in Hydrological Science Can Be Considered a Qualified Success
Despite its challenges, the hydrological sciences have seen tremendous progress during the last century and a half, and our capacity to predict hydrological fluxes at a range of spatial and temporal scales is unprecedented.A key feature of the scientific discipline is a heavy reliance on models-where models are the main expression of hydrological theory and represent complex sets of hypotheses (always multiple), which can be tested on data from those elements of the hydrological system that lend themselves to observation.In keeping with common interpretations of the scientific method, we should expect such models (representing competing hypotheses) to undergo a type of scientific natural selection-gradually weeding out the weaker potential explanations and retaining the stronger ones-by confronting them with more and different kinds of data.
However, taking the pragmatic perspective that the quality of a scientific endeavor is ultimately measured through its predictive skill under varied conditions, we might call this partnership a qualified success.While our capacity to predict dominant hydrological quantities generally continues to increase, our capacity to verify the proposed models remains limited by inadequate access to relevant data (e.g., Maier et al., 2023), and by the unavoidable inability to conduct controlled experiments.Put simply, our capacity to test and refine hydrological models at the catchment scale, for a wide range of catchments, is severely limited.

Simpler Conceptual Models Are More Commonly Used in Practical Contexts
If we accept the assertion that hydrological science is largely the discipline of building and testing models of hydrological processes, then what do hydrological models consist of?There indeed exists a wide spectrum of hydrological models, with varying degrees of simplification in space, time, and process representation (Peel & McMahon, 2020).For the purposes of this discussion, we (somewhat crudely) categorize hydrological models into two end-member model types-"conceptual models" (CMs) and "physically resolved models" (PRMs).The first type, CMs, refers to simplified, lumped or semi-distributed, hydrological representations based on an aggregate perceptual-conceptual physical understanding of the dominant processes that give rise to the catchment-scale behaviors we observe under typical climatic conditions (Gupta et al., 2012).Classic examples of CMs include GR4J (Perrin et al., 2003), HBV (Bergström, 1995) and IHACRES (Croke & Jakeman, 2004).In contrast, the second model type, PRMs, refers to space-time-and-process-resolved spatially discretized hydrological representations based on detailed physics-based interpretation of subcatchment scale storages, flows and physical transformations of water within the system.This physics-based understanding is typically obtained from detailed laboratory/plot scale and experimental catchment studies.Examples of PRMs include HydroGeoSphere (Brunner & Simmons, 2012), HM (VanderKwaak, 1999), and ParFlow (Kollet & Maxwell, 2006).We prefer to use the term "physically resolved" here, instead of "physically based" and/or "process-based," because the real difference between conceptual and physically resolved models is not that the former have no basis in physical reality (they clearly do, in that attributes of reality, such as the processes of catchment storage, routing and ET fluxes are mirrored onto model attributes), but that the essential difference between them is the level of detail with (and form of representation by) which physical attributes of space, time and hydrologically relevant processes are resolved within them.
In categorizing hydrological models into these two types, we recognize several drawbacks (as outlined by Partington et al., 2022, and others), including: (a) there have been extensive debates in the hydrological literature on the relative merits of categorizing models (e.g., Abbott et al., 2001;Beven, 1989;Fatichi et al., 2016;Grayson et al., 1992 and many others); (b) definitions are not "sharp" but form a continuum (e.g., Mount et al., 2016); (c) there are always exceptions to the model types summarized in the preceding paragraph; (d) this categorization applies to models which predominantly focus on predicting surface-water quantities rather than groundwater quantities-noting of course there can be significant groundwater-surface water interactions.
Of these two model types, CMs are generally the more common choice for practical management applications.This includes eWater SOURCE, Australia's hydrological modeling platform (Welsh et al., 2013), the Sacramento model in the US (Burnash et al., 1973), and GR4J in France (Perrin et al., 2003).A recent review of hydrological models in Switzerland by Horton et al. (2022) (based on 157 journal articles) found that simpler conceptual models (PREVAH and HBV-light) dominate usage with over 45% of applications, and are the most widely used in climate change impact studies.Similarly, Peel and McMahon (2020), who reviewed 279 rainfall-runoff models, found that 74% of models were conceptual.
One can postulate likely reasons for the popularity of CMs over PRMs.In terms of performance, model intercomparison studies have shown that CMs provide similar or better streamflow predictions than PRMs (e.g., Reed et al., 2004;Refsgaard & Knudsen, 1996).Another obvious advantage of CMs over PRMs is their ease of use and low computational cost, which makes it possible to explore many hundreds to thousands of scenarios; thereby supporting evaluation of risk when making decisions to manage the potential impacts of climate variability and change (e.g., Partington et al., 2022).Faster computational speed also permits both sensitivity (Razavi et al., 2021) and comprehensive uncertainty analysis (Renard et al., 2010)-typically requiring parameter samples of thousands or more for convergence (the number depending on complexity of the model parameter space).There is also a strong social component to hydrological model selection, with Addor and Melsen (2019) referring to "practicality, convenience, experience, and habit" as key factors, and Horton et al. (2022) establishing, via surveys of model developers and users in Switzerland, that institutional knowledge is a key factor in model choice.

Current Conceptual Models Are Unlikely to Be Able to Deal With Hydrological Change
Recent signs suggest that increasingly rapid hydrological change will have profound implications, not just for our modeling efforts, but for hydrological science in general.Natural and anthropogenic forces act to alter water balances, vegetation dynamics, soil porosities and longer-term geomorphologies, and these changes interact to drive further change along complex, interconnected and multi-scale causal chains (Section 2.1 will provide a more detailed description of the types of "hydrological change").It has become increasingly difficult to accept that these changes can be adequately represented using highly conceptualized model abstractions, whose behaviors are controlled by static parameters that cannot easily be related to observed geometrical and physical properties of the changing system.This is not mere speculation-many recent studies have highlighted the challenges that conventional CMs face when attempting to represent the catchment-scale changes that are taking place (e.g., Clarke, 2007;Fowler et al., 2020;Gibbs et al., 2018;Saft et al., 2016).
Despite their popularity, CMs are known to often produce unrealistic internal hydrologic fluxes (Bouaziz et al., 2021;L. Li et al., 2015;Peel & McMahon, 2020), as these models do not provide detailed "physically resolved" representations of internal catchment processes.In addition, the reliance of these models on calibration to tune parameter values can result in parameter interactions that mask structural inadequacies.Observational limitations make it impossible to properly assess whether the "accuracy" of a CM's predictions is due to the correct partitioning of fluxes and proper assignment of state variables within the model (see Partington et al., 2013).Indeed, it is a major conceptual leap to view catchments as series and parallel arrangements of large "buckets" influenced by aggregated processes, and to routinely assume that individual physical processes such as groundwater/surface interactions, capillary drive, preferential pathways, and spatial patterns of ponding or reinfiltration can be ignored at catchment scales and under operational conditions.Put simply, it is difficult to know whether the CMs are "right for the right reasons."This concept, known as hydrological fidelity, describes the extent to which a model faithfully simulates dominant hydrological processes (Clark et al., 2015).Unsurprisingly, in order for CMs to represent hydrological change, their internal process representations will likely need greater hydrological fidelity and more direct connection to physical catchment attributes (e.g., see Duethmann et al., 2020, and case study therein).

Will PRMs Be the Savior to Overcome the Challenge of Predicting Change?
In contrast to CMs, PRMs provide a physically resolved representation of a catchment.These models are typically derived from laboratory/plot scale studies and experimental catchments; their mathematical representation typically employs partial differential equations derived from continuity of mass and momentum in combination with initial and boundary conditions (Freeze & Harlan, 1969).These models are appealing because they aim to capture diverse catchment processes at scales that are far more detailed than those used in CMs.As noted by Fatichi et al. (2016), "when applied spatially, from hillslope to continental scales, such a model [PRM] can incorporate the space-time variability of the primary forcings, such as precipitation and radiation, and variations of land-surface properties (e.g., topography, soils, vegetation) at the sub-hillslope scale, while resolving the subsurface domain in horizontal and vertical directions in a way to describe catchment heterogeneity."As such, PRMs provide the best opportunity to capture internal catchment processes (Peel & McMahon, 2020).In turn, as stated by Fatichi et al. (2016), it is believed this helps provide predictions under hydrological change, such as changes in land use/land cover or a non-stationary climate (e.g., Pierini et al., 2014;van Roosmalen et al., 2009;Wang et al., 2008).
The challenges with using PRMs in practical contexts are twofold: (a) observational constraints limit the ability to establish reliable model constitutive functions and may result in over-parametrization, providing a risk to model accuracy, and (b) there is a high cost related to their practical implementation.
In regard to the first challenge, hydrological systems exist in the so-called realm of "organized complexity" (Dooge, 1986;Weinberg, 2001), where it is difficult to (a) acquire detailed process-relevant data at space-time resolutions required to study the fundamental scale-dependent nature of hydrologically relevant processes, (b) develop and verify accurate mathematical representations of those processes at those various scales, and (c) constrain the values of the extremely large number of parameters that must be specified when applying PRMs to common real-world catchments.This makes it difficult to test the fidelity of PRMs, except for a relatively small number of "data-rich" experimental catchments (e.g., Camporese et al., 2014Camporese et al., , 2015;;Thyer et al., 2004).Thus, calibration of PRMs is susceptible to over-parameterization problems.It is generally recognized that only a small number of parameters can be identified from typically available catchment data; see Jakeman and Hornberger (1993) and Shin et al. (2015) for further discussion.
In regard to the second challenge, PRMs have a significant cost associated with model development, as field observations, data preparation and parameter calibrations are very expensive (Ampadu et al., 2013) and often unavailable at the intended model scale.The computational costs of PRMs often preclude comprehensive scenario, sensitivity and/or uncertainty analysis.The computational challenges are being gradually overcome by massively parallel super-computers (Kollet et al., 2010) (and to some extent by model emulation), though such resources are not readily available to the majority of hydrological modelers.These considerations, in our view, suggest that PRMs, while being attractive in terms of potentially more appealing process presentation, are on their own unlikely to provide a reliable and practical basis for representing environmental changes across a wide range of scenarios and locations.

The Need for a New Approach to Conceptual Models to Represent Change
The evidence presented in Section 1.4 suggests that CMs are the most common type of model used for practical applications by hydrological model users (e.g., engineers/planners/designers/operators of water resource systems) for decision support.In effect, CMs and their predictions represent the "shopfront" of hydrological science.Given the pressing and profound implications of hydrological change, it is critical that the hydrological community develop models with a higher hydrological fidelity than current CMs.This does not mean that we should reject CMs in favor of PRMs, or vice versa.It is unlikely that the challenges associated with PRMs related to observational limitations, practical implementation and institutional convenience will be alleviated any time soon, hence there is unlikely to be widespread adoption of PRMs in practical contexts in the foreseeable future.Similarly, it seems wasteful for the development of CMs to not take advantage of the tremendous investment and hydrological knowledge generated by PRMs.We need to find a "sweet spot" where we can take advantage of strengths and overcome the weaknesses of both model types.
It is time to re-think how we assemble our hydrological lines of evidence when constructing CMs so that our evolving best available understanding can be used to provide the next generation of CMs with better hydrological fidelity, and therefore greater confidence in their predictions in the face of hydrological change.

Accelerating Development of the Next Generation of Conceptual Models
As an empirical science, hydrology can be characterized as the process of developing hypotheses (through models) and testing them against available evidence (in the form of observations/process understanding) (Section 1.2).A number of "hydrological lines of evidence" (HLEs) are used in the hydrological literature to inform process understanding and provide the evidence to change our hydrological models (i.e., update our hypotheses).These HLEs range from laboratory scale experiments to large sample hydrology (described below in Section 2.3).This section evaluates HLEs in terms of their ability to accelerate the development of the next generation of conceptual models that are robust in terms of both predictive ability and hydrological fidelity when faced with hydrological change.
The outcomes of this evaluation of the HLEs are summarized in Figure 1.An overview of the evaluation process is outlined as follows.Section 2.1 starts by defining "hydrological change" and discussing the consequences of hydrological science's limitations in controlling these changes (which impacts our ability to conduct controlled experiments).Section 2.2 identifies five key questions that need to be asked of each HLE to identify models that provide robust predictions of hydrological change for a broad range of catchments and/or changing conditions.Section 2.3 defines and evaluates four existing HLEs against these key questions.Section 3 then provides a comprehensive definition and evaluation of an emerging fifth HLE, Virtual Hydrological Laboratories (VHLs), which are increasingly being used to provide virtual observations for experimentation (see Fatichi et al., 2016).This includes a discussion of the unique opportunities and challenges provided by the VHL approach.2).The four existing HLEs are listed as column headings (light blue background).The "cells" in the figure provide the outcomes of the evaluation (Section 2.3 for further information).For question 1, regarding "hydrological change," we assess whether the HLEs can evaluate the model for robustness to hydrological change using either a controlled experimental or observational approach.For questions 2-5 the HLEs are classified into three categories ("yes", "no", "limited"), based on the authors' experience.The right hand column (dark blue background) evaluates an emerging HLE, Virtual Hydrological Laboratories (see Section 3 for detailed discussion).

Hydrological Change Is Itself a Complex Issue
Hydrological changes occurring within a catchment can be complex.Building upon Thirel et al. (2015), we define hydrological change as a significant change in physical characteristics (e.g., land use), or boundary conditions (e.g., rainfall/temperature change)."Significant change" refers to systematic deviations of the system behaviors from the historical record.Though this definition is subjective, it is important to distinguish from trivial changes.
The inability of hydrological science to undertake controlled experiments (see Section 1.1) provides the motivation to classify the drivers of change into two groups, based on level of control that we have as hydrological scientists and practitioners.
(a) "Climatic phenomena" refers to changes that we cannot control (at the scale of the catchment) due to climate variability or anthropogenic climate change.Climatic phenomena can alter boundary conditions (e.g., precipitation/temperature), which can change flow dynamics by activating or deactivating hydrological processes as the system adapts to the changes.Physical characteristics can also change as ecosystems adapt in response to climate and/or are subject to disturbances such as bushfires/vegetation mortality, which have a substantial impact on a catchment's hydrology (Anderegg et al., 2015;Partington et al., 2022).(b) "Human interventions" refers to changes that we can have some degree of control over.This includes changes to land cover (e.g., agriculture/urbanization), water management (e.g., dam construction, or streambed mining) or other interventions (e.g., water withdrawals/pumping/diversions) that will have significant impact on flow dynamics.

Key Questions to Evaluate Hydrological Lines of Evidence
Regardless of their origin, hydrological changes to catchment characteristics or boundary conditions lead to changes in hydrological processes and manifest as changes in the input-state-output dynamics of the system.The altered system dynamics will challenge our assumptions regarding catchment structure and function as embodied by CMs.This is a key challenge faced by catchment models-will they be robust to these changes?
In this study we build upon the definition of Mathevet et al. (2020) and use the term "robustness" to refer to the capability of a model to hold a certain level of performance for a broad range of different or changing conditions.
This definition provides the motivation to propose the following key questions to evaluate the HLEs, starting with a focus on evaluating hydrological change: 1. How can the HLE evaluate the model for robustness to hydrological change due to (a) climatic phenomena, or (b) human intervention?
This question has two possible answers, depending on our ability to control the real-world system of interest: • Controlled experimental approach.The real-world system can be manipulated and the dynamical response of the system to this change can be observed.The benefits are that confounding factors can be controlled so that the impact of changes in the experimental factors on the system can definitively be determined.The challenge is that it is not always practical, feasible or ethical to manipulate real-world systems.• Observational approach.The system cannot be manipulated and only its response to changes (e.g., climatic phenomena) can be observed.The benefits are that existing data from real-world systems can be used for studying the impact of changes that happened in the past, or phenomena/processes that are too vast or complex to control easily.The challenges are that uncontrolled confounding factors make it difficult to determine causation and one may have to wait a long time to obtain observations of rare, naturally occurring phenomena and/or detect trends due to slow moving gradual changes.
The next key questions (2-5) are motivated by the need for predictions to be robust to change for a range of realistic catchment types (size, properties etc.) and processes (moving beyond streamflow-only evaluation): 2. Can the HLE evaluate the model for robustness on catchments of practically relevant size?Although catchment management decisions can occur at a range of sizes, it is common for practitioners to be interested in impacts at larger catchment with sizes of the order of 10-10,000 km 2 or greater.3. Can the HLE evaluate the model for robustness on a diverse range of catchment properties?Hydrological model users require robust predictions on catchments with a wide variety of properties (sizes, slopes, elevations, soil vegetation types, etc.).
4. Can the HLE evaluate the model for robustness on a wide range of processes/predictions?This question refers to moving beyond evaluating only streamflow prediction at the outlet, and could include streamflow at internal catchment locations, different streamflow generating mechanisms (infiltration excess vs. saturation excess), ET processes, snow/ice related physical phenomena, and groundwater/surface water exchanges.5. Can the HLE evaluate the model for robustness using real data?To ensure realistic predictions, it is important to evaluate models against observations from real catchments.This consideration becomes important when using synthetic data generated in a Virtual Hydrological Laboratory for experimentation-see Section 3 for detailed discussion.

Evaluating Existing Hydrological Lines of Evidence
This section evaluates each of the existing hydrological lines of evidence against the five key questions outlined above, with outcomes summarized in Figure 1.
To answer question 1, regarding hydrological change, each HLE is categorized by whether the change can be evaluated using either a controlled experimental approach or an observational approach.In Figure 1, the answer to this question is split into two components as it depends on whether the change is due to (a) climatic phenomena or (b) human interventions (see Section 2.1).Key questions 2-5 are answered by classifying the HLEs into three different categories ("yes", "no", "limited"), based on the authors' experience, and with justification provided below.We recognize this classification is subjective, but the analysis has value because it starts a conversation on the pathway we should take to develop the next generation of CMs.
The principal lines of evidence that have been used in hydrology to inform process understanding and its incorporation into hydrological models are categorized and evaluated as follows: • Lab/plot scale experiments.These focus on using detailed observations at small spatial scales (∼5 m 2 ) at a plot or in a laboratory to enable the use of controlled experiments (Glaser et al., 2019;Nanda et al., 2018).Evidence of this kind is used to develop small scale process understanding, for a diverse range of hydrological properties, but has limited applicability for modeling catchments of practically relevant size.• Experimental catchments.Heavily instrumented experimental catchments provide high-resolution observations of multiple hydrological variables and catchment properties.Evidence of this kind facilitates understanding of key hydrological processes (e.g., McGlynn et al., 2002) and can test the ability of models to represent dominant processes (e.g., Fenicia et al., 2014).However, experimental catchments are somewhat rare, typically small in size, and usually provide data for only short time periods (a few years).While these catchments enable experimental studies of the impacts of human intervention (e.g., land-use change), evaluation of changes related to climate is necessarily limited to the observational approach.• Paired catchments.These are typically catchments selected to experience the same hydroclimate, with similar properties such as slope, aspect, soils, area, climate, and vegetation.This experimental setup reduces the influence of confounding factors.Evidence of this kind can be useful when one catchment undergoes change (e.g., alteration of land-use) so that the other catchment can serve as a control against which to assess the impacts of that change (Brown et al., 2005;McDonnell et al., 2018).Paired catchments can provide valuable evidence at practically relevant catchment sizes; however, it can be difficult to find suitably paired catchments that span the wide range of hydrological diversity of practical relevance, and such pairs typically are not heavily instrumented and thus cannot be used to test over a wide range of processes/predictions.While hydrological change is controlled for, any evidence pertaining to changing climate will be observational.• Large sample hydrology.This involves the compilation of historical data for a common set of hydrologically relevant variables and catchment attributes across a large number of catchments (100-1000s) with widely varying catchment properties.Evidence of this kind can be used to assess model adequacy and performance for a wide range of catchments in a statistical sense and to study hydroclimatic and hydrogeological variability.To achieve relative consistency of input/output data used across catchments, the data sets used in large sample hydrology studies are typically limited to a few key variables (e.g., streamflow at catchment outlets, interpolated catchment rainfall and PET).For example, see data sets such as CAMELS (Addor et al., 2017;Fowler et al., 2021) and studies that use this HLE (Coron et al., 2012;Fowler et al., 2016;Gupta et al., 2014).Hence, this HLE only supports testing on a narrow range of processes/predictions, using an observational approach that is limited to historical changes in climate and/or land-use.et al., 2018), and can overcome some of these limitations.
Figure 1 shows that the four existing HLEs are unable to simultaneously address all five key questions.They do not provide sufficient information to support the development of a new generation of CMs that can provide robust support for decision making in the face of hydrological change across a wide range of situations of practical relevance.
However, there exists another complementary approach that provides the opportunity to integrate multiple lines of evidence, while taking advantage of their respective strengths and overcoming their respective weaknessesthe Virtual Hydrological Laboratory-as described in the next section.

Overview of a Virtual Hydrological Laboratory Approach
We envisage the Virtual Hydrological Laboratory (VHL; Figure 2) to be a conceptual framework consisting of a computational component that can generate a wide range of "virtual catchments" and an evaluation component that enables the use of controlled experiments to test conceptual models for a range of hydrological change scenarios.Each virtual catchment (VC) will be constructed using the best available PRMs that provide detailed physics-based understanding of sub-catchment scale storages, flows and physical transformations of water within the system.Being virtual, these VCs can differ systematically (and widely) in their climate and/or catchment characteristics, and can be subjected to a range of change scenarios to determine how their hydrological behaviors respond to such changes.Being based in process understanding, the VHL should be constructed to be consistent with the four other HLEs discussed above.The VHL can then provide a framework within which the next generation of CMs can be developed, by comprehensively evaluating competing catchment-scale model structures (systems architectures and process parameterizations) against these virtual catchments.We emphasize that while VCs are constructed from PRMs, they are distinct from traditional applications where a PRM is implemented for a specific catchment.
As discussed above, broad application of PRMs within a decision making context is limited by data availability and computation time.In a VHL, catchment characteristics can be assigned to cover a range of likely values, and VHL simulations used to create and test CMs that can then be more broadly applied.Testing the VHL could focus on behavioral modeling (Schaefli et al., 2011;Yang & Chui, 2021) to ensure that relationships between drivers of change and hydrologic responses are consistent with the four lines of evidence rather than using catchment specific performance metrics for single hydrologic variables (e.g., Nash Sutcliffe efficiency between observed and modeled flow) that rely heavily on having complete and accurate data sets to parameterize the model and define boundary conditions.Note that in this VHL approach, selection of suitable "trustworthy" PRMs for the particular change experiment of interest will be crucial-see Section 3.3 for further discussion.
The VHL approach will facilitate and accelerate CM development because: (1) In a virtual catchment all hydrological components can be observed (albeit virtually), and thus CMs can be subjected to a more comprehensive level of scrutiny than current observational data sets allow; (2) The capability to systematically change catchment characteristics will provide the ability to conduct controlled experiments that isolate the key changes in hydrological processes for different catchments types -this is currently not possible with real-world experiments; and (3) The ability to systematically change climate and land cover/use characteristics will provide the opportunity to undertake hydrological change experiments and evaluate CMs on a wide range of future hydrological change scenarios that are outside the envelope of observations.This strategy provides real potential to proactively "future-proof" CMs to be able to support decision making in the face of changes that are yet to be observed.
The concept of a "virtual laboratory" or "digital twins," although not (yet) fully utilized in hydrology, is extensively applied in other scientific and engineering fields.For instance, virtual simulators are used in military, aviation, and medical training to prepare for rare events that don't often occur during short training periods (Lateef, 2010).Hydrology faces similar challenges, particularly in predicting rare events like extreme floods due Water Resources Research 10.1029/2022WR034234 THYER ET AL.
to changing climate.Climate modeling also employs virtual approaches, using complex, highly resolved models to guide the parameterization of less-resolved but computationally viable global climate models, which are too coarse to model cloud processes (O'Gorman & Dwyer, 2018).This parallels the hydrological challenge of capturing aggregate effects of surface and sub-surface heterogeneity in catchment-scale CMs.In biology, "virtual simulators" enhance the range of data for machine learning (ML) model training (Deist et al., 2019).Similarly, in hydrology, we often have limited observations for a catchment (typically areal rainfall/PET and outlet streamflow), hence a VHL could generate a wider range of synthetic data to enhance system knowledge for better process evaluation and prediction.
The VHL concept has been gradually emerging in the hydrological sciences.Fatichi et al. (2016) provided several examples of how virtual labs could be integrated with natural ones (e.g., experimental catchments) to advance theory and process understanding by coordinating model development with field observation activities.Weiler and McDonnell (2004) used a range of VHL-like experiments to develop perceptual models of hillslope processes.Tague and Moritz (2019)  Inspired by the past use of VHLs, we argue that VHLs need to become a key hydrological line of evidence to inform development of the next generation of CMs.

Arguments for Development of the VHL Approach
To accelerate the development of the next generation of CMs, the use of VHL-type approaches needs to be expanded to become the backbone of the workflow in CM development and evaluation.Three key arguments why this transformation is required are outlined below: Argument 1: The VHL approach enables construction of conceptual models as simplified models guided by scientific understanding of the key dominant processes under hydrological change.
To provide greater confidence that the next generation of CMs can provide robust predictions of hydrological change, these models will need to faithfully represent key processes-that is, they will need to improve their hydrological fidelity.
Initially-starting as far back as Linsley and Kohler (1958)-catchment CM development relied substantially on hydrologists' personal assessments regarding model structure and parameterization.Hence, the CMs reflected the knowledge, experience and prejudices of their developers (Gharari et al., 2021;Gupta et al., 2012).Contemporary CM development has evolved considerably since those early days.Notably, conceptual models of the GR family are derived from extensive testing using large sample hydrology (e.g., Mathevet et al., 2020;Perrin et al., 2003), while models such as SuperFlex have incorporated process understanding from the experimental catchments HLE (Fenicia et al., 2014(Fenicia et al., , 2016)).However, while current CMs share many common principles-for example, the conceptualization of water flowing through series and parallel assemblages of storage tanks-their system architectures, process parameterization equations, and parameter specifications vary widely (Clark et al., 2008).
One of the key challenges with the HLEs of large sample hydrology and experimental catchments for CM development is that these approaches are typically "backward" looking.The use of past historical data sets can only evaluate the ability of the CMs to represent past hydrological change (and the report card on those evaluations is generally not good-see Section 1.4).Large sample hydrology and experimental catchments will continue to play a role in hydrological model development as there is still significant scope to improve CMs by learning from the past.However, for the future, as we face unprecedented hydrological change, how do we provide greater confidence in our models to provide predictions of future change?If we continue to rely solely on the use of HLEs such as large sample hydrology and experimental catchments, we will have to wait until the hydrological change has occurred before we can evaluate and improve CMs.That is too late for supporting decisions in the interim.We cannot wait.We need new thinking to accelerate the development of CMs that can capture the dominant processes under hydrological change.
To capitalize on current scientific knowledge, one strategy is to treat CM development as "reduced-order" modeling, using detailed PRMs to guide simpler CMs.The CM should aim to retain key input-state-output dynamics from PRMs, particularly those that are relevant for the hydro-climatic region of interest and application objective (e.g., capturing future changes in catchment yield or flood peaks).This approach recognizes that the best CM for a particular hydro-climatic region, and for a specific hydrological purpose, may be different (Watson et al., 2013).A VHL-like approach can be followed, starting with a highly realistic model for the target environment, and conducting virtual experiments to design a minimal-information-loss, reduced-order CM.This creates a continuum of progressively simpler model representations, minimizing information loss with respect to a model's intended purpose (Tishby et al., 2000).By doing so, it could be argued that the resulting CMs reflect current best available relevant scientific understanding, at least for the given scale and context.This reasoning aligns with the "organizing principles" of the behavioral modeling approach proposed by Schaefli et al. (2011).
As such, the "reduced-order" approach aims to get "the right answers for the right reasons" and achieve hydrological fidelity under climate change.
The VHL could be designed to quickly compare different CMs-and help the modeler choose the appropriate CM for the particular application-similar to frameworks such as FUSE (Clark et al., 2008), SuperFlex (Kavetski & Fenicia, 2011), MaRRMoT (Knoben et al., 2019), Raven (Craig et al., 2020), and others.An approach that could be helpful in terms of identifying the appropriate reduced-order models here is surrogate or emulation modeling (e.g., Asher et al., 2015;Yang et al., 2018), where simplified models that capture the behavior of specific quantities of interest are developed from PRMs.These surrogate approaches could be explored to help identify (or used instead of) suitable CMs.The VHL could also provide improved visualization of model structure and underlying hydrologic concepts, which would contribute to more reliable, and defensible use of CMs, particularly in decision-making contexts (e.g., Partington et al., 2013;Tague & Frew, 2021).
Inevitably, when moving from a complex PRM to a simpler CM, there will be some information loss and there is a risk that simplified CMs are unable to capture the entire range of streamflow regimes.To overcome this deficiency, probabilistic error models (e.g., Hunter et al., 2021;McInerney et al., 2017McInerney et al., , 2020) could be used in combination with these simpler CMs to capture streamflow uncertainty that is important for management applications such as streamflow forecasting.
Argument 2: A VHL enables the use of controlled experiments to evaluate hydrological change.Clark et al. (2011) considered the set of available CMs to be "Multiple Working Hypotheses" of catchment process representations.Since we must account for processes that are relevant to hydrological change of various kinds, our "hypotheses" of catchment processes clearly need to be improved.However, there is a potentially large variety of processes that may need to be included in these new CMs to enable investigation of the impacts of changes to boundary conditions and/or physical properties of the catchment of interest.
A key challenge to determining whether a process should be included in a CM is the existence of "confounding factors" that make it difficult to test hypotheses.As an example, a change in catchment vegetation due to bushfire and/or land clearing may lead to a change in catchment yield.However, yield is also dependent on climate variability (annual rainfall, temperature), which can act as a confounding factor.If we try to correlate changes in catchment yield with changes in vegetation pre-and post-bushfire, the different climates pre-and post-bushfire can confound the development of a suitable relationship.Such confounding effects of climate variability could be reduced by computing yields over longer periods (i.e., 20 years instead of 10 years), but then as the vegetation grows back, the change in catchment yield can become very difficult to determine.
A major advantage of the VHL, which is not easily provided by the existing HLEs, is the powerful ability to conduct controlled experiments to test for changes in either climatic phenomena (via virtual simulation of climate) or human interventions (via virtual simulation of water management/land-use change) at practically relevant catchment scales (Figure 1).In a VHL, we can (more or less) independently vary, and therefore control for, a variety of important factors, such as catchment size and shape, geology, topographic gradient (aspect and slope), channel network form, vegetation type, soil layering and structure, material properties, sub-surface boundary conditions, and climate drivers, among many more.Accordingly, we can represent a very diverse range of catchments (limited only by hydrological understanding and computational power).As the VHL can comprehensively embody our current state of knowledge regarding the catchment system (albeit, virtually), it can be used as a benchmark against which any simpler CM hypothesis can be tested over a wide range of potential applications.Being able to conduct controlled experiments provides the VHL with the unique ability to evaluate the impact of hydrological change in a controlled way for both human intervention (e.g., land-use change) and climatic phenomena (e.g., climate variability/change).As Figure 1 shows, no other HLE provides this capability at practically relevant scales of interest.This translates into the opportunity to proactively build CMs that are robust under hydrological change, rather than passively waiting (potentially years or, more likely, decades) for the observational data sets to reveal the flaws in existing CMs.
Argument 3: VHLs provide the vehicle that enables the integrating of multiple lines of evidence into the next generation of conceptual models.
The first two arguments outline the particular strengths of the VHL approach that overcome the limitations of existing HLEs.However, rather than replacing existing HLEs, we envision that the VHL approach represents a fifth HLE (Figure 1).
The VHL approach enables us to integrate multiple lines of evidence, complementing and integrating existing HLEs (Figure 3).The resulting framework uses data-rich observations from experimental catchments and/or paired catchments to improve process understanding, and subsequently improve PRMs (blue section, Figure 3).The VHL can be used to develop virtual catchments and undertake hydrological change experiments, enabling the development of candidate CMs as simpler reduced-order models and proactive evaluation of robustness to hydrological change (green section, Figure 3).Large-sample hydrological studies can be used to test these candidate CMs and ensure high quality performance on a wide range of catchments (orange section, Figure 3), weeding out any candidate CMs that are unable to match real observations.Finally, calibration with local data remains part of the conceptual model application step (black section, Figure 3).While each step uses different lines of evidence, it will almost certainly benefit from iterative feedback from the other steps to identify and address limitations at each stage.
A key benefit of the integrated multiple HLE framework (Figure 3) is that it does not rely solely on a single line of evidence to develop the next generation of CMs.We are not relying exclusively on PRMs to provide the basis for all CM future model development.We are not building CMs using only Large Sample Hydrology, which has provided a solid foundation for CM development, but can only look at past change using an observational approach.This integrated approach based on multiple HLEs will enable the cross-pollination of ideas across different sub-disciplines within hydrological sciences.This is why the iterative feedback loop is so important-it provides opportunity for ideas such as a reverse coupling, where the use of existing well-performing CMs could serve as a check on the PRMs, thus yielding a coherent combination of PRMs and CMs.We argue, based on these benefits, that this integrated multiple HLE framework should become a standard part of the CM development workflow, rather than the exception.

Limitations of the VHL Approach: Selection of a Trustworthy PRM Is Crucial
It is important to acknowledge the limitations of using the VHL approach as a fifth HLE.These include: (1) The data used in the VHL approach to test the CMs for hydrological change are not "real" observations (see Figures 1 and 2).The ability to conduct hydrological change experiments using the VHL (see argument 2) Water THYER ET AL.
relies on the assumption that the PRMs incorporate not only relevant-scale hydrological processes (such as soil infiltration and evapotranspiration from vegetation), but that they also provide realistic representations of emergent properties at the catchment scale, such as flood peaks (e.g., Farmer et al., 2015) and/or Budyko curves.
(2) The VHL approach assumes a level of "trust" that the PRMs capture future hydrological change.This is a thorny issue because the ability to capture future change cannot be guaranteed for any environmental model.If we accept the premise that PRMs reflect our best current understanding of environmental physics, then it is logical to propose that such models represent our best current chance to simulate future changes.The review by Fatichi et al. (2016) cites several studies where this assumption was made and PRMs were used to estimate the impact of future climates (e.g., Huntington & Niswonger, 2012;Piras et al., 2014;Sulis et al., 2012) and land-use/cover change (e.g., Ebel & Mirus, 2014;Ogden & Stallard, 2013;Pierini et al., 2014;van Roosmalen et al., 2009).Rodriguez and Tomasella (2016)  For the VHL approach to be adopted, it is important to be aware of these assumptions/limitations and take steps to mitigate them.The aim is to avoid the situation, where, as eloquently stated by a reviewer "we encircle ourselves by developing models with different models that we have already built."A key step is the requirement for careful scrutiny and selection of PRMs.This selection process needs to consider the strength of evidence for the use of a given PRM for the chosen hydrological change experiment in a given catchment type (hereafter in this paragraph referred to as a "scenario").The aim of this selection process is to identify PRMs that are sufficiently "trustworthy" to be used in the VHL for a given scenario.For example, if the scenario of interest is to capture changes in vegetation dynamics, the ecohydrological model RHESSys could be considered a "trustworthy" PRM because its representation of processes related to vegetation dynamics have been developed, tested and refined based on large number of studies and using multiple HLEs, namely experimental and/or paired catchments (Son et al., 2016;Tsamir et al., 2019;Zierl et al., 2007).Indeed, RHESSys has already been used in virtual change experiments (Stephens et al., 2020).For other scenarios, a "trustworthy" PRM may not currently exist, especially considering the wide range of potentially different hydrological changes (see Section 2.1), wide range of catchment types and potential processes of interest.In these scenarios, the lack of trustworthy PRM will hopefully motivate further research to develop a sufficiently trustworthy PRM.This research may involve comparisons with CMs and observed data in order to question and improve existing PRMs.An example of a scenario where a trustworthy PRM does not currently exist is the shifts in streamflow found by Fowler et al. (2020), and references therein, for which there is yet to be a study that evaluates whether a PRM can capture this change.
If the VHL approach is to be adopted, there is a clear need to develop an objective process for determining if a PRM is trustworthy.This could include adapting the criteria used for multiple lines and levels of evidence in ecology (e.g., Norris et al., 2012), while recent developments of validation techniques by Gelfan and Millionshchikova (2018) and Gelfan et al. (2020) are also promising in terms of developing and establishing trustworthy PRMs for climate change.
One could argue these limitations preclude the VHL approach from helping guide CM development.However, this argument misses several important points.
(1) In order to predict future behavior of environmental systems, we must unavoidably use our best available current understanding of these systems to make assumptions about the future.(2) The integration of multiple HLEs in an iterative framework (as outlined in Figure 3) will likely lead to explicitly confronting, questioning and ultimately improving the assumptions underlying both CMs and PRMs and the impact of those assumptions on predicting future change.Indeed, we hope this framework will further encourage the ongoing improvement and refinement of PRMs themselves, to make them more "trustworthy."(3) The integration of multiple HLEs, notably large sample hydrology, also mitigates these limitations by avoiding exclusive reliance on the VHL.sample hydrology, we gain a potentially very powerful way to develop scientifically informed predictions of future change.
Finally, we note that the modification of CMs to simulate past change has already been successful in certain contexts.For example, Duethmann et al. (2020) modified the HBV model to account for vegetation dynamics in 156 Austrian catchments.These types of studies give more confidence that CMs can be modified to provide more reliable predictions of change.The VHL approach provides a mechanism to inform these modifications for future changes (as yet unseen) by reconciling the strengths of multiple HLEs.

Summary and Call to Action
The need for new thinking becomes ever more pressing as the predictive skill of current hydrological models is confronted by rapid, multi-faceted change and catchments are pushed outside of the envelope of historical experience.To continue to advance our scientific enterprise, we should revisit the role of conceptual models as our most important workhorse, ensuring their use not merely as tools for prediction, but as agents that represent our best available scientific understanding (applied in a given context).
Conceptual models are the model type most commonly used for supporting decision making in surface water hydrology.As such, they and their predictions essentially represent the "shopfront" of hydrological science.Our next generation of conceptual models will need to be firmly founded upon the multiple lines of evidence available in hydrology.This paper argues that the key to integrating these lines of evidence is the use of Virtual Hydrological Laboratories (VHLs).The VHL approach enables the integration of knowledge from experimental/paired catchments through physically resolved models and provides the ability to conduct controlled experiments using appropriately selected physically resolved models in a virtual lab-type environment.This VHL approach will accelerate the development of the next generation of conceptual models for providing robust predictions of hydrological change through a process of hypothesis falsification-rather than waiting decades or even centuries for the observations to do the falsification for us.This VHL approach also requires us, as hydrologists, to confront and enhance the assumptions regarding future change that are embodied in both conceptual and physically resolved models.
The VHL approach aims to spearhead the proactive development of the next generation of conceptual models that can better support decision making, and will shift the focus toward hydrological fidelity (i.e., "getting the right answers for the right reasons").This proactive development of conceptual models that can reliably predict future outcomes before they occur is expected by policy makers who need to know what to plan for.

Figure 1 .
Figure1.Evaluation of hydrological lines of evidence against five key questions to accelerate the development of the next generation of conceptual models.The questions are listed as row headings on left hand side of the figure (see description in Section 2.2).The four existing HLEs are listed as column headings (light blue background).The "cells" in the figure provide the outcomes of the evaluation (Section 2.3 for further information).For question 1, regarding "hydrological change," we assess whether the HLEs can evaluate the model for robustness to hydrological change using either a controlled experimental or observational approach.For questions 2-5 the HLEs are classified into three categories ("yes", "no", "limited"), based on the authors' experience.The right hand column (dark blue background) evaluates an emerging HLE, Virtual Hydrological Laboratories (see Section 3 for detailed discussion).

Figure 3 .
Figure 3. Framework that integrates multiple hydrological lines of evidence, including a VHL, to provide the best available scientific understanding and develop the next generation of conceptual models.
used a PRM to examine the impact of land-use change in Amazonian basins.More recently, Gelfan and Millionshchikova (2018) and Gelfan et al. (2020) proposed comprehensive evaluation/validation tests of PRMs for climate change impacts.Stephens et al. (2020) used a PRM in a series of virtual change experiments to evaluate the impact of changing climate and its interaction with vegetation dynamics on catchment runoff response.
By constraining CM development to be consistent with the understanding encoded into VHLs and emergent catchment behavior from real catchment data used in large (McCabe et al., 2017;Sheffieldo global, is increasingly used to derive new and even larger scale data sets for other hydrological variables for use in large sample hydrology studies(McCabe et al., 2017;Sheffield It is noted that remotely sensed data, THYER ET AL.across Stephens et al. (2020)sed a VHL to evaluate the ability of stochastic rainfall models to provide reliable streamflow predictions.Stephens et al. (2020)used a VHL to show historical periods with equivalent precipitation statistics cannot necessarily be used as proxies for future climate change when examining catchment runoff or model performance.
used an eco-hydrological model as a VHL to investigate how different assumptions about plant root characteristics are likely to impact hydrologic responses to forest thinning.L. Li et al. (2014) used a VHL approach to enhance the performance of recursive digital baseflow filters and evaluate a conceptual model's (AWBM) ability to simulate quick-flow and slow flow responses.J. Li et al. (2014) used a VHL to assess the ability of event-based methods to estimate flood frequency in different Australian climates.THYER ET AL.