On doing hydrology with dragons: Realizing the value of perceptual models and knowledge accumulation

Our ability to fully and reliably observe and simulate the terrestrial hydrologic cycle is limited, and in‐depth experimental studies cover only a tiny fraction of our landscape. On medieval maps, unexplored regions were shown as images of dragons—displaying a fear of the unknown. With time, cartographers dared to leave such areas blank, thus inviting explorations of what lay beyond the edge of current knowledge. In hydrology, we are still in a phase where maps of variables more likely contain hydrologic dragons than blank areas, which would acknowledge a lack of knowledge. In which regions is our ability to extrapolate well developed, and where is it poor? Where are available data sets informative, and where are they just poor approximations of likely system properties? How do we best identify and acknowledge these gaps to better understand and reduce the uncertainty in characterizing hydrologic systems? The accumulation of knowledge has been postulated as a fundamental mark of scientific advancement. In hydrology, we lack an effective strategy for knowledge accumulation as a community, and insufficiently focus on highlighting knowledge gaps where they exist. We propose two strategies to rectify these deficiencies. Firstly, the use of open and shared perceptual models to develop, debate, and test hypotheses. Secondly, improved knowledge accumulation in hydrology through a stronger focus on knowledge extraction and integration from available peer‐reviewed articles. The latter should include metadata to tag journal articles complemented by a common hydro‐meteorological database that would enable searching, organizing and analyzing previous studies in a hydrologically meaningful manner.

Mauro world map from 1459 shows the world in great detail with hundreds of illustrations and thousands of descriptive texts (Figure 1a), which is surprising given the poor state of knowledge about the world at the time. Looking more closely, however, one finds that this one and other maps from the period included imagined representations in unexplored (unknown) areas. Indeed, it was customary practice at the time to show monsters rather than to leave spaces emptysuggesting that it is undesirable and dangerous to explore such places. Ancient Roman and Medieval mapmakers demarcated such unknown areas with the phrase HIC SUNT LEONES ("here are lions") or alternatively with HIC SUNT DRACONES ("here are dragons"), thus populating unknown areas with creatures that would instill fear in the reader (Agostinho et al., 2019). Fewer than 100 years later the Salviati Planisphere map from 1525 ( Figure 1b) not only shows the newly found eastern coasts of North and South America, but also reveals empty space in thus far unexplored areas. The 16th-century map reveals where knowledge gaps exist by daring to leave such areas blank, thus inviting explorations to discover what lies beyond the edge of current knowledge. Rather than fearing ignorance, this step of acknowledging the unknown became a scientific goal in itself-an important scientific belief in modernity.
In hydrology, we have seen our knowledge and analysis domain expand from the catchment to continental and even global scales, increasingly with the help of large-scale hydrologic simulation models. Global hydrologic models have emerged, not just any more as the endeavors of particularly brave scientists (e.g., Manabe & Holloway, 1975), but rather as tools for regular scientific analysis and increasingly even as potential tools for water resource management (Archfield et al., 2015;Bierkens et al., 2015;Straatsma et al., 2016). Hydrology is indeed moving toward realizing elements of the "models of everywhere" idea of Beven-that is, modeling as a learning process (Beven, 2001). At the same time, there is the hope for more realistic process representations through hyper-resolution models (Clark et al., 2017;Maxwell & Condon, 2016;Wood et al., 2011), though the debate regarding the best way to represent the physics of hydrologic processes in our models continues (Beven et al., 2015). Global and continental-scale hydrologic models increasingly reveal human influence on global fluxes of terrestrial sediments to the oceans (Syvitski et al., 2005), risks to global river biodiversity (Vörösmarty et al., 2010), global depletion of groundwater resources (Wada et al., 2010), the relative impacts of groundwater pumping on our rivers (De Graaf et al., 2019), global drivers of flood risk (Winsemius et al., 2016), impacts of human activity on the global water cycle (Bosmans et al., 2017), and potential implications of climate change for the global freshwater system (Döll et al., 2018).
Equally, we see the emergence of large-scale and even global data sets which add new dimensions to our ability to analyze global hydrology (Beck, van Dijk, et al., 2019;Ghiggi et al., 2019;Lindersson et al., 2020). New global data sets of physical system properties, such as subsurface hydrogeological properties, have been accumulated in recent years (Gleeson et al., 2011;Huscroft et al., 2018). Various global precipitation data sets can be used to force hydrologic simulation models (Beck, van Dijk, et al., 2019), while land surface fluxes have been regionalized from flux tower networks to the global scale (Jung et al., 2011). Satellite data provide continuous time estimates of freshwater storage (Rodell et al., 2018;Famiglietti et al., 2015) and the topographic characteristics of our land surface in a hydrologically meaningful manner (Linke et al., 2019;Nardi et al., 2019;Yan et al., 2019), while remotely sensed observations of vertical fluxes are used to create land surface water balances everywhere (Miralles et al., 2011). Increasingly, new global data sets of land cover change and other human interventions in the water cycle are becoming available, such as artificial storage through reservoirs Mulligan et al., 2020). Widely used historical data such as river flows of the Global Runoff Data Centre (Grabs et al., 1996) are complemented with hydrologic response data compilations like karst spring hydrographs (Olarinoye et al., 2020). However, all of these data sets have their own issues and limitations as we stress further below.
The potential power of such large-scale data sets is maybe best exemplified by the tremendous success of machine learning and other data-based approaches, which often outperform simulation models that are based on our mechanistic understanding of how nature works (Kratzert, Klotz, Shalev, et al., 2019;Reichstein et al., 2019;Shen et al., 2018). For example, Boers et al. (2019) find teleconnection patterns in global extreme rainfall by analyzing high-resolution satellite data with the help of complex networks. Stolbova et al. (2016) managed to empirically predict the onset of the monsoon two weeks further in advance than previous methods (including predictions from dynamic models). Addor et al. (2018) used random forests to demonstrate the predictability of hydrologic signatures across the United States and showed that signatures predictable from descriptors which vary smoothly in space, such as those related to climate, regionalize particularly well. Machine learning tools have also been used widely to turn in-situ observations into global data sets. Jung et al. (2011) used a machine learning strategy-model tree ensembles-to upscale Fluxnet observations of carbon dioxide, water, and energy fluxes to the global scale. The ability of deep learning models to predict streamflow in gauged and ungauged catchments has already been demonstrated (Kratzert, Klotz, Herrnegger, et al., 2019;Nearing et al., 2021).
So, what is the link between dragons and these aspects of hydrology? Relevant hydrologic quantities and properties as well as their uncertainties tend to be poorly determined by scarce historical observations away from areas where observations are concentrated (Beven et al., 2020). The Predictions in Ungauged Basins (PUB) initiative has given us many examples that demonstrate this problem in the context of streamflow predictions (Hrachowitz et al., 2013), and the study of hydrologic extremes has shown that much of the landscape we think we know is rather akin to a terra incognita-unmapped regions (B. Merz et al., 2015). However, we also find that our models sometimes seamingly work well in some regions without local calibration (Van Werkhoven et al., 2009), so it is not simply a question of well-studied regions versus poorly studied ones. Large-scale or even global hydrology brings this problem even more strongly to the forefront. In our endeavor to move to larger scales, we unavoidably move hydrologic investigation away from highly studied headwater catchments, hillslopes, aquifers, or Fluxnet sites to regions of poorly explored and poorly characterized landscapes-relying on our ability to extrapolate with models instead (Fan et al., 2019). But this presents a major challenge: How can we deal with such knowledge gaps or epistemic uncertainties apart from hoping for potential new future measurement techniques? Large-scale outputs of hydrologic simulation models or compiled data sets will unavoidably include hydrologic dragons rather than meaningful information in some places. So, similar to map makers who drew the Salviati Planisphere map from 1525, we need to start distinguishing these areas so that they can be highlighted as "blank," that is in need of further exploration. This is of course not a question of information versus no information (truly blank space), because the space of possible hydrologic behavior or system properties is not completely unconstrained anywhere in the world (Beven, 2001;Kirchner, 2006;Wagener & Montanari, 2011). It is rather a question of how much can we know, what is the basis for this knowledge, and how confident are we in our knowledge?
So, while the combination of global models (both data-based and mechanistic) and global data sets undoubtedly offers tremendous opportunities for scientific advancement and for new scales of management, it also contains hydrologic dragons, that is, knowledge gaps that are currently difficult to identify and address. Our observations of hydrologic fluxes and storages do not allow us to characterize the water balance (especially subsurface properties) at the above-mentioned hyper-resolution (Beven et al., 2015;Beven & Cloke, 2012). Different assumptions (poorly constrained by available information) still leave the potential for different conclusions when using the same data, for example, that catchment-scale water balance errors are due to precipitation error  or due to subsurface losses (Liu et al., 2020). Global geological data show artifacts such as variability along administrative boundaries due to differences in processing underlying observations across administrative units (Gleeson et al., 2011). Pedotransfer functions based on soil texture classes, the basis for estimating soil hydraulic properties used in many hydrologic and other models, have been derived from very limited and biased empirical data, while ignoring structural soil characteristics (or 2019; Fatichi et al., 2020). So, it is not too surprising that Gutmann and Small (2007) find soil texture only explains a small fraction (5% in their case) of the expected variability of soil hydraulic properties. Similarly, Rosero et al. (2010) found that behavioral soil and vegetation parameters, derived as parameter sets through conditioning of the NOAH land surface model to observations from flux towers along a climatic gradient with varying soil and vegetation properties, correlated with the climatic gradient, but not with soil or vegetation properties. These examples put a critical focus on the transfer algorithms used to translate the actual measurement into hydrologically meaningful information. Most (if not all) of our data sets are based on measurements subsequently processed through models for interpretation, interpolation or extrapolation (Gupta & Nearing, 2014). They are therefore associated with their own (and often significant) uncertainties with potentially significant consequences for subsequent use as we will discuss further below (Kauffeldt et al., 2013;Yatheendradas et al., 2008). For example, J. Yang et al. (2013) provide a good discussion of the uncertainties stemming from remote sensing measurements, as well as from the subsequent algorithms needed to transform these measurements into the relevant variables for climate science. How often do we consider or even just acknowledge the uncertainties in this processing chain where possibly unknown corrections are made or poorly defined parameters are used in the algorithms? Are these data post-processing models sufficiently realistic and do we sufficiently acknowledge this lack of realism where and when it occurs?
So how do we identify hydrologic dragons, and, more importantly, how might we overcome them given that they originate from a lack of observations or even observational capability? In this brief commentary, we discuss two strategies to address this question.
• A focus on perceptual models to pool and test our knowledge.
• Improved knowledge accumulation in hydrology.
Below we discuss each of these strategies and what role they play.

| PERCEPTUAL MODELS TO POOL AND TEST OUR KNOWLEDGE AND EXPERIENCE
We propose that a currently underutilized strategy to collect and share information (defined here as "data seen in a particular context") as well as knowledge (defined here as "our understanding gained through experience") valuable for large-scale modeling and hydrology in general lies in openly shared and jointly evolved perceptual models. Perceptual models in hydrology are defined by the evolving understanding of real-world system based on the interpretation of all available information, influenced by each hydrologist's unique experience and training (Beven, 2001;Gupta et al., 2008;Gupta et al., 2012;Seibert & McDonnell, 2002;Tetzlaff et al., 2008). Sometimes, perceptual models are seen as one step in a modeling chain where they form the basis of more formal system conceptualizations, for example, in hydrology (Beven, 2001) and hydrogeology (Brassington & Younger, 2010). Here we use the term perceptual model as the (typically visual) representation of the hydrologist's understanding of the system, including her subjective understanding, speculation and opinion, but without any specific consideration of subsequent simulation model building efforts (e.g., whether the subsequent simulation model is spatially lumped or distributed). We do not, of course, claim that any modeler would build and apply large-scale hydrologic model without perceptual models as a baseline, or that an experimentalist does not have a perceptual model in mind when placing their instruments ( Figure 2): the issue is rather one of publishing and sharing such models so that differences in the interpretation of available information about the hydrology of a place becomes visible, can be debated, and can be addressed.
Large-scale hydrologic simulation models depend on appropriate data sets to define their parameters (and potentially model structure). Workflows to integrate simulation models and data have become increasingly sophisticated and efficient (Turuncoglu et al., 2013;Leonard and Duffy, 2014;Leonard and Duffy, 2016;Blair et al., 2019). Leonard and Duffy (2013), for example, developed workflows through which 100 s of terabytes of US data sets organized at the United States Geological Survey Hydrological Unit Code Level 12 scale can be used to parameterize a version of their watershed model anywhere in the continental United States. Workflows like these facilitate the integration of models F I G U R E 2 We suggest moving the perceptual model from its often implicit side-role, to an explicit central role for both experimentalists and modelers, that is, for both deciding where and what to measure, as well as how to simplify reality in our simulation models ( and data, leading to a strong focus on available data. A priori models (i.e., without subsequent calibration) built in this manner are the basis for much of our global hydrology (Y. Yang et al., 2019).
Relying heavily on the integration of data sets creates problems when and where currently available data sets are poor descriptors of the underlying hydrologic processes, that is, where they are mere hydrologic dragons. It also ignores knowledge (or experience) that has been gained but is not easily embedded in data sets, thus potentially ignoring knowledge which would significantly alter our predictions. Hartmann et al. (2017) compared groundwater recharge estimates of two large-scale models, PCR-GLOBWB and VarKarst-R across the carbonate rock regions of Europe, Northern Africa, and the Middle East ( Figure 3). The former is a global integrated hydrologic model based on global data sets, while the latter is a parsimonious model tailored to regions with strongly focused recharge processes. The authors found that recharge estimates of the simpler model were more consistent with available observations and local model results. One reason for this result is that the underlying perceptual model for VarKarst-R is based on the expected dominant system characteristics of carbonate rock regions, derived from experience derived other in highly studied locations. The perceptual model of PCR-GLOBWB (at the time) assumed that the world consisted only of two systems: mountains and alluvial plains-the remaining tailoring was done through adjustment of its parameters (Prof Mark Bierkens, personal communication). The simpler model used in-depth knowledge gained through local studies to develop different perceptual models for a key hydrologic domain (carbonate rock regions), which were then further constrained using similarity principles expressed as behavioral rules. How can we formalize the integration of available knowledge through perceptual models? While many journal papers, for example, those describing modeling studies, will include a schematic depiction of a simulation model, few include the underlying perceptual model of the system that the modeler had in mind. The schematic of the simulation model typically includes assumptions related to the implementation choices of the modeler, for example, they might select a spatially lumped or a grid-based model. However, this schematic might be quite far removed from the underlying system perception the modeler started with. Any perceptual model will F I G U R E 3 Two perceptual models of large-scale hydrologic models are shown at the top. On the left are four perceptual models of the carbonate rock regions across Europe/North Africa/Middle East by Hartmann et al. (2015). The different perceptual models are derived based on the expected differences between carbonate rock regions, including relative differences in the degree of karstification and the amount of storage present. The top right figure is the perceptual model underlying the global hydrological model PCR-GLOBWB ( © Marc Bierkens). The bottom graphs shows how the differences in perceptual models propagate into differences in recharge predictions of the simulation models (Hartmann et al., 2015) likely be much more complex than the subsequent simulation model we execute on a computer-how we simplify the former to reach the latter is one of the most exciting aspects of hydrological modeling (Beven & Chappell, 2021). We as a community have not, so far, systematically collected and used the knowledge provided in perceptual models for hydrologic modeling. Open perceptual models would provide a forum to discuss and challenge our current thinking about the dominant hydrologic processes of different places, about hydrologic connectivity, about boundary conditions, and so on. Even if we cannot agree on system properties and behavior in absolute terms, they might offer us an opportunity to discuss the relative difference between places (e.g., evapotranspiration rates should be higher in this place than that one, this catchment should respond faster than the next, or subsurface storage should be larger in this system than another one)-which might already significantly improve our understanding of dominant process controls and whose value is likely underrated in hydrological investigations (Rogger et al., 2012). Perceptual models also help us to make (unavoidable) subjective choices transparent by bringing them out into the open.
These perceptual models also directly relate to the wider problem of transferring knowledge that has been gained in a specific catchment or location to other (even seemingly similar) places. As McDonnell et al. (2007, p. 2) conclude that: "As a community, and as individuals, we have progressed along a philosophical path that 'if we characterize enough hillslopes and watersheds around the world through detailed experimentations, some new understanding is bound to emerge eventually.'" As the authors acknowledge, such a reductionist approach has not led to the transferrable knowledge we aim to find in hydrology (Dooge, 1986). How can we complement this focus on understanding individual places with one of understanding regional scale variability in a structured manner? Put another way, how can we construct our hydrologic knowledge landscape so that it transcends the uniqueness of place and we increase our chance of understanding whether hydrologic similarity might exists and at what scales-preferably utilizing advancements in computational science to capture and share knowledge landscapes (Gil et al., 2019)? Investigating individual catchments in depth unavoidably confronts us with high levels of complexity (Tetzlaff et al., 2008) and unique features that distinguish one particular catchment from another (Beven, 2000). However, at some higher level, we continue to assume that principles of hydrologic similarity apply and are helpful for regionalization, classification and thus organization of hydrologically relevant spatial units (McDonnell & Woods, 2004;Wagener et al., 2007). Part of the problem is a lack of hydrologically meaningful descriptors of catchments (or other hydrologic units of relevant size). Climatic and topographic catchment descriptors have been assessed widely (e.g., Seibert & McGlynn, 2007;Knoben et al., 2018), and they have been shown to be valuable predictors of some hydrological responses (e.g., Addor et al., 2018;Kuentz et al., 2017;Troy et al., 2008). Subsurface characteristics on the other hand are much harder to observe and characterize (Addor et al., 2018;Beven & Cloke, 2012;R. Merz et al., 2020), and therefore might have to be more strongly based on our expectations than just on directly observable properties. Few attempts to integrate (expected) system conceptualizations and data have been made thus far (Boorman et al., 1995;Enemark et al., 2019) and opportunities for improvement remain . How we better characterize hydrologic units across scales meaningfully is unclear, though there is a danger that the subsequent stage of utilizing such data is the more appealing due to the advancements in machine learning and the ease with which we can build models today.
Open and structured discussions of perceptual models might reveal divergent expectations of the dominant hydrologic processes in particular places, potentially even before we have taken in situ measurements. Revealing such divergences would help us to identify hydrologic dragons and provide the basis for developing hypotheses to be tested, so that candidate perceptual models can be rejected. It could also support the development and testing of increasingly powerful deep learning models, thus connecting the process-based modeling community with the data-based modeling community (Nearing et al., 2021). A starting point for a global set of perceptual models could be the previously proposed simple perceptual models of comparative hydrology (Falkenmark & Chapman, 1989), which would nonetheless require much more tailoring to each location using top-down thinking already applied in many modeling studies (Sivapalan et al., 2003;Young, 2003). How much can we reduce model prediction uncertainty if we do not just constrain the expected hydrological behavior with available data, but also using our expectations across large scales (Hartmann et al., 2015;Sarrazin et al., 2018)? These ideas of using qualitative information more rigorously is of course building on previous suggestion at the catchment scale discussed, for example, by Seibert and McDonnell (2002), Savenije (2010), or Kelleher et al. (2013). Some studies have shown that simpler measurements, sensibly distributed in space and time, might provide insight that is more transferable than that produced by much more in-depth measurements that can only be made in very few places (Jencso & McGlynn, 2012).

| IMPROVED KNOWLEDGE ACCUMULATION IN HYDROLOGY
The philosophy of science underlying most of hydrology is based on the process of scientific evolution proposed by Popper (1959) where hypotheses are falsified through evidence (data) and remain conditionally valid only as long as they are consistent with all available evidence. The approach by Popper-one of hypothesis testing-is the strategy often utilized (or at least attempted) in hydrology (though some argue that this has not been done very well; Pfister & Kirchner, 2017;Beven, 2018;Beven & Chappell, 2021). Another idea, less frequently discussed, is that knowledge accumulation itself constitutes scientific advancement in its own right-rather than just being a component in the hypotheses falsification style scientific processes mentioned above. Does science inherently advance if we accumulate knowledge? Or only if this knowledge leads to the development of new ideas or theories? Bird (2007Bird ( , 2008 suggests that: "Science (or some particular scientific field or theory) makes progress precisely when it shows the accumulation of scientific knowledge; an episode in science is progressive when at the end of the episode there is more knowledge than at the beginning." While Bird was not the first to suggest this concept for scientific progress, which can be traced back to Francis Bacon, he nonetheless reenergized the discussion (Mizrahi, 2013). Here, we do not want to answer the question whether knowledge accumulation is equal to scientific progress or not, but rather stress that effective knowledge accumulation is a fundamental element for scientific progress and for tackling hydrologic dragons.
We argued in Section 2 that perceptual models are one way toward reducing knowledge gaps (dragons) in largescale hydrology. So how and where is knowledge-to build up these perceptual models-currently captured in the field of hydrology? And especially, how well does it accumulate? Given that we regularly ask what questions remain in hydrology (Sivapalan, 2009;Blöschl et al., 2019), it seems equally relevant to ask what we already know and how confident we are that all available knowledge has been captured. An exhaustive case study of how the hydrologic community has accumulated knowledge was the synthesis effort within the PUB initiative (Blöschl et al., 2019;Hrachowitz et al., 2013). Over 100 authors produced a compendium of what had been learned about the PUB problem-often rerunning analyses to make them consistent and comparable. While this work is a good example of what can be achieved through such a community effort, it used more resources than those normally available. Different-more sustainable and routine-strategies to achieve such a synthesis are needed to address our hydrologic questions (Blöschl et al., 2019).
Knowledge about the hydrology of different places is shared mainly through peer-reviewed journal papers. While this meant reading a few hundred papers per year in the 1970s, it now requires checking in excess of 3000 papers in 2020 alone (more than 8 per day), even if we only focus on the main hydrology journals (Figure 4). Many, if not most, of these papers will describe what has been learned by studying the hydrology of a particular place, or a collection of places, thus providing insight into the hydrologic variability found in our highly heterogeneous world (Beven, 2000).  (119,202,227,338,385,563,647), Journal of Hydrology (À, 103, 166, 231, 359, 735, 1266), Advances in Water Resources (À, 23, 26, 36, 115, 195, 234), Hydrological Science Journal (À, À, 72, 52, 79, 150, 217), Hydrological Processes (À, À, 21, 83, 281, 358, 306), Hydrology and Earth System Sciences (À, À, À, 60, 94, 306, 315). Hydrological Sciences Journal was searched as "Hydrological Sciences Journal -Journal des Sciences Hydrologiques" The existing meta-analyses show the great potential for learning by reviewing and synthesizing the existing literature (e.g., Evaristo & McDonnell, 2017;Price, 2011). Review papers play an important role as well, regardless of whether they are published in our main journals or in journals which specialize on reviews. In either case, problems include that reviews regularly cannot consider a large fraction of the papers in a particular topic (given the sheer number), and thus more likely propose a new organization of available knowledge with a limited number of papers as examples, that is, they are more qualitative reviews (which of course does not mean that they are not useful!). In response to this issue, some journals (e.g., Environmental Research Letters) specifically advocate more quantitative, metadata-driven reviews. These quantitative, metadata-driven reviews are a useful complement to more qualitative reviews and require an easier way to identify and organize the existing literature for synthesis. Given the growing complexity of reviewing any particular area of hydrology, it is maybe surprising that hydrology has not yet adopted more formalized strategies to literature reviews as is common in other fields-for example, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses methodology (Moher et al., 2009). Requiring authors to develop and explicitly communicate a strategy for their reviews would likely also be useful in hydrology. While a full systematic review might be too complex for a standard paper, the current lack of strategy often leads to poorly grounded studies and therefore unclear contributions.
We might also look at textbooks, field trips, or graduate supervisors for this knowledge. In contrast to journal papers, most hydrology textbooks provide the underlying hydrologic theory, but the tailoring to a specific place remains difficult to encapsulate in general guidelines (Wagener & Montanari, 2011). Field trips are a tangible and impactful way of sharing knowledge, and new ways for sharing our experience with understanding the hydrology of specific catchments or regions are emerging. For example, Google Earth Engine and other virtual earth educational tools provide exciting opportunities for "virtual field visits" in combination with those field trips that can be done locally. All students will be offered some level of accumulated knowledge through their supervisors' experience. However, each supervisor shares their subjective knowledge landscape and can only reach a small number of students, which creates a heterogeneous and dissatisfying baseline for our scientific community. It is important to have exceptional researchers who provide inspiration and creativity, but it would be beneficial to have a better general baseline for our accumulated knowledge.
We need a better way to find, extract, and accumulate the hydrological knowledge hidden in over 2500 papers published per year if we want to have a chance to tackle hydrologic dragons at the global scale. Hradec et al. (2019) call this the challenge of assessing information "trapped in the text". "The sheer volume of text means that, unassisted, we cannot hope to read all available sources, nor even to keep up to date with all advances in a particular field" (Hradec et al., 2019). For example: How many studies last year analyzed the water balance of catchments located in the subtropics? Or, have the Nash Sutcliffe Efficiency values of models applied to semi-arid catchments improved over the last decade? We can currently only answer these questions by manually looking through large numbers of papers in a tedious manner. In other fields, it has been recently proposed to approach this problem through machine learning, which has been shown to be able to extract scientific knowledge hidden in scientific papers (Hradec et al., 2019;Kumar et al., 2018;Tshitoyan et al., 2019). Tshitoyan et al. (2019) demonstrate what the automated mining of scientific literature can achieve by showing, for example, that structure-property relationships in materials can be derived from information gathered in this manner. Hradec et al. (2019) developed a software tool to perform a similar automated semantic analysis of a large number of documents to support European policy making. The recent need for fast progress in studying COVID-19 has further pushed the development and use of such methods (Khanday et al., 2020). While such approaches are not yet widely explored, hydrology, which offers a large number of historical studies on a very diverse hydrologic systems (at catchment or other scales), is exceptionally well placed to test its potential.
Regardless of the advancements made with text mining algorithms, the hydrological community needs to advance how knowledge synthesis is supported. The PUB effort (Blöschl et al., 2013), for example, has shown how tremendously difficult it is to simply identify everybody who has simulated a particular catchment in the past, and to ask how well their hydrologic model performed. A starting point to organize our articles in a hydrologically relevant manner would be the inclusion of mandatory metadata for each hydrologic study in each journal article as Essential Hydrological Descriptors in addition to the standard key words or subject tags. These could include geolocation, time period, spatial and temporal resolution, and extent, fluxes, and stores studied at the hydrologic study area. Requiring additional information that has to be provided by the authors has to be balanced with the effort it takes to provide this information. However, such metadata tagging is already done in some data journals such as Scientific Data which includes machinereadable metadata on location, time period etc. for every article. We suggest that this needs to be expanded to all journal articles published in hydrology (unless they are purely theoretical) so that the identification and synthesis of studies, for example, for a particular location and time period, is highly simplified. We would also need to add such machine-readable metadata tagging retrospectively to the many articles already published, so that we do not lose the information stored there and to continue to utilize what has been learned.
Simply organizing all published hydrologic studies on a particular topic, for example, flooding or groundwater recharge rates, by geolocation, would enable us to see where time and space clusters of studies exist and which studies we can compare for consistency of conclusions. More importantly, it would highlight actual blank spaces on the map of global hydrology, showing which catchments or regions have never been studied locally. Clearly the regions of Europe and North America will be densely populated with study locations, but how many places in the developing world have never been studied in the peer-reviewed scientific literature, or have actually been studied, but their results are published in papers that are unfortunately often rather poorly cited (maybe because they are not published in the main hydrology journals or published in a Red Book of the International Association of Hydrological Science)?
Such metadata should describe characteristics that are unlikely to change-such as the geolocation and time period covered by a study. They could form the basis for developing suitable further descriptors to analyze such studies during syntheses. Imagine a Web of Hydrology (rather than a Web of Science) where the papers (identified by the DOI and their metadata) are connected with hydrologically relevant information that would be calculated from one or more common global data sets-check out the https://www.isipedia.org idea in this regard. Such descriptors could include, for example, climate descriptors to group the existing studies not just by location, but by the similarity of the climate they were performed in (e.g., all studies performed in cold arid regions). The community debate around what climate (or topographic or geologic or …) descriptors should form the basis for organizing our hydrology by itself would be a very interesting and relevant study by itself (e.g., discussions by Winter, 2001;McDonnell & Woods, 2004;Buttle, 2006;Wagener et al., 2007). Including multiple data sets would enable at least a basic assessment of the uncertainty in how well we can characterize a place and a time period. This database would also slowly grow through efforts to extract and submit hydrologically relevant information from journal papers such as groundwater recharge estimates or the performance of a hydrologic model applied to a particular catchment. It would take a community-scale effort to make such a Web of Hydrology happen and widely utilized (e.g., similar to those during PUB; Wood et al., 2005).
A key question is of course how we would motivate and incentivize the community to do so. Adding additional metadata to future papers would simply be a requirement by the journal. To add these to historical articles might require a paid activity maybe even done by nonhydrologists like in Mechanical Turk (https://www.mturk.com), which is a crowdsourcing marketplace where an outsourced virtual workforce can perform tasks. The focus lies on outsourcing tasks that humans can still perform better than computers, such as doing research (see example application by Bonnefon et al., 2016). Another interesting model for motivation might be to gamify this activity as well as the subsequent analysis. The idea is to use techniques borrowed from game-development to motivate consistent participation and long-term engagement. A nice example of gamification is the Moral Machine (moramachine.net). Users are asked to make decisions about moral dilemmas. For example, a self-driving car is approaching a pedestrian crossing and a break failure occurs. The car can either stay in the lane and likely kill a child, or it can swerve and likely kill an adult. What should the self-driving car do? The Moral Machine online experiment collected 40 million decisions from people in 233 countries and territories (Awad et al., 2018). Results showed, for example, that there are strong preferences for sparing humans over animals, for sparing more lives over fewer and for sparing young people over older people.

| CONCLUSIONS
The outputs of regional-to global-scale inquiries in hydrology unavoidably contain hydrologic dragons, that is, regions where the uncertainty in expected hydrologic behavior or relevant system properties is very high due to a lack of local knowledge. Our observations are often too sparse, our data sets are not equally valid everywhere due to the empirical post-processing models they are based on, and we cannot "see" key processes due a lack of measurement capability. All of these points are simply statements of the present state of our science, and we are not the first to point them out. The wider problem is that few studies highlight such knowledge gaps where they exist, how they propagate into model outputs, and what their consequences are for our conclusions. We drew a comparison with cartography in the 15th/16th century where cartographers shifted from filling all parts of the map to being content with leaving large areas blank. Blank spaces represented significant knowledge gaps that invited exploration. Highlighting knowledge gaps became a key outcome, rather than something to hide. How many large-scale maps of hydrologic model predictions or hydrologic data products have been published with blank areas highlighting knowledge gaps, for example, regions where the model does not reach specific predictive benchmarks-using local data or regionalized information (Seibert et al., 2018;Wagener & Montanari, 2011)? Furthermore, while this would be a good start, how do we tackle these hydrologic dragons?

| First, open and shared collective perceptual models
Hydrology as a science is strongly dependent on experience. This experience is difficult to share and pass on fully. One strategy to improve this sharing, we believe, lies in the development of collective and open perceptual models that evolve if new insight becomes available. Such perceptual models would have to be developed with a granularity that is sufficient to derive testable hypotheses (Beven & Chappell, 2021), but not too fine, because this would distract the focus from dominant processes which should be captured. Simple perceptual models that capture our expectation of how a system will behave already exist-for example, within the comparative hydrology framework by Falkenmark and Chapman (1989). More complex and spatially distributed versions transferred to larger scales or across larger domains are required to facilitate where our understanding converges or diverges when applied outside of experimental catchments. Visualizing such changing knowledge landscape is an interesting challenge for computer science (Gil et al., 2019). Even weak constraints on hydrologic dynamics derived from such perceptual models might help to constrain acceptable model behavior as has been shown at the catchment scale and beyond (Hartmann et al., 2015;Seibert & McDonnell, 2002;Wagener & Montanari, 2011). We finally should stress that shared, open and evolving perceptual models would make exceptionally good tools for hydrology teaching across all levels of students.

| Second, improved knowledge accumulation
We argue that knowledge accumulation is poor in the field of hydrology and needs to become a much stronger focus. While the sharing of insights through collectively developed and shared perceptual models would be a first step, much of our knowledge is captured in journal articles, and not easily found or extracted. Here, semantic data mining algorithms might offer the chance to harvest the existing knowledge in an effective manner. In the future, we need to improve the efficiency of extracting and synthesizing knowledge from published work. To do so would require a metadata tagging of journal papers with Essential Hydrological Descriptors such as geolocation and time period studied. A separate open database-a Web of Hydrology-could become a community virtual laboratory by linking these essential metadata to evolving descriptors of climatic, topographic, or other properties.
Some 350 years after John Cabot had set sail to the West from Bristol harbor, Alexander von Humboldt published, for the time, an incredibly comprehensive portrait of nature in the first volume of his work Cosmos: A Sketch of a Physical Description of the Universe in 1845. Humboldt's Cosmos was largely the results of multiple expeditions in the Americas to explore some of the blank areas shown in the Salviati Planisphere (Figure 1b). His aim was "… to grasp Nature's essence under the cover of outer appearances" by studying the "perceptible world," an objective akin to Dooge's search for hydrologic laws (Dooge, 1986). Humboldt took an incredible 25 years to write his fivevolume Cosmos while corresponding by letter with scientists across the globe on topics including botany, geology, geography, and volcanology. We have since moved on to communicate via (increasingly open) journal articles, and via exchanges at conferences and online meetings, but knowledge accumulation remains cumbersome and timeconsuming. We need to urgently rethink how we share, debate, and ultimately accumulate hydrologic knowledge given the opportunities provided by web-based tools and machine learning-this would help us to tackle some of our dragons. It might even help us to realize that some systems are not hydrological monsters with unexpected behavior (a term coined by Kuczera et al., 2010), if we might find out that their behavior is not as unexpected as we think it is once we gain a better overview.

ACKNOWLEDGMENTS
Thanks to Sina Leipold from the University of Freiburg for helpful comments on the philosophy of science. We further thank Keith Beven, Conrad Jackisch, Eric Wood, Marc Bierkens, Alberto Viglione, Jan Seibert, Hoshin Gupta, Bodo Bookhagen, and two anonymous reviewers for constructive criticism on earlier versions of the manuscript. Especially the debates with Keith Beven in the context of this commentary are highly appreciated. Thanks to Marc Bierkens for sharing the drawing of the early PCR-GLOBWB perceptual model. Partial support to Thorsten Wagener was provided by a Royal Society Wolfson Research Merit Award (WM170042) and by the Alexander von Humboldt Foundation in the framework of the Alexander von Humboldt Professorship endowed by the German Federal Ministry of Education and Research. Partial support for Tom Gleeson was provided by a Benjamin Meaker Distinguished Visiting Professorship at the University of Bristol. Andreas Hartmann was supported by the Emmy-Noether-Program of the German Research Foundation (HA 8113/1-1). Rafael Rosolem was partially supported by the International Atomic Energy Agency of the United Nations (IAEA/UN) coordinated research project (CRP D12014). Francesca Pianosi was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) through a "Living with Environmental Uncertainty" fellowship [EP/R007330/1]. Lina Stein was funded as part of the WISE CDT under a grant from the Engineering and Physical Sciences Research Council (EPSRC) (EP/L016214/1).

DATA AVAILABILITY STATEMENT
The statistics in Figure 4 are included in the caption of the figure in the manuscript. Further data sharing is not applicable to this article as no further new data were created or analyzed in this study.