Challenges in modeling and predicting floods and droughts: A review

Predictions of floods, droughts, and fast drought‐flood transitions are required at different time scales to develop management strategies targeted at minimizing negative societal and economic impacts. Forecasts at daily and seasonal scale are vital for early warning, estimation of event frequency for hydraulic design, and long‐term projections for developing adaptation strategies to future conditions. All three types of predictions—forecasts, frequency estimates, and projections—typically treat droughts and floods independently, even though both types of extremes can be studied using related approaches and have similar challenges. In this review, we (a) identify challenges common to drought and flood prediction and their joint assessment and (b) discuss tractable approaches to tackle these challenges. We group challenges related to flood and drought prediction into four interrelated categories: data, process understanding, modeling and prediction, and human–water interactions. Data‐related challenges include data availability and event definition. Process‐related challenges include the multivariate and spatial characteristics of extremes, non‐stationarities, and future changes in extremes. Modeling challenges arise in frequency analysis, stochastic, hydrological, earth system, and hydraulic modeling. Challenges with respect to human–water interactions lie in establishing links to impacts, representing human–water interactions, and science communication. We discuss potential ways of tackling these challenges including exploiting new data sources, studying droughts and floods in a joint framework, studying societal influences and compounding drivers, developing continuous stochastic models or non‐stationary models, and obtaining stakeholder feedback. Tackling one or several of these challenges will improve flood and drought predictions and help to minimize the negative impacts of extreme events.


| INTRODUCTION
Droughts and floods can have severe societal, ecologic, and economic impacts. Key examples are the multi-year California drought -2016Diffenbaugh, Swain, Touma, & Lubchenco, 2015), which caused severe losses in crop yields and led to water restrictions in urban areas (Luo et al., 2017), and the Danube flood in 2013, which caused heavy damages along the Danube and many of its tributaries (Blöschl, Nester, Komma, Parajka, & Perdigão, 2013). The impacts of such hydrological extreme events can be particularly severe if they affect a large region or occur in close succession. Rapid drought-flood transitions such as the transition of the California drought 2011-2016 (Diffenbaugh et al., 2015) to the winter 2017 California flood (White, Moore, Gottas, & Neiman, 2019) or the transition from longlasting drought conditions in the Missouri basin from 2002-2006 to high flows in 2007 (Woodhouse & Wise, 2020), challenge water management and reservoir operation because of trade-offs between long-term water storage and shortterm flood control . Management strategies to address floods or droughts independently may limit ability to cope with the other type of extreme. Therefore, droughts and floods should be jointly considered to enable development of integrated and efficient management strategies (Di Baldassarre, Martinez, Kalantari, & Viglione, 2017;Kreibich et al., 2019).
The development of management strategies to minimize the negative impacts of droughts, floods, and rapid event transitions relies on our ability to predict floods and droughts at different time scales. Planning of emergency responses and early warnings rely on flood and drought forecasts at hourly to seasonal time scales (Mirza, 2010). Hydraulic design of reservoirs for water supply or retention basins for flood protection requires estimates of frequency and magnitude of extreme events, for example, the probability of occurrence of a 100-year flood event. Development of adaptation strategies requires long-term multi-decadal projections of droughts and floods representing future climate, land use, and water management conditions. All three types of predictions-forecasts, frequency estimates, and projections-are typically derived in a framework where floods and droughts are treated separately even though floods and droughts can happen in close succession, which may challenge water management. A primary reason for their separate treatment is the fact that the two types of extremes evolve from different processes. Floods are fast phenomena triggered by excess precipitation, snowmelt, high initial soil moisture, or a combination of these potential drivers (Berghuijs, Woods, Hutton, & Sivapalan, 2016;Wasko & Nathan, 2019). In contrast, droughts evolve from slowly developing processes often triggered by a precipitation deficit and modulated by positive temperature anomalies (Hanel et al., 2018;Woodhouse, Pederson, Morino, McAfee, & McCabe, 2016), which potentially lead to high evapotranspiration losses, or prolonged periods of below freezing temperatures that combined with low flow conditions at the start of the winter season may cause a winter drought to develop.
Even though floods and droughts are different from a process perspective, we can often use similar methods and modeling approaches for deriving predictions for both types of extremes. For example, a common approach for drought and flood forecasting is to use a hydrological model to simulate continuous streamflow time series with input from a numerical weather prediction model (Alfieri et al., 2013;Cloke & Pappenberger, 2009;Emerton et al., 2016;Pagano et al., 2014;Wu et al., 2020;Yuan, Wood, Roundy, & Pan, 2013). To derive drought and flood estimates for a given return period, a common approach is to fit a generalized extreme value distribution to annual minima or maxima, respectively (Meylan, Favre, & Musy, 2012). To obtain future predictions and projections of hydrological extremes, possible approaches include driving a hydrological model with meteorological data from a global/regional circulation model (e.g., Addor et al., 2014;Hakala et al., 2019;Mendoza et al., 2015;Mizukami et al., 2016;Wilby, 2010) or employing hybrid statistical-dynamical techniques using projections as covariates within a statistical modeling framework (e.g., Madadgar et al., 2016;Slater & Villarini, 2018).
In the simplest case, forecasts, predictions, and projections are derived for one type of extreme (droughts or floods) from observed or simulated time series, for a single catchment, and for natural and current flow conditions. However, the prediction challenge is more complex in many cases. We may be interested in transitions between droughts and floods, have to deal with ungauged catchments where no streamflow observations are available, be interested in regional flood or drought events affecting several catchments at once, and study catchments where streamflow is altered by reservoir operation or future climate change. This review addresses these challenges affecting drought and flood prediction. Its goal is twofold: (a) identify challenges common to drought and flood prediction and joint drought and flood hazard assessments; and (b) discuss how to jointly tackle these challenges for both types of extremes.
We group challenges related to drought and flood prediction into four interrelated categories: (a) data, (b) process understanding, (c) modeling and prediction, and (d) human-water interactions (Figure 1). For each category, we subsequently discuss challenges and tractable methods to tackle them.

| Challenges
Data-related challenges arise both due to limited data availability and event definition. Limited data availability challenges drought and flood modeling because it prevents from the description of the full variability of extremes. The choice of event definition is crucial because it can have substantial effects on the outcome of extreme value analyses.

| Data availability
Both statistical and hydrological models rely on streamflow observations for model calibration and evaluation. However, such records may not be available at all, may be available only for a short period of time, or may contain gaps. Furthermore, data may be uncertain due to measurement errors or inhomogeneities or they may not be publicly available. In addition, we may lack spatial information and information on factors influencing streamflow conditions. In the following paragraphs, we discuss these categories and their effects on flood and drought modeling. Hydraulic modeling Observed streamflow records are only available for specific locations along a stream where a gauge has been established and station density may be low, a challenge that is particularly expressed on the African continent (Tramblay, Rouché, et al., 2020;Tramblay, Villarini, & Zhang, 2020). If a catchment is ungauged (i.e., does not have streamflow observations), modeled data or alternative information such as catchment and climate characteristics have to be used instead for drought and flood prediction. At the national scale, some countries make their streamflow observations publicly available (e.g., through national web interfaces or the international Global Runoff Data Centre [GRDC]), but in other countries streamflow records might not be updated regularly or are considered to be of strategic importance (e.g., for reservoir operations) and thus are not publicly available. In addition, the lack of trans-national datasets is a major issue for large-scale analyses of drought and flood events. 2. Short records: Even if there is a gauging station, long streamflow records are only available in locations where streamflow has been consistently measured and monitored at a stationary gauging location over a period of years or decades (Hannah et al., 2011). As a consequence, records may be short and most countries have few time series that extend beyond 30, 50, and 100 years at daily resolution (for Europe see e.g., Mediero et al., 2015). However, extreme value analysis of floods and droughts requires long streamflow records to achieve an acceptable level of (un)certainty particularly for long return periods (such as 20-, 50-, or 100-year events; Meylan et al., 2012). Long time series are particularly important if we are interested in studying very rare events such as multivariate and spatial extremes, for example, long and intense drought events or flood events affecting multiple catchments at once, as well as transitions from long-lasting droughts to severe floods. Even if long time series may be obtained, only part of the series may reflect current climate conditions due to non-stationarities (Milly et al., 2008). 3. Data gaps: Gaps may exist in observed streamflow records for a variety of reasons such as discontinued measurements due to funding issues or damaged/displaced stream gauges following flood events. Such data gaps can affect our ability to assess natural variability and detect trends (Slater & Villarini, 2017). This problem may be overcome by using gap-filling/imputation techniques (Hamzah, MohdHamzah, Mohd Razali, Jaafar, & AbdulJamil, 2020), which may, however, introduce spurious trends or spatial dependencies in the case of large data gaps (Zhang & Post, 2018). 4. Uncertainty and inhomogeneity: Streamflow data biases and errors can arise from site selection, instrumentation, sampling/measurement procedures, postprocessing, or inadequate rating curves (Kiang et al., 2018;Neppel et al., 2010;Wilby et al., 2017). Rating curves are particularly uncertain in unstable river channels that adjust their form due to erosion or sediment deposits, ice build-up, vegetation growth, and blocked wood debris, or after major floods (Lang, Pobanz, Renard, Renouf, & Sauquet, 2010;Mansanarez, Renard, Coz, Lang, & Darienzo, 2019). These biases and errors have been shown to be particularly pronounced for floods (Steinbakk et al., 2016) and low flows (Sörengård & Di Baldassarre, 2017) because these extreme events occur rarely and are difficult to measure with a high precision. 5. Data access: Publicly available streamflow datasets are still rare because of data licensing restrictions, strict access policies, or the time required to make these datasets readily usable at the global scale (Coxon et al., 2020). Data access is often also hampered by storage in non-centralized databases, which are maintained by regional rather than national authorities, and may only be accessible in the local language. 6. Lack of spatial information: Observations of discharge and other hydrological variables, such as surface water storage in rivers, lakes, reservoirs, and wetlands, are currently available for selected locations and poorly observed at the global scale (Biancamaria, Lettenmaier, & Pavelsky, 2016). The resulting unequal station densities can hamper spatial analyses. For example, analyses of the extent of floods and droughts are only feasible if there is a sufficient spatial density of stream gauges. Even if observations are available for a spatial set of catchments, these catchments may not have temporally overlapping records for example, due to station removal, which limits their usefulness for spatio-temporal analyses. 7. Lack of information on human influences: Hydrologic extremes can be significantly influenced by human flow regulations through hydropower production, water abstraction, or water diversions (He, Wada, Wanders, & Sheffield, 2017;Mahe et al., 2013;Tijdeman, Hannaford, & Stahl, 2018;van Oel, Martins, Costa, Wanders, & van Lanen, 2018;Verbunt, Groot Zwaaftink, & Gurtz, 2005). However, data on human impacts, such as land use and channel morphology changes, water abstractions, or reservoir regulations, lack sufficient temporal and spatial detail or are not available at all (Ikeshima & Bates, 2019;Wada et al., 2016;Yassin et al., 2019). The lack of metadata on human impacts within a catchment may be particularly problematic for hydrological attribution studies of flood and drought drivers.
8. Uncertainty of meteorological drivers and initial conditions: Uncertainty in meteorological drivers and initial conditions can lead to uncertainties in streamflow forecasts, particularly for longer lead times (Hao, Singh, & Xia, 2018). Quality gridded meteorological datasets, which are essential for certain hydrological models, rest on the quality and availability of underlying point-based data (Clark & Slater, 2006;Cornes, van der Schrier, van den Besselaar, & Jones, 2018;Newman et al., 2015).

| Event definition
Flood and drought predictions hinge on event definition because extreme events can be defined using different variables and different event identification approaches. These two aspects are discussed in the next two paragraphs.
1. Variable choice: In the case of floods, streamflow and water level are the main variables of interest while different types of drought rely on different variables: precipitation for meteorological drought, soil moisture for soil moisture drought, and streamflow, reservoir levels, or groundwater levels for hydrological drought (Wilhite, 2000). Clear communication of the drought type under study is crucial to enable the interpretation and comparability of findings. 2. Event identification: Floods can be defined as annual maxima events or partial duration series using the peak-overthreshold approach (Lang, Ouarda, & Bobée, 1999), whereas droughts can be defined as annual minimum events (e.g., the lowest flow each year) or defined as events below a given threshold characterized by drought duration and deficit volume (Heudorfer & Stahl, 2017;Stahl et al., 2020) (for an illustration see Figure 2). Based on the latter, either annual maximum series (i.e., largest drought each year) or all events (resembling a partial duration series) can be subject for further analysis. Annual maximum and minimum approaches may contain non-extreme values from notable wet or dry years while threshold-based approaches are sensitive to the choice of threshold level, it being either too high (too many non-extreme values) or too low (too few events and many years without events) (Bezak, Brilly, & Šraj, 2014;Hisdal, Tallaksen, & Frigessi, 2002).
The use of a threshold-based approach requires ensuring independence between events, which can be achieved in a similar way for both floods and droughts. For floods, a time window of 3-10 days between threshold exceedances may be appropriate to assure independence depending on the catchment (Meylan et al., 2012), whereas for drought, splitting of major droughts into several small events may be avoided by pooling events that are separated by less than 2-3 days. Such pooling can be achieved using different methodologies including time series smoothing prior to event selection (Tallaksen, Madsen, & Clausen, 1997). Additional challenges regarding event definition arise when studying regional events, which requires a spatio-temporal event definition (Cattiaux & Ribes, 2018) often achieved by tracking event clusters through time and space (Andreadis, Clark, Wood, Hamlet, & Lettenmaier, 2005;Chang, Stein, Wang, Kotamarthi, & Moyer, 2016;Herrera-Estrada, Satoh, & Sheffield, 2017). F I G U R E 2 Illustration of (a) flood and (b) drought identification using a peak-over-threshold and trough-under-threshold, that is, threshold-level approach, respectively

| Tackling challenges
In this section, we identify three main avenues for improving issues related to data availability: (1) improving data sharing, (2) exploiting new data sources, and (3) increasing sample size. In addition, we discuss potential avenues for improving event identification.
1. Improving data sharing: A concentrated and concerted effort is needed to make streamflow time series, catchment characteristics data, and management information publicly available in easy-to-use, regularly-updated, and accessible databases. Accessibility can be ensured by providing download instructions in a common language (e.g., English), by removing pay-walls, and by providing tools facilitating automatic downloads such as R-packages. A major effort for sharing continuous streamflow data on a global scale has been made by the GRDC (Bundesanstalt für Gewässerkunde bfg, 2019) and on a national scale for example, by the National River Flow Archive in the United Kingdom (UK Centre for Ecology and Hydrology, 2020) and the National Water Information System in the United States (USGS, 2019). In addition to these continuous streamflow databases, there exist databases of streamflow indices on a global (Do, Gudmundsson, Leonard, & Westra, 2018) and continental scale for example, the African database of hydrometric indices (Tramblay, Rouché, et al., 2020). Catchment attributes and information on water storage and degree of regulation on a global scale are made available by the HydroATLAS database (Linke et al., 2019) or the Catchment Attributes and Meteorology for large-sample studies datasets, which are available for a selection of countries (Addor, Newman, Mizukami, & Clark, 2017;Alvarez-Garreton et al., 2018;Chagas et al., 2020;Coxon et al., 2020). Other databases focus on droughts and floods specifically, such as the Global Drought and Flood Catalogue (He, Pan, Wei, Wood, & Sheffield, 2020) or the Dartmouth flood observatory that compiles information about large floods at the global scale from satellite data (Kettner & Brakenridge, 2020). Ideally, various databases would provide data in the same format and be accessible through harmonized workflows including Application Programming Interfaces and open-source packages (Slater, Thirel, et al., 2019). 2. Exploiting new data sources: We highlight the potential of new data sources, such as remotely sensed data measured by satellites or drones (unmanned aerial vehicles) and data collected by citizen scientists, for complementing in situ observations. Such additional data sources may be particularly valuable in regions which have sparse in-situ measurement networks such as the African continent (Tramblay, Rouché, et al., 2020). Remotely sensed data can provide a better impression of spatial variability than in situ observations because of more consistent spatial coverage and data availability for trans-boundary watersheds; however, their temporal resolution is often lower than that of in-situ data (days or several days versus hours). Three products with particular potential for drought and flood research are high-resolution digital elevation models (DEMs), the Gravity Recovery and Climate Experiment (Tapley, Bettadpur, Ries, Thompson, & Watkins, 2004), and the Surface Water and Ocean Topography (SWOT; Alsdorf, Rodriguez, & Lettenmaier, 2007) mission. High-resolution DEMs can improve hydrological modeling, for example, by a better representation of land-surface processes (Zhang & Montgomery, 1994) and flood inundation areas (Schumann & Bates, 2018). Measurements of earth's gravity field anomalies have recently been used to provide estimates of water surface and subsurface storage changes at regional to global scales (Tapley et al., 2004). Storage anomalies have been used to derive maximum water storage capacity and flood potential (Reager & Famiglietti, 2009), information on groundwater depletion (Famiglietti et al., 2011), or drought indices (Boergens, Güntner, Dobslaw, & Dahle, 2020;Gerdener, Engels, & Kusche, 2020). The future SWOT mission will measure surface water elevation, slope, and a water mask, which will enable estimation of surface water storage and fluxes in rivers, lakes, reservoirs, and wetlands at the global scale at 100 m spatial resolution (Biancamaria et al., 2016). These data are potentially valuable for deriving flood inundation areas (Frasson, Schumann, Kettner, Brakenridge, & Krajewski, 2019), discharge in ungauged catchments, and information on human activity such as water withdrawals. However, the data will only be available at an uneven and longer than daily temporal resolution and therefore not necessarily allow for the detection of shorter events such as flash floods. Data at finer spatial and temporal resolution can be collected by drones, which have been previously used to map evapotranspiration (Wang et al., 2019), measure flow velocity (Eltner, Sardemann, & Grundmann, 2020), and estimate streamflow (Kang, Kim, Kim, & Kang, 2019;Tauro, Petroselli, & Arcangeletti, 2016). Citizen science and crowdsourcing can provide local water level measurements and snow depth observations for model calibration in remote regions (Etter, Strobl, Seibert, & van Meerveld, 2020;Hill, Wolken, Jones, Crumley, & Arendt, 2018) and for characterizing process heterogeneity or human impacts on the water cycle (Buytaert et al., 2014). In addition, they can provide local data for real-time flood and drought monitoring (Douvinet, Kouadio, Bonnet, & Gensel, 2017;See, 2019), particularly in urban areas (Wang, Mao, Wang, Rae, & Shaw, 2018). However, these data are often discontinuous and associated with large uncertainties (Buytaert et al., 2014). Various methods have been proposed to generate data through engagement of the public. Water level observations can be sourced for fixed staff gauges through the transmission of text messages (CrowdHydrology; Lowry & Fienen, 2013) or for virtual staff gauges through a smartphone app (CrowdWater; Seibert, Strobl, Etter, Hummer, & van Meerveld, 2019). Information on wet and dry conditions can be collected from participants through online platforms (CoCoRaHS; NOAA, 1998). Flood monitoring and inundation mapping are enabled thanks to social media, for example, twitter posts, and crowdsourced data such as photos (Eilander, Trambauer, Wagemaker, & Van Loenen, 2016;Fohringer, Dransch, Kreibich, & Schröter, 2015;Wang et al., 2018). 3. Increasing sample size: Efforts need to be made to increase the size of flood and drought samples to gain a better understanding of natural drought and flood variability, drought-flood transitions, and long term trends. One way of increasing the sample size is by reconstructing time series further back in time than the start period of the observed records, for example, using tree-rings (Meko et al., 2007;Ryberg, Vecchia, Akyüz, & Lin, 2016) or chronicles and other historical records (Blöschl, Kiss, Viglione, Barriendos, & Böhm, 2020). Another possibility to increase the sample size is the use of stochastic simulation approaches that allow hydrologists to generate very long streamflow time series or large sets of extreme events with similar statistical characteristics as the observed series (Ilich, 2014;Lall & Sharma, 1996;Salas, Ramirez, Burlando, & Pielke, 2003;Vogel & Stedinger, 1988). Long streamflow time series can also be generated in a more physically-based way by generating an ensemble of streamflow time series using a variety of climate/meteorological forcings as input to a hydrological model. Ensembles of climate simulations are used to enhance the sample size of forcings and obtain a better understanding of uncertainties. Examples of such ensemble-based simulation approaches include Single-Model Initial-condition Large Ensembles (Poschlod, Willkofer, & Ludwig, 2020; van der Wiel, Wanders, Selten, & Bierkens, 2019) and the UNprecedented Simulated Extreme ENsemble approach, which increases the sample size by pooling ensemble members and lead times from seasonal prediction systems (Kelder et al., 2020;Thompson et al., 2017). 4. Event identification: Unifying drought and flood definitions is possible if focusing on below-and above-threshold events for droughts and floods, respectively. Whereas flood frequency analyses usually focus on maximum peak discharge or flood volume, drought analysis addresses both maximum values (e.g., drought duration and deficit volume) and minimum values (e.g., low flow), depending on the purpose of the study and the variable of interest. In either case, the choice of distribution should be guided by the type of extreme, that is, whether it is a maximum or minimum value. In case of maximum values, the distribution chosen should preferably not be bounded above, whereas for minimum values a lower bound of zero should apply (Tallaksen, Madsen, & Hisdal, 2004). For studying floods and droughts in a regional context, we need to develop tools for identifying spatio(-temporal) events from time series of spatial fields, which is easier when using gridded than point data. Spatial events can be identified by looking at spatial extent (Kemter, Merz, Marwan, Vorogushyn, & Blöschl, 2020) or maximum cluster area (Tallaksen & Stahl, 2014) and spatio-temporal events can be defined using Lagrangian approaches aggregating contiguous areas under drought/flood into clusters (Herrera-Estrada et al., 2017) or approaches based on almostconnected component labeling (Chang et al., 2016).

| PROCESS UNDERSTANDING
The analysis of hydrologic extremes is complicated by multiple driving mechanisms, their multivariate characteristics, for example, duration and peak flow, and their spatial dimension, that is, they are regional phenomena often affecting several catchments at once. Furthermore, droughts and floods may be characterized by non-stationarities arising through natural oscillations or long-term trends introduced by climate or land use change and water management.

| Multiple driving mechanisms
Floods and droughts can arise from different event triggering mechanisms, for example, heavy precipitation or snowmelt in the case of floods (Berghuijs et al., 2016; Bertola, Viglione, Hall, & Blöschl, 2020) or a lack of precipitation or high evapotranspiration losses in the case of droughts (Hanel et al., 2018;Woodhouse et al., 2016). For floods, a variety of classification schemes have been proposed, which rely on meteorological forcing and catchment state to distinguish between flood generation types (Merz & Blöschl, 2003;Sikorska, Viviroli, & Seibert, 2015;Stein, Pianosi, & Woods, 2019;Tarasova et al., 2019) such as flash floods, short rain floods, long rain floods, excess rainfall floods, rain/ snowmelt floods, and snowmelt floods. For droughts, Van Loon and Van Lanen (2012) proposed a typology based on drought propagation processes, which distinguishes between six hydrological drought types: rainfall deficit droughts, rain-to-snow-season droughts, wet-to-dry-season droughts, cold snow season droughts, warm snow season droughts, and composite droughts. They later extended the scheme by two additional classes for cold climates: snowmelt and glacier melt droughts (Van Loon et al., 2015). Droughts can alternatively be classified by their development speed, for example, when separating flash droughts with a rapid intensification from slowly developing droughts (Otkin et al., 2018). Considering such diverse event types remains challenging in frequency and trend analysis, where seasonality plays an important role.

| Multivariate characteristics
Different flood and drought properties, such as magnitude, frequency, seasonality or spatial extent, may be of interest for different applications. In the case of floods, magnitude can be described by peak discharge, flood volume, and duration (Mediero, Jiménez-Alvarez, & Garrote, 2010) while drought magnitude can be expressed by duration, deficit volume, and minimum flow (Wilhite, 2000) (for an illustration see Figure 3). These variables are related and their joint consideration in frequency analysis requires characterizing and modeling variable dependencies, for example, between flood peak and volume (Figure 3c; Brunner, Seibert, & Favre, 2016) or between drought duration and deficit (Figure 3d; Brunner, Liechti, & Zappa, 2019). Variable dependencies also need to be considered when determining the frequency of compound events (Zscheischler et al., 2018), for example, the co-occurrence of floods with high tides or of droughts with heatwaves.

| Spatial dimension
Floods and even more so droughts are spatial phenomena, that is, they can affect a larger region, which can span more than one country (Kemter et al., 2020;Rossi, Benedini, Tsakiris, & Giakoumakis, 1992). Areal drought and flood indices that account for spatial extent, as well as regional hazard estimates, are needed to prepare for and adapt to large-scale events, for example, by establishing regional emergency plans or suitable water transfer schemes from water-abundant to water-scarce regions (Patterson, Lutz, & Doyle, 2013). The derivation of such estimates requires the inclusion of areal extent in the flood/drought index or as a feature of the analysis itself, for example, the severity-area-frequency (SAF) approach, which calculates the probability of a specific area to be affected by a drought of a given severity (Hosking & Wallis, 1997;Tallaksen et al., 2004), and/or modeling spatial dependencies of extreme events co-occurring in different catchments (Brunner, Gilleland, Wood, Swain, & Clark, 2020). Still, spatial dependencies are often neglected in risk analyses by assuming complete dependence between sites (i.e., the co-occurrence of flood events with the same return period at all sites), which has been shown to lead to a misestimation of flood risk (Lamb et al., 2010).

| Non-stationarities
Non-stationarities in hydrological extremes, that is, natural oscillations or long-term trends, may be caused by climate variability, climate change, land use changes, or water management changes. Determining whether these changes are short-term variability or part of a long-term trend is difficult due to the limited time series that are typically available (see Section 2.1.1). In addition, attributing non-stationarities to a certain driver is challenging because of the confounding nature of different drivers (Ryberg, Lin, & Vecchia, 2014;Slater et al., 2020).
1. Climate: Climate variability and long-term change have been shown to affect both local and spatial characteristics of floods and droughts including the timing and frequency of floods Mallakpour & Villarini, 2015) the duration of droughts (Ge, Apurv, & Cai, 2016), the spatial extent of floods (Kemter et al., 2020) and droughts Rudd, Kay, & Bell, 2019), and the type of floods (Chegwidden, Rupp, & Nijssen, 2020;Sikorska-Senoner & Seibert, 2020) and droughts (Bumbaco & Mote, 2010). Many studies have assessed short term (e.g., 30-year) trends in floods (e.g., Archfield, Hirsch, Viglione, & Blöschl, 2016;Bertola et al., 2020;Delgado, Apel, & Merz, 2010;Hodgkins et al., 2017;Mangini et al., 2018;Nka, Oudin, Karambiri, Paturel, & Ribstein, 2015;Slater & Villarini, 2016) and droughts across river basins, countries, and entire continents (e.g., Ge et al., 2016;Nikbakht, Tabari, & Talaee, 2013;Stahl, Tallaksen, Hannaford, & Van Lanen, 2012). However, detection of a short-term trend does not necessarily mean there is a long-term trend in a hydrological record, as changes in "flood-rich"/"drought-rich" and "flood-poor"/"drought-poor" periods often occur on time scales that are much longer then the length of the historical record (Liu & Zhang, 2017;Lun, Fischer, Viglione, & Blöschl, 2020;Mediero et al., 2015;Merz, Nguyen, & Vorogushyn, 2016). However, even in the absence of long records, signals of change may be detected using areal models that pool information across catchments (Prosdocimi et al., 2019). 2. Land use and channel morphology: Land-use changes such as urbanization or changes in forest cover and changes in river channels may change runoff conditions and flood and drought generation processes. Urbanization for example may lead to faster surface runoff and rapid flooding, while the lack of natural storage in the basin may make the catchment more vulnerable to drought (Oudin et al., 2018). Similarly, deforestation can change flood and drought intensities because of effects on evapotranspiration, precipitation, and runoff response (Bradshaw, Sodhi, Peh, & Brook, 2007;Staal et al., 2020). In addition, river channel changes through, for example, channel straightening or construction of flood protection structures, can change flow properties and the frequency of overbank flooding (Munoz et al., 2018;Sofia & Nikolopoulos, 2020). However, disentangling the influence of land cover changes on extreme events from those of climate change is challenging as illustrated by a growing number of studies that have attempted to do so (Blum, Ferraro, Archfield, & Ryberg, 2020;Bradshaw et al., 2007;Staal et al., 2020). 3. Water management: Floods and droughts are directly influenced by water management and vice versa. Hydropower production, water abstractions, water diversions, and irrigation can all aggravate or alleviate droughts and floods (He et al., 2017;Tijdeman et al., 2018;van Oel et al., 2018;Verbunt et al., 2005) and they can change the temporal succession of drought and flood events. Similarly, drought and flood occurrence can influence water management strategies . However, adaptation measures taken to accommodate one extreme may not necessarily be beneficial for the other (Ward et al., 2020). In addition, climate change may lead to changes in well-established management operational practice, which may potentially affect the balance between the two types of extremes. Larger precipitation may imply larger dams being built, or higher temperature may lead to higher evaporation losses, thus reduced storage, and potentially low flows downstream of the dam. However, these feedbacks between hydrological extremes and people are not fully understood (Razavi, Gober, Maier, Brouwer, & Wheater, 2020;Van Loon et al., 2016) neither under current nor future climate conditions, among other reasons because the degree of regulation is often not well documented (Arheimer, Donnelly, & Lindström, 2017).

| Future changes
In addition to changes in water management strategies, changes in streamflow seasonality may affect both floods and droughts and their interplay. For instance, less snow and earlier snowmelt may cause smaller snowmelt floods (Chegwidden et al., 2020), a longer snow-free season can increase the potential for summer drought to develop (Huning & AghaKouchak, 2020), and changes in both drought and flood seasonality may increase or decrease the likelihood of rapid drought-flood transitions. Future change assessments are often performed by combining global and regional climate models and hydrological models, or by combining statistical regression models with climate model outputs (hybrid "dynamical-statistical" approaches). Such assessments are hampered by different uncertainty sources introduced in the modeling chain through emission scenarios, climate model structures, initial conditions, downscaling methods, hydrological model structures, and parameter estimation (Clark, Wilby, et al., 2016). In addition, such assessments often focus on changes in the target variable without focusing on changes in the frequency of the underlying generating mechanism (e.g., rainfall or snowmelt), which may alter the frequency of occurrence of various flood and drought types (Sikorska-Senoner & Seibert, 2020).

| Tackling challenges
We suggest that the following actions are needed to improve our understanding of drought and flood occurrence and their interplay: (1) study droughts and floods in a joint framework, (2) study event co-occurrence at multiple sites, (3) investigate compounding drivers of extremes, (4) study compound events, and (5) analyze societal influences on extremes and vice versa. Here, we will tackle these in turn.
1. Study droughts and floods in a joint framework: To understand temporal transitions between droughts and floods, the two phenomena must be studied in a joint framework, for example, using continuous instead of event-based simulations. Joint assessments will enable analyses of transition times between extremes and identification of how post-drought water management decisions influence future flood management . 2. Study the co-occurrence of hydrologic extremes at multiple sites: To improve our understanding of floods and droughts as regional phenomena, we need to assess spatial cross-correlations and analyze spatial extents/patterns of past extreme events, and assess the ability of models to capture these patterns. Measures for analyzing regional extremes include spatial extent, for example, the number or percentage of grid cells under extreme conditions (Rudd et al., 2019;Sheffield & Wood, 2008;Tallaksen & Stahl, 2014), the synchrony scale proposed by Berghuijs, Allen, Harrigan, and Kirchner (2019), which quantifies the area over which multiple catchments are jointly under extreme conditions; or the connectedness measure introduced by , which allows for mapping pairs of catchments co-experiencing extreme events. Even though initially proposed for floods, these measures can be adapted to the drought context. 3. Investigate compounding drivers of extremes: We can potentially advance our predictive skills for future extremes by better understanding the compounding drivers of extremes and their future changes. Previous studies have shown that increases in precipitation may not necessarily lead to increases in flood peaks (Wasko & Nathan, 2019) and we may therefore profit from learning more about the interplay between soil moisture, rainfall, evapotranspiration, and snowmelt (Berghuijs et al., 2016;Blöschl et al., 2015;Stein et al., 2019). Similarly, we may wish to focus on the role of storage in addition to precipitation and temperature when identifying suitable predictors of extremes and studying future streamflow drought development (Hao et al., 2018;Haslinger, Koffler, Schöner, & Laaha, 2014).
4. Study compound events: Floods and droughts are influenced by a range of related factors including temperature, evapotranspiration, precipitation, snowmelt, and soil moisture (Sharma, Wasko, & Lettenmaier, 2018). There is a need to better understand how these compounding climate variables and initial conditions combine to produce extreme events and impacts. In addition, floods and droughts may co-occur together with other types of extremes, such as storm surges (Ganguli & Merz, 2019;Moftakhari, Salvadori, AghaKouchak, Sanders, & Matthew, 2017) and heatwaves (Sutanto, Vitolo, Di Napoli, D'Andrea, & Van Lanen, 2020) or together with unrelated events such as COVID or locust plagues (Salih, Baraibar, Mwangi, & Artan, 2020), which can amplify their impacts. We should therefore improve our understanding of the interplay between streamflow extremes and other variables within a compound event framework (Zscheischler et al., 2018). 5. Analyze societal influences on extremes and vice versa: To increase our understanding of societal influences on extremes and of extremes on society, we need to understand feedbacks between society and hydrological extremes, for example, how flood protection measures change in response to flood occurrence or how water supply chains expand in response to drought (Di Baldassarre et al., 2019;Ward et al., 2020).

| MODELING AND PREDICTION
For flood and drought prediction, we use a range of modeling tools including parametric distributions for frequency analysis and extrapolation, stochastic simulations for generating long time series of rare events, hydrological models for simulating streamflow in ungauged catchments or for future climate conditions and to interpolate time series in the case of gaps, or hydraulic models for simulating flood extents.

| Frequency analysis
Local and regional hazard estimates can be derived using observed or simulated extremes from stochastic and hydrological models for current and future climate conditions in combination with appropriate parametric models (Genest & Favre, 2007;Meylan et al., 2012). Such parametric models usually have interpretable parameters, related to mean, variance, and shape, which is vital if we want to consider change scenarios. However, it can be challenging to identify suitable parametric distributions and fit their parameters using observed records of floods and droughts because of the usually limited sample size due to short records. Regional (i.e., multiple sites) and multivariate analyses (i.e., multiple flood or drought characteristics such as duration and peak) require multivariate distributions or copula models (Genest & Favre, 2007). It is challenging to identify distributions with dependence structures flexible enough to reflect dependencies in the actual data, a problem which becomes especially pronounced once we move beyond dimension two (Favre, Quessy, & Toupin, 2018). An additional challenge in the multivariate case arises from the definition of return periods which is no longer as straightforward as in the univariate case because one needs to decide whether to work with joint (OR or AND) or conditional probabilities (Brunner et al., 2016;Gräler et al., 2013) depending on the problem at hand (Serinaldi & Kilsby, 2013). For example, if both flood magnitude and volume were equally relevant for a design problem, such as reservoir construction, one would work with a joint return period. In contrast, one would use conditional probabilities if drought deficit became a problem only for droughts of longer duration. Additional challenges arise if the extreme of interest shows monotonic trends. Such trends should ideally be reflected in the distribution chosen for analysis (López & Francés, 2013) by using time-varying parameters. Furthermore, the distribution of flood/drought events at any given location in space and time often results from a combination of different generating mechanisms (Villarini, 2016) that may interact (i.e., amplify/mitigate one another) over different timescales. Mixing different flood or drought types (i.e., with different underlying mechanisms) breaches the homogeneity assumption, but is still often done in frequency analysis.  Figure 4) by employing, for example, spatial extreme value models such as the conditional exceedance model by Heffernan and Tawn (2004) (Diederen, Liu, Gouldby, Diermanse, & Vorogushyn, 2019;Keef, Tawn, & Lamb, 2013), hierarchical Bayesian models (Yan & Moradkhani, 2015), the multivariate skew-t distribution (Ghizzoni, Roth, & Rudari, 2010), or copula-based approaches, representing the dependence structure between variables, including pair-copula constructions (Bevacqua, Maraun, Hobaek Haff, Widmann, & Vrac, 2017;Schulte & Schumann, 2015), Student-t copulas (Ghizzoni, Roth, & Rudari, 2012), dynamical conditional copulas (Serinaldi & Kilsby, 2017), or the Fisher copula (Brunner, Furrer, & Favre, 2019). These stochastic approaches are often event-based, that is, they simulate individual extreme events instead of continuous flow time series (Winter, Schneeberger, Förster, & Vorogushyn, 2020), which precludes analyzing transitions between drought and flood events. Existing continuous stochastic approaches such as autoregressive moving average type models (Stedinger & Taylor, 1982) or bootstrap approaches (Rajagopalan, Salas, & Lall, 2010) do not represent spatial dependencies well in their original formulation. Additional challenges arise if we are interested in regions without streamflow observations, which cannot be directly included in model fitting (Hrachowitz et al., 2013), or simulating under climate or management conditions different from those employed for model fitting.

| Hydrological modeling
The use of hydrological and land-surface models enables generating streamflow time series for current and future climate conditions. Classical hydrological modeling approaches typically simulate continuous streamflow time series containing both drought and flood events. However, such models do not necessarily represent well flood timing and magnitude, drought onset and termination, transitions between the two types of extremes, spatial dependencies, and variable dependencies, due to trade-offs in model calibration, limited model structures, lack of direct human impact representation, or lack of data for model fitting.
1. Model calibration: Different calibration metrics usually focus either on low or high flows and focusing on one type of extreme may result in performance decreases for the other type of extreme (Kollat, Reed, & Wagener, 2012;Pool, Vis, Knight, & Seibert, 2017) and for the representation of event transitions. In addition, the use of calibration metrics said to be ideal for certain applications (e.g., Nash-Sutcliffe-efficiency-based metrics for floods) does not guarantee a reliable reproduction of extremes Lane et al., 2019;Mizukami et al., 2019). While model calibration for extremes is already challenging at the local scale, it gets even more demanding in a regional context. Spatial evaluation metrics are relatively rare (Dembélé, Hrachowitz, Savenije, & Mariéthoz, 2020;Koch, Demirel, & Stisen, 2018) and have hardly been applied in the context of extremes, which may lead to a misrepresentation of spatial flood and drought dependencies Prudhomme et al., 2011). Furthermore, model calibration may lead to spatially discontinuous parameter sets, which may be problematic in large-domain modeling . 2. Model structures: The simulation of hydrological extremes depends on model structure and how extreme-triggering mechanisms are parameterized as shown by Melsen and Guse (2019) for droughts and by Kempen, Wiel, and Melsen (2020) for high-and low-flow events. Processes particularly challenging to represent include soil water repellency (i.e., hydrophobic soil behavior; Doerr et al., 2003), persistence, which depends on storage, for example, soil water holding capacity (Tallaksen & Stahl, 2014), or spatial patterns in snow cover or evaporation (Hrachowitz & Clark, 2017). 3. Data assimilation: Data assimilation is often used in hydrologic modeling and forecasting to reduce uncertainty in simulations of model states, for example, in soil moisture or snow-water equivalent, and hence improve simulations and forecasts of hydrological processes into the future. Therefore, data assimilation is very useful to improve simulations of drought and improve forecasts of floods. Data assimilation methods use observations to update model state variables (Weerts & El Serafy, 2006), where the size of the state update depends on the relative uncertainty in models and observations . Challenges in data assimilation lie in quantifying and accounting for uncertainties in model inputs, parameters, and model structure (Liu & Gupta, 2007), in improving computational efficiency (Moradkhani, Hsu, Gupta, & Sorooshian, 2005;Sun, Seidou, Nistor, & Liu, 2016), and in assimilating multiple types of observations such as streamflow, soil moisture, and snow-water-equivalent (Bergeron, Trudel, & Leconte, 2016). 4. Water management representation: The simulation of hydrological extremes can be improved by considering human influences (Veldkamp et al., 2018). However, it is challenging to set up rainfall-runoff models in watersheds with significant regulation because information on how such regulations are implemented are rare at a regional and local scale . If regulations are considered, simplistic methods are often used to represent, for example, reservoir operation (Yassin et al., 2019). Exceptions are management models particularly designed for a catchment or a set of catchments. In addition, drought and flood management is hardly addressed jointly (Di Baldassarre et al., 2019). 5. Future projections: Streamflow time series for future conditions can be simulated by using a hydrological model in combination with forcing time series from a variety of emission pathways and general circulation models (GCMs), which are often statistically or dynamically downscaled using statistical or regional climate models, respectively (Clark, Wilby, et al., 2016;IPCC, 2014) (for an illustration of a typical workflow see Figure 5). Future simulations of extremes are hampered by diverse sources of uncertainty. These include uncertainties in the climate forcing, internal climate variability, regional downscaling, and bias correction (Clark, Wilby, et al., 2016). Uncertainties in the climate forcing arise due to uncertainties related to emission scenarios (Moss et al., 2010) and because of uncertainties related to the use of general circulation models (GCMs), which do often not capture processes essential for streamflow generation at the local scale. Statistical or dynamical downscaling (i.e., through regional circulation models) approaches are therefore used to retrieve high-resolution climate variables from GCM output (Hakala et al., 2019). Additionally, hydrological models may also struggle to produce reliable model simulations under future conditions (Thirel, Andréassian, & Perrin, 2015). Such inability may be related to inadequate snow parameterization (Melsen, Vos, & Boelens, 2018), model structures with inadequate vegetation dynamics, inhomogeneous precipitation input (Duethmann, Blöschl, & Parajka, 2020), or non-transferable model parameters (Coron et al., 2012;Fowler, Peel, Western, Zhang, & Peterson, 2016;Thirel et al., 2015). 6. Dependencies representation: Spatial flood dependencies have been shown to be underestimated by hydrological models Prudhomme et al., 2011), which means that simulated flood events are less spatially coherent than observed ones, that is, it is less likely to find widespread flood events in the simulations than the observations. The misrepresentation of spatial dependencies is likely related to input uncertainty. On the one hand, the precipitation product used to drive the models (Te Linde, Aerts, Dolman, & Hurkmans, 2007) may already have a suboptimal representation of spatial dependencies because spatial smoothing or averaging during the gridding process reduces variability (Risser, Paciorek, Wehner, O'Brien, & Collins, 2019). On the other hand, land cover and subsurface properties also influence spatial dependency, which may not be well reproduced if such characteristics are not well represented in models. In the case of calibrated models, a misrepresentation of spatial dependencies may also be related to the lack of spatial calibration strategies  and uncertainty of streamflow observations (McMillan, Freer, Pappenberger, Krueger, & Clark, 2010) used for model calibration. In addition to spatial dependencies, dependencies between different flood/drought characteristics are not necessarily well represented by models as illustrated by Brunner and Sikorska (2018) for flood peaks and volumes. 7. Ungauged catchments: In catchments that lack streamflow observations, flood and drought prediction either relies on statistical methods such as regional frequency analysis or hydrological modeling by inferring model parameters using data from similar catchments for which observations are available (Hrachowitz et al., 2013). Regionalization approaches used to transfer model parameters may establish similarity using alternative data such as catchment and meteorological characteristics or remotely-sensed data if available.

| Earth system modeling
Earth system models (ESMs) through their land surface model (LSM) component simulate interactions and feedbacks between the atmosphere, ocean, land, cryosphere and biosphere, to estimate the state of regional and global climate (Heavens, Ward, & Natalie, 2013). The land plays an important role in ESMs through its impact on the carbon cycle, and the energy and water budget. LSMs describe in detail the vertical exchange of heat, water (and carbon) at the earth surface-atmosphere interface. However, in contrast to most hydrological models, LSMs have a rudimentary representation of hydrological processes, particularly related to groundwater and lateral flow (Clark, Fan, et al., 2015). Thus, there is a need to advance hydrology in LSM development by improving the representation of subsurface properties and their control on hydrological response; ultimately enabling river discharge to be simulated at the spatio-temporal resolution needed for water management, such as hourly simulations for flash floods. Additional challenges include a realistic representation of land-atmosphere feedbacks, such as the positive feedback between dry and hot conditions (Dirmeyer et al., 2012;Saini, Wang, & Pal, 2016). As for large-scale hydrological models, physical processes, such as evapotranspiration and soil fluxes, are often simplified and in some cases determined through calibration against observed discharge.

| Hydrodynamic modeling
To simulate inundation areas in addition to flood flows, hydrological models are often coupled with a hydraulic model Sampson et al., 2015;Wing et al., 2017;Zischg, Felder, Mosimann, Röthlisberger, & Weingartner, 2018). Such hydro-dynamic models should be able to accurately represent both the hydrology and geomorphology of river systems, both in the present and in the future. While the current generation of hydro-dynamic running global circulation models (GCMs), (c) downscaling and bias correction using statistical models or regional circulation models, (d) hydrological model calibration and evaluation, and (e) simulation of streamflow time series models is increasingly skillful at modeling flood flows, accurate estimation of floodplain inundation also requires precise representation of the river channels that convey those flows (e.g., Slater, Singer, & Kirchner, 2015). Considering changes in channel morphology is not only relevant for high but also for low flows because of their influence on bed infiltration and their influence on the rating curve. However, the current generation of models represents river systems as a stationary network with constant river channels.
1. Spatial frequency models: Assessments of the frequency of regional extreme events require approaches incorporating the affected area as a variable. Existing approaches for performing such assessments in a stationary setting include severity-area-frequency curves as proposed for drought, which enable estimation of the probabilities of droughts with certain severities and spatial extents (Henriques & Santos, 1999;Hisdal & Tallaksen, 2003); severityarea-duration curves (Andreadis et al., 2005;Sheffield, Andreadis, Wood, & Lettenmaier, 2009); and stochastic models for spatial drought (max-stable models; Oesting & Stein, 2018) and flood events (conditional exceedance models and copula-based approaches; Keef et al., 2013;. Such spatial frequency approaches, especially the severity-area-duration and -frequency approaches, have in the past mainly been applied in the drought context but can be easily extended to the flood context. All types of approaches need to be extended to non-stationary settings to enable frequency analyses of regional events in a changing world. 2. Mixture models: The use of inhomogeneous samples, for example, a sample of flood events of different genesis, violates the assumption of independent identically distributed (iid) events (Klemes, 2000). While this has been common knowledge for a long time, different event types are often pooled in hazard analyses, for example, by analyzing all peak-over-threshold events jointly. Hazard estimation would therefore profit from introducing models that respect the iid assumption such as the seasonal mixture model by Fischer (2018), which separates events occurring in different seasons and considers different event types to derive a mixing distribution. Alternatively, flood or event types can be assessed in separate frequency analyses as suggested by  and Brunner, Viviroli, et al. (2017) for design hydrograph construction. Similar approaches as for floods could be used in the case of droughts. 3. Continuous stochastic models: We need to develop continuous stochastic simulation approaches for joint drought and flood assessments because event-based approaches currently only focus on one type of extreme (Diederen et al., 2019;Le, Leonard, & Westra, 2019). Such continuous approaches can be indirect or direct. Indirect, continuous modeling approaches (corresponding to discrete-time models in the stochastic literature) simulate continuous streamflow series by combining a stochastic weather generator with a deterministic hydrological model potentially with perturbed model coefficients (Montanari & Koutsoyiannis, 2012;Vogel, 2017;Winter et al., 2019), while direct approaches simulate streamflow directly (e.g., Brunner, Bárdossy, & Furrer, 2019;Chen et al., 2019;Kelman, 1980;Stedinger & Taylor, 1982). In other words, the stochasticity in indirect approaches is introduced through the stochastically generated meteorological forcing while stochasticity in direct approaches is added to streamflow directly. Indirect approaches have also been called "stochastic watershed models" and have the advantage that they may account for anthropogenic influences in a straightforward way (Vogel, 2017). Direct continuous approaches should be further developed by representing non-stationarities (Kwon, Lall, & Khalil, 2007;Lee & Ouarda, 2012) and by improving representation of spatial and temporal long-range dependencies Efstratiadis, Dialynas, Kozanis, & Koutsoyiannis, 2014) to enable studying drought-flood transitions in regional and change contexts. 4. Non-stationary models: A continued discussion on how to best deal with non-stationarities in frequency analyses is needed. One possibility of dealing with non-stationarities in frequency analyses is to work with non-stationary distributions whose parameters vary with time, climate variables/indices (López & Francés, 2013), or reservoir indices (Xiong et al., 2019). Another possibility is to fit separate stationary distributions for time periods with different streamflow or climate characteristics, for example, a past and current period that may best resemble the future, a period representing current and future climate conditions (Šraj, Viglione, Parajka, & Blöschl, 2016), a period prior to and after reservoir construction (Wang et al., 2017), or periods representing different phases of large-scale climate indices that are known to drive regional extremes. Using a stationary model on different time windows has the advantage of requiring fewer parameters and therefore may minimize uncertainty, although the use of short time windows may also have the effect of increasing uncertainty. One potential approach to avoid short time windows may be to work instead with moving (overlapping) longer windows. In contrast, working with a non-stationary model may circumvent the difficulties associated with defining two or more partly arbitrary time periods. 5. Novel calibration strategies for hydrological models: We see a need for developing flexible multiobjective calibration procedures, which consider spatial aspects of extremes and result in dependable simulations of both types of extremes under current and future conditions. Representation of spatial features could be improved by explicitly considering them in calibration by using metrics such as the SPAtial EFficiency metric (Dembélé et al., 2020;Koch et al., 2018) or by using multiscale parameter regionalization techniques enabling estimation of spatially consistent parameter sets (Samaniego, Kumar, & Attinger, 2010). Joint simulations of droughts and floods could be improved by minimizing errors in both types of extremes through the development of suitable multiobjective procedures. Simulation under future conditions requires reliable process representation, which needs to be verified for current climate conditions. Parameter sets realistically reflecting processes may be obtained using multivariate calibration frameworks by considering evaporation, soil moisture and water storage in addition to streamflow, which may, however, lead to trade-offs among variables (Dembélé et al., 2020;Seibert, 2000). Alternatively, split-sample tests, where the model is calibrated, for example, on a wet period and evaluated on a dry period, have been proposed to identify parameter sets still reliable under climate conditions different from the ones used in calibration (Fowler et al., 2018;Refsgaard et al., 2014;Seibert, 2003;Thirel et al., 2015). 6. Improve process representation in hydrological models: Modeling of extremes would profit from model structures with an improved process representation (Clark, Schaefli, et al., 2016), for example, of the subsurface (Clark, Fan, et al., 2015) or of snow processes (Magnusson et al., 2015). Such representation can potentially be achieved by using models with more parameters because they provide more flexibility for process representation (Knoben, Freer, Peel, Fowler, & Woods, 2020). Suitable or "best" model structures can be determined using process-based model evaluation (Clark, Fan, et al., 2015) or within a model-intercomparison framework, such as the modular assessment of rainfall-runoff models toolbox MARRMot (Knoben, Freer, Fowler, Peel, & Woods, 2019), the Framework for Understanding Structural Errors (Clark et al., 2008), or the hydrological modeling frameworks SUPERFLEX (Fenicia, Kavetski, & Savenije, 2011), SUMMA , and Raven (Craig et al., 2020). Insights into model behavior and process representation can also be gained using flux maps by inspecting different flow components (e.g., baseflow, infiltration excess, and interflow; Khatami, Peel, Peterson, & Western, 2019). 7. Coupled non-stationary landscape-flood-inundation models: The non-stationarity of the physical landscape is currently less well understood than non-stationarity of the water cycle (e.g., Slater, Khouakhi, & Wilby, 2019), even though it may be equally important for those seeking to obtain accurate projections of future extremes. Because of this research discrepancy, flood inundation models currently focus primarily on hydrological non-stationarity, but neglect the influence of geomorphic non-stationarity introduced through seasonal changes in vegetation, flood events, or direct human modifications including riverbed aggradation and degradation (e.g., Gregory, 2006). The confounding nature of these influences hinders the predictability of river morphology changes. A growing number of studies are now seeking to elucidate the temporal controls on river channel morphodynamics (e.g., Zen & Perona, 2020). Yet, we are still far from developing an operational capability for generating accurate estimates and predictions/projections of channel conveyance. Integrating landscape models with flood inundation models in order to better represent the feedbacks between the two, and obtain improved projections of changing flood risk in vulnerable regions of the world with dynamic river systems, remains a major long-term goal. 8. Models representing human influences: To study the influence of water management decisions on hydrologic extremes, further development of models representing human influences on the water cycle is needed, especially at a regional scale. When predicting future hydrologic extremes (Arnell et al., 2019), socioeconomic scenarios (Wada et al., 2016), future technological advancements in the water sector (Graham et al., 2018), and regional water use scenarios (Yao, Tramberend, Kabat, Hutjes, & Werners, 2017) need to be considered in addition to climate scenarios as for example, by Winsemius et al. (2016) in an assessment of future flood risk.
9. Model-comparison frameworks: Model choice could be facilitated by comparing the suitability of different types of models for flood and drought estimation including statistical, hydrological, and land-surface models within a model-comparison framework. For example, van der Wiel et al. (2019) compared flood return periods estimated using a parametric distribution within a classical frequency analysis framework to empirical estimates derived using a large ensemble and Winter et al. (2020) compared flood risk estimates derived using an event-based stochastic model with an indirect modeling approach (weather generator and hydrological model). 10. Exploring added value of deep learning: Future modeling efforts should explore the potential benefits of deep learning for streamflow simulation and data assimilation. Approaches such as long short-term memory models have been shown to be particularly valuable in a regional modeling context because they can incorporate information on catchment characteristics (Kratzert et al., 2019). If such models are to be used in a climate change context, model interpretability has to be improved, for example, by visualizing the outputs from the hidden layers of deep networks (Shen, 2018). In addition, neural networks have been shown to be useful for data assimilation because they can learn nonlinear relationships between simulated streamflow and a corresponding state variable (Boucher, Quilty, & Adamowski, 2020).

| HUMAN-WATER INTERACTIONS
Hydrologic extremes are of societal importance because of their impacts on society and economy. Flood and drought predictions can help alleviate these impacts because they facilitate the development of adaptation strategies and management plans.

| Challenges
Drought and flood predictions serve the public and decision makers. However, such predictions may not be available everywhere in the world. Particularly developing countries have received comparably little attention in the scientific literature and these regions may have limited data and technical capability to make predictions publicly available (Miyan, 2015;Nkwunonwo, Whitworth, & Baily, 2020). In addition, scientists face the challenge of providing meaningful assessments/products because establishing a direct link between hazard and impact predictions is not straightforward, feedbacks between hazards and society are hardly represented in models, and choosing a suitable communication strategy is demanding.
1. Economic and societal impacts: Impact predictions may be easier to grasp by the public than hazard predictions. However, establishing a link between hazard and impact is difficult (Bachmair, Svensson, Hannaford, Barker, & Stahl, 2016) because impact information is often missing, scarce Silvestro et al., 2019), or poorly shared by, for example, insurance companies. Flood impact data is partly available on a global scale in nonflood specific disaster databases and via national databases, while spatially consistent information on a continental scale is largely missing (Paprotny, Sebastian, Morales-Nápoles, & Jonkman, 2018) also because of a lack of a system that would allow for regular updates. Some empirical databases, such as flood insurance claims data, have recently been used to provide a more accurate evaluation of damage patterns (Wing, Pinter, Bates, & Kousky, 2020). Drought impact data are collected, for example, through the European Drought Impact Report Inventory (European Drought Centre, 2015) or the U.S. Drought Impact Reporter (National Drought Mitigation Center, 2011), which both suffer from underreporting. 2. Human-water interactions: Predicting extremes over long timescales requires understanding and considering their interactions and feedbacks with human systems Sivapalan, Savenije, & Blöschl, 2012), for example, the implementation of protection measures after a severe flood event and the resulting reduction of flood risk. Co-evolutionary hydrologic modeling-as advocated for by the socio-hydrology community-can help to understand the coupling across human and environmental systems (Elshafei, Sivapalan, Tonts, & Hipsey, 2014;Thompson et al., 2013). In addition to understanding potential feedbacks between extreme event occurrence and society, we may also want to consider such feedbacks in modeling efforts. Socio-hydrological models have been developed that explicitly account for feedbacks between water and society at multiple scales (Srinivasan et al., 2017). Models proposed include stylized models, agent-based models, comprehensive system-of-system models (Sivapalan & Blöschl, 2015), pattern-oriented modeling, and coupled-component modeling (Blair & Buytaert, 2016). However, setting up useful models is challenging because of a lack of data on interactions, a lack of calibration/validation protocols, difficulties in parameter estimation, and a lack of transferability to regions other than the one where a model has been set up (Blair & Buytaert, 2016;Sivapalan & Blöschl, 2015;Srinivasan et al., 2017). 3. Availability and communication: Predictions need to be available and well communicated in order to serve society.
However, the availability of modeling capabilities and forecast products may be limited in developing countries, among other due to data availability and limited financial resources to develop operational forecasting and prediction systems (Nkwunonwo et al., 2020). In this context, global/continental scale data and products, for example, the European flood awareness system (EFAS; Smith et al., 2016) or the global flood awareness system (GLOFAS; Hirpa et al., 2018), are valuable. In addition, communication poses a challenge, because members of the public, governments, and elected officials, may have difficulties in understanding concepts, such as return periods (e.g., 100-year floods/droughts) and event probabilities (a 1% chance of occurring in any year or a 26% chance of occurring in 30 years) (e.g., Bell & Tobin, 2007). Still, there has been relatively little theoretical and empirical research on flood/ drought risk communication or attempting to measure people's risk perceptions and how they adapt their behavior in response to such risks (Kellens, Terpstra, & De Maeyer, 2013). Similarly to hazard estimates, predictions are only beneficial to society if they are communicated in an efficient and transparent way with appropriate uncertainty information. Forecasts, especially flash flood and flash drought forecasts, need to be communicated in a timely manner and updated regularly (Braud, Vincendon, Anquetin, Ducrocq, & Creutin, 2018;Pendergrass et al., 2020), which requires well established data processing and communication procedures. Ideally, predictions include some uncertainty information. At short time scales, uncertainty can be characterized by producing probabilistic forecasts through varying both initial conditions and meteorological forcing and for frequency estimates, sampling uncertainty can be expressed by performing bootstrap experiments (Hu, Nikolopoulos, Marra, & Anagnostou, 2020). At longer (e.g., decadal) time scales, it is more difficult to produce probabilistic projections because the available climate simulations are an "ensemble of opportunity" offering a biased and incomplete sample of possible climate futures. In addition to describing uncertainty, choosing an appropriate method for uncertainty communication is important, although challenging because the format of uncertainty communication can affect how a message is perceived by the target audience (Ho & Budescu, 2019). The communication of future projections is particularly challenging because change assessments often hinge on the definition of a "reference period," to which future drought and flood characteristics are compared. The World Meteorological Organization suggests that reference periods are updated every 10 years, which is challenging in terms of communication because an "above average" year should not suddenly become "below average" because of a change in reference period (World Meteorological Organization, 2017).

| Tackling challenges
We see the following avenues for improving our communication efforts of current and future drought and flood predictions: (1) obtain stakeholder feedback, (2) provide predictions with information on uncertainty, (3) communicate changes in extremes through attribution studies, (4) clearly indicate the reference period, (5) make current and future flood and drought estimates accessible to the public to help people "live with risk." 1. Obtain stakeholder feedback: We may improve communication strategies by learning about a user's perception of model output (Ramos, Mathevet, Thielen, & Pappenberger, 2010) and by engaging in a dialogue with key players in the decision making process (Addor et al., 2015;Arnal et al., 2020) and public agencies and engineers who develop prediction and modeling tools for operational application. 2. Provide predictions with information on uncertainty: Forecasts, frequency estimates, and future projections should be provided together with some sort of uncertainty estimate because of the uncertainties inherent in the driving variables (e.g., internal climate variability), the limitations of prediction capabilities (Hao et al., 2018), sampling uncertainties (Hu et al., 2020), and uncertainties in future climate projections (Clark, Wilby, et al., 2016). At short time scales, such uncertainty information is often provided in a probabilistic way by using ensembles of streamflow forecasts or extremes. For hydraulic design, frequency estimates are often communicated with uncertainty bounds related to sampling uncertainty. However, ensemble forecasts and predictions are not always well understood (Pagano et al., 2014) and decision-making based on probabilistic information is challenging (Arnal et al., 2020). There is a need to develop clear guidelines of how probabilistic forecasts and future projections are to be used in decision making in combination with subjective expertise (Arnal et al., 2020;Pagano et al., 2014). As an alternative to expressing uncertainties in terms of probabilities, the storyline approach has recently been proposed (Clark, Wilby, et al., 2016;Shepherd et al., 2018) to communicate uncertainties of (future) projections at longer timescales. It builds on illustrating different plausible storylines, that is, a range of potential plausible events or future pathways. The variability of potential outcomes is highlighted by providing a set of plausible (future) outcomes rather than statistical uncertainty bounds. 3. Communicate changes in extremes through attribution studies: One way of communicating the (ongoing) impact of climate change is the use of event attribution studies, which ask to what degree a certain extreme event is caused by climate change or has become more frequent due to climate change (Swain, Singh, Touma, & Diffenbaugh, 2020). Such attribution assessments try to separate the effects of human influence on the climate system from a counterfactual climate without human influence (Naveau, Hannart, & Ribes, 2019). A good example for efficient change communication is the World Weather Attribution initiative, which conducts real-time attribution analysis of extreme weather events as they happen around the world (Environmental Change Institute, 2020). 4. Indicate reference period: The WMO suggests to clearly and precisely communicate the use of climate normals to avoid misinterpretation of reference periods and predicted climate change signals and to provide explanatory notes for all users of relevant products and services such as maps of future flood and drought risk (World Meteorological Organization, 2017). Such communication is also important when using reference periods to calculate indices such as standardized drought indices and thresholds for flood and drought analyses. In the latter case, two different thresholds may be used for a past and future period or reference periods may be shifted in time, for example, by using a transient threshold (Wanders, Wada, & Van Lanen, 2015). 5. Provide access to predictions: Access to predictions can be improved by refocusing research efforts on regions which have traditionally received less attention, for example, the African continent, and by fostering knowledge transfer on modeling tools and prediction methods. In addition, predictions of current and future flood and drought risk will only help develop targeted adaptation strategies and live with risk if they are communicated in public and accessible forms, for example, through easy-to-use web-pages, mobile-phone Apps, or platforms such as EFAS or GLOFAS (Hirpa et al., 2018;Smith et al., 2016), and at spatial scales relevant for individual households. Recent examples of hazard/risk communication include the First Street Foundation's national flood risk assessment and the European Drought Observatory. The former maps flood risk for individual properties for the whole United States through a "flood factor" and provides free access to property-specific information (First Street Foundation, 2020). The latter maps drought indicators derived from different data sources and provide drought reports of severe drought events (Copernicus, 2018).

| SUMMARY AND CONCLUSIONS
We identify challenges common to drought and flood prediction and their joint assessment and group them into four interrelated categories: data, process understanding, modeling and prediction, and human-water interactions. Furthermore, we discuss potential approaches for tackling these challenges. Limited data availability is one of the most important challenges because it diminishes our ability to improve process understanding and to develop reliable models. We see different ways of tackling the data availability challenge: improving data sharing, exploiting new data sources such as satellite data or crowd-sourced data, increasing sample size using streamflow reconstructions, stochastic simulation approaches, or large ensembles, and simulating high-resolutions streamflow for future conditions. A further important challenge is the regional nature of hydrologic extremes, which should be addressed by considering spatial correlations when modeling extremes and by considering extents and regional occurrence probabilities when predicting extremes. Another outstanding challenge is human-extreme interactions, for which we often lack data and process understanding on and we often neglect in both statistical and hydrological models. Addressing this challenge requires exploitation of new data sources to gain insight into direct human impacts on extremes and explicit representation of human influences in statistical and hydrological models. Another important challenge relates to non-stationarities arising through the effects of changing climate, land use, channel morphology, or water management. These non-stationarities need to be better understood, as do the uncertainties associated with different statistical modeling techniques. Last but not least, we suggest that droughts and floods should be studied in a joint framework to learn about fast event transitions.
Uniting droughts and floods in one framework requires stochastic continuous models and an improved joint representation of both types of extremes in hydrological models. Such representation can be obtained by better representing processes important for both types of extremes in model structures and by developing calibration strategies specifically targeting both droughts and floods. Tackling these challenges will allow us to derive more reliable flood and drought predictions and ultimately help to minimize the negative impacts of extreme events.

ACKNOWLEDGMENT
This work was supported by the Swiss National Science Foundation via a PostDoc. Mobility grant (Number: P400P2_183844, granted to Manuela I. Brunner). Open access funding enabled and organized by Projekt DEAL.

CONFLICT OF INTEREST
The authors have declared no conflicts of interest for this article.