Prediction of Extraordinarily High Floods Emerging From Heterogeneous Flow Generation Processes

River floods are generated by various processes that, if disregarded, may induce errors in flood hazard assessment. This is particularly relevant where events extraordinarily larger than the typical floods have been observed, that is, for rivers with a flood divide in their flood‐frequency curves. We identify 11 such cases in a large set of German catchments and test a statistical approach that accounts for different runoff‐generation processes to predict the magnitude and frequency of extraordinarily high floods. We observe that in catchments with a flood divide, ordinary peaks are generated by different runoff‐generation processes and the distribution of at least one process is heavy‐tailed. By accounting for the different tail behaviors of multiple processes, we can reproduce flood‐frequency curves in these catchments. Our findings shed light on the origin of flood divides and set a method to improve the estimation of high flood quantiles in these high‐risk cases.

predict extraordinarily high floods occurring in river basins with variable flood extremes • The approach leverages knowledge of various hydrologic processes generating runoff events to estimate the magnitude of extreme floods • By accounting for multiple processes in flood magnitude-frequency analysis, we improve the estimation of extraordinarily high floods

Supporting Information:
Supporting Information may be found in the online version of this article.
Traditional flood magnitude-frequency analyses (e.g., Sivapalan & Samuel, 2009) are unsuitable to handle heterogeneous flood records (Hirschboeck, 1987a), especially when floods differ among each others to the point that flood divides are observed.Mixed extreme value distributions have been proposed for these cases, but the limited availability of large flood observations introduces considerable estimation uncertainty (Alila & Mtiraoui, 2002;Barth et al., 2019;Waylen & Woo, 1982).To tackle this issue, past studies (e.g., Fischer, 2018;Hirschboeck, 1987b) proposed a mixed distribution approach using peaks over a threshold (i.e., by using all observed peaks above a threshold).
These approaches improved hazard estimation compared to methods which assume homogeneity of the flood record.However, peaks over threshold may not be a representative sample of the processes that generate extraordinary events in catchments where the generation processes of the largest and most common floods are strikingly different (Tarasova, Basso, & Merz, 2020).In these cases, using ordinary peaks (i.e., all the independent streamflow peaks) may allow for sampling the full variety of generation processes while constraining prediction uncertainty.Recently, Miniussi et al. (2020) adopted such an approach to model floods occurring during different ENSO phases.However, studies that consider specific runoff-generation processes which produced ordinary peaks to estimate flood frequencies and magnitudes are lacking.
In this study, we aim to fill this gap by addressing the issue of reliably estimating flood frequencies and magnitudes in basins with a flood divide in the empirical flood magnitude-frequency curve.To do so, we rely on the analysis of ordinary peaks by explicitly considering the presence of multiple runoff-generation processes in a mixed non-asymptotic extreme value model.The advantage of this approach over traditional methods lies in the extension of information obtained by using a larger sample of ordinary peaks and accounting for the underlying physical processes leading to runoff generation in flood frequency analysis.

Study Area and Data
This study uses daily streamflow records of 169 mesoscale river basins in Germany with drainage area between 30 and 23,000 km 2 (median: 581 km 2 ; Figure S1 in Supporting Information S1).The analyzed data series range from 1951 to 2013, encompassing 36-62 (median: 59) hydrological years per basin (November to October).From this data set, we identified river basins exhibiting a sharp increase in the magnitude of rare floods (i.e., a flood divide in their empirical flood magnitude-frequency curve, sensu Miniussi et al., 2023;Basso et al., 2023) by visually examining empirical flood-frequency curves.In order to facilitate their identification from a large data set, we calculated the L-skewness of the annual maxima sample (Hosking, 1990), which is a general and universal index for tail heaviness and robust to the presence of outliers (Vogel & Fennessey, 1993), we noticed that, for Germany, all the cases with L-skewness exceeding 0.3 were also presenting this issue (Figure S2 in Supporting Information S1).Notably, the threshold of 0.3 is selected for German basins and may vary for other regions.
Visual inspection of the empirical flood-frequency curves remains the main tool to, confirm the presence of flood divides in a general case.In fact, flood frequency analyses are typically performed case by case by practitioners, who realize the presence of a flood divide in their data without relying on quantitative metrics, sometimes just because traditional approaches seem not to fit the data in any way.The selection yields 11 river basins with a clear flood divide in the empirical flood magnitude-frequency curve (Figure S1 in Supporting Information S1).All the following analyses are performed individually for each single basin.

Simplified Metastatistical Value Approach Applied to Heterogeneous Processes
We adopt the Simplified Metastatistical Extreme Value (SMEV) approach (Marani & Ignaccolo, 2015;Marra et al., 2019) to model the magnitude and frequency of extreme floods emerging from different runoff-generating processes (e.g., rain-on-wet or dry soils, snowmelt) in each single basin.The SMEV approach enables us to account for the presence of events triggered by multiple runoff-generation mechanisms (also termed event types in the following) through a parsimonious parametrization and to derive the compound extreme value distribution emerging from these types (Marra et al., 2019).Indicating with i = 1…S the event type in any of the basins, the mixed-SMEV cumulative distribution function ζ SMEV can be written as where θ i are the parameters of the cumulative distribution F i of ordinary events of the ith type and n i is the average yearly number of ordinary events of type i.Notice that only two event types are considered in each basin.
We compared estimates obtained through the mixed-SMEV approach against those of a single-SMEV and Generalized Extreme Value (GEV) distributions, a widely used approach for flood magnitude-frequency analyses (Katz et al., 2002;Petrow et al., 2007).To this end, we estimate the GEV distribution parameters from the sample of annual maxima of each basin using the L-moments method (Hosking, 1990).For the single-SMEV approach, we used a log-normal distribution to describe all ordinary events in each basin.The choice of a log-normal distribution is supported by previous study on the same data set of river basins in Germany (Mushtaq et al., 2022).We performed resampling with replacement (i.e., bootstrap) across years (1,000 realizations of all available years; Overeem et al., 2008) to assess the estimation uncertainty of all the methods.We finally quantified the model accuracy by means of non-dimensional error, computed as and observed maxima (x obs ) (Zorzetto et al., 2016).Notice that such an error metric favors the GEV distribution, which is explicitly parameterized to match the observed maxima, as opposed to mixed-SMEV and single-SMEV approaches, whose parameter estimation is performed on a larger sample, of which just a subset belongs to the annual maxima.

Process-Based Classification of Ordinary Events
We employed the method of Lang et al. (1999) to identify the independent ordinary peaks from the streamflow record required to apply the SMEV approach, as previously done by Miniussi et al. (2020) and Mushtaq et al. (2022).The number of independent ordinary peaks obtained for each catchment ranges from 370 to 933 (median value: 796).
We further classified the ordinary peaks into process-based types by using the classification of Tarasova, Basso, Wendi, et al. (2020), which labels streamflow events corresponding to the identified ordinary peaks according to their runoff-generation processes, which are assessed based on the nature of the inducing events (i.e., rainfall vs. snowmelt) and the catchment wetness states (i.e., wet or dry).This process-based approach uses observed daily precipitation (Rauthe et al., 2013) as well as daily snow water equivalent and soil moisture simulated by the mesoscale Hydrological Model (Kumar et al., 2013;Samaniego et al., 2010).Tarasova, Basso, Wendi, et al. (2020) employed dimensionless indicators to differentiate between inducing events and catchment wetness states and observed the small uncertainties associated with model structure and parametric uncertainty (Figure 4 in Tarasova, Basso, Wendi, et al., 2020).However, it is worth noting that employing different indicators or classification frameworks could lead to substantial differences in event classification (Tarasova et al., 2019), consequently affecting the accuracy of flood estimates.
In order to efficiently incorporate distinct event types in flood magnitude-frequency analyses and to ensure sufficient sample sizes for each of them, we aggregate the event types by Tarasova, Basso, Wendi, et al. (2020) into two major groups (S = 2): processes related to dry antecedent conditions (rain-on-dry events-Type-1) and wet antecedent conditions (rain-on-wet and snowmelt events-Type-2).The probability distributions of the magnitude of these event types are significantly different (p < 0.05), as evaluated through a pairwise two-sided Kolmogorov-Smirnov test (Massey Jr, 1951).

Selection of Ordinary Distributions
For each event type, we choose a suitable ordinary distribution.The presence of a clear flood divide in the empirical flood magnitude-frequency distribution suggests that one of the event types is characterized by a heavy-tailed distribution (Merz et al., 2022).We used Weibull plotting positions (Weibull, 1939) to derive the empirical cumulative distributions of ordinary events of each type and distinguish between those either exhibiting or not a heavy-tailed behavior, here intended as a power-law tail (i.e., a linear behavior in double-logarithmic coordinates; Newman, 2005).This process can in principle be automated, for example, by using tests like the one applied by Marra et al. (2023).However, since the number of basins considered in this study is limited, it is preferred here to proceed with the accuracy of human supervision.Visual inspection of the empirical cumulative distribution functions of ordinary events of the two types showed that, for each of the 11 study catchments, either Type-1 or Type-2 events clearly exhibit power-law behavior (see respectively Figures 2a and 2e), as manifested by the linear form of the distribution in a log-log probability plot.
If the empirical distribution of ordinary events of the examined type has a heavy tail, we use a power-law distribution (Malamud & Turcotte, 2003, 2006).The cumulative distribution function of the power-law is  () = (  ∕ min ) 1− , where, x is the analyzed variable, x min is a left-censoring threshold (i.e., the threshold above which the power-law behavior is manifest) and α is the scaling parameter.We estimated the parameters α and x min through the method of Clauset et al. (2009).If the empirical cumulative distribution function of the ordinary events of the examined type is not heavy-tailed (i.e., it does not show a linear behavior in double-logarithmic coordinates), we model it using a two parameter log-normal distribution (Bobee et al., 1993).The cumulative distribution function of the log-normal distribution is expressed as F(x;μ,σ) = Ф(log(x)−μ)/σ, where σ and μ are respectively its shape and scale parameters and Ф is the standard normal distribution function.We fit the log-normal distribution using the method of L-moments (Hosking, 1990) by left-censoring the lower portion of the ordinary events for the mixed-SMEV (i.e., considering Type-1 and Type-2) and single-SMEV (i.e., by considering all ordinary events without types).The left-censoring method enables us to characterize the tail of the distribution with few parameters, by ignoring the magnitudes of the censored part while retaining their probability (Marra et al., 2019).For both the power-law and log-normal distributions, we selected the left-censoring threshold by minimizing the root mean square error between predicted and observed magnitudes of ordinary events in the upper twentieth percentile (Ritter & Munoz-Carpena, 2013).

Results and Discussion
In this study, we investigate river basins that experienced extraordinarily high floods with much larger magnitudes than the bulk of recorded annual maxima.To better illustrate the properties of these rare floods we focus on two exemplary case studies, the Müglitz and Este River basins (Figure 1), which exhibit marked flood divides in their empirical flood magnitude-frequency curves.In fact, the largest annual maxima (roughly those with a return period exceeding 10 years) grow to considerably larger magnitudes than the smaller ones (e.g., the largest observed annual maxima are 3-8 times and 2-4 times larger than their mean values for the Müglitz and Este River basins, respectively).
In Figure 1, annual maxima are color-coded based on the type of processes that generated the events.We observe that floods with runoff-generation processes, different from those mostly observed for common floods (i.e., those on the left hand sides of panels a, b) may strongly affect the upper tail behavior, as also highlighted by Tarasova, Basso, and Merz (2020).For instance, extraordinarily high floods in the Müglitz River (Figure 1a) mainly belong to the rain-on-dry type (4 out of 6 maxima with a return period greater than 10 years are characterized as Type-1), while this type is hardly present in lower flood peaks.Conversely, extraordinarily high floods in the Este basin 10.1029/2023GL105429 5 of 10 (Figure 1b) are caused by rain-on-wet conditions and snowmelt processes (5 maxima with a return period greater than 10 years are characterized as Type-2).
As described in Section 2.4, we examine the empirical cumulative distribution functions of the different types of ordinary events (gray dots in Figures 2a, 2b, 2d, and 2e).We notice that in all cases in which a flood divide is present one event type displays heavy-tailed behavior (Figures 2a and 2e).Hence, we fit the empirical ordinary distribution of these events with a power-law distribution (see red lines in Figures 2a and 2e), and use a log-normal distribution for the other type (blue lines in Figures 2b and 2d).While previous studies identified both heavy-tailed and light-tailed behaviors in flood records (Bernardara et al., 2008;Mushtaq et al., 2022), here we move a step further and show that different tail behaviors may be associated with distinct runoff-generation processes (Yu et al., 2022).
The mixed-SMEV flood magnitude-frequency curves obtained for the two exemplary case studies are displayed in Figures 2c and 2f (green solid lines).Gray dots in Figures 2c and 2f represent the empirical frequencies of the sample of annual maxima.Our analysis reveals that mixing different types of ordinary events with distinct tail behavior through the mixed-SMEV framework allows for capturing the upper tail of the distribution of annual maxima.This is opposed to standard methods that rely on identical distribution assumptions, which tend to underestimate the largest floods (results for the single-SMEV and GEV are shown for comparison in Figures 2c and 2f).It is relevant to mention that the mixed-SMEV approach ensures improved estimation of upper tail quantiles compared to single-SMEV and GEV, as the latter generally fail to capture the upper tail behavior (Figures 2c and 2f; Figures S3-S11 in Supporting Information S1).The shaded areas in Figures 2c and 2f depict the 95% confidence intervals calculated via bootstrap with replacement across years for mixed-SMEV, single-SMEV and GEV distributions.The single-SMEV has lower uncertainty than both the mixed-SMEV and GEV distributions (Miniussi & Marra, 2021).Despite the use of a larger portion of the data, the uncertainty of mixed-SMEV and GEV are instead similar.The underlying reason is twofold: by using a mixed-SMEV we estimate a larger number of parameters than with a single-SMEV approach, and often one of the event types is markedly less populated than the other, leading to an increase in uncertainty.These results demonstrate that relying solely on a pre-defined statistical distribution and disregarding the heterogeneity of the sample arising from different underlying physical processes may lead to an erroneous estimation of extreme floods.The proposed approach could thus improve the estimation of upper tail quantiles in basins where a flood divide is observed in the flood magnitude-frequency curves.These results confirm that heavy tails (i.e., flood divides) can originate from a mixture of flood-generating processes (Merz et al., 2022).It is worth noting that rain-on-dry events dominate the upper tail in the Müglitz River basin (Figure 2a), while rain-on-wet and snowmelt events dominate the upper tail in the Este River basin (Figure 2e).Past studies indicate that various runoff-generation processes might be associated with heavier tails of the distribution of floods in different regions (Tarasova et al., 2023), as we observe in our set of case studies.The flood divide arises from the fact that the mixture contains a heavy-tailed process, whose upper tail tends to dominate the mixture distribution at large quantiles (Figures 2c and 2f).These results are coherent with previous studies, which show that the tail of a mixture distribution is influenced by the component with the most pronounced tail (e.g., Cavanaugh et al., 2015).Results for all other case studies are reported in Figures S3-S11 in Supporting Information S1.These results are broadly consistent with what was found for the exemplary case studies discussed in Figures 1 and 2, despite a few cases showing a less clear distinction among event types.As expected, this leads to comparable performances of the three methods in these catchments.
Figure 3 summarizes the findings obtained for all 11 cases which exhibit flood divides in the set of 169 German river basins.Observed versus estimated annual maxima resulting from the proposed mixed-SMEV, the single-SMEV and the GEV approaches are evaluated in a bootstrap fashion (see Methods) and respectively displayed in panels a, b and c.This overall comparison confirms the results discussed above for the two exemplary case studies and highlights the capability of the mixed-SMEV approach to provide reliable estimates of floods for a wide range of quantiles in all river basins with a flood divide (Figure 3a).The comparison with the single-SMEV (Figure 3b) and the GEV (Figure 3c) distribution reveals that mixed-SMEV estimates (Figure 3a) are characterized by a considerably smaller bias, especially for the upper tail quantiles.
To summarize the performance of the mixed-SMEV approach to estimate flood magnitude-frequency in catchments with a flood divide, we computed the non-dimensional errors (see Section 2.2) between observed annual maxima and estimates of the corresponding empirical quantiles (Figures 3d and 3e). Figure 3d displays a boxplot of non-dimensional errors for all observed maxima in the 11 case studies.Here, the mixed-SMEV approach tends to overestimate the bulk of floods, mainly due to small floods with a return period less than 10 years, whereas single-SMEV and GEV distributions show the same degrees of over and underestimation.However, when we focus our analysis on quantiles with return period greater than 10 years (Figure 3e), which are the most relevant for flood hazard assessment, the non-dimensional errors provided by the mixed-SMEV approach are lower compared to single-SMEV and GEV distributions.
Separately fitting different probability distributions to samples of ordinary peaks with distinct tail behaviors increases the physical basis of flood magnitude-frequency analyses.In fact, the proposed method allows for accounting for different statistical properties of events triggered by various runoff-generation processes.It is, however, worth noting that separating the processes yields significant improvements only if event types are characterized by markedly different tail behaviors (e.g., Marra et al., 2019).If the different processes have similar tail behaviors, the mixed distribution will not be distinguishable from the one obtained by using all the data together, since uncertainty will be predominant due to the need to estimate a larger number of parameters on a smaller amount of data.The practice of classifying event types evolved in recent years (Tarasova, Basso, Wendi, et al., 2020;Turkington et al., 2016;Vormoor et al., 2016) and is deemed to grow (Merz et al., 2022;Tarasova et al., 2019), increasing the availability of information required to apply the proposed approach.
A substantial difference among runoff-generation processes was suggested as a possible cause of flood divides by past studies.Rogger et al. (2012Rogger et al. ( , 2013) ) showed, by means of extensive field surveys and modelling analyses, that flood divides may emerge from strong non-linearities in runoff-generation processes resulting from progressive saturation of the catchment (i.e., a shift from dry to wet conditions).Basso et al. (2016) provided a mechanistic explanation of this phenomenon and the resulting appearance of flood divides in flood magnitude-frequency curves by linking it to the catchment water balance.Catchments in wetter climates experience sustained water supply which determines unvaried runoff-generation processes.Conversely, river basins in drier areas, where longer lag times between rainfall events allow for the catchment to dry, undergo transient conditions leading to varied runoff-generation processes (Basso et al., 2023).Our results confirm that explicitly accounting for the existence of different runoff-generation processes enables us to adequately model the tail of flood distributions in catchments exhibiting flood divides.
The above discussion hints at the interplay between various hydro-meteorological drivers and the hydroclimatic settings leading to the occurrence of extraordinarily high floods in a region.In fact, river basins exhibiting flood divides (i.e., the exemplary Müglitz River and other eight case studies in our data set) are mostly located in the Central-Alpine region of Europe (Blöschl et al., 2017).In this region, the distributions of precipitation volumes for rain-on-wet and snowmelt floods have slightly heavier tails than for rain-on-dry events, leading to the possible occurrence of high floods of Type-2 (as exemplified by one river basin in our data set, see Figure S7 in Supporting Information S1).However, the distribution of precipitation intensity of rain-on-dry events in this area is remarkably heavier than for the other event types (Tarasova et al., 2023), mostly due to the occurrence of Vb-cyclones (Hofstätter et al., 2016).This feature likely underlies the occurrence of heavy-tailed distributions of rain-on-dry ordinary events and extraordinarily high floods of Type-1, which are mostly responsible (8 case studies) for the appearance of flood divides in flood-magnitude frequency curves.In contrast, the Este River and one additional catchment among the 11 showing a flood divide are located in the Atlantic region, where rain-on-wet flood events exhibit heavier tails than rain-on-dry events (Tarasova et al., 2023).
Variable distributions of key factors contributing to runoff generation, such as rainfall intensity, volume, soil moisture, snow accumulation and release are mirrored by differences in the tail behavior of ordinary distributions across regions, which can be leveraged by means of the proposed approach to improve the estimation of the hazard posed by extraordinarily high floods.However, climate change induces shifts in the mixture of flood generation processes, such as a decrease in snowmelt events and an increase in intense rainfall events (Hall et al., 2014;Huo et al., 2022).These changes inevitably influence the flood probabilities.Our method explicitly incorporates the presence of diverse flow-generating processes, and can therefore be used to predict changes in flood frequency based on the projected changes in the frequency of the runoff-generation processes.

Conclusions
We provide a framework to derive accurate estimates of flood magnitude and frequency for basins where extraordinarily high floods occur.The approach leverages knowledge of the heterogeneity of runoff-generation processes by means of a process-informed mixed non-asymptotic statistical method, the SMEV framework.We employ the mixed-SMEV to estimate flood magnitude and frequency for 11 river basins in Germany featuring a flood divide.In these cases, at least one runoff-generation process is characterized by a heavy-tailed empirical distribution.Thanks to the explicit consideration of various runoff-generation processes, the proposed approach enables us to accurately predict the magnitude of rare floods, outweighing the performance of single-SMEV and GEV distributions.The approach relies on a classification of event types and is only worth using when runoff events generated by different hydrologic processes are characterized by distributions with distinct tail behaviors.These requirements may constrain the use of the approach in practice.Nonetheless, classifications of event types are becoming more common, and the method provides a process-based solution to estimate large flood quantiles in contexts for which current methods fail.As climate change may alter the frequency of different event types, our approach also offers a way to account for climate change impacts on flood hazards in a physically sound manner.This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)-Project number 421396820 "Propensity of rivers to extreme floods: climate-landscape controls and early detection (PREDICTED)" and Research Group FOR 2416 "Space-Time Dynamics of Extreme Floods (SPATE)."The financial support of the Helmholtz Centre for Environmental Research-UFZ is also well acknowledged.The first author is also thankful to the Higher Education Commission of Pakistan (HEC) and the German Academic Exchange services (DAAD) for providing financial support to this study (DAAD-91716446).FM was partially funded by the CARIPARO Foundation through the Excellence Grant 2021 to the "Resilience" project.Open Access funding enabled and organized by Projekt DEAL.

Figure 1 .
Figure 1.Empirical flood magnitude-frequency curves for two exemplary case studies exhibiting a flood divide: (a) the Müglitz River at Dohn (Gauge-ID: 550940, area = 196 km 2 ) and (b) the Este River at Emmen (Gauge-ID: 6338260, area = 171 km 2 ).Floods are classified into two major event types: rain-on-dry (red dots-Type-1) and combination of rain-on-wet and snow processes (blue dots-Type-2).

Figure 2 .
Figure 2. Exceedance cumulative distributions of ordinary peaks(a, b, d, e)  and flood magnitude-frequency curves resulting from a mixed-SMEV, single-SMEV and a standard Generalized Extreme Value (GEV) approach (c, f) for the Müglitz River at Dohn (a-c) and the Este River at Emmen (d-f).Panels (a, d) show the ordinary distributions for Type-1 events, and panels (b, e) show the ordinary distributions for Type-2 events.Blue (log-normal) and red (power-law) lines in panels (a, b, d, e) display the probability distributions describing ordinary events of Type-1 and Type-2.Green, orange and pink curves in panels c and f show the median values for the corresponding quantiles of 1,000 resample values with replacement for mixed-SMEV, single-SMEV and GEV estimates, respectively.Green, orange and pink shaded areas indicate the related confidence intervals (5th-95th percentiles).

Figure 3 .
Figure 3.Estimated versus observed normalized (i.e., divided by their median value) annual maxima for (a) mixed-SMEV (green dots) (b) single-SMEV (orange dots) and (c) Generalized Extreme Value (pink dots) for 11 river basins in the data set which exhibit flood divides.Light and dark colors indicate results for 1,000 realizations of the bootstrap and their median values, respectively.Insets of panels (a-c) show the same results plotted on a double-logarithmic scale.Panels d and e show non-dimensional error between observed and estimated maxima of the analyzed statistical distributions computed for the median of 1,000 bootstrap values (i.e., dark green, orange and pink dots in a-c respectively) with the same return period: (d) all quantiles; (e) the quantiles corresponding to return periods >10 years.