Flood frequency hydrology: 3. A Bayesian analysis

Authors


Corresponding author: A. Viglione, Institut für Wasserbau und Ingenieurhydrologie, Technische Universität Wien, Karlsplatz 13/E222, 1040 Vienna, Austria. (viglione@hydro.tuwien.ac.at)

Abstract

[1] Merz and Blöschl (2008a, 2008b) proposed the concept of flood frequency hydrology, which highlights the importance of combining local flood data with additional types of information: temporal information on historic floods, spatial information on floods in neighboring catchments, and causal information on the flood processes. Although most of the previous studies combined flood data with only one extra type of information, all three types are used here in a Bayesian analysis. To illustrate ways to combine the additional information and to assess its value, flood frequency analyses before and after the extraordinary 2002 flood event are compared for the 622 km2 Kamp river in northern Austria. Although this outlier significantly affects the flood frequency estimates if only local flood data are used (60% difference for the 100 year flood), the effect is much reduced if all additional information is used (only 3% difference). The Bayesian analysis also shows that the estimated uncertainty is significantly reduced when more information is used (for the 100 year return period, the 90% credible intervals range reduces from 140% to 31% of the corresponding flood peak estimate). Further analyses show that the sensitivity of the flood estimates to the assumptions made on one piece of information is small when all pieces of information are considered together. While expanding information beyond the systematic flood record is sometimes considered of little value in engineering hydrology because subjective assumptions are involved, the results of this study suggest that the extra information (temporal, spatial, and causal) may outweigh the uncertainty caused by these assumptions.

1. Introduction

[2] The concept of flood frequency hydrology [Merz and Blöschl, 2008a, 2008b] highlights the importance of using a maximum of hydrologic information from different sources and a combination based on hydrological reasoning. In their framework, Merz and Blöschl [2008a, 2008b] propose to compile flood peaks at the site of interest plus three additional types of information: temporal, spatial, and causal information.

[3] Temporal information expansion is directed toward collecting information on the flood behavior before (or after) the period of discharge observations (systematic data period). Spatial information expansion is based on using flood information from neighboring catchments to improve flood frequency estimates at the site of interest. Causal information expansion analyzes the generating mechanisms of floods in the catchment of interest. For each of these types of information expansion, methods have been proposed in the literature. Formal methods exist on combining historical flood data (from flood marks and archives) and possibly paleofloods with available flood records [e.g., Leese, 1973; Stedinger and Cohn, 1986; Cohn et al., 1997; O'Connell et al., 2002; England et al., 2003; Reis and Stedinger, 2005; Benito and Thorndycraft, 2005], which would be considered temporal information expansion. Methods of regional flood frequency analysis [e.g., Dalrymple, 1960; Cunnane, 1988; Tasker and Stedinger, 1989; Bobée and Rasmussen, 1995; Hosking and Wallis, 1997; Merz and Blöschl, 2005] would be considered spatial information expansion. Finally, the derived flood frequency approach [e.g., Eagleson, 1972; Kurothe et al., 1997; Fiorentino and Iacobellis, 2001; Sivapalan et al., 2005] or, more generally, rainfall-runoff modeling [e.g., Pilgrim and Cordery, 1993; Wagener et al., 2004] would be considered causal information expansion.

[4] As discussed in Merz and Blöschl [2008b], it is vital to account for the respective uncertainties of the various pieces of information when combining them. In local flood statistics, a range of estimates may result from a reasonable fit of several distributions to the observed data or by accounting for the uncertainty associated with the estimated parameters of those distributions. Historical flood data may only allow us to give a range of estimates owing to large uncertainties. Spatial information may lead to a range of estimates when using several regionalization schemes or parameters of the regionalization schemes all of which may be consistent with the regional information. Causal information may result in a range of estimates due to using different methods, different data, and uncertainty in the expert judgment. In Merz and Blöschl [2008b], the final estimate was obtained by expert judgment, considering the relative uncertainties of the component sources of information [see also Gutknecht et al., 2006].

[5] The present paper is a follow up of Merz and Blöschl [2008a, 2008b] where, instead of reasoning in terms of ranges of estimates, we account for the uncertainty inherent to the different sources of information using the Bayesian framework. Bayesian methods provide a computationally convenient way to fit frequency distributions for flood frequency analysis by using different sources of information as systematic flood records, historical floods, regional information, and other hydrologic information along with related uncertainties (e.g., measurement errors). They also provide an attractive and straightforward way to estimate the uncertainty in parameters and quantiles metrics.

[6] In flood hydrology, Bayesian methods have been used often for pieces of information such as historic floods [e.g., Stedinger and Cohn, 1986; O'Connell et al., 2002; Parent and Bernier, 2003a; Reis and Stedinger, 2005; Neppel et al., 2010; Payrastre et al., 2011], regional information [e.g., Wood and Rodriguez-Iturbe, 1975; Kuczera, 1982, 1983; Madsen and Rosbjerg, 1997; Fill and Stedinger, 1998; Seidou et al., 2006; Ribatet et al., 2007; Micevski and Kuczera, 2009; Gaume et al., 2010], and less frequently for other information such as, for example, expert opinion [e.g., Kirnbauer et al., 1987; Parent and Bernier, 2003b]. In the literature, there are examples in which more than one piece of additional information was used in a Bayesian analysis. For example, Vicens et al. [1975] investigate information expansion from regional information (through regression models) or expert judgment (causal information from precipitation characteristics) but do not combine the two together. Martins and Stedinger [2001] use historical information jointly with the generalized maximum likelihood method, which can be thought as regional (or expert) information expansion. The aim of this paper is to illustrate by example how all three pieces of information (Figure 1, top row) can be combined in a Bayesian analysis and to assess the sensitivity of the final flood estimate to the assumptions involved. Obviously, there may be applications, where not all three types of information expansion (temporal, spatial, and causal) can be provided, e.g., no historic flood data and/or regional studies are available; so, we test the effect of each piece of information on the flood estimates separately.

Figure 1.

Use of information additional to the maximum annual flood peak systematic record. (top row) The hydrological pieces of information and (bottom row) the way they are used in the Bayesian framework. The terms math formula and math formula are the likelihood function and the prior distribution of the parameters of the selected statistical model, respectively, as in the Bayes' theorem (equation (1) in section 3). In brackets are indicated the sections of the paper where the types of information are discussed.

[7] The sensitivity of the flood estimate to the flood peak sample at hand is also assessed by comparing two cases in a study catchment where a very large flood has occurred. In the first case, we assume that only the information before the big flood is available to mimic the situations where no large floods have been observed but may occur. In the second case, we include the large flood.

[8] The information expansion used in this study is not only diverse in terms of the temporal, spatial, and causal character of the additional information but also in the qualitative character of the information (Figure 1, bottom row): (1) additional data, (2) full information or, and (3) partial information on the prior distribution of parameters of the selected statistical model. In the example presented hereafter, historical floods are used as additional data; regional information provides an estimate of the full distribution of the model parameters, while the estimate with uncertainty of one flood peak quantile, obtained through expert judgment from a rainfall-runoff modeler, constitutes partial information on the model parameters. This may differ in other applications, where, e.g., spatial information expansion could provide only partial information while causal information expansion could provide full information on the prior distribution of the parameters of the selected statistical model.

2. Kamp at Zwettl

[9] The Kamp river at Zwettl is located in northern Austria and has a catchment area of 622 km2. For the Kamp at Zwettl, annual flood peak data from 1951 to 2005 are available (Figure 2). The statistical analyses of the flood peaks are dominated by the extreme flood event in August 2002, which interested a large portion of Central Europe [see, e.g., Choryński et al., 2012]. Due to that flood, the Kamp catchment has been extensively studied in the last years [e.g., Gutknecht et al., 2002; Komma et al., 2007; Blöschl et al., 2008; Reszler et al., 2008; Viglione et al., 2010]. In August 2002, a Vb-cyclone [Mudelsee et al., 2004] carried warm moist air from the Adriatic region and caused persistent rainfall over the Kamp region. This resulted in an estimated peak flow of 460 m3/s, which is three times the second largest flood in the 55 year record. In a humid climate such as Austria, this is an extraordinary event. Owing to the extreme event in 2002, the systematic sample mean annual flood (MAF), its coefficient of variation (CV), and skewness (CS) are 63 m3/s, 0.98, and 5.21, respectively. A generalized extreme value (GEV) distribution, fitted by the method of L-moments, gives a 100 year flood runoff (Q100) of 285 m3/s. When extrapolating this flood frequency curve to large return periods, one would assign a return period of 340 years to the 2002 event.

Figure 2.

Time series of maximum annual peak discharges of the Kamp at Zwettl (622 km2). Ranges for three historical floods and the threshold not exceeded (300 m3/s) in the historic period (1600–1950), at least no more than in these three cases, are also shown.

[10] The 2002 flood departs significantly from the other events in the record and indeed statistical tests, such as threshold analyses [e.g., Stedinger et al., 1993], identify the 2002 flood event as an outlier. Without the extreme event in 2002 (considering the record 1951–2001), the sample MAF, CV, and CS are 57 m3/s, 0.51, and 1.14, respectively. In particular, the skewness is much smaller than that from the sample including the 2002 flood. The Q100 from the sample without the 2002 flood is 159 m3/s (GEV distribution fitted with the method of L-moments). When extrapolating this flood frequency curve to large return periods, one would assign a return period greater than 100,000 years to the 2002 event. If one takes the samples (either with or without the 2002 flood) at face value, one assumes that they are representative of the population of extreme events. When including the 2002 flood, the statistically estimated return period of such an extreme event decreases dramatically, which implies that such extreme events occur regularly, while when excluding the event, the statistically estimated return period of such an extreme event is very high, which implies that such events occur very rarely. It is a similar situation as the one described more than 80 years ago by Hazen [1930], who investigated the effect of one large flood (June 1921 on the Arkansas River) in a short record.

[11] Of course, the extraordinary event of 2002 has to be included in the analysis because it reveals how extreme the floods can be in the Kamp catchment [Laio et al., 2010]. From a practical perspective, one is particularly interested in cases where such an extraordinary flood has not yet occurred, although may occur in the future. We, therefore, compare two cases hereafter. In the first case, we assume that only the information until the end of 2001 is available, i.e., the situation of particular relevance for engineering design. In the second case, we include information until the end of 2005. The comparison is to assess how well a flood of the magnitude of the 2002 event could have been anticipated statistically prior to the occurrence of that event.

3. Bayesian Inference Using Systematic Data Only

[12] In flood frequency analysis, Bayesian inference is a method in which the Bayes' theorem is used to combine the information provided by the locally observed flood data with additional information independent from those data. For a flood frequency distribution with parameters math formula, the Bayes' theorem states that

display math(1)

where math formula is the posterior distribution of the parameters math formula, after having observed the data math formula; math formula is the likelihood function, i.e., the probability density function (pdf) of the data conditional on the parameters; math formula is the prior distribution of the parameters, which can be formulated from additional information that does not take into account any information contained in the observed data math formula; and the integral in the denominator of equation (1), computed on the whole parameter space math formula, serves as a normalization constant to obtain a unit area under the posterior pdf math formula. In Bayesian inference, the parameters math formula are considered as random variables and the uncertainty associated with them can be explicitly modeled, thus allowing to assign credible intervals (which are the Bayesian analogs to the confidence intervals in frequentist statistics) to the estimated flood quantiles. These credible intervals reflect user perception instead of a frequentist assessment of the probability of the true value to fall between them [see, e.g., Montanari et al., 2009].

[13] Since, in most of the cases, the integral in the denominator of equation (1) cannot be processed in closed form, simulation-based Monte Carlo techniques such as the Markov chain Monte Carlo (MCMC) approaches are used. MCMC methods (which include random walk Monte Carlo methods) are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution (in our case, the posterior probability model) as its equilibrium distribution [see, e.g., Robert and Casella, 2004; Gelman et al., 2004]. The states of the chain after a large number of steps are then used as a sample from the desired distribution (the quality of the sample improves as a function of the number of steps). Several MCMC algorithms have been used in flood hydrology [e.g., Kuczera, 1999; Reis and Stedinger, 2005; Ribatet et al., 2007]. We use the Metropolis-Hastings algorithm [Chib and Greenberg, 1995].

[14] When only systematic data are used, we write the likelihood function in equation (1) as

display math(2)

where math formula is the sample of annual discharge maxima systematically recorded math formula (in our case, s = 55 years) and math formula is the pdf of the variable X (representing the maximum annual peak discharges). We assume that the peaks xi are known exactly but equation (2) could be generalized to account for uncertainties (random and independent) and/or systematic errors due, for example, to the construction of the rating curves [see, e.g., Kuczera, 1992, 1996; O'Connell et al., 2002; Reis and Stedinger, 2005; Neppel et al., 2010]. For the distribution function fX, we assume the GEV distribution with parameters math formula:

display math(3)

where math formula denotes the location parameter, math formula the scale parameter, and math formula the shape parameter [e.g., Grimaldi et al., 2011, p. 489]. An improper flat prior on the parameters is used in equation (1) that corresponds to no information other than the data, i.e., math formula math formula = 1. Note that other noninformative priors could be used [e.g., Reis and Stedinger, 2005] and that, in practice, general priors like the geophysical prior proposed by Martins and Stedinger [2000] (GML method) are often employed. In this section, we do not use the GML method, which restrict the shape parameter of the GEV distribution to a statistically/physically reasonable range, because it can be considered as a regional or expert information expansion.

[15] By applying the MCMC algorithm, one obtains the fit represented in Figure 3a, considering the data until 2001 (before the big 2002 event), and in Figure 3b, considering the data until 2005 (after the big 2002 event). The two graphs show the estimates for the flood frequency curves corresponding to the posterior mode (PM) (i.e., GEV with parameters math formula corresponding to the maximum of math formula) and the 90% credible bounds associated with them. Note that the MCMC algorithm is not needed to identify the posterior mode but to quantify the uncertainty. Because the posterior density is known exactly up to a scale normalization constant, one can find the maximum without knowing the normalization constant. The choice of the posterior mode has been made for consistency with the maximum likelihood method, but other choices could have been made. For example, one could have taken the parameters corresponding to the mean of math formula, or, alternatively, one could use the Bayesian posterior predictive distribution, which is defined as the expected exceedence probability distribution of flood peak values with the expectation taken over all feasible values of the parameters (i.e., by integrating over their posterior distribution [see Kuczera, 1999, equation (9)]). The credible bounds have been constructed by reading off the 5% and 95% nonexceedance values from all quantiles corresponding to the MCMC generated parameters.

Figure 3.

Bayesian fit of the GEV distribution to the data of the Kamp at Zwettl (a and c) before and (b and d) after the 2002 event. The following cases are shown: in Figures 3a and 3b, only the systematic data from 1951 are used; in Figures 3c and 3d, historic flood information is also included. The distribution corresponding to the posterior mode (PM) of the parameters is shown as continuous line, while the 5% and 95% credible bounds are shown as dashed lines. Mean, CV, and CS corresponding to the PM are indicated, as well as the 100 years return period quantile and its 5% and 95% credible values.

[16] Figure 3 also lists the values of MAF, CV, and CS for the GEV curve (corresponding to the PM) and the PM value of Q100 along with the 90% credible interval. The PM estimate of the 100 year flood runoff (Q100) after 2002 is almost 100 m3/s higher than the PM estimate before 2002 (the difference of the two is 60% of the value before 2002). The difference of the Q1000 is more than 100%. As can be seen in Figures 3a and 3b, the shape of the fitted curve is very different in the two cases, much more skewed if the analysis is done after the big 2002 event ( math formula = −0.096 before 2002 against math formula = −0.31 after 2002). More remarkably, the 90% credible bounds are very different, i.e., the inclusion of the 2002 event increases considerably the uncertainty accounted for by the method. The 90% credible bounds range for Q100 is about 100% of Q100 before 2002 and 140% of Q100 after 2002. For Q1000, the ranges are more than 200% and almost 300% of the PM estimates before and after 2002, respectively (and this does not change if a 20% random error is assumed for the 2002 event).

[17] Although this increase of uncertainty after 2002 would be expected from a statistical perspective, it is disturbing from a hydrological perspective as the large event has indeed revealed a lot of information on the hydrology of extreme events in this catchment (and one would hope that additional information allows to better constrain the estimates). Observing the 2002 event leads to an increase of knowledge on floods that could happen, therefore to a reduction of knowledge uncertainty (also called epistemic uncertainty in Montanari et al. [2009]). The widened credible bands after 2002 reflect the fact that a higher natural variability (also called structural uncertainty in Montanari et al. [2009]) has been correctly recognized by the statistical method. In the following sections, we compare the two estimates when additional information is used, as summarized in Figure 1.

4. Temporal Information Expansion

[18] To expand information into the past, historical flood information is used [Brázdil et al., 2006]. A survey of the local archives [Wiesbauer, 2004, 2007] reports that the three largest historical floods in the past 400 years occurred in 1655, 1803, and 1829 (Figure 2). The flood discharge of these events is highly uncertain but, for a historic analysis, the relative magnitudes as compared to the 2002 flood suffice. Information on inundation areas indicates that the water levels of the 1655 and 1829 events ranged around the 2002 event but these two events were caused by ice jams, so the discharges were likely smaller than those of the 2002 flood. The inundated area of the 1803 event in the downstream reach of the Kamp was much larger than in August 2002, but there were apparently backwater effects from the Danube, which were less pronounced in 2002, so that the associated flood discharges can be assumed to be smaller than for the 2002 event. These analyses, therefore, suggest that the 2002 event was probably the largest event since 1600. Based on the work of Wiesbauer [2004, 2007], the estimated peak discharges for the three historic events at Zwettl are those shown in Figure 2, where uncertainty bounds have been set at ±25% of the peak discharges from expert judgment on the basis of geometry, roughness, and potential changes in the river morphology. Moreover, for all the other years of the historic period (1600–1950), we assume that the threshold of 300 m3/s was never exceeded. The threshold has been set equal to the highest possible value of the smallest of the three historic events (see Figure 2).

[19] Suppose that math formula is the sample of annual discharge maxima systematically recorded in s = 55 years (from 1951 to 2005), and math formula are k = 3 extraordinary flood discharges of the events of 1655, 1803, and 1829, which occurred during the historical period (h = 350 years). Further, let us suppose that X0 = 300 m3/s is the perception threshold, i.e., the threshold ensuring exhaustivity of the information above it. Finally, the magnitudes of the historical floods are known with uncertainty, for instance, with lower and upper bounds math formula (Figure 2). All this information constitutes math formula, the observed data. We write the joint probability of occurrence of recent and historical flood observations as in Stedinger and Cohn [1986]:

display math(4)

where math formula is given by equation (2) and, indicating by math formula, the cumulative of math formula given by equation (3):

display math(5)

The likelihood function math formula combines three terms: (1) the pdf of the s systematic data; (2) the probability of observing no events above X0 for hk years; and (3) the probability of observing k historical events lying between the specified lower and upper bounds. The distribution describing the uncertainty on the historic flood peaks is assumed to be uniform (between yLj and yUj) but could be generalized to account for other error structures such as systematic errors due to hydraulic models used to reconstruct the discharges [Neppel et al., 2010]. In addition, the threshold is assumed without uncertainty for clarity although it is affected by the same uncertainty as the flood peaks. In section 8, the sensitivity of the estimates to the position of the perception threshold is examined.

[20] Considering the case represented in Figure 2, applying the MCMC algorithm and assuming a GEV distribution and the same improper flat prior used in section 3 (in equation (1), math formula math formula = 1), one obtains the fit represented in Figure 3c, considering the data until 2001, and in Figure 3d, considering the data until 2005. The PM estimate of Q100 after 2002 is just 12% higher than the PM estimate before 2002 (the difference between the two estimates is only 26 m3/s). For Q1000, the difference is about 25%. Compared to the case shown in Figures 3a and 3b (use of the systematic data only), the difference of estimation of flood quantiles in the two cases is much reduced, i.e., the weight of the 2002 event in the estimation exercise has been reduced by accounting for other extreme events. Still, for high return periods, the PM curves differ, as the estimated CS after 2002 is still much higher than before 2002 ( math formula = −0.22 before 2002 against math formula = −0.28 after 2002). Accounting for the historic information also narrows the 90% credible bounds considerably (e.g., the range for Q100 is about 54% of Q100 before 2002 and 56% after 2002).

5. Spatial Information Expansion

[21] Quantitative estimates of flood frequencies based on neighboring catchments can be obtained by various formal regionalization schemes. In the work by Merz and Blöschl [2005], the predictive performance of various types of automatic regionalization methods was examined on the basis of a jack-knifing comparison for 575 Austrian catchments, indicating that a geostatistical method outperforms other methods such as regressions and the region of influence approach. The geostatistical regionalization method known as topkriging [Skøien et al., 2006] takes both catchment area and the river network structure into account and provides regional estimates of streamflow statistics at each point of the network as well as estimates of the uncertainty related to them (variances of estimation). In principle, one could use topkriging to regionalize the GEV parameters, estimated at the gauged stations, but we use the results of Merz et al. [2008] who regionalize the maximum annual flood moments in Austria. In the following, we indicate these moments as math formula. For ease of comparison of catchments of different size, the MAF peak has been standardized as math formula, where A is the catchment area, α = 100 km2 is a standard catchment area and β is obtained from a regional analysis [Merz and Blöschl, 2005, 2008a, 2008b]. Topkriging provides estimates for the means math formula and variances math formula for the Kamp at Zwettl in cross-validation mode, i.e., without using the local data.

[22] If the three-variate distribution of the maximum annual flood moments math formula is given, then the prior distribution of the GEV parameters math formula to be used in the Bayesian procedure can be uniquely defined. Unfortunately, topkriging does not provide an estimate of the entire trivariate distribution math formula. What is missing is the correlation between the moments (cotopkriging could in principle be used to provide this but would require to fit three-dimensional variograms, which is very difficult) and the type of joint distribution.

[23] To obtain an understanding of the shape of the marginals and the correlation between moments, a regional analysis was conducted here of the sample math formula, CV, and CS for all stations in Austria. In Figure 4, the natural logarithms of the estimated math formula, CV, and CS are plotted for all Austrian sites with more than 40 years of data. The first column shows histograms of the flood moments and, for comparison, a normal distribution fitted to them. The second column in Figure 4 represents the same data in normal probability plots and the result of the Anderson-Darling normality test, math formula, is also shown (see D'Agostino and Stephens [1986] and Laio [2004] for details on the test). For math formula and math formula, the test does not reject normality at the 5% level ( math formula = 0.94 and 0.93, respectively), while it does for math formula. Also, in principle, CS should be allowed to be negative. However, visual inspection of Figure 4 suggests that math formula is close to normal (closer than the nontransformed CS) and only nine catchments of 575 in Austria have a local negative CS (corresponding to catchments where anthropogenic effects are important, which is not the case for the Kamp catchment). For simplicity, we therefore assume that the marginal distribution of all three moments, in this particular case study, is lognormal and that their joint distribution math formula is three-variate lognormal. Note that alternative distributions could be used without changing the thrust of the analysis.

Figure 4.

Distribution of the estimated math formula [m3/s/km2], CV, and CS of the maximum annual floods in Austria (stations with more than 40 years of data). The natural logarithm transformation is shown. (left column) The histogram is compared with a fitted normal density function. (middle column) The data are represented in normal probability plots and the result of the Anderson-Darling normality test is shown as math formula. (right column) The correlation between the natural logarithms of the three moments is calculated.

[24] The topkriging moment means and standard deviations at Zwettl in cross-validation mode and the sample correlations between the natural logarithms of the three moments estimated from the regional data (see the third column of Figure 4) are used to estimate math formula at Zwettl as described in Appendix A (equations (A1), (A2), (A3), (A4)). The result is plotted in Figure 5 for the period before the 2002 event (first column) and for the whole period (second column). Mean and CV do not differ very much for the two cases, while the estimated CS is slightly higher when considering the data until 2005 because the neighboring stations also experienced the 2002 event although it was less extreme than at the Kamp. Figure 5 represents the spatial information that we want to include into the Bayesian analysis. The prior distribution of GEV parameters is calculated as math formula, where math formula is the Jacobian of the transformation math formula (see Appendix A, equations (A5), (A6), (A7)). Because information is provided on all three parameters, we consider this as full information on the prior distribution math formula (Figure 1).

Figure 5.

Bivariate marginals of the annual flood peak moment distribution math formula at Zwettl, based on topkriging in cross-validation mode (left) before and (right) after the 2002 event and the correlations between moments from the regional analysis.

[25] The results of the Bayesian fit using the MCMC algorithm are shown in Figures 6a and 6b. The estimates of the CV with the regional information are slightly smaller than the estimates from the temporal information expansion (Figures 3c and 3d). The regionally estimated CS (i.e., math formula = −0.15 before 2002 against math formula = −0.22 after 2002) is also smaller than CS from temporal information expansion. The difference between the two PM estimates in Figures 6a and 6b, before and after 2002, is 38 m3/s, which is about 20%. For Q1000, the estimate after is about 35% higher than the estimate before the 2002 event (see also Figure 10 in section 9). For the 90% credible bounds, the range for Q100 is about 53% of Q100 before 2002 and 45% after 2002. For Q1000, the ranges are about 80% and 60% of the PM estimates before and after 2002, respectively. The width of the credible bounds for highest return periods obtained from the regional information is narrower than the one obtained with the temporal expansion.

Figure 6.

Bayesian fit of the GEV distribution to the data of the Kamp at Zwettl (a and c) before and (b and d) after the 2002 event. The following cases are shown: in Figures 6a and 6b, the regional information of topkriging is used along with the systematic data; in Figures 6c and 6d, expert guess for Q500 is used along with the systematic data. (See caption of Figure 3.)

6. Causal Information Expansion

[26] The most obvious way of causal information expansion is to derive flood frequencies from rainfall information and modeling of rainfall-runoff processes. The main benefit of using rainfall information is that available rainfall records are often much longer than the flood records (e.g., 100 years as opposed to flood records of 50 years in Austria). Modeling the rainfall-runoff processes assists in assessing the processes that have occurred and could occur, including nonlinearities and threshold effects [Gutknecht et al., 2002; Komma et al., 2007; Blöschl et al., 2008; Rogger et al., 2012b]. In this paper, we use the expert opinion of one modeler, Jürgen Komma, who has much experience with the Kamp area, being one of the developers of the operation flood forecasting model in the region, a continuous-simulation model, which he parameterized on the basis of hydrological data and field information [see Komma et al., 2007, for details]. We believe that, though subjective, his understanding of the physical mechanisms of rainfall generation and rainfall-runoff transformation can provide a valuable prior information. After discussing with the modeler, the floods that happened in the past, the mechanisms leading to them, the way he modeled the local hydrology, and his opinion on the uncertainties involved, we asked him to formulate quantitatively his beliefs on extremal behavior of floods in the area. As would be expected, it was not possible for the expert to formulate these beliefs in terms of GEV parameters, but in terms of extreme quantiles associated with large return periods, “a scale on which the expert has familiarity” [Coles and Tawn, 1996, p. 467]. His guess for the 500 year flood peak was of 480 m3/s ±20%. The guessed value (480) is the result of model simulations with artificial rainfall series while the uncertainty is based on his experience with the model in the Kamp and other catchments in Austria. Based on previous studies [Blöschl et al., 2008; Deutsche Vereinigung für Wasserwirtschaft, Abwasser und Abfall (DWA), 2012], and together with the expert, we refined the guess on the 500 year flood peak to the assumption that

display math(6)

where math formula = 480 m3/s, math formula = 80 m3/s, and N is the normal distribution. This is shown in Figure 7, which represents the causal information that we want to include into the Bayesian analysis.

Figure 7.

Expert guess for Q500 at Zwettl, modeled by a Gaussian distribution with mean math formula = 480 m3/s and standard deviation math formula = 80 m3/s. The box-plot shows the 5%, 25%, 50%, 75%, and 95% quantiles.

[27] In contrast to Kirnbauer et al. [1987] and Coles and Tawn [1996], we did not ask the expert for information on more moments/quantiles to specify entirely the prior math formula. Information on Q500 is, therefore, a partial information on the GEV parameters math formula (Figure 1). We use the information in equation (6) as part of the MCMC process. During the MCMC random walk, triplets of values math formula are extracted at each step, corresponding to the following GEV quantile with 500 year return period:

display math(7)

During the MCMC random walk, at each step, the likelihood term is multiplied to math formula (equation (6)) to calculate a posterior distribution of the parameters, which is consistent with the reasonable range for Q500. In other words, the function used by the MCMC analysis to compute the posterior distribution of the parameters is math formula. The fact that math formula is substituted by math formula could suggest that this is an arbitrary and nonrigorous choice, but it is not. In Appendix Partial Information on the GEV Parameters, we demonstrate that math formula, i.e., the pdf of the location parameter conditional on the other two GEV parameters. The information is partial; we cannot specify the entire math formula from math formula but only one of the three GEV parameters conditional on the other two. In Appendix Partial Information on the GEV Parameters, we show that math formula and math formula also can be related to math formula but in different ways (see equations (B1) and (B2)). There is therefore a subjective choice involved (i.e., for what parameter is the information on Q500 used) and one would expect this choice to affect the results. However, we found out that, with the given conditions (e.g., Kamp data, GEV distribution, etc.), this subjective choice has no significant effects on the fitted flood frequency curves (not shown here). Because the information on one parameter conditional to the other two is used as part of the MCMC process, the joint likelihood function, the information provided as to reasonable values of Q500, and the prior distribution for the parameters all work together to describe the likelihood of the triplets as a whole. In particular, the information provided as to reasonable values of Q500 boosts the posterior probability of triplets that are consistent with the specified reasonable range for Q500 and decreases the posterior probability of other sets of parameters.

[28] The results, obtained using math formula as prior, are shown in Figures 6c and 6d. The estimated CS is much higher than CS from temporal and spatial information expansion, which can be explained by the high guess of Q500 obtained by the rainfall-runoff model ( math formula = −0.31 before 2002 against math formula = −0.334 after 2002). In Figure 6d, the value of CS is marked as NA (not available) because the coefficient of skewness of a GEV distribution with shape parameter math formula does not exist. The PM estimate of Q100 after 2002 is only 4% higher than the PM estimate before 2002 (about 10 m3/s), and Q1000 is about 7% higher. The 90% credible bounds range for Q100 is about 44% of Q100 before 2002 and 38% after 2002. For Q1000, the ranges are about 65% of both the PM estimates before and after 2002. The difference between the estimates before and after 2002 is very small for high return periods since the additional information on Q500 is related to the most extreme floods. Citing Coles and Tawn [1996, p. 476]: “if the prior is specified for the extreme tail and we focus attention on the posterior distribution of a quantile which remains above the maximum observed datum as the sample size increases to infinity, then the data necessarily fail to dominate the prior.”

7. Combination of All Information

[29] One of the most appealing aspects of the Bayesian method is the possibility to account for all the different pieces of information together. Among the many different ways that could be used to combine prior information (spatial and causal), the most natural way is by multiplication, which is symmetric to the multiplication of the prior and the likelihood to derive the posterior. Equation (1) then becomes

display math(8)

This corresponds to give equal weight to the four pieces of information: (1) math formula, which is the likelihood function accounting for systematic data only (see section 3); (2) math formula, which is the likelihood function accounting for historic floods (see section 4); (3) math formula, which is the prior obtained from the spatial information (topkriging, see section 5); and (4) math formula, which is the prior obtained from causal information (in this example, just partially defined by using math formula from the expert judgment on Q500 in the MCMC procedure, see section 6).

[30] The results obtained by the MCMC algorithm by using equation (8) are shown in Figures 8a and 8b. Using in this way, all information together provides the highest agreement between the estimates before and after the 2002 event. The difference for the 100 and 1000 year quantiles is of the order of 3% of Q100 (about 8 m3/s) and 5% for Q1000. The CS are very close, and the shape parameters of the GEV distributions are math formula = −0.23 before 2002 and math formula = −0.24 after 2002. The combination of all information provides the narrowest credible bounds as well: the 90% credible bounds range for Q100 is about 34% of Q100 before and 31% of Q100 after 2002; for Q1000, the ranges are about 45% and 40% of the PM estimates before and after 2002, respectively. Of course, inclusion of the extreme 2002 event still affects the flood frequency curve but, if one compares Figure 8 to Figures 3a and 3b, it is clear that the influence of the single extreme event is now very small. The width of the credible bounds is smaller than what was obtained when using any of the single pieces of information (temporal, spatial, or causal), especially for high return periods.

Figure 8.

Bayesian fit of the GEV distribution to the data of the Kamp at Zwettl (a) before and (b) after the 2002 event, when all available information is used. (See caption of Figure 3.)

[31] Many other ways of combining priors exist. Genest and Zidek [1986] discuss several pooling models, which are based on different assumptions. The type of combination one chooses is a choice (a model of combination) rather than the result of some probabilistic algebra (as is the multiplication of the prior and the likelihood to derive the posterior). The combination math formula is a special case of what Genest and Zidek [1986] call “geometric (or logarithmic) opinion pool,” i.e., math formula with weights wi assumed equal to one (other weights are considered in the sensitivity analysis of section 8). The rationale behind multiplying the priors is that each prior is individually considered representative of the entire population of the GEV parameters. One fundamentally different choice is to sum up the prior distributions math formula, which corresponds to what Genest and Zidek [1986] call “linear opinion pool” (i.e., math formula). The rationale behind summing up the priors is that each prior individually represents a different part but together they are representative of the entire population of the GEV parameters. As an illustration, if one is interested in estimating the distribution of weights of Italian people and has prior information based on two studies conducted one in Rome and the other in Turin, one would combine the pieces of information by multiplying these priors. If, instead, one has prior information based on two studies conducted in Rome, one on males and the other on females, one would combine the pieces of information by summing up these priors. Even though in our study we assume that the spatial and causal pieces of information are both individually representative of the flood frequency in the Kamp region, and therefore only the multiplication of priors should be considered, in section 8 we also show the results we would obtain by summing up (and eventually weighting) the priors. The summation of priors would be definitively an option in studies where different pieces of information can be associated to different aspects of the population of floods, for instance, to different flood types.

8. Sensitivity Analysis

[32] The flood estimates of the Bayesian method, obviously, depend on the parameters used and, to some extent, on the assumptions made. To assess what is the relative role of the different pieces of information accounted for in equation (8) and the importance of their reliability, Figure 9 shows the result of a sensitivity analysis of some of the parameters used and the assumptions made in the previous sections. For clarity, Q100 and Q1000 are considered as estimates after the 2002 event only. The first whisker with the black square point in the left part of the two graphs represents the PM estimate and 90% credible bounds obtained using all information as in Figure 8.

Figure 9.

Sensitivity of the flood estimates on the parameters used and the assumptions made. Posterior mode estimation (points) and 90% credible bounds (segments) of (a) Q100 and (b) Q1000 at the Kamp catchment using all pieces of information, after the big 2002 event. The far left bars (with the black square point) show estimates as in Figure 8; s is the length of the systematic record, X0 is the perception threshold, h is the length of the historical period (equation (8)), math formula is the standard deviation of the topkriging estimates of the moments of the maximum annual peaks, math formula and math formula are the mean and standard deviation of Q500 given by the expert rainfall-runoff modeler, respectively, and math formula and math formula are the prior distributions of the GEV parameters considering the spatial and causal information expansion, respectively.

Figure 10.

Posterior mode estimation (points) and 90% credible bounds (segments) of (a) Q100 and (b) Q1000 at the Kamp before and after the big 2002 event. In the first case (system) just the systematic flood data are used. The following cases show the estimates using different pieces of information in addition to the systematic flood data.

8.1. Sample Length

[33] The first four whiskers, with a circular point and with a gray background shade, show the effect on the estimates of Q100 and Q1000 of the length of the systematic data period, when all other pieces of information are also considered (as in section 7). The first of these whiskers (denoted by s = 107) is obtained by a longer systematic data sample reconstructed from water stage data [Gutknecht et al., 2002] for the period 1896 to 1947. These additional data were not used in our previous analyses (i.e., sections 4 and 7) to account for the information added by historical floods only. The reconstructed data indicate that a number of large floods occurred in the first half of the twentieth century, while floods tended to be lower in the second half. Considering these additional data as systematic recordings, the likelihood function is again expressed by equation (2), where the number of observations is now s = 107 for the period 1896–1947 and 1951–2005. The PM estimate does not change much (it is slightly higher because of the large floods in the first half of the century), and the credible bounds are only slightly narrower. The second whisker in the gray region (s = 20) is obtained by considering only the last 20 years of data, the third (s = 5) considering only the last 5 years, and the fourth (s = 0) by excluding the systematic data from the analysis (in this last case, for the 2002 event, we have just assumed that it was over the threshold of 300 m3/s used for the historical period). The PM values increase for a decreasing number of systematic years, but the increase is moderate. In the worst case of no systematic data (ungauged site, s = 0), the PM estimate of Q100 and Q1000 increase by circa 20% and 10%, respectively. The width of the credible bounds also increases for a decreasing number of systematic years, especially for Q100. This last effect is more evident for low return period quantiles (not shown in the figure) since the systematic data give much more information for the small floods than for the big ones.

8.2. Temporal Expansion

[34] The subsequent four whiskers in Figures 9a and 9b, with white background, result from changes in the assumptions made for the temporal expansion of information. The first of them is obtained if one assumes the threshold X0 of equation (4) to be 200 m3/s instead of 300 m3/s. Of course, the PM estimate is then lower because one assumes that, in the 350 years before the systematic period, all maximum annual floods were relatively small (except for the three historic events). Also, the credible bounds are narrower. This is because, reducing the threshold, one constrains the range of values for the flood peaks in the past. This would be a valuable result, provided that the threshold was really exceeded only three times. Normally, the less the information one has on the historic period, the highest the threshold X0 is, which is usually chosen to be slightly below the most extreme events reported for the presystematic period. The neighboring whisker is obtained by considering X0 = 400 m3/s and presents, as expected, a higher PM value (but not as much as it was lowered by the X0 = 200 m3/s case) and slightly wider credible bounds. Increasing the threshold corresponds to providing less information on floods in the historic period.

[35] The following two whiskers show the case where the historic period is considered to be 500 years longer (h = 850 years) or 200 years shorter (h = 150 years). In the first case, this is like increasing the information content and the result is similar to the one obtained by decreasing X0. The PM value is lower and the credible bounds narrower because now we assume that the threshold X0 = 300 m3/s was exceeded only three times in 850 years instead of 350 years. Analogously, for a shorter historic period, the PM estimate is higher and the credible bounds larger similarly as for a higher threshold. Similar sensitivity analyses were presented in Gaume et al. [2010] showing analogous results. We also analyzed the effect of increasing or reducing the uncertainty of estimation of the three historic peaks (not shown in Figure 9). By setting the historical peak discharge ranges to be ±45% of the peak discharges estimates (more uncertainty, if compared to ±25% used before), Q100 is circa 1% and 2% lower before and after 2002, respectively. By setting the historical peak discharge ranges to be ±5% of the peak discharges estimates (less uncertainty), Q100 is circa 1% higher both before and after 2002. The same considerations apply to the 90% credible bounds. Because the historic period is long compared with the systematic sample, the level of uncertainty of the historic flood peak estimates has only a limited impact on the statistical inference results [see also Stedinger and Cohn, 1986; Martins and Stedinger, 2001; Payrastre et al., 2011]. This might also be due to the assumption that these errors are independent, i.e., due to the imprecise knowledge of the water levels. Systematic errors, such as rating curve errors, might exert more leverage on the inference [e.g., Kuczera, 1996; Neppel et al., 2010].

8.3. Spatial Expansion

[36] The four whiskers in the central part with gray background of Figures 9a and 9b result from a sensitivity analysis on the assumptions made for the spatial expansion of information. The first two whiskers are obtained by doubling or halving the standard deviation of the topkriging estimates of the moments of the maximum annual peaks ( math formula and math formula, respectively). Surprisingly, the width of the credible bounds is not much affected by that, but the position of the estimates is. By doubling the standard deviations, the estimates of Q100 and Q1000 are smaller. The reason of that is related to the asymmetric shape of the prior distribution of the moments (Figure 5). For instance, increasing the standard deviation math formula decreases math formula because of equation (A3) (see Appendix A). For the same reason, halving the standard deviations increases the estimates of Q100 and Q1000.

[37] The third whisker (no corr.) is obtained by considering no correlation in the prior distribution of the moments of QT given by topkriging. In the example provided here, the effect of neglecting the correlation among moments is very small. Visible effects on the estimates of the flood frequency curve are obtained only if the degree of correlation among moments is very high (high corr.), i.e., math formula, math formula, and math formula, when the PM estimates increase and the credible bounds become slightly narrower, especially for Q1000. Assuming high correlation of the moments (or parameters) corresponds to reducing the degrees of freedom for the fitted model. It is like providing prior information on the type of distribution with less than three free parameters.

8.4. Causal Expansion

[38] The four whiskers, in the central-right part with white background of Figures 9a and 9b, show the effect of increasing or decreasing the mean estimate of Q500 by ±20% in the causal expansion, as well as doubling or halving the standard deviation. Departures of ±20% in the expert guess (a priori) cause departures in the PM estimates (a posteriori) of less than ±10% (for both Q100 and Q1000), because of the other data taken into account. Doubling the standard deviation increases the width of the credible intervals and produces a lower PM estimate of Q100. Halving the standard deviation has the opposite effect. The results are consistent with those of the sensitivity analysis on the spatial expansion, but the reasons are different. In this case, the prior distribution of Q500 is symmetric and produces an increase of the PM estimation of Q100 and Q1000 with respect to the other pieces of information. By reducing the confidence in Q500 (i.e., doubling the standard deviation), the increase of the PM estimates of Q100 and Q1000 is reduced, while the opposite happens if one has more confidence in the estimates of Q500 provided by the expert. Also, the effects are much more pronounced than when doubling and halving the standard deviations of the topkriging estimates of the moments of QT, especially for Q1000. This is because the causal information expansion controls the highest return period quantiles in this example.

8.5. Prior Combination

[39] The last four whiskers, in the right part with gray background of Figures 9a and 9b, show the effect of combining the spatial and causal information in different ways. As discussed in section 7, we multiply the priors math formula, which is symmetric to the multiplication of the prior and the likelihood to derive the posterior and which is based on the assumption that each prior is individually representative of the entire population of the GEV parameters. To investigate the sensitivity to this assumption, we also try different combinations. For instance, the first whisker uses the combination math formula, which corresponds to smoothing the prior distribution used in section 7 (i.e., augmenting its variance). In fact, compared with the simple multiplication (first whiskers with the square black point), the credible bounds are wider. The posterior mode is a little bit lower because less weight is given to the prior, which, because of the causal information, tends to increase the estimates for high quantiles. That is the reason of the higher position of the second whisker, which illustrates the case of using a higher weight for the causal than for the spatial information (i.e., math formula).

[40] As discussed in section 7, one fundamentally different choice is to sum up the prior distributions math formula. The rationale behind summing up the priors is that each prior individually represents a different part but together they are representative of the entire population of the GEV parameters. Even though we argue that this is not a reasonable assumption in our case, the third whisker shows the effect of this choice on the results, for illustration purposes. The credible bounds are wide because they embrace those of the spatial expansion (which gives lower flood estimates) and of the causal expansion (which gives higher flood estimates) together. The fourth and last whisker illustrates the effect of giving more weight to one of the pieces of information, in this case the causal information, on the flood estimates. Using the combination math formula does not change the results. The result would be the same if math formula were used (not shown here).

9. Discussion

[41] In this paper, we illustrate by example how three pieces of information (temporal, spatial, and causal) can be combined with the systematic flood data in a Bayesian analysis and compare the flood frequency estimations before and after the extraordinary 2002 flood event in the Kamp catchment in Austria. Figure 10 shows the comparison of the PM estimates (i.e., Bayesian analogues to the maximum likelihood estimates) and 90% credible intervals of Q100 and Q1000 if the analyses are made before or after 2002 and considering the different pieces of information. The figure allows to directly compare the results of Figures 3, 6, and 8.

[42] By looking at Figure 10, one notices that, when compared to the temporal and causal expansions, the spatial information expansion gives (1) the smallest PM estimates of Q100 and Q1000; (2) a higher difference between the estimates before and after 2002; and (3) narrower confidence bounds, specially for Q1000. The reason for the low estimates obtained by the spatial information expansion has to do with the regional data in northern Austria. For most of the gauging stations of the Kamp region, the local discharges (including the 2002 flood) are higher than the regional estimates. Because of the smaller number of outliers outside the Kamp region, the quantiles obtained with the regional information are smaller than those obtained with the temporal or causal information. Another reason is that, by deriving the prior distribution of the GEV parameters from the moments, the GEV shape parameter cannot be lower than −1/3, for which CS does not exist. This could be avoided by regionalizing the GEV parameters directly using topkriging. The highest difference between the estimates before and after 2002 is due to the fact that we use the topkriging estimates before and after the 2002 event (we respectively ignore or account for the 2002 event in the other catchments of the Kamp area), while for the temporal and causal expansion, we use the same additional information in the two cases (before and after 2002).

[43] The causal information places the PM estimates of Q100 and Q1000 above the values obtained by all other information. The rainfall-runoff model used by the expert gives relatively large peak discharges because the catchment consists of porous soils, which fill up above a rainfall threshold of about 70 mm and significantly increase the runoff coefficients [Gutknecht et al., 2002; Komma et al., 2007; Blöschl et al., 2008]. In addition, with event-based models, widely used in hydrologic practice, it is quite common that the design flood method gives significantly larger peak flows than flood statistics for the same catchment, which is related to the usual assumptions made in the design flood method [Viglione and Blöschl, 2009; Viglione et al., 2009; Rogger et al., 2012a]. The nature of the judgment of the expert also plays a role. With a similar Bayesian analysis, but for extreme rainfall, Coles and Tawn [1996] obtain estimates of high return period quantiles higher than those obtained with maximum likelihood, since the expert opinion essentially is that events more extreme than those observed at the site can happen in the region with nonnegligible probability. In our example, the causal information expansion is the one that gives the smallest difference between Q100 and Q1000 estimated before and after 2002. This is because the expert guess for Q500 is the same in the two cases (we interviewed him only once) and affects the estimation of high return period quantiles.

[44] Despite the differences in Figure 10, the estimates of Q100 and Q1000 from the three types of additional information, all fall within a range that is narrower than that from flood frequency analysis with systematic data alone. Under the assumptions made, adding information to the systematic data reduces the difference between the two estimates substantially (increase the stability of estimation). Temporal, spatial, and causal expansions, in the example considered in this paper, all provide a significant reduction of the credible bounds (increase the precision). What has not been analyzed in this paper is the validity of estimation of the flood frequency curve and the uncertainty associated with it, i.e., how close is the estimate to the truth. In the real case example treated here, we cannot answer this question. Simulations of ‘synthetic realities’ could have been used instead, as for example those in Kuczera [1982] or Seidou et al. [2006], to compare the validity of the method and investigate the influence of prior specifications (or misspecifications) on the estimates, but we considered a real world case to be more insightful for the purposes of this paper: i.e., for conveying the message that combining information from many sources is valuable not only in theory but also in practice.

[45] The considerations made so far are valid under the assumptions made, as stated many times in the paper. It is, therefore, important to discuss these assumptions, at least the principal ones, and refer to the literature dealing with them explicitly. For example, regarding the historic floods, we have assumed that the uncertainties were given by independent, nonsystematic errors. The evaluation of the sources of uncertainties is a delicate issue, as shown by Neppel et al. [2010], who analyzed recent and historical maxima, accounting for additive and multiplicative errors affecting discharge values due to random errors in water-level readings and systematic rating curve errors, respectively. Their finding is that, as in our case, independent discharge errors stemming from imprecise knowledge of the water levels does not appear to impact much on the results, while the prior for systematic rating curve errors exerts a significant influence on the estimated quantiles.

[46] When applying the Bayes theorem with historical floods, we have implicitly assumed stationarity in time, i.e., we have assumed that the climate and catchment processes leading to floods in the historic period are the same as those leading to floods today. There are ways of incorporating information on changes in time in nonstationary models. For example, Perreault et al. [2000a, 2000b] consider change-points (in the mean and variance) in the Bayesian analysis of hydropower energy inflow time series; Renard et al. [2006] consider change-point and linear trend models for flood frequency analysis; El Adlouni et al. [2007] and Ouarda and El Adlouni [2011] generalize the generalized maximum likelihood method of Martins and Stedinger [2000] to the nonstationary case; and Lima and Lall [2010] account for nonstationarity of annual maximum floods and monthly streamflows in a regional context with a Bayesian method. Similar to stationarity in time, an assumption of homogeneity in space has been made here, i.e., when applying topkriging (and the same would hold for any other regional model), we assume that the processes leading to floods in the neighboring catchments are analogous to those leading to floods in the catchment of interest.

[47] In this paper, the GEV distribution is used as statistical model for the flood peaks, which has been shown to reproduce the sample frequency distributions of hydrological extremes around the world [see, e.g., Stedinger et al., 1993; Vogel and Wilson, 1996; Robson and Reed, 1999; Castellarin et al., 2001; Coles et al., 2003]. However, if mixed processes and thresholds occur [Gutknecht et al., 2002; Komma et al., 2007; Blöschl et al., 2008; Rogger et al., 2012b], other models should be preferred. In the Kamp catchment, of the maximum annual floods in the systematic period 37% were long rain floods, 11% short rain floods, 4% flash floods, 41% rain on snow floods, and 7% of snow melt floods [Merz and Blöschl, 2003], while two of the three historic floods were caused by ice jams associated with heavy rain and snow melt. A mixture of models could be used to account for the diversity of processes, but the data samples stratified by flood processes would be too short. Recent literature exists in which model mixture and/or uncertainty in model selection is accounted for in a Bayesian framework [e.g., Wood and Rodriguez-Iturbe, 1975; Perreault et al., 2000a, 2000b; Renard et al., 2006; Montanari, 2011].

[48] Another issue is the independence of the pieces of information. Even if in general there are ways to combine nonindependent pieces of information into priors [Genest and Zidek, 1986], in Bayesian analysis, priors have to be specified independently from the local data. The topkriging estimates after 2002 at Zwettl, even if in cross-validation mode, are not fully independent from the 2002 event happening in the neighboring catchments (see Stedinger [1983], Stedinger and Tasker [1985], Hosking and Wallis [1988], and Castellarin et al. [2005, 2008] for a discussion on the effects of intersite correlation in regional frequency analysis). Taking these correlations into account may slightly increase the width of the credible bounds. However, we expect this effect to be small, given the similarity between the estimations with regional information before and after 2002. In addition, for the causal information expansion, the independence assumption may not fully apply. When running the rainfall-runoff model, the expert does not ignore the observed floods at Zwettl, since the model was calibrated. However, the calibration of the continuous model considers the whole runoff time series, including minor events and dry periods. Therefore, the assumption of independence from the maximum annual peak data is reasonable.

[49] The consistency of information is another assumption made in the Bayesian approach [Gelman et al., 2004; Laio and Tamea, 2007]. Data (a likelihood function) that are in conflict with prior information will result in a posterior that is a compromise between the two disparate sources of information but is in disaccord with both. This is because likelihood and prior distribution are multiplied. Nonconsistencies have to be recognized when checking the results of the analysis. In the sensitivity analysis, for example, we checked the effect of departures in the causal information (by varying math formula in Figure 9). In the Kamp example, all pieces of information are reasonably consistent, as can be inferred by comparing the ranges in Figure 10. In the case that different pieces of prior information are inconsistent because they are representative of different parts of the population of the variable of interest, pooling methods exist that allow to combine them properly (e.g., by linear pooling as discussed in sections 7 and 8).

[50] In the example reported in this paper, temporal and spatial information expansions can be considered as “objective”, as their use (in the likelihood and as prior) are based on procedures whose hypotheses can be listed and openly discussed and scrutinized. The causal information expansion instead is “subjective” as its specification is based on personal judgment that is far more difficult to formalize. Of course, objective causal information could be obtained from rainfall-runoff model predictions, ideally not mediated by expert judgment, by rigorously quantifying the associated uncertainties (in the inputs, model parameters, and structure). This is a challenge, and there is ongoing research in the hydrologic literature [see Montanari et al., 2009, and references therein]. Subjective information, in form of personal experience, is nevertheless very valuable. However, to use personal experience in a Bayesian analysis, the experts have to be “trained” to make probabilistic statements. More qualified experts may be able to provide more accurate statements, in the sense that their prior mean is closer to the true value they are trying to predict compared with less qualified experts, but they may have a tendency to underestimate uncertainty. This is what Taleb [2007] calls “epistemic arrogance”, i.e., the difference between what someone actually knows and how much he thinks he knows. Also, if many experts are interviewed [e.g., Kirnbauer et al., 1987], it is not trivial to decide how to combine their opinions [Genest and Zidek, 1986].

10. Conclusions

[51] The main message of this paper is in line with the messages of the companion papers of Merz and Blöschl [2008a, 2008b] in the flood frequency hydrology series: combining information from many sources is very useful. While, in the previous papers, the usefulness of combining many pieces of information is demonstrated by heuristic reasoning, it is demonstrated here by Bayesian analysis. Even if a number of assumptions have been made in this paper, we believe that the ability of the Bayesian approach to use all pieces of information in conjunction is a major advantage over other methods. The way to formalize the hydrological information into a Bayesian framework is not straightforward, but possible, as has been shown here through examples. Expanding information beyond the systematic flood record is sometimes considered of little value in engineering hydrology because subjective assumptions are involved. In the past, most of the national guidelines [e.g., Natural Environment Research Council (NERC), 1975; Deutscher Verband für Wasserwirtschaft und Kulturbau (DVWK), 1999] have therefore involved prescriptive procedures with little choice for the hydrologists who applies them, while few have encouraged the use of additional information [see e.g. U.S. Interagency Advisory Committee on Water Data, 1982, p. 20, Bulletin 17B]. More recent versions of these guidelines [e.g., DWA, 2012] explicitly focus on temporal, spatial, and causal information to complement the systematic flood data (see also Stedinger and Griffis [2008], for a discussion on how Bulletin 17B will be revised).

[52] The results of this study suggest that the additional information can significantly improve the confidence in flood frequency estimates. With all the information combined together, our estimated 100 year flood peak at Zwettl is 250 m3/s with a 90% credible region of ±15%. This estimate would have been practically the same in 2001, before observing the extraordinary 2002 flood event. The return period of this event (460 m3/s) would be estimated as 1000 years before the event and 800 years after it with 90% credible bounds of 500 to 4500 years and 450 to 2300, respectively. This contrasts the estimates of >100,000 years and 340 years with the systematic data alone. While a range of assumptions need to be made, the final flood estimate is not much affected by assumptions varying in a reasonable range because the information provided through combining independent sources (temporal, spatial, and causal) may outweigh the uncertainty introduced by these assumptions. It is suggested that complementing systematic flood data by temporal, spatial, and causal information should become the standard procedure for estimating large return period floods.

Appendix A: From the GEV Moments to the GEV Parameters

[53] In section 5, it is argued that, in Austria, the logarithms of the maximum annual peak moments math formula follow the three-variate normal distribution:

display math(A1)

with mean math formula and covariance matrix:

display math(A2)

Because Merz et al. (2008) provided math formula and math formula, and not their logarithms, we used the transformation [see Kottegoda and Rosso, 1997, pp. 225–227]:

display math(A3)

While for the covariance terms, we used the sample correlations between the natural logarithm of the three moments estimated from regional data and indicated in Figure 4. The trivariate lognormal distribution of the maximum annual peak moments math formula is obtained transforming the normal distribution of equation (A1) to

display math(A4)

whose bivariate marginals are plotted in Figure 5.

[54] The transformation of the three-variate distribution math formula into the prior distribution of the GEV parameters math formula is

display math(A5)

where math formula is the Jacobian of the transformation math formula. The transformation math formula is [Stedinger et al., 1993, p. 18.17]

display math(A6)

where the proportionality in the first equation is due to the standardization of the MAF values. The corresponding Jacobian is

display math(A7)

with math formula and math formula, where math formula, math formula is the gamma function, and math formula is the digamma function.

Appendix B: Partial Information on the GEV Parameters

[55] In section 6, we argued that the information on Q500 correspond to partial information on the GEV parameters math formula. The density math formula provides information on one parameter given the other two, e.g., on math formula, where

display math

Because the functional form relating Q500 to the GEV parameters ( math formula in equation (7)) is assumed to be known with certainty, then

display math

where math formula is the Dirac delta function and math formula is the value of the scale parameter for which, given math formula and math formula, math formula. Because math formula, where f has a single root x0, then

display math

The two terms after the δ Dirac function depend on Q500 only; therefore, considering the properties of the δ Dirac, the integral is equal to evaluating these two terms on math formula, i.e., noting that math formula,

display math

Because math formula, math formula.

[56] Analogously,

display math(B1)

and

display math(B2)

where math formula.

[57] During the MCMC random walk, math formula, math formula, and math formula are given at each step and either math formula, math formula, or math formula can be used by the MCMC analysis to compute the posterior distribution of the parameters. There is, therefore, a subjective choice involved, i.e., for what parameter is the information on Q500 used. However, by applying the procedure to the case study considered in this paper, we found that this choice has no significant effect on the results (not shown here).

[58] In the analyses in this paper, we have used math formula. Although the information is partial, one can see math formula as a whole prior (although an improper one) by noting that math formula, because flat priors are used for math formula and math formula.

Acknowledgments

[59] Financial support for the project “Mountain floods—regional joint probability estimation of extreme events” funded by the Austrian Academy of Sciences, the Doctoral program DK-plus W1219-N22 funded by the Austrian Science Funds, and the “FloodChange” project funded by the European Research Council are acknowledged. We would like to thank John England, Dmitri Kavetski, Francesco Laio, Alberto Montanari, Jery Stedinger, and an anonymous reviewer, whose comments enabled us to significantly improve the paper.

Ancillary