A Bayesian extreme value analysis of debris flows


  • Natalia Nolde,

    Corresponding author
    1. Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
    • Corresponding author: N. Nolde, Department of Statistics, University of British Columbia, 3182 Earth Sciences Building, 2207 Main Mall, Vancouver, BC V6T 1Z4, Canada. (natalia@stat.ubc.ca)

    Search for more papers by this author
  • Harry Joe

    1. Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
    Search for more papers by this author


[1] Debris flows carry a tremendous potential for physical destruction as well as a threat to human lives. Quantitative analysis of their frequency and magnitude relation is key to the development of mitigation measures to reduce hazard and risk. Yet, the data available for such analysis are typically very sparse, leading to point estimates for return levels that are too imprecise to be of practical value. The aim of the paper is to demonstrate how additional sources of information, in particular, expert judgment which uses physical causes and the size of the active area producing loose sediments, can be incorporated to produce more precise estimates with a smaller upper endpoint of interval estimates of extreme return levels. A Bayesian framework for extreme value analysis is used. We provide a rationale for the prior choice and discuss how its parameters can be elicited from the expert's knowledge. A case study of debris flows at Capricorn Creek in Western Canada is used to illustrate our methodology.

1. Introduction

[2] Debris flows often carry a tremendous hazard and risk potential. Accurate assessment of this risk is key to its proper management. One of the main ingredients of such risk assessment is a debris flow frequency-magnitude analysis based on records of past debris flow events. A frequency-magnitude analysis relates the volume of debris flows with the likelihood of such event occurrence. However, the nature of the data and of the problem greatly challenges standard analytical techniques. Some of the standard methods are discussed in Jakob [2012] along with the issues surrounding their use. The present paper outlines a methodology that allows one to combine historical data with expert judgment and thus improve on the precision of statistical estimation for the frequency-magnitude relation.

[3] Historical data, comprising volume estimates of debris flows, are usually not directly observed but rather need to be reconstructed using absolute dating methods such as dendrochronology, radiocarbon dating, and varve chronologies [Chiverell and Jakob, 2012]. In this process, large events can be identified but even then their volumetric values tend to be prone to large errors due to erosion. Although such records of past events can span time periods of over a hundred years, they usually contain only a small number of event data points owing both to rareness of the phenomenon and the fact that smaller events remain undetected.

[4] Once a record of debris flows for a given location is obtained, it is used to estimate return levels, that is, levels of the debris flow volume to be exceeded with a specified annual frequency or probability, for a range of return periods. For instance, in British Columbia, Canada, the landslide safety criterion with respect to life-threatening or catastrophic landslides is set to the 10,000 year return level in the landslide safety guidelines provided by the Ministry of Transportation and Infrastructure [2009]. The return period of 10,000 years falls well beyond the span of most available data records, and hence estimation of the associated return level has to rely on a model-based extrapolation. A standard statistical approach to tackle this kind of problem is to make use of the asymptotic results of extreme value theory (EVT). The basic references on statistics of extremes and its applications include Beirlant et al. [2004], Embrechts et al. [1997], Coles [2001], and Reiss and Thomas [2007]. Katz et al. [2002] is a review article with focus on application of extreme value methods in hydrology. The models motivated by EVT serve as a theoretically justified basis for extrapolation to extreme levels of the process but, as for any model, the results of such extrapolations have to be treated with caution.

[5] Inference for the chosen model can be performed using a variety of methods. The moments-based methods have been shown to possess better small sample properties [see e.g., Madsen et al., 1997]; however, precision seems to be gained at the expense of underestimation in return levels when data come from a very heavy-tailed distribution (based on a simulation study described in Appendix Performance of Estimation Methods When Sampling From a Heavy-Tailed GP Distribution), a likely scenario in the case of debris flows we consider. This is a serious drawback from the risk management perspective. An alternative such as the maximum likelihood method is not usually recommended for small data samples. Due to a small sample size, confidence intervals for maximum likelihood return level estimates tend to be very wide, reflecting tremendous sampling variability especially in view of the extreme value problem at hand. Such wide confidence intervals are impractical in real-life decision making. Maximum likelihood return level estimates also suffer from bias, albeit on the positive side. One approach to address these issues of maximum likelihood is via restriction of the domain of possible values of the model shape parameter [see Coles and Dixon, 1999; Martins and Stedinger, 2001]. The shape parameter is key in determining the tail of the assumed distribution. However, such restrictions on the tail may not necessarily be justified in the present context.

[6] In our approach, in order to improve the overall input of the analysis as well as precision of the return level estimation, we propose to supplement the available small sample of historical debris flow events with additional information in the form of expert judgment. In particular, we have sought an expert opinion concerning the likely magnitudes of debris flows return levels. Kris Holm, a senior geoscientist at BGC Engineering in Vancouver, Canada, has kindly agreed to be our debris flow expert and to provide required information.

[7] The idea that incorporation of additional information into the analysis has the potential to make estimation more precise and reduce the bias has been exploited in various forms and contexts. The use of expert opinion similar to our approach has been suggested in Coles and Tawn [1996]. In flood frequency analyses, Jin and Stedinger [1989] combine the regional and historical information via maximum likelihood estimation, while O'Connell et al. [2002] employ a Bayesian methodology to include historical and paleohydrologic bound data. Examples of methods and case studies using regional information are Coles and Powell [1996], Casson and Coles [1999], and Ribatet et al. [2006]. Coles and Dixon [1999] and Martins and Stedinger [2001] incorporate additional information by imposing restrictions on the model shape parameter, as mentioned above.

[8] To combine the expert opinion with the available data, we make use of Bayesian techniques. For a recent review of Bayesian analysis of hydrologic extremes, refer to Renard et al. [2013]; an earlier review paper on Bayesian methods in extreme value modeling is Coles and Powell [1996]. The (likelihood) model we assume for the data is based on the point process representation of excesses over a high threshold. In the current application, as a threshold we use a volume estimate above which debris flows can be identified. Exceedance times or, equivalently, in our case, times of debris flow events are assumed to follow a Poisson process, with volume amounts and occurrence times being independent. This model is determined by three parameters. Without reference to the data, the sampling distribution of the parameter vector, known as the prior distribution, is specified using expert opinion. Applying Bayes' theorem, the prior distribution can be updated to incorporate the available record of data. The resulting posterior distribution can be interpreted as the sampling distribution of the parameter estimate that combines the given data and the prior information, derived with the expert's judgment. The posterior distribution should be less spread out than the sampling distribution of say the maximum likelihood estimate, as the latter uses less information. Hence, it can lead to shorter interval estimates of return levels. Details are provided in section 2. As a case study, presented in section 3, we consider the record of debris flows at Capricorn Creek on Mount Meager in British Columbia, Canada. Discussion of the results in comparison to other methods, and sensitivity analysis with respect to prior choice and data uncertainty are given in section 4. Section 5 summarizes our findings and conclusions. Appendix Performance of Estimation Methods When Sampling From a Heavy-Tailed GP Distribution supplements the analysis in section 4.1.

2. Methodology

2.1. Likelihood Assumptions

[9] Let inline image denote the available historical data. These can be thought of as realizations of a random variable, which in our case represents the volume of debris flows when they occur. We assume that the stochastic behaviour of this random variable can be reasonably described by a probability model that is determined by a parameter inline image, possibly vector valued. The choice of the model is crucial as it forms part of the basis for inference and prediction.

[10] Unlike for many hydrological sequences where measurements are available at regular time intervals, say daily, debris flows can typically be identified only when their volume exceeds a certain threshold. The model we adopt is a process in which debris flow volumes (marks) follow a generalized Pareto (GP) distribution, whereas the debris flow events occur according to a Poisson process with the annual rate parameter inline image; magnitude of debris flows and their occurrence times are assumed to be independent. The distribution function of the GP distribution with scale parameter inline image and shape parameter inline image has the form

display math(1)

[11] The asymptotic result on which the model is based goes back to Balkema and de Haan [1974] and Pickands [1975]. It asserts that, for sequences of independent and identically distributed random variables, excesses over a high threshold can be well approximated by the GP distribution for a wide class of underlying distributions.

[12] Based on the above model assumptions, probability of no-debris flow event in any given year is inline image. The probability that ky debris flow events occur in year y is inline image, whereas contribution of the ith observation ( inline image) to the likelihood function is inline image, where inline image denotes the density of the GP distribution with parameters inline image and inline image, and u is the chosen threshold. Hence, due to assumed independence of debris flow events, the total likelihood, the model-based probability of the observed data, is the product of individual contributions conditional on the detection of debris flow events:

display math
display math
display math(2)

where ny is the number of years spanned by the data, inline image is the number of years (time periods) with at least one event, and ky is the number of events in year y. The Fisher information matrix for the likelihood in (2) has the form

display math(3)

where the submatrix formed by the first two rows and columns of inline image is the Fisher information matrix, derived in Smith [1984], for the GP density.

[13] Assume inline image (otherwise all results below are taken as the limit with inline image). The N-year return level corresponds to the level with return period of an average of ( inline image) observations, which can be computed by solving inline image for qN, where u denotes the threshold. This gives

display math(4)

[14] It is common to use an alternative parametrization of the point process model introduced above. The shape parameter ξ remains the same, but a location parameter inline image and a new scale parameter inline image are introduced via

display math(5)

[15] In this parametrization, inline image correspond to the parameters of the generalized extreme value (GEV) distribution. The N-year return level is then given by

display math(6)

[16] Note that for inline image, and substituting this asymptotic identity for N in (6), one recovers the (1 – 1/N)-quantile of the GEV distribution.

[17] It can be shown that in the inline image-parametrization, the likelihood function satisfies [see e.g., Coles, 2001, section 7.5]

display math(7)

where ny is the number of years spanned by the data and n is the number of debris flow events over this time period. The Fisher information matrix for inline image can be computed via

display math(8)

where J1 is the Jacobian matrix

display math
display math
display math
display math

2.2. Prior Distribution

2.2.1. Prior Choice

[18] After introducing the likelihood, the next step is to make inference about the unknown parameter vector inline image. In a general Bayesian setup, uncertainty about the value of inline image is described by a probability distribution. This distribution is known as the prior distribution, and it is formulated without reference to the data on which the likelihood is based. The use of a probability model for the parameter value makes it natural to think of the parameter as being a random variable with the prior distribution representing the underlying stochastic physical process. However, if inline image is interpreted as a fixed constant, the prior distribution simply expresses the uncertainty about its value.

[19] The prior distribution can also be seen as the sampling distribution of the parameter estimate based on information other than that contained in the data. The sampling distribution of the maximum likelihood estimator (MLE) is approximately multivariate normal for a large sample size with mean vector being the true parameter value and covariance matrix being the inverse Fisher information divided by the number of observations in the sample [see e.g., Casella and Berger, 2002]. Hence, the multivariate normal distribution is an “asymptotic” conjugate prior for any Bayesian application (where regular asymptotics apply). From the viewpoint of the sampling distribution, we write inline image for the estimator as a random vector (or the parameter as a random variable). Let inline image denote the prior density. Based on the above argument, we adopt a multivariate normal distribution as the prior for inline image so that

display math

for a prior mean vector inline image and a positive-definite covariance matrix inline image, where d is the dimension of the underlying parameter space.

[20] In the Poisson process model, inline image corresponds to the vector of the three parameters inline image, where the scale parameter σ is log-transformed to ensure positivity of σ. This model parametrization is chosen in order to facilitate numerical computations using the existing library evdbayes [Stephenson and Ribatet, 2012] for a Bayesian extreme value analysis in the open-source statistical software R [R Core Team, 2012].

2.2.2. Elicitation of Expert Judgment

[21] We elicit expert opinion to get the mean vector inline image and covariance matrix inline image of the multivariate normal distribution used as prior. Figure 1 gives a schematic summary of the steps to determine inline image and inline image.

Figure 1.

Flowchart of the prior specification for parameters inline image of the point process model.

[22] For an expert, it would be more natural to formulate prior information on the underlying debris flow process in terms of return levels rather than model parameters directly. Consequently, similar to Coles and Tawn [1996], we have asked our expert to provide estimates of at least three return levels for return periods of his choice, along with the errors or confidence bounds he associates with each of his estimates. The error bounds are an equivalent of subjective variances used as an estimate of uncertainty; see Winkler [1968]. The expert's return level estimates, denoted inline image with inline image, allow us to infer the corresponding model parameters via minimization of the sum of squared relative differences between the model-based return levels inline image in (4) and the inline image.

[23] The model constraints on the parameter space in this optimization problem include inline image and inline image. However, the implied value of λ, the rate with which debris flow events occur, might fall outside the range that the expert finds appropriate. We have thus asked the expert to additionally specify an interval inline image of likely values for λ. Hence, using this interval as an additional constraint, the elicited model parameters are obtained by numerically solving the following minimization problem:

display math(9)

[24] In general, expert judgment elicitation is an iterative process [O'Hagan et al., 2006; Garthwaite et al., 2005], which in our situation comprises the following steps:

[25] i. Elicit inline image return level estimates, inline image, of the return level curve.

[26] ii. Find model parameters inline image as implied by the return level estimates via solving the constrained optimization problem in (9).

[27] iii. Assess adequacy of inline image on the basis of statistical coherence and physical feasibility. If adequate, stop; otherwise, ask the expert to make adjustments and then repeat the process. The adjustment can be based on the differences between the expert's assessments and the fitted values to identify large differences.

[28] Clearly, even for an expert, assessment of magnitude of rare events is a difficult task. In order to assist and guide the elicitation process, it is helpful to make reference to the expert return level graph inline image. If the expert believes that the tail of the underlying distribution is light (e.g., exponential) and hence the shape parameter ξ is zero, then the “spacings” or differences between return levels inline image and inline image should be (approximately) equal provided the corresponding log-return period spacings are equal, i.e., inline image, for example, as in the case with N1 = 100, N2 = 1000, and N3 = 10,000. If the upper distributional tail is perceived by the expert to be heavy (with inline image), the spacings of return levels should increase progressively along the return level graph. For a short-tailed distribution (i.e., with inline image), the spacings should decrease. In this case, there is an upper bound on return levels. As a smooth curve needs to be fitted to match as close as possible the expert's return levels, the expert should account for the “smoothness” criterion in formulating his or her return level estimates. Adherence to the above mentioned points facilitates statistical coherence of expert judgment elicitation. It is also recommended [see e.g., O'Hagan et al., 2006, chap. 8] to follow the overfitting procedure in Step (ii) by a feedback in which the fitted return levels or additional statistics are reported to the expert to confirm their consistency with his beliefs.

[29] Once the optimal values inline image are elicited from the expert's assessment of return levels, identities in (5) are used to compute the corresponding values for the parameters inline image, which then constitute the mean inline image of the prior distribution. The Fisher information, required in the prior covariance specification, can be obtained from (8) via

display math(10)

where the Jacobian matrix J2 is a diagonal matrix with inline image on the diagonal.

[30] Given prior information, one can ask what are data that could have come from the prior. The size of such a prior-equivalent sample can be used as an “effective” sample size by which the inverse Fisher information should be scaled to arrive at the desired prior covariance matrix. The idea of data representation of priors can be found in the work of Sander Greenland in the context of epidemiology [see e.g., Greenland, 2006, 2007], but his examples are simple cases with conjugate priors.

[31] We elicit the effective sample size, denoted inline image, from the expert's error bounds accompanying the return level estimates. The effective sample size is based on the comparison of these error ranges with the asymptotic standard errors obtained from the inverse Fisher information matrix for the expert's return level estimates. The details of its computation are given in Appendix On Effective Sample Size Computation. The prior covariance matrix is then set to inline image.

[32] Note that the use of a “prior” that is a product of densities for each parameter or a sequence of quantiles (as was done in Coles and Tawn [1996]) is not appropriate for extreme value applications (based on either a GEV or GP distribution) because, for any simple interpretable parametrization, the inverse Fisher information does not approach a diagonal matrix. The use of inverse Fisher information to summarize the dependence in parameter estimates means that there is equivariance to the parametrization that is used for the GEV or GP distribution.

2.3. Posterior Distribution: Inference, Prediction, and Implementation

[33] The likelihood is the model-based probability of the historical data given the model parameter inline image. The prior is the expression of the initial belief about inline image. These two quantities can be combined to produce the updated distribution for inline image, known as the posterior distribution, after taking into account historical data. In our particular case, the prior distribution is trivariate normal with mean vector inline image and covariance matrix inline image, whose density is denoted inline image. Let inline image be the density of the data vector X. By Bayes' theorem, the posterior density, denoted inline image, is given by

display math(11)


display math(12)

[34] A predictive distribution describes probability of future events conditional on the available data. Although it is typical to report return level curves based on the fitted distribution, or the Bayesian posterior distribution, the predictive distribution is a more reasonable alternative for decision making as it incorporates both the uncertainty of future observations as well as the model uncertainty, and is easily obtained within a Bayesian framework. The density of the predictive distribution, denoted inline image, is

display math(13)

[35] The posterior distribution in (11) gives a complete distribution for Bayesian inference. The mean of the posterior distribution is usually used as the Bayesian point estimate for the parameter vector, whereas quantiles of the posterior distribution can be used to summarize precision of the estimates. So, Bayesian inference boils down to the ability to sample from the posterior distribution in (11), which in turn requires computation of the multidimensional integral in (12). Unfortunately, for our choices of the likelihood model and the prior distribution, the explicit form of the posterior distribution is intractable and hence direct sampling from (11) is not possible here. A standard technique widely used in such cases is the Markov chain Monte Carlo (MCMC) simulation, which can be applied to give an approximation to a sample from a posterior distribution [see e.g., Gilks et al., 1996].

3. Analysis and Results

3.1. The Data and Likelihood Inference

[36] The debris flow volume measurements at Capricorn Creek available for our analysis are presented in Table 1 and are also plotted in Figure 2; the data are taken from Jakob [2012]. These values were scaled by a factor of 104 for convenience of presentation. In total, there are 13 observations of events exceeding the threshold, spanning a period of 170 years from 1841 to 2010. Note that the two largest measurements of 3000 ( inline image) m3, in 1903 and 2010, greatly exceed the other values and hence suggest a heavy right-tailed distribution as a possible model.

Table 1. Summary of the Debris Flow Data at Capricorn Creeka
  1. a

    The measurements are in cubic meters and are scaled by a factor 104.

Figure 2.

Debris flow data at Capricorn Creek. The volume measurements (in m3) are scaled by a factor 104.

[37] The Poisson process model requires a threshold above which debris flow volumes are described by a GP distribution with Poisson occurrence times. Given a very small sample size, usual diagnostic plots (like the mean excess plot) are not meaningful in guidance of the threshold choice. There is, however, a detection threshold below which debris flow events would not be recorded. For Capricorn Creek, this detection threshold is in the range 30,000–50,000 m3 according to the expert, and hence for our analysis we take the threshold as u = 3 ( inline image) m3.

[38] Figure 3 displays two diagnostic plots to assess the assumption of Poisson occurrence times of threshold exceedances, which in this case coincide with the times of debris flow events. The left plot shows a quantile-quantile plot of the empirical versus theoretical exponential quantiles of the interarrival times (computed as the difference between the years of debris flow events in Table 1) with the corresponding correlogram in the right plot to check independence [Smith and Shively, 1995]. Both plots present no evidence against the Poisson assumption. Furthermore, the return level plot in Figure 4 with approximate 95% confidence bounds can be used to assess adequacy of the GP distribution in modeling threshold excesses of debris flow volumes.

Figure 3.

Diagnostic plots for the Poisson debris flow occurrence times: (left) quantile-quantile plot against theoretical exponential quantiles and (right) a correlogram for the interarrival times of the debris flow events.

Figure 4.

Comparison of return level curves based on the predictive distribution, posterior distribution, maximum likelihood estimation (MLE), and empirical estimates. C.I. stands for credibility or confidence intervals for the posterior or under MLE, respectively. The latter are approximate confidence intervals based on asymptotic normality. Measurement units for return levels are ( inline image) m3, and return periods are given in years. (top and bottom) Return level curves on two ranges of return periods for more clear display.

3.2. Prior Elicitation

[39] Table 2 summarizes return level estimates and the associated error bounds provided by the expert for four return periods. The expert assessment was based on geoscientific knowledge of the debris-flow processes at the location for our case study. This involved several considerations that constrain debris-flow volume estimates for the time frame considered. First, the fact that the debris flows originate from a Quaternary volcano with eruptions as late as approximately 2350 years ago [Clague et al., 1995] implies an abundance of erodible surface materials and unstable fractured and hydrothermally altered volcanic rocks [Read, 1990] overlying heavily jointed porphyritic rhyodacite basement rocks. Both weak surface materials and weak bedrock would result in a high frequency of debris flows and the possibility of very large events due to partial collapses of bedrock edifices onto deep fines-rich talus and hydrothermally altered rocks. Second, a strong Little Ice Age glacial advance had oversteepened the valley sides, which gives rise to a cycle of renewed debris-flow activity following glacial retreat [Holm et al., 2004]. Third, the expert accounted for debris-flow initiation mechanisms at the Capricorn Basin, as specified in the last column of Table 2.

Table 2. Information on Return Level Volume Estimates as Provided by the Experta
Return PeriodLower BoundUpper BoundAverageTrigger Mechanism
  1. a

    Values are in cubic meters and are scaled by factor 104.

10050150100Gully sidewall failures, failure of debuttressed volcanoclastics
5005003,0001,750Failure of debuttressed volcanoclastics
1,0001,00010,0005,500Edifice collapse
10,00010,00050,00030,000Edifice collapse, volcanic eruption

[40] A solution to the optimization problem in (9) was found using the method of Byrd et al. [1995], which allows for box constraints. The method is implemented in the R function optim. The expert believes that the (annual) rate parameter λ lies between 1/30 and 1/10, which we use as constraint boundaries inline image and inline image for λ in (9). We hence obtain the following model parameters derived from the expert's judgment: inline image and inline image, where the first two are the parameters of the GP distribution for the volume excesses (in inline image) of debris flow events and the last parameter is the Poisson rate for the debris flow occurrences. The positive large value of shape parameter ξ indicates that the prior or expert belief is that of a very heavy-tailed distribution for debris flow volumes. The value of the rate parameter λ lies on the boundary of the range provided by the expert. A comparison of the fitted return levels and the original expert estimates is shown in Figure 5. The fitted values lie with the error bounds, marked by vertical dotted lines. So, in these respects the judgment elicitation is successful in producing model parameter estimates which are statistically coherent (a solution to the optimization problem in (9) could be found), physically realistic (the shape parameter corresponds to a heavy-tailed distribution) and consistent with the available information.

Figure 5.

Expert estimates of four N-year return levels (diamonds) and the corresponding fitted values (circles) based on the least squares minimization of relative differences in (9). The expert error bounds are indicated by the dotted vertical lines.

[41] As motivated in the previous section, it is reasonable to place a multivariate normal prior on the model parameter vector inline image for parameters inline image. From identities in (5) with the expert-based values for inline image, ξ, and λ computed above, the prior mean vector is given by inline image.

[42] To determine the effective sample size, to be used in setting the prior covariance matrix inline image, we first compute the Fisher information matrices inline image for two subsets of three return levels specified by the expert. Taking the square root of the diagonal elements of their inverses gives asymptotic standard errors associated with the four N-year return levels: 429, 5990, 16,800, and 417,000, respectively, for N equal to 100, 500, 1000, and 10,000. Naturally, the longer the return period, the larger the error.

[43] We next compare these model-based asymptotic errors with those provided by the expert. With the interpretation of the expert error bounds as corresponding to ± one standard error interval (cf. Appendix On Effective Sample Size Computation), we divide the width of this interval by 2 to obtain a rough measure of the standard error for the expert assessments. Based on lower and upper bounds in Table 2, the resulting errors are 50, 1250, 4500, and 20,000 for 100, 500, 1000, and 10,000 year return periods, respectively. The squared ratios for each of the four return levels are 74, 23, 14, and 435. These determine the appropriate sample size factors such that the prior 68% confidence intervals approximately match the error bounds provided by the expert. Conservatively, we take the smallest ratio and use the effective sample size of 14 in the prior specification.

[44] Finally, we sequentially evaluate the Fisher information matrices in (3), (8), and (10) at the parameter estimates elicited from the expert's opinion. Inverting the last information matrix and scaling it by the effective sample size gives the following prior covariance matrix:

display math

3.3. Posterior Analysis

[45] We have sampled 10,000 values from the posterior distribution of the random vector inline image using library evdbayes in the open-source statistical software R. The burn-in period of 100 iterations is used for the subsequent posterior analysis based on the behavior of individual chains for each of the parameters; see Figure 6. Convergence of the Markov chains was verified by varying the starting points and comparing the output of different simulation runs. A formal procedure, described in Gilks et al. [1996, section 8.4], based on the comparison of the between-sequence variability and within-sequence variability for multiple chains of parameter values, gave the estimated potential scale reduction of approximately 1.01 for the three parameters, thus signifying chain convergence.

Figure 6.

Realizations of Markov chains for the Poisson process model parameters in the posterior analysis. Prior specification is based on the effective sample size of 14. The burn-in period of 100 iterations used in the posterior analysis is indicated by the vertical lines.

[46] Taking advantage of the flexibility of the Bayesian analysis, it is possible to estimate marginal posterior densities for each model parameter and functionals such as return levels. These are displayed in Figure 7 along with the underlying prior densities. As expected, the posterior densities are less spread out than the prior ones. They suggest more conservative estimates of the upper tail in terms of the shape parameter and consequently the return levels.

Figure 7.

Prior and posterior kernel density estimates for the parameters of the Poisson process model and the log-base 10 transform of the 1000-year return level.

[47] The summary statistics, including posterior means and 95% credible intervals, for the model parameters and return levels are shown in Table 3 under inline image. The return periods reported are the same as provided by the expert for the purpose of comparison. The use of log-base 10 scale for return levels is motivated by precision considerations as well as a high degree of skewness in the sampling distribution of return levels. The logarithmic transformation approximately symmetrizes the density so that the mean roughly coincides with the mode. Comparison of the prior and posterior estimates confirms an upward adjustment after incorporation of the data.

Table 3. Prior and Posterior Means and 95% Credible Intervals for Shape Parameter ξ and Log-Base 10 Transformed N-Year Return Levels (qN) for Different Values of the Effective Sample Size Used in the Prior Specification
  inline image inline image inline image
ξ1.17 (0.05; 2.32)1.48 (0.84; 2.36)1.17 (−0.16; 2.53)1.51 (0.83; 2.56)1.17 (0.18; 2.18)1.48 (0.82; 2.32)
inline image2.14 (1.44; 2.90)2.27 (1.84; 2.81)2.17 (1.37; 3.04)2.32 (1.85; 2.92)2.13 (1.50; 2.80)2.24 (1.83; 2.73)
inline image3.06 (1.93; 4.43)3.34 (2.62; 4.36)3.08 (1.81; 4.71)3.41 (2.64; 4.62)3.04 (2.01; 4.26)3.32 (2.60; 4.30)
inline image3.42 (2.06; 5.11)3.79 (2.91; 5.07)3.45 (1.92; 5.44)3.86 (2.93; 5.39)3.41 (2.15; 4.90)3.77 (2.88; 4.98)
inline image4.61 (2.34; 7.38)5.27 (3.80; 7.42)4.65 (2.16; 7.92)5.38 (3.82; 7.95)4.59 (2.49; 7.04)5.26 (3.76; 7.30)

[48] Figure 4 also has return level curves based on Bayesian posterior and predictive analyses. The top and bottom curves focus on regions with shorter and longer return periods, respectively. The top curve compares the posterior return level curve with that under maximum likelihood (ML) estimation. The funnel of the posterior 95% confidence region is well contained in that obtained under ML estimation (using symmetric confidence intervals based on the normal approximation). Empirical estimates are added to check consistency of estimates with the available data. The bottom curve also displays predictive return level curve, a more natural candidate for the use in, say, design of mitigation structures as it incorporates future data uncertainty. The difference between posterior and predictive estimates becomes particularly noticeable at higher return periods. As both panels illustrate, the prior has a clear effect of a more controlled growth in the posterior return level estimates in contrast to a more explosive behavior of the ML curve. Comparison of the posterior estimates with other available estimation procedures is given in the next section.

4. Discussion

4.1. Comparison With Other Estimation Procedures

[49] Several competing procedures exist for estimating parameters of the GP distribution, which is used to model the marks in the Poisson process model. Hosking and Wallis [1987] introduced estimators using the method of moments (MOM) and probability-weighted moments (PWM), where the general idea is to equate the sample moments with the model-based ones. These methods implicitly restrict the values of the shape parameter ξ ( inline image under MOM and inline image under PWM), thus imposing a priori a condition of not very heavy tails. In view of possibly heavier-tailed distributions suitable for some applications, Diebolt et al. [2007] suggested a generalized probability-weighted moments (GPWM) method.

[50] While moments-based methods have the advantage of computational simplicity, they are not easily extendable to more complex modeling situations. Here, likelihood-based methods have a clear advantage [see e.g., Smith, 1989; Davison and Smith, 1990; Coles, 2001]. On the other hand, the performance of the maximum likelihood estimates may be poor when the sample size is small [Madsen et al., 1997; Martins and Stedinger, 2001]. To address small sample behavior of the maximum likelihood estimates, several variants of likelihood-based methods have been proposed.

[51] Coles and Dixon [1999] use a penalty function to restrict the shape parameter to values below one, consistent with the implicit restriction for the PWM estimator. They show that the resulting penalized maximum likelihood (PML) estimates outperform the PWM estimates (on the basis of mean squared error) for the parameters of the generalized extreme value distribution. The methodology is easily extendible to the case of the GP distribution, and for the purpose of comparison we also report the PML estimates.

[52] In a similar spirit, Martins and Stedinger [2001] suggest generalized maximum likelihood estimators by including a prior distribution on the possible values of the shape parameter ξ. In particular, as the prior for ξ, they use a beta distribution with support on [–0.5, 0.5] and a certain choice of parameters, which they claim represents the worldwide experience for phenomena such as rainfall depths and flood flows. However, such information is not available for debris flows, and hence the use of their prior specification is not warranted in the present context.

[53] Table 4 shows estimates of the shape parameter and return levels under various estimation methods. The prior and posterior return level point estimates are based on the mean of log-base 10 transformed return levels (cf. Table 3) and then converted to the original scale. Profile likelihood is used to approximate confidence intervals for maximum likelihood estimates except in the case of the 1000 year return level, for which we report only the asymptotic standard error as the profile likelihood surface is too flat to produce a reasonable confidence interval. For all the other methods, 95% confidence intervals were obtained using a parametric bootstrap [see Efron and Tibshirani, 1993; Davison and Hinkley, 1997], based on 999 samples generated from the extreme value Poisson process model. (Note that in the case of moments-based estimates, asymptotic standard errors cannot be computed since the estimates of the shape parameter fall outside the applicable ranges.) The basic bootstrap confidence limits were computed after applying a logarithmic transformation to normalize the estimates (the only exception was the 100 year return level under PML, where a piecewise linear transformation was more appropriate).

Table 4. Shape Parameter (ξ) and N-Year Return Level Estimates (qN) for N = 100 and N = 1000 Under Different Estimation Proceduresa
 ξq100 inline image
  1. a

    The fully Bayesian (FB) approach with informed prior based on the expert judgment (with inline image), method of moments (MOM), method of probability-weighted moments (PWM), method of generalized PWM (GPWM), penalized maximum likelihood (PML), and maximum likelihood (ML). Except for the FB method, the return level point estimates are obtained as a function of the parameter estimates. In the FB case, the posterior mean on the log-base 10 scale transformed and 95% credible intervals are reported; approximate 95% confidence intervals for ML estimates are obtained via profile likelihood (for q1000, only the standard error is reported (marked with dagger symbol)); in the other cases, a basic parametric bootstrap is used with a suitable symmetrizing transformation. The starred estimates are adjusted to plausible values. Values of credibility intervals and bootstrap-based confidence intervals are rounded to three significant digits.

Prior1.17 (0.05; 2.32)140 (28; 786)2,640 (114; 129,000)
FB1.48 (0.84; 2.36)185 (69; 649)6,160 (812; 118,000)
ML1.84 (0.83; 4.12)443 (131; 17,814)31,388 (85,797 inline image)
MOM0.40 (0.40; 0.50*)932 (411; 3,100)3,460 (1,700; 22,900)
PWM0.86 (0.75; 1.00*)374 (174; 1,890)3,210 (1,560; 41,800)
GPWM1.54 (1.19; 2.10)609 (26; 4,610)21,800 (532; 1,440,000)
PML0.67 (0.43; 1.0*)166 (0*; 750)963 (514; 45,700)

[54] There is a large discrepancy among estimation procedures considered, with the small sample size clearly being one of the contributing factors. The shape parameter estimates seem to reflect the premises underlying each of the estimation procedures. MOM, PWM, and PML are naturally producing lower values. When the range of possible ξ values is not bounded by one, the estimates such as GPWM, ML, and also the fully Bayesian approach with an informative prior elicited from the expert opinion indicate a heavier tail, with ξ possibly greater than one.

[55] The precision of return level estimates under MOM, PWM, and ML methods is consistent with previous studies. In particular, Madsen et al. [1997] concluded that MOM outperforms both PWM and ML estimates in small samples for inline image (on the basis of precision and mean squared error). For 100 year return level, the fully Bayesian approach has the narrowest 95% confidence interval, whereas for larger return periods the posterior credibility intervals reflect greater tail uncertainty and become considerably wider relative to tighter constrained methods like MOM, PWM, and PML. However, as originally anticipated, there is a dramatic improvement in precision in comparison to maximum likelihood estimates.

[56] As pointed out by Coles and Dixon [1999], the implicit restriction of finite population moments in the moments-based estimation procedures can be interpreted as providing additional information in the inference. This can then be attributed to better small-sample performance of the moments-based estimators in comparison to nonrestricted likelihood-based estimators. After incorporating similar information into the likelihood formulation, the likelihood-based methods show similar and even superior performance [see Coles and Dixon, 1999; Martins and Stedinger, 2001]. Note that in these comparative studies as well as in Madsen et al. [1997], the shape parameter is restricted to the range inline image or inline image. In Appendix Performance of Estimation Methods When Sampling From a Heavy-Tailed GP Distribution, we investigate performance of the various methods when the underlying distribution is GP but with higher values of the shape parameter. The results show that, under such assumptions, MOM, PWM, and PML underestimate the shape parameter and consequently the return levels for large return periods.

[57] The question is to what extent restrictions on the shape parameter of the aforementioned methods are meaningful for the debris flow series we consider. Judging from the data and the nature of the phenomenon, the likelihood of very extreme events is not negligible, which is an indication of a rather heavy upper tail. In fact, it was even suggested by one of the reviewers to check for plausibility of superheavy tails. (The analysis of log-flows did not however give a support for a log-Pareto-like tail.) From these considerations, it seems premature to assume a priori only mildly heavy tails.

[58] The proposed approach is methodologically similar to the penalized ML estimation of Coles and Dixon [1999] and generalized ML estimation of Martins and Stedinger [2001] in that the methods account for the prior information on the shape parameter. However, our choice of a (normal) prior is less restrictive as it does not truncate the domain of possible values. While tighter constraints on the model parameters improve precision, they may lead to undesirable biases if not properly justified.

4.2. Sensitivity Analyses

4.2.1. Prior Specification

[59] As a sensitivity analysis to the prior specification, we investigate the impact of changes in the effective sample size on the posterior estimates of the shape parameter and return levels. The effective sample size is determined by the expert “confidence intervals” for a selection of return levels. It is chosen in such a way that the prior confidence intervals for return levels approximately match those specified by the expert. The wider the interval, the less certain is the expert about the “true” value of a given return level, which is reflected in lower values of the effective sample size. Alternatively, the effective sample size can be viewed as a tuning parameter to adjust for our confidence in the expert estimates and hence to assign more or less weight to the data. In Table 3, we summarize prior and posterior means with 95% credibility intervals for the shape parameter ξ and log-base 10 transformed N-year return levels for inline image. As expected, lower values of the effective sample size result in wider confidence intervals. In addition, for this time series, giving more weight to the data leads to larger estimates in the shape parameter and consequently return levels since the expert assigned lower values to return levels in comparison to the corresponding maximum likelihood estimates. However, while naturally, there are variations due to effective sample size changes, the sensitivity in point estimates is not large, especially for moderate return periods.

4.2.2. Data Uncertainty

[60] As described in section 1, debris flow volumes are typically not directly observed but need to be reconstructed, and consequently are prone to uncertainty. While we do not have information here on the exact structure of the measurement uncertainty, we perform a sensitivity analysis to assess how uncertainty in the sample points would affect maximum likelihood estimation.

[61] In this analysis, each sample point xi ( inline image), representing the “best” estimate of the debris flows volume, is replaced by an interval of the form

display math

where inline image denotes the relative error. For these interval-valued data, the likelihood is given by (cf. equation (7))

display math(14)

where inline image is the generalized extreme value distribution function with the notation inline image for the positive part.

[62] Table 5 gives a summary of maximum likelihood estimation of the model parameters for a number of values of relative error ɛ, up to 50% variation in the data points. A parametric bootstrap with 100,000 replicates for the model parameters, assuming a multivariate normal distribution determined by the maximum likelihood estimates, is used to construct 95% confidence intervals for 100- and 1000 year return levels. The first row ( inline image) corresponds to the standard ML procedure with data in Table 1 and likelihood function (7). Based on the results presented in this table, the parameter most affected by increases in data uncertainty via increases in ɛ is the location η. However, the shape parameter ξ remains fairly stable over the range of considered ɛ values. The return level estimates also show only minor sensitivity to larger relative errors in the data.

Table 5. Maximum Likelihood Estimates of the Poisson Process Model Parameters inline image for Interval-Valued Data of the Form inline image for Various Values of the Relative Error inline image, Where xi ( inline image) Are the Volume Estimates (in m3, Scaled by 104) as Given in Table 1a
inline image inline image inline image inline imageq100q1000
  1. a

    Standard errors for parameter estimates are provided in brackets. Estimates of the corresponding 100- and 1000 year return levels (q100 and q1000) with 95% confidence intervals are based on a parametric bootstrap with the mean of the sampling distribution taken as the point estimate (slightly negative lower bounds were rounded to zero in view of physical considerations). Values are rounded to three significant digits.

0−7.53 (8.59)−1.76 (2.24)1.84 (0.74)334 (0; 2,660)513,000 (0; 1,700,000)
0.02−7.15 (8.14)−1.70 (2.16)1.83 (0.72)333 (0; 2,650)441,000 (0; 1,550,000)
0.05−7.14 (8.25)−1.70 (2.18)1.83 (0.73)333 (0; 2,650)454,000 (0; 1,580,000)
0.10−7.07 (8.22)−1.71 (2.18)1.83 (0.73)332 (0; 2,640)459,000 (0; 1,580,000)
0.20−6.81 (8.07)−1.74 (2.20)1.83 (0.73)329 (0; 2,610)471,000 (0; 1,603,000)
0.30−6.34 (7.80)−1.81 (2.23)1.84 (0.74)323 (0; 2,570)497,000 (0; 1,642,000)
0.40−5.65 (7.42)−1.92 (2.28)1.86 (0.75)316 (0; 2,510)554,000 (0; 1,750,000)
0.50−4.64 (6.87)−2.11 (2.37)1.89 (0.78)310 (0; 2,460)694,000 (0; 1,970,000)

5. Conclusions

[63] Development of mitigation structures for debris flows often relies upon return level estimates associated with very long return periods, extending beyond available data range. This is a typical problem dealt with in extreme value analysis. However, standard inference techniques like maximum likelihood fail for very small data sets due to unacceptably large confidence intervals, which are noninformative in decision making, as well as positive biases in return level estimates. The use of implicitly or explicitly constrained estimation methods, like the method of moments, method of probability weighted moments or the penalized maximum likelihood, leads to underestimation in return levels in the presence of heavy tails close to or beyond the assumed ranges.

[64] To address above issues, we propose a methodology within a Bayesian framework, which makes use of expert judgment as an additional source of information. The emphasis and the main contribution of the paper are on specification of an informative prior based on the expert's estimates of the return levels for the underlying debris flow process. We have given a rationale for the prior choice as well as a way to elicit prior model parameters, which include the prior covariance structure and the effective sample size. The choice of the effective sample size is a trade-off between precision of return level estimates and how much weight one is willing to place on the expert judgment versus the historical data. When more data become available, there is less need to rely upon expert estimates. On the other hand, when the historical sample size is small and the quality of the data is poor, more weight can be given to expert opinion, as the expert has extra knowledge of the physical process and constraints that cannot be directly incorporated in the analysis.

[65] The ultimate choice of the inferential procedure should be determined not only by the available data resources (e.g., regional information, expert judgment), but also by the stochastic properties of the given time series, in particular with respect to the upper tail of the underlying distribution. The latter may be hard to assess, and should be based on both data and physical knowledge considerations.

[66] As in any extreme value type application, output of such an analysis should be treated with caution. The analysis is confined to the usual paradigm of stationarity as there are too few data points to incorporate possible nonstationarity effects. There is also a need to account for uncertainty in the data values themselves. This would require a better understanding of the nature of the estimation errors. At the moment, the proposed methodology should be viewed as a first step toward a more realistic risk assessment. Within the information available to us, more informative estimates of return levels are obtained than those based on the historical data alone without imposing tight constraints on the tail of the underlying distribution.


[67] We are grateful to three anonymous reviewers and the Associate Editor for detailed comments and constructive criticism, leading to a substantial improvement of the paper. We thank Matthias Jakob for bringing our attention to the problem discussed in the paper and answering numerous questions pertaining to the phenomenon of debris flows, and Kris Holm for providing the expert judgment. Financial support of the Natural Sciences and Engineering Council of Canada is gratefully acknowledged.

Appendix A:: On Effective Sample Size Computation

[68] In this section, we present details of the computation of the effective sample size for specifying the prior covariance matrix.

[69] The asymptotic standard errors, denoted inline image ( inline image), for each of the k return levels assessed by the expert, are obtained as square root of the diagonal entries of the inverse Fisher information matrix. First consider a subset (q1,q2,q3) of Ni-year return levels (i = 1,2,3) from the full set inline image of return levels specified by the expert. Let inline image denote the Fisher information matrix for this subset of three return levels. inline image can be computed from (3) by applying the Jacobian matrix inline image for the transformation from inline image to (q1,q2,q3) via inline image. Each of return levels (q1,q2,q3) satisfies identity (6) for the respective value of the return period Ni (i = 1, 2, 3). Implicitly differentiating (6), it follows that the ith column of inline image is the solution of the linear system inline image, where inline image is the ith standard basis vector in inline image, and A is a inline image matrix with elements

display math
display math
display math

for i = 1, 2, 3. This procedure is repeated for other subsets of three return levels until their union covers all k return levels specified by the expert.

[70] The standard errors for the expert return level estimates, denoted inline image ( inline image), are induced from the respective expert error bounds. According to Alpert and Raiffa [1982], there is a tendency for people to underestimate their uncertainty. To mitigate expert overconfidence in his estimates, we interpret the expert error bounds as corresponding to ± one standard deviation, which is equivalent to a 68% confidence interval under the normal distribution. Thus, dividing the length of the expert confidence intervals by 2 gives a rough estimate of inline image for each inline image. inline image and inline image are related via the effective sample size for the ith return level: inline image, inline image. Hence, inline image. As a conservative value for the overall effective sample size inline image, we take the smallest inline image: inline image; the value is rounded to an integer for interpretability.

Appendix B:: Performance of Estimation Methods When Sampling From a Heavy-Tailed GP Distribution

[71] In previous studies of the performance of various estimation procedures for the parameters of the GP distribution, the analysis was typically restricted to moderate values of the shape parameter such as inline image [Madsen et al., 1997; Martins and Stedinger, 2001] or inline image [Coles and Dixon, 1999]. The case study of debris flows at Capricorn Creek, with infrequent but very extreme severity events (like those in 1903 and 2010) suggests the presence of a fairly heavy distributional upper tail, associated with a positive large value of the shape parameter. A value close to one would indicate an infinite-mean model, which cannot be ruled out a priori in the present context.

[72] In this section, we summarize a brief investigation into performance of various estimation procedures when the underlying distribution is GP with the value of the shape parameter close to one. A simulation study was carried out with 5000 random samples of size n (for inline image) from a GP distribution with scale inline image and shape parameter inline image. Estimation procedures under consideration included MOM, PWM, GPWM, ML, and PML; see section 4.1.

[73] Figure B1 displays sampling distributions of the shape parameter estimator under the five estimation procedures for inline image and two sample sizes. The restriction of the domain of the shape parameter values under MOM, PWM, and PML procedures is clearly visible, introducing a negative bias into the estimates. The bias is however accompanied by reduction in variability.

Figure B1.

Kernel density estimates of the sampling distribution of the shape parameter estimator under various estimation methods (method of moments (MOM), method of probability-weighted moments (PWM), method of generalized PWM (GPWM), maximum likelihood (ML), and penalized maximum likelihood (PML)). The sampling distribution is based on 5000 samples of size n from the GP distribution with scale inline image and shape parameter inline image. The true value of ξ is indicated by vertical lines.

[74] Tables B1 and B2 give the bias and root-mean-squared error of the model parameters (scale inline image and shape parameter ξ) and the 1000 year return level (q1000). Based on variability of the output for several simulation runs, using transformed quantities inline image and inline image gives stable results up to two decimal places (with the exception of the root-mean-squared errors under the ML method, having lower precision).

Table B1. Bias of the Estimators of the GP Distribution Parameters ( inline image and ξ) and the Associated 1000-Year Return Level (q1000) Under Various Estimation Methods (Method of Moments (MOM), Method of Probability-Weighted Moments (PWM), Method of Generalized PWM (GPWM), Maximum Likelihood (ML), and Penalized Maximum Likelihood (PML))a
  1. a

    The results are based on 5000 simulated samples of size n from a GP distribution with parameters inline image and inline image. The transformed quantities inline image and inline image are used for precision reporting purposes. Statistics for q1000 are given to show the magnitude of the bias on the original scale. Values are rounded to three significant digits.

  inline image
inline image200.320.
inline image20−0.83−0.53−0.10−0.86−0.20
  inline image
inline image200.640.
inline image20−1.39−1.10−0.23−1.52−0.24
Table B2. Root-Mean-Squared Error of the Estimators of the GP Distribution Parameters ( inline image and ξ) and the Associated 1000-Year Return Level (q1000) Under Various Estimation Methods (Method of Moments (MOM), Method of Probability-Weighted Moments (PWM), Method of Generalized PWM (GPWM), Maximum Likelihood (ML), and Penalized Maximum Likelihood (PML))a
  1. a

    The results are based on 5000 simulated samples of size n from a GP distribution with parameters inline image and inline image. The transformed quantities inline image and inline image are used for precision reporting purposes.

  inline image
inline image200.410.
inline image200.940.730.920.950.93
  inline image
inline image200.790.
inline image201.521.211.081.571.24

[75] Results in Table B1 show a persistent negative bias in the return level estimates under MOM, PWM, and PML. The effect of shape parameter underestimation is counterbalanced by a positive bias in the scale estimates. On the other hand, GPWM and ML tend to perform better on the basis of the model parameters while exhibiting gross overestimation in return levels. The phenomenon has already been noted by Coles and Dixon [1999] for the ML method. As Figure B1 indicates, the sampling distribution of the shape parameter is widely spread. Since the return level is an exponentially increasing function of the shape parameter, there is a much greater penalty (in terms of a positive bias) for overestimation in the shape parameter, thus greatly inflating the mean return levels.

[76] Consistent with earlier studies, the ML method does not perform well in small or moderate samples. However, one should beware of the negative bias in return level estimates when using traditional methods such as MOM and PWM or methods with explicit restriction of the domain of the shape parameter such as in PML method for very heavy-tailed data. It is especially important from the risk management perspective where risk underestimation can have devastating consequences.