A conditional approach for multivariate extreme values (with discussion)


Jonathan A. Tawn, Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, UK.
E-mail: j.tawn@lancaster.ac.uk


Summary.  Multivariate extreme value theory and methods concern the characterization, estimation and extrapolation of the joint tail of the distribution of a d-dimensional random variable. Existing approaches are based on limiting arguments in which all components of the variable become large at the same rate. This limit approach is inappropriate when the extreme values of all the variables are unlikely to occur together or when interest is in regions of the support of the joint distribution where only a subset of components is extreme. In practice this restricts existing methods to applications where d is typically 2 or 3. Under an assumption about the asymptotic form of the joint distribution of a d-dimensional random variable conditional on its having an extreme component, we develop an entirely new semiparametric approach which overcomes these existing restrictions and can be applied to problems of any dimension. We demonstrate the performance of our approach and its advantages over existing methods by using theoretical examples and simulation studies. The approach is used to analyse air pollution data and reveals complex extremal dependence behaviour that is consistent with scientific understanding of the process. We find that the dependence structure exhibits marked seasonality, with ex- tremal dependence between some pollutants being significantly greater than the dependence at non-extreme levels.

1.Introduction and background

Multivariate extreme value theory and methods concern the characterization, estimation and extrapolation of the joint tails of multidimensional distributions. Accurate assessments of the probabilities of extreme events are sought in a diversity of applications from environmental impact assessment (Coles and Tawn, 1994; Joe, 1994; de Haan and de Ronde, 1998; Schlather and Tawn, 2003) to financial risk management (Embrechts et al., 1997; Longin, 2000; Stărică, 2000; Poon et al., 2004) and Internet traffic modelling (Maulik et al., 2002; Resnick and Rootzén, 2000). The application that is considered in this paper is environmental. We examine five-dimensional air quality monitoring data comprising a series of measurements of ground level ozone (O3), nitrogen dioxide (NO2), nitrogen oxide (NO), sulphur dioxide (SO2) and particulate matter (PM10), in Leeds city centre, UK, during the years 1994–1998 inclusively.

Regulation of air pollutants is undertaken because of their well-established deleterious effects on human health, vegetation and materials. Government objectives for concentrations of air pollutants are given in terms of single variables, rather than combinations of variables (Department of the Environment, Transport and the Regions, 2000). However, atmospheric chemists are increasingly aware of the importance of understanding the dependence between different air pollutants. Recent atmospheric chemistry research (Photochemical Oxidants Review Group, 1997; Colls, 2002; Housley and Richards, 2001) has highlighted issues concerning extremal dependence between air pollutants. In particular, the Photochemical Oxidants Review Group (1997) suggested that the dependence between O3 and some other atmospheric pollutants strengthens as the level of O3 increases. This is of concern since it is known that O3 has synergistic corrosive effects in combination with other sulphur- and nitrogen-based pollutants. The adverse health effects of particulate matter are also believed to be exacerbated by the excessive presence of other gaseous pollutants.

The gases are recorded in parts per billion, and the particulate matter in micrograms per cubic metre. The data are available from http://www.blackwellpublishing.com/rss We compare data from winter (from November to February inclusively) and early summer (from April to July inclusively).

Fig. 1 shows the daily maxima of the hourly means of the O3 and NO2 variables for each of these seasons. The highest values of O3 are observed in the summer, as O3 is formed by a series of reactions that are driven by sunlight (Brimblecombe, 2001). The reactions involve hydrocarbons and NO2; large values of the latter occur with large O3 values as shown Fig. 1. This positive dependence between O3 and NO2 in summer is not observed during the winter when the sunlight is weaker. Dependence between the air pollution variables influences the combinations which can occur when any one of the pollutants is large. In Section 7 we estimate several functionals of the extreme values of the joint distribution of the air pollution variables. One such functional is the probability that these variables occur in an extreme set C ⊂ ℝd, an example of such a set being shown in the summer data plot of Fig. 1(a). The precise specification of this set is discussed in Section 7. Pairs of (O3, NO2) could occur in the set that is shown in Fig. 1 by being extreme in a single component, or by being simultaneously (but possibly less) extreme in both components.

Figure 1.

 Daily maxima of O3 and NO2 variables during (a) summer and (b) winter periods, 1994–1998 inclusively: the shaded set in (a) indicates an extreme set C which is split into two subsets C1 (bsl00016) andC2 (bsl00017)

The air pollution problem is a typical example of multivariate extreme value problems, summarized as follows. Consider a continuous vector variable X = (X1,…,Xd) with unknown distribution function F(x). From a sample of n independent and identically distributed observations from F we wish to estimate functionals of the distribution of X when X is extreme in at least one component. The methods that are developed in this paper allow any such functional to be considered. However, to simplify the presentation we shall focus much of our discussion on estimating & pr ;(X ∈ C) where C is an extreme set such that for all x ∈ C at least one component of x is extreme. Typically no observations will have occurred in C. The structure of C motivates the following natural partition of C into d subsets inline image. Here, Ci is that part of C for which Xi is the largest component of X, as measured by the quantiles of the marginal distributions. Specifically, for each i = 1,…,d, let FXi denote the marginal distribution of Xi; then


We assume that subsets of C of the form C∩{x ∈ ℝd:FXi(xi)=FXj(xj) for some ij} can be ignored; these are null sets provided that on these subsets there are no singular components in the dependence structure of X. The partition of C into C1 and C2 for (O3, NO2) is shown in Fig. 1; the curved boundary between the sets is due to the inequality of the two marginal distributions.

With the partition of C defined in this way, C is an extreme set if all xi-values in a non-empty Ci fall in the upper tail of FXi, i.e., if vXi = infx ∈ Ci(xi), then FXi(vXi) is close to 1 for i = 1,…,d. So


Consider the estimation of & pr ;(X  ∈  C) by using decomposition (1.1). We need to estimate & pr ;(Xi > vXi) and & pr ;(X  ∈  Ci|Xi > vXi), the former requiring a marginal extreme value model and the latter additionally needing an extreme value model for the dependence structure. We focus on these two terms in turn.

Methods for marginal extremes are now relatively standard; see Davison and Smith (1990), Smith (1989) and Dekkers et al. (1989). Univariate extreme value theory provides an asymptotic justification for the generalized Pareto distribution to be an appropriate model for the distribution of excesses over a suitably chosen high threshold; see Pickands (1975). Thus, we model the marginal tail of Xi for i=1,…,d by


Here uXi is a high threshold for variable Xi, βi and ξi are scale and shape parameters respectively with βi > 0 and s+ = max(s,0) for any s  ∈  ℝ. We require a model for the complete marginal distribution FXi of Xi for each i = 1,…,d, since to estimate & pr ;(X  ∈  Ci|Xi > vXi) we need to describe all Xj-values that can occur with any large Xi. We adopt the semiparametric model inline image for FXi of Coles and Tawn (1991, 1994), i.e.


where inline image is the empirical distribution of the Xi-values. We denote the upper end point of the distribution by xFi, which is ∞ if ξigeqslant R: gt-or-equal, slanted0 and uXiβi/ξi if ξi<0. Model (1.3) provides the basis for estimating the & pr ;(Xi>vXi) term of decomposition (1.1).

Both the marginal and the dependence structures of X are needed to determine & pr ;(X  ∈  Ci|Xi > vXi). We disentangle these two contributions and focus on the dependence modelling by working with margins that are assumed known for much of the following. We transform all the univariate marginal distributions to be of standard Gumbel form by using the probability integral transform, which for our marginal model (1.3) is


where ψi = (βi,ξi) are the marginal parameters. This transformation gives & pr ;(Yi leqslant R: less-than-or-eq, slant y) = exp {− exp (−y)} for each i, so & pr ;(Yi > y) ∼  exp (−y) as y → ∞, and Yi has an exponential upper tail. To clarify which marginal variable we are using, we use X and Y throughout to denote the variable with its original marginal distributions and with Gumbel margins respectively.

We now focus on extremal dependence modelling of variables with Gumbel marginal distributions. Modelling dependence for extreme values is more complex than modelling univariate extreme values and despite there already being various proposals the methodologies are still evolving. When interest is in the upper extremes of each component of Y, the dependence structures fall into two categories: asymptotically dependent and asymptotically independent. Variable Yi is termed asymptotically dependent on and asymptotically independent of variable Yi when the limit


is non-zero and zero respectively. Here Yi denotes the vector Y excluding component Yi and y a vector of y-values. All the existing methods for multivariate extreme values (outlined in Section 2) are appropriate for estimating & pr ;(X  ∈  C) under asymptotic dependence of the associated Y, or for asymptotically independent variables provided that all x  ∈  C are large in all components.

Fig. 2 shows the winter air pollution data transformed, by using transformations (1.4), to have identical Gumbel marginal distributions. It is clear from Fig. 2 that the extremal dependence between the NO variable and each of the other variables varies from pair to pair, with asymptotic dependence a feasible assumption only for (NO, NO2) and (NO, PM10). Thus the range of sets for which existing methods can be used to estimate & pr ;(X ∈ C) is re- stricted.

Figure 2.

 Winter air pollution data transformed to have Gumbel margins by using transformations (1.4)

We present an approach to multivariate extreme values that constitutes a change of direction from previous extreme value methods. Our modelling strategy is based on an assumption about the asymptotic form of the conditional distribution of the variable given that it has an extreme component, i.e. the distribution of Yi|Yi = yi as yi becomes large. This conditional approach provides a natural extension of the univariate conditional generalized Pareto distribution model (1.2) to the multivariate case as & pr ;(X ∈ Ci|Xi>vXi) can be expressed as


where the integrand is evaluated by using the distribution of Yi|Yi=yi after marginal transformation. When vXi > uXi the derivative of inline image is the generalized Pareto density function with scale and shape parameters βi+ξi(vXiuXi) and ξi respectively.

Our conditional approach applies whether the variables are asymptotically dependent or asymptotically independent; it can be used to estimate & pr ;(X  ∈  C) for any extreme set C, and it is applicable in any number of dimensions. The model that we use for the conditional distribution is motivated by an asymptotic distributional assumption and is supported by a range of theoretical examples. The model is semiparametric; parametric regression is used to estimate the location and scale parameters of the marginals of the joint conditional distribution and nonparametric methods are used to estimate the multivariate residual structure. Though our approach lacks a complete asymptotic characterization of the probabilistic structure, such as those which underpin existing extreme value methods, we show that strong mathematical and practical advantages are given by our approach in comparison with existing multivariate extreme value methods.

Existing methods are presented in Section 2. In Section 3 we state the new asymptotic assumption on which our conditional model is based, present some theoretical examples and draw links between the proposed and current methods. The examples motivate the modelling strategy that is introduced in Section 4. In Section 5 inference for the model is discussed. The methods are compared by using simulated data in Section 6. In Section 7 we illustrate the application of the techniques by analysing the extreme values of the air pollution data. Finally, in Section 8 we give the detailed working for the theoretical examples that are presented in Section 3.

2.Existing methods

We present a brief overview of the current methods for variables with Gumbel marginal distributions only. The extension to variables with arbitrary marginal distributions is obtained by incorporating marginal transformation (1.4).

Many multivariate extreme value analyses are based on models which assume implicitly that in some joint tail region each component of Y is either independent of or asymptotically dependent on the other components. Approaches which rely on these assumptions include the models for the multivariate extreme value distribution to describe componentwise maxima of Tawn (1988, 1990), Joe (1994), Capéraàet al. (1997) and Hall and Tajvidi (2000) and the multivariate threshold methods of Coles and Tawn (1991, 1994), Joe et al. (1992), de Haan and Resnick (1993), Sinha (1997), de Haan and de Ronde (1998), Draisma (2000) and Stărică (2000). Ledford and Tawn (1996, 1997, 1998) showed that these multivariate threshold methods are inappropriate for extrapolation of a variable Y with components that are dependent but asymptotically independent, when estimation is carried out by using a single selected thresh- old. Ledford and Tawn (1996, 1997) proposed a bivariate threshold model to overcome this limitation, which has been explored and developed by Bortot and Tawn (1998), Peng (1999), Coles et al. (1999), Bortot et al. (2000), Heffernan (2000), Draisma et al. (2003) and Ledford and Tawn (2003).

Behind all these existing approaches is the assumption of multivariate regular variation in Fréchet margins. For statistical purposes this asymptotic assumption is taken to hold exactly over a joint tail region. For Gumbel margins, these modelling assumptions combine to give a joint distributional model with the property


where t + A is a componentwise translation of every element of set A by a scalar t > 0, A is a set in which every element is large in all its components and ηY, termed the coefficient of tail dependence, satisfies 0 < ηY leqslant R: less-than-or-eq, slant 1. When ηY = 1 the asymptotic theory behind property (2.1) extends to any set A in which every element is large in at least one of its components.

Ledford and Tawn (1996) identified four classes of extremal dependence. The first class is that of asymptotically dependent distributions, for which ηY = 1. The other three classes comprise distributions with asymptotically independent dependence structures exhibiting positive extremal dependence (d−1 < ηY < 1), near extremal independence (ηY = d−1) and negative extremal dependence (0 < ηY < d−1) for a d-dimensional variable. These three classes correspond respectively to joint extremes of Y occurring more often than, approximately as often as or less often than joint extremes if all components of the variable were inde- pendent.

Relationship (2.1) forms the basis for the estimation of probabilities of extreme multivariate events for all the existing methods. Specifically, for an extreme set D, which will typically contain no observations in a large sample, the approach is to choose a constant t > 0 and to identify a set A such that D = t + A and that A is an extreme set in the joint tail that contains sufficient observations for the empirical estimate of & pr ;(Y  ∈  A) to be reliable. Thus the choice of t is equivalent to selecting a threshold. Estimates of & pr ;(Y  ∈  D) follow from property (2.1). Estimates of the parameter ηY are obtained by exploiting the property that & pr ;{min(Y) > y} ∼ exp (−y/ηY) for y → ∞. Estimates of & pr ;(Y  ∈  A), or equivalently & pr ;(Y + t  ∈  D), are obtained empirically.

Extrapolation based on relationship (2.1) cannot provide estimates of probabilities for sets D that are not simultaneously extreme in each component. The reason for this is that, for such D, the empirical estimate of & pr ;(Y + t  ∈  D) is likely to be 0 since the translated data Y + t are unlikely to fall in D. For asymptotically independent variables such sets are of most interest. This weakness of existing methods illustrates the need for a new approach, as it is due to the inadequacy of the asymptotic framework of the existing methods rather than a paucity of available models within this framework.

3.Theoretical motivation

In this section we present a range of theoretical results which motivate our choice of statistical model. In Section 3.1, we make an assumption about the asymptotic form of the conditional distribution and examine the consequences of this assumption. Then, in Section 3.2, we identify the conditions that must be satisfied by the normalizing functions underlying this assumption for the limiting representation to hold. In Section 3.3 we discuss some theoretical examples which suggest that the asymptotic assumption is appropriate for a wide range of distributions, and that the class of normalizing functions is narrow, whereas the range of limit distributions is broad. Finally, in Section 3.4, we draw links between the proposed and existing methods.

3.1.Assumption of a limit representation and its properties

Consider the asymptotic structure of the conditional distributions arising from a d-dimensional random variable Y = (Y1,…,Yd) with Gumbel marginal distributions. For each i = 1,…,d, we examine the conditional distribution & pr ;(Yi leqslant R: less-than-or-eq, slant yi|Yi=yi), where here, and throughout, vector algebra is applied componentwise. To examine the limiting behaviour of these distributions as yi → ∞ we require the limiting distribution to be non-degenerate in all margins, so we must control the growth of yi according to the dependence of Yi on Yi.

Specifically we assume that for a given i there are vector normalizing functions a|i(yi) and b|i(yi), both ℝ → ℝ(d−1), which can be chosen such that, for all fixed z|i and for any sequence of yi-values such that yi → ∞,


where all the margins of the limit distribution G|i are non-degenerate. An alternative expression of this assumption, which has an easier statistical interpretation, is that the standardized variables


have the property that


where the limit distribution G|i has non-degenerate marginal distributions.

Under assumption (3.1), or equivalently assumption (3.3), we have that, conditionally on Yi > ui, as ui → ∞ the variables Yi − ui and Z|i are independent in the limit with limiting marginal distributions being exponential and G|i(z|i) respectively. To see that this result holds, let yi = ui + y with y > 0 fixed; then


where fYi is the marginal density function of Yi. The final convergence in this derivation is implied by the exponential tail of the Gumbel variables and the property that the conditional limit (3.1) holds irrespectively of how yi → ∞.

We now consider the marginal and dependence characteristics of G|i(z|i). For each ji, we define Gj|i(zj|i) to be the limiting conditional distribution of


where aj|i(yi) and bj|i(yi) are the component functions of a|i(yi) and b|i(yi) associated with variable Yj. Thus Gj|i is the marginal distribution of G|i associated with variable Yj. If


then we say that the elements of Yi are mutually asymptotically conditionally independent given Yi.

3.2.Choice of normalization

We now identify the normalizing functions a|i(yi) and b|i(yi) in terms of characteristics of the conditional distribution of Yi|Yi, thus enabling these functions to be identified for theoretical examples. The normalizing functions and limit distribution are not unique in the sense that, if the normalizing functions a|i(yi) and b|i(yi) give a non-degenerate limit distribution G|i(z|i), using the normalizing functions


for arbitrary vector constants A and B, with B > 0, gives the non-degenerate limit G|i(Bz|i+A). However, following standard arguments such as used in Leadbetter et al. (1983), page 7, this is the only way that two different limits with no mass at ∞ can arise, so the class of limit distributions is unique up to type, and the normalizing functions can be identified up to the constants A and B in expression (3.5).

For fixed i, the choice of the vector functions can be broken into d − 1 separate condi- tions based on the limiting behaviour of Yj|Yi = yi, for each j ≠ i, since assumption (3.1) speci- fies that each marginal distribution of G|i must be non-degenerate. Thus we are interested in the conditional distribution function of Yj|Yi = yi which is denoted by Fj|i(yj|yi). The associated conditional hazard function hj|i is defined as


where fj|i(yj|yi) is the conditional density function of Yj|Yi=yi.

Theorem 1.  Suppose that the vector random variable Y has an absolutely continuous joint density. If, for a given i, the vector functions a|i(yi) and b|i(yi) > 0 satisfy the limiting property (3.1), or equivalently property (3.3), then the components of these vector functions corresponding to variable Yj, for each ji, satisfy, up to type, properties (3.6) and (3.7):


where pj|i is a constant in the range (0,1), and


The proof of theorem 1 is given in Appendix A. Owing to the flexibility in the form of normalizing function given by expression (3.5), a simplification of the structure of the normalizing functions can be achieved, as illustrated by corollary 1.

Corollary 1.  If functions aj|i(yi) and bj|i(yi) > 0 satisfy the conditions of theorem 1, and there is a constant sj|i < ∞ such that


then limit relationship (3.1) holds with aj|i(yi) = 0. Furthermore, if bj|i(yi) = tj|i kj|i(yi) for tj|i > 0 any constant independent of yi, and kj|i(yi) any function of yi, then the limit relationship (3.1) holds with bj|i(yi) replaced by kj|i(yi).

3.3.Theoretical examples

We present the normalizing functions a|i(y) and b|i(y), given by theorem 1 and corollary 1, and some properties of the associated non-degenerate limiting conditional distribution G|i for a range of multivariate distributions with Gumbel marginal distributions. The examples are selected to provide a coverage of the four classes of extremal dependence that were identified in Section 2.

As pairwise dependence determines each of the components of the normalizing functions, we present the results categorized by the pairwise coefficient of tail dependence for (Yi,Yj), denoted by ηij, with inline image indicating near extremal independence for the pair. Table 1 shows two examples from each of the four classes. The special cases of perfect positive and negative dependence (cases i and viii respectively) are included here to identify upper and lower bounds on the behaviour of the normalizing functions, although strictly the methods of Section 3.2 do not apply to these two distributions as, for each, the associated conditional distribution is degenerate. At this stage, interest is only in the structure of the normalizing functions and the limiting distributions, so discussion of the precise specification of distributions ii–vii is postponed until Section 8, where additional examples are presented. Furthermore, as the limit distribution G|i is often complicated, here we identify only the marginal distribution Gj|i and state whether or not the margins of G|i are independent.

Table 1.   Examples of multivariate dependence structures classified by extremal dependence behaviour†
Extremal dependence structureηijNormalizationLimit distributionG|i
  1. †1, asymptotic dependence; 2, asymptotic independence with positive association; 3, near independence; 4, negative dependence. The dependence structures are i, perfect positive dependence, ii, multivariate extreme value distribution, iii, multivariate normal (ρij > 0), iv, inverted multivariate extreme value distribution with symmetric logistic dependence structure and parameter 0 < α ≤ 1, v, independence, vi, multivariate Morgenstern, vii, multivariate normal (ρij < 0), and viii, perfect negative dependence.

  2. ‡ACI, asymptotic conditional independence, which is not applicable (NA) if the variable is degenerate.

  3. §The limiting distribution is complicated and its exact form is given in Section 8.

1, i1y1DegenerateNA
1, ii1y1§No
2, iii(1+ρij)/2inline imagey1/2NormalNo
2, iv2α0y1−αWeibullYes
3, v0.501GumbelYes
3, vi0.501GumbelNo
4, vii(1+ρij)/2inline imagey−1/2NormalNo
4, viii0− log (y)1DegenerateNA

The examples that are listed in Table 1, and those given in Section 8, all satisfy the asymptotic assumption (3.1), have a simple structure for the normalizing functions and give a range of limiting distributions G|i that are not contained in any simple distributional family. This finding about G|i is in contrast with the limiting representation for multivariate extreme value distributions (de Haan and Resnick, 1977; Resnick, 1987) but is a consequence of the lack of structure that is imposed on G|i by the limiting operation. The normalizing functions are all special cases of the parametric family


where, on the right-hand side, a|i, b|i, c|i and d|i are vector constants and I is an indicator function. The vectors of constants have components such that 0 leqslant R: less-than-or-eq, slant aj|i leqslant R: less-than-or-eq, slant 1, −∞ < bj|i < 1, −∞ < cj|i < ∞ and 0 leqslant R: less-than-or-eq, slant dj|i leqslant R: less-than-or-eq, slant 1 for all j ≠ i. Parametric family (3.8) has different structural formulations for a|i(y) for positively and negatively associated pairs, owing to the asymmetry of the Gumbel marginal distribution, for which the upper tail is heavier than the lower tail.

The construction of the limiting operations that give the normalizing functions and limit distribution does not ensure continuity in these functions or distributions as the parameters of the original distribution are changed. Two particular examples illustrate this point as the parameters of the underlying distributions approach values corresponding to independence. A special case of distribution ii is the bivariate extreme value distribution with logistic dependence structure, which is asymptotically dependent when the dependence parameter 0 < α < 1 (see Section 8 for details). When α = 1 the variables are independent. Consequently the normalization that is required is discontinuous in α at α = 1. However, as α ↑ 1 the limit distribution Gj|i puts all of its mass increasingly close to −∞, indicating that the location normalization is becoming too powerful. Similarly, the multivariate normal distribution iii gives Gj|i as normal with variance inline image, so as ρij↓0 the limit is degenerate as the scale normalization becomes too strong. Similar inconsistencies are found for ηij (see Heffernan (2000)) and for a range of asymptotically derived probability models.

We obtained the rate of convergence of each margin of the limiting conditional joint distribution, i.e. the order of convergence to 0 of


as a function of n, where & pr ;(Yi > yi) = n−1 so that n determines how extreme the conditioning variable is in a manner that is invariant to the marginal distribution. Thus specified, the rate of convergence depends only on the underlying dependence structure. Expression (3.9) equals 0 for all zj|i for distributions i, v and viii in Table 1; the convergence rate is O(n−1) for distributions ii and vi and O{1/ log (n)} for distribution iv whereas for distributions iii and vii it is O[ log { log (n)}/ log (n)1/2]. These rates are typical of those that are seen in other extreme value problems.

3.4.Links with existing methods

To clarify the connections with existing methods, we examine the limiting conditional distribution under the existing framework for multivariate extreme values. Let inline image, for fixed large values of each yi, i = 1,…,d, in expression (2.1). Differentiating expression (2.1) with respect to yi and dividing by fYi(yi+t) gives that for all t>0


where δ(yi,t)=1− exp [− exp (−yi){1− exp (−t)}] → 0 as yi → ∞. Hence, for large yi, to first order, expression (3.10) is invariant to changes in t when ηY = 1, so the limit distribution of YiYi is non-degenerate for Yi = yi as yi → ∞. This result is identical to the structure that we find under asymptotic dependence between all the components (a|i = 1 and b|i = 0). Despite strong connections between the approaches, the statistical model that is developed in Section 4 leads to a new estimator of & pr ;(X ∈ C) when the variables are asymptotically dependent. When ηY < 1, expression (3.10) shows that the normalization Yi − Yi leads to a degenerate limit given Yi = yi as yi → ∞, demonstrating the need for more sophisticated normalizations than those considered previously.

4.Model structure and properties

In Section 4.1 we present a semiparametric dependence model for describing extreme values in multivariate problems. This model is presented for variables with univariate marginal Gumbel distributions. Combined with our marginal model, described in Section 1, this dependence model gives a complete joint model for the extreme values of the random variable X. Issues concerning the self-consistency of the various conditional models are discussed in Section 4.2. Methods for extrapolation for the X-variable under the joint model are described in Section 4.3. Finally, in Section 4.4, we propose diagnostics to aid model selection.

4.1.Conditional dependence model

The model structure is motivated by the findings in Section 3. Using the same approach as in other univariate and multivariate extreme value methods, we take an asymptotic assumption which holds under weak conditions to hold exactly provided that the limiting variable is sufficiently extreme. Here we use the formulation of the limiting conditional distribution (3.1), and its implied limiting independence property (3.4), to capture the behaviour of variable Yi occurring with large Yi. We assume that for each i=1,…,d there is a high threshold uYi for which we model


where Z|i is the standardized residual defined by expression (3.2), with distribution function G|i, and Z|i is independent of Yi for Yi > uYi. The extremal dependence behaviour is then characterized by location and scale functions a|i(yi) and b|i(yi) and the distribution function G|i.

First consider the specification of the individual conditional models, i.e. a|i(yi), b|i(yi) and G|i(z|i) for a given i. We adopt the parametric model (3.8) as it is a single parametric family of normalizing functions which is appropriate for the wide range of theoretical examples that are shown in Table 1 and Section 8. We denote the parameters of a|i(y) and b|i(y) by θ|i = (a|i,b|i,c|i,d|i) and adopt the convention that cj|i=dj|i=0 unless aj|i=0 and bj|i<0. We discuss the estimation of θ|i in Section 5, denoting the estimator of θ|i by inline image, and the associated estimators of the normalizing functions by inline image and inline image. As the limiting operation (3.1) imposes no specific structure on G|i, we adopt a nonparametric model for G|i. We estimate this distribution by using the empirical distribution of replicates of the random variable inline image, defined by


The theoretical examples suggest that the Z|i are often asymptotically conditionally independent, so if supported by diagnostic tests it may be advisable to model the components of inline image as being independent, i.e. inline image, where inline image is the empirical distribution function of the inline image.

In summary, for i=1,…,d our dependence model is a multivariate semiparametric regression model of the form


where a|i(yi) and b|i(yi) are given by the parametric model (3.8), and the distribution of the standardized residuals is modelled nonparametrically. The parameters of the overall model are θ = (θ|1,…,θ|d). Each regression model applies only above the threshold uYi for which the dependence structure is viewed to be well described by model (4.1). There is no necessity for the dependence threshold uYi (on the Gumbel scale) and the marginal threshold uXi (on the original scale) to agree in the sense that uYi = ti(uXi), where transformation ti is given in equation (1.4).

We categorize the dependence structure that is implied by model (4.1) by using four classes which identify the behaviour of quantiles of the distribution of Yj|Yi = yi as yi → ∞. If the quantiles of the conditional distribution grow at the same rate as yi, i.e. aj|i = 1 and bj|i = 0, the variables (Yi,Yj) are asymptotically dependent; otherwise they are asymptotically independent. For asymptotically independent distributions, the conditional quantiles tend to ∞, a finite limit or −∞ as yi → ∞ if (Yi,Yj) exhibit positive extremal dependence, extremal near independence or negative extremal dependence respectively. Thus the variables exhibit positive extremal dependence when at least one of 0 < aj|i < 1 or bj|i > 0 holds, extremal near independence when aj|i = dj|i = 0 and bj|i leqslant R: less-than-or-eq, slant 0, and negative extremal dependence when aj|i = 0,dj|i > 0 and bj|i < 0.

Though the examples of Section 3.3 illustrate that the limit operations on the parameters of the original distribution and the conditioning variable cannot be interchanged, we do not see that this poses any problems in practice for model (4.1). The theoretical examples motivate a subclass of the general limiting structure imposed by asymptotic assumption (3.1); the family (3.8) that we have identified varies smoothly over the four classes of dependence. Furthermore, for statistical applications the underlying distribution is fixed and so the issue of interchanging limits does not arise in practice.

Treating the d conditional models separately gives the most general version of our model with parameter θ an unconstrained vector of length 4d(d−1), though, for each ordered pair, cj|i and dj|i are only non-zero if there is no positive association. Dependence submodels may be of interest for identifying scientifically relevant structure in the joint distribution or for parsimony. For example, there are many multivariate distributions whose dependence structure is exchangeable in some way. The most common form of exchangeability is pairwise, i.e. Yi depends on Yj in the same way as Yj depends on Yi. We say that variables Yi and Yj exhibit weak pairwise extremal exchangeability if θi|j=θj|i and strong pairwise extremal exchangeability if in addition Gi|j=Gj|i. In Section 8 we show examples of distributions which exhibit each of these forms of exchangeability.

4.2.Self-consistency of separate conditional models

Now consider the self-consistency of the d individual models for the conditional distributions of Yi|Yi for each i and large values of thresholds (uY1,…,uYd). Problems of this general type are discussed by Besag (1974) and Arnold et al. (1999). As all d conditional distributions are determined by the joint distribution of Y, there are some theoretical constraints on the possible combinations of values taken by the parameters θ and the distributions G|i for i = 1,…,d. However, as the individual models are applied to different subsets of the support of the joint distribution, the self-consistency is important only on the intersection of these subsets. Generally the intersections take the form {y:yi geqslant R: gt-or-equal, slanted uYi ∀i ∈ J} where J is a subset of at least two elements of {1,…,d}. First consider the case where J={i,j}; then self-consistency requires that


where yj=aj|i(yi)+bj|i(yi)zj|i and yi=ai|j(yj)+bi|j(yj)zi|j for yi>uYi and yj>uYj. In general condition (4.2) is too complex to impose. However, unless at least one of aj|i=1 and ai|j = 1 holds, condition (4.2) becomes null since & pr ;{min(Yi,Yjgeqslant R: gt-or-equal, slanted u|max(Yi,Yjgeqslant R: gt-or-equal, slanted u} → 0 as u → ∞. When aj|i = 1 and bj|i = 0, as min(uYi,uYj)→∞, condition (4.2) imposes that ai|j=1 and bi|j=0 and, subject to the appropriate convergence of conditional density results, that


Now suppose that J={1,…,d} and that all the variables are asymptotically dependent. Self-consistency then requires that, for all i and j, ai|j=1 and bi|j=0 and that


where inline image denotes a (d−1)-vector with element associated with variable k (kj) being zk|izj|i for ki and zj|i for k=i. Analogous conditions apply when only a subset of the variables is asymptotically dependent.

Though we have made progress in characterizing the self-consistency properties for the special case of asymptotic dependence we have no solution for ensuring self-consistency of the conditional distributions more generally. Our general approach is to estimate the d different conditional distributions separately and not to impose further structure in addition to model (4.1). The first defence of this approach is that the data arise from a valid joint distribution and so estimates that are based on the data should not depart greatly from self-consistency. Secondly, we recommend assessing the effect of using different conditionals to estimate probabilities of events in which more than one variable is extreme. Averaging estimates over the different conditionals reduces any problems of inconsistency, and in essence this is what our partitioning of C into C1,…,Cd ensures. Thirdly, in many applications when submodels are fitted, the θ-component of the model is automatically restricted to be self-consistent. Finally, we might expect that ensuring self-consistency should improve the general performance of the method. Contrary to this expectation, in Section 6 we illustrate the use of models which are not self-consistent and show that imposing self-consistency of the θ-parameters substantially reduces the performance.


We generate random samples from the conditional distributions of X|Xi > vXi for each i, using the estimated conditional models. These samples are used to obtain Monte Carlo approximations of functionals of the joint tails of the distribution of X. Since we use the estimated model, the parameters are replaced by their estimates inline image and inline image which are obtained by using methods described in Section 5. We employ the following sampling algorithm.

  • Step 1:simulate Yi from a Gumbel distribution conditional on its exceeding ti(vXi).
  • Step 2:sample Z|i from inline image independently of Yi.
  • Step 3:obtain inline image.
  • Step 4:transform Y=(Yi,Yi) to the original scale by using the inverse of transformation (1.4).
  • Step 5:the resulting transformed vector X constitutes a simulated value from the conditional distribution of X|Xi>vXi.

For example, we evaluate & pr ;(X ∈ Ci|Xi>vXi) by using a Monte Carlo approximation of integral (1.5) by repeating steps 1–5 and evaluating & pr ;(X ∈ Ci|Xi>vXi) as the long run proportion of the generated sample that falls in Ci. When C is not contained entirely in the joint tail region on which the dependence component of the conditional model is defined, we first partition C into C* and CC* where


By definition of the uYi, the empirical estimator of & pr ;(X ∈ C*) will be reliable. In contrast & pr ;(X ∈ CC*) requires model-based estimation, for which we use the conditional model as follows. We partition CC* into sets C1,…,Cd as in Section 1. Using this construction, inline image for all i=1,…,d, and the above approximation can be used to evaluate & pr ;(X ∈ Ci|Xi>vXi).


The examples in Section 3.3 indicate that the rate of convergence of the conditional distribution of Yi|Yi = y, as y → ∞, to its limiting form can be slow. However, the limiting form of the conditional distribution is used only to motivate our model structure and we are not interested in the true limit values of θ|i and G|i. What is of practical importance is whether the conditional distribution of the normalized variable Z|i is stable over the range of Yi- (or equivalently Xi-) values that is used for estimation and extrapolation.

This requirement suggests that diagnostics for our model structure should be based on assessing the stability of the extrapolations that are achieved when fitting the model above a range of thresholds. For marginal estimation, we use diagnostics that are based on the mean residual life plot and the stability in the marginal shape parameter estimates; see Smith (1989) and Davison and Smith (1990). For dependence estimation, a fundamental modelling assumption is that Z|i is independent of Yi given Yi > uYi, for a high threshold uYi, for each i. By fitting the conditional model over a range of high thresholds, the stability of the estimates of θ|i and the resulting extrapolations can be assessed. Then, for a selected threshold, independence of Z|i and Yi is examined. Furthermore, a range of standard tests for independence can be applied to the observed Z|i to identify whether the variables can be treated as being asymptotically conditionally independent.


Our model comprises the marginal distributional model (1.3) and the dependence model (4.1). Both of these models are semiparametric, consisting of components that are specified parametrically and components for which no parametric model is appropriate. Our strategy for inference is driven by three features: a lack of parametrically specified joint distributions for each conditional distribution, the absence of practical constraints to impose self-consistency between different conditional distributions and a need for simplicity. This leads us to use an algorithm for point estimation which makes simplifying assumptions and a semiparametric bootstrap algorithm for evaluating uncertainty which does not rely on these assumptions.

Inference for marginal and dependence structures is undertaken stepwise: first the marginal parameters ψ are estimated and then the dependence parameters θ are estimated assuming that the marginal parameters are known. Stepwise estimation is much simpler than joint estimation of all the parameters and findings in Shi et al. (1992) suggest that the loss of efficiency relative to joint estimation is likely to be small unless the values of ξi, i = 1,…,d, differ greatly.

Brief details of the marginal estimation step are given in Section 5.1. Following marginal estimation, the data are transformed to have Gumbel marginals by using transformations (1.4), with ψ replaced by their estimates inline image. In Section 5.2 we describe why we use Gaussian estimation for the normalizing function parameters θ|i for each separate conditional distribution under the assumption that there are no constraints between θ|i and θ|j for any i and j. The fitting of submodels requires the joint estimation of all the conditional model parameters θ. In Section 5.3 we discuss an approach for this joint estimation which has similarities to the pseudolikelihood of Besag (1975). In Section 5.4 we present techniques for evaluating the uncertainty in estimation for the overall model and the resulting extrapolations. Throughout, we assume that the data are realizations of independent and identically distributed random variables X1,…,Xn.

5.1.Marginal estimation

We estimate the d univariate marginal distributions jointly, ignoring the dependence between components. Specifically, we assume independence between components of the variable in constructing the log-likelihood function


where inline image is the density that is associated with distribution (1.3), nuXi is the number of observations with ith component exceeding the marginal threshold uXi and the jth component of the kth such observation is denoted by xj|i,k : j = 1,…,d; k = 1,…,nuXi. If there are no functional links between the parameters of the various components then maximizing log-likelihood (5.1) is equivalent to fitting the generalized Pareto distribution to the excesses over the marginal thresholds separately for each margin. When there are constraints between marginal par- ameters, jointly maximizing the log-likelihood function (5.1) enables inferential efficiency to be gained.

5.2.Single conditional

For each i, we wish to estimate θ|i under minimal assumptions about G|i. If we assume that Z|i has two finite marginal moments, then θ|i determines the marginal means and variances of the conditional variable Yi|Yi=yi when yi>uYi. Specifically, if the Z|i have marginal means and standard deviations denoted by vectors μ|i and σ|i respectively, then the random variables Yi|Yi=y, for y>uYi, have vector mean and standard deviation respectively given by


which are functions of y, θ|i and of the constants λ|i=(μ|i,σ|i). Thus (θ|i,λ|i) are the parameters of a multivariate regression model with non-constant variance and unspecified error distribution. We exploit the consistency of maximum likelihood estimates of θ|i achieved by using a parametric model for G|i which is liable to be misspecified. Specifically, we maximize the associated objective function over the parameter space to produce a consistent and valid point estimator for θ|i. For a general discussion of this approach see Hand and Crowder (1996), chapter 7. The parametric model for G|i is chosen for convenience and computational simplicity. We take the components of Z|i to be mutually independent and Gaussian and hence our inference for θ|i is based on Gaussian estimation (Hand and Crowder, 1996; Crowder, 2001). The independence simplification appears reasonable as θ|i determines only the marginal characteristics of the conditional distribution. We considered a range of parametric distributions for the marginals of Z|i and selected the Gaussian distribution for its simplicity, superior performance in a simulation study and links to generalized estimating equations that arise from this choice of model for G|i.

Therefore, the objective function that we use for point estimation of θ|i and λ|i is


where the notation follows the conventions that are adopted in Section 3 and for log-likelihood (5.1). We maximize Q|i jointly with respect to θ|i and λ|i to obtain our point estimate inline image, with λ|i being nuisance parameters. To overcome the structural discontinuity in a|i(y), we fit the dependence model in two stages: first fixing cj|i=dj|i=0; then only estimating cj|i and dj|i if inline image and inline image.

5.3.All conditionals

We now consider joint estimation of the conditional model parameters θ. For reasons that are similar to those discussed in Section 5.2, we falsely assume independence between different conditional distributions to give the objective function


where Q|i(θ|i,λ|i) is as in expression (5.2) and λ=(λ|1,…,λ|d). For Gaussian error distributions it can be shown that objective function (5.3) is an approximation to the pseudolikelihood, which Besag (1975) introduced as an approximation to the joint likelihood function. The approximation of equation (5.3) to the pseudolikelihood follows from Bayes's theorem and the property that the marginal density of Yi and the conditional density of Yi|Yi=yi when yi<uYi influence the shape of the pseudolikelihood negligibly. Further, if the variables are all mutually asymptotically independent then, for sufficiently large thresholds uYi, each datum will exceed at most one threshold so the independence assumption underlying the construction of objective function (5.3) will be satisfied.


Uncertainty arises from the estimation of the semiparametric marginal models, the parametric normalization functions of the conditional dependence structure and the nonparametric models of the distributions of the standardized residuals. To account for all these sources of uncertainty, we use standard semiparametric bootstrap methods to evaluate standard errors of model parameter estimates and of other estimated parameters such as & pr ;(X ∈ C) (see Davison and Hinkley (1997)). Throughout we assume that the marginal and dependence thresholds are fixed and so the uncertainty that is linked to threshold selection is not accounted for by the bootstrap methods.

Our bootstrap procedure has three stages: data generation under the fitted model, estimation of model parameters and the derivation of an estimate of any derived parameters linked to extrapolation. These stages are repeated independently to generate independent bootstrap estimates. The novel aspect of our algorithm is the data generation. To ensure that the bootstrap samples that are obtained replicate both the marginal and the dependence features of the data, we use a two-step sampling algorithm for data generation. A nonparametric bootstrap is employed first, ensuring the preservation of the dependence structure; then a parametric step is carried out so that uncertainty in the estimation of the parametric models for the marginal tails can be assessed. The precise procedure is as follows. The original data are first transformed to have Gumbel margins, using the marginal model (1.3) which is estimated by using these original data. A nonparametric bootstrap sample is then obtained by sampling with replacement from the transformed data. We then change the marginal values of this bootstrap sample, ensuring that the marginal distributions are all Gumbel and preserving the associ- ations between the ranked points in each component. Specifically, for each i, i = 1,…,d, we replace the ordered sample of component Yi with an ordered sample of the same size from the standard Gumbel distribution. The resulting sample is then transformed back to the original margins by using the marginal model that was estimated from the original data. The data that are generated by using this approach have univariate marginal distributions with upper tails simulated from the fitted generalized Pareto model and dependence structure entirely consistent with the data as determined by the associations between the ranks of the components of the variables.

6.Simulation study

Throughout this section we use simulated data with known Gumbel margins to illustrate the application of the methods proposed. In Section 6.1 we present a detailed analysis of a single data set to highlight inference and extrapolation issues. Section 6.2 reports results of simulation studies comparing the performance of the existing and conditional methods for bivariate and multivariate replicated data sets. To allow a comparison with existing methods, we consider only positively dependent variables and hence work with the submodel a|i(y)=a|iy with 0leqslant R: less-than-or-eq, slanta|ileqslant R: less-than-or-eq, slant1. We focus on return level estimation. Specifically, when the multivariate set C is described by a single parameter v say, i.e. C=C(v), then the return level vp for an event with probability p is defined implicitly by


We assess the performance of an estimator inline image of vp by using the relative error inline image.

6.1.Simulated case-study

We analyse the simulated data set of 5000 points shown in Fig. 3. The underlying distribution is the bivariate extreme value distribution with asymmetric logistic dependence structure; see Section 8.1 and Tawn (1988, 1990) for details. The parameters of this distribution are θ1,{1} = 1−θ1,{1,2} = 0.1, θ2,{2} = 1−θ2,{1,2} = 0.75 and α{1,2} = 0.2, so the limiting parameters for the conditional distributions are a2|1=a1|2=1 and b2|1=b1|2=0. The simulated data have a complicated structure as, for large Y1, variable Y2 behaves as though it were asymptotically dependent on Y1 but, for large Y2, Y1 arises from a mixture distribution with one component that is independent of Y2 and the other asymptotically dependent on Y2. As the normalization stabilizes the growth of the asymptotically dependent component only, the limiting distribution of Y1|Y2 has substantial mass at −∞, corresponding to the independent component of the mixture distribution. At finite levels the independent points are likely to contaminate the parameter estimates of any asymptotically motivated model. Although the limiting values for the normalization parameters are symmetric, the clear asymmetry in the data suggests that we should compare two models: one with weak pairwise exchangeability (a2|1=a1|2 and b2|1=b1|2) and the other relaxing this assumption to allow for any form of asymmetry. Both models are fitted by using objective function (5.3) and thresholds corresponding to the 0.9 marginal quantiles.

Figure 3.

 Observations from the bivariate extreme value distribution with asymmetric logistic dependence structure (•) and pseudosamples (∘) generated under the asymmetric model above a threshold (inlinefigure - - - - - - -) (inline image, inline image, inline image and inline image) and sets Ci (bsl00017)

Diagnostic procedures that are outlined in Section 4.4 aid model selection. Fig. 4 shows scatterplots of residuals inline image for large Y1 for each of the models proposed. Fig. 4(a) shows that the estimated distribution of inline image has a trend in mean value with Y1 for the weak pairwise exchangeable model, whereas this trend is much diminished in Fig. 4(b), which shows residuals from the fitted asymmetric conditional model. Equivalent plots for inline image (not shown) indicate approximate independence of these residuals and Y2 for both models.

Figure 4.

 Diagnostic plots for data from the bivariate extreme value distribution with asymmetric logistic dependence structure—scatterplots of residuals Z2|1 against the conditioning variable Y1, after transforming Y1 to uniform margins: (a) residuals from fitting the weakly pairwise exchangeable model; (b) residuals from fitting the asymmetric conditional model

Fig. 3 shows the pseudosamples that are obtained by using the fitted asymmetric conditional model with Fig. 3(a) and Fig. 3(b) showing the samples that are obtained conditioning on Y1 and Y2 respectively, and revealing the different forms of the conditional distributions. For set C(v) = (v,∞)2, Fig. 3 shows C1 and C2. Empirical estimates of & pr ;(Y ∈ Ci|Yi>v) are obtained as the proportion of the respective pseudosamples falling in these sets; & pr ;(Y  ∈  C) is then estimated by using decomposition (1.1). We investigated the effect of inconsistencies of the conditional models for Y2|Y1 and Y1|Y2 on the estimation of & pr ;(Y ∈ C) by comparing approaches using pseudosamples generated under the following models: Y2|Y1 only, Y1|Y2 only and the intermediate approach based on decomposition (1.1). Despite the very different forms of the two conditional distributions, the differences between the three estimates are small relative to the uncertainties in estimation.

6.2.Multivariate examples

We consider the following four distributions, all with standard Gumbel margins:

  • (a) a multivariate extreme value distribution with symmetric logistic dependence structure (8.4) and parameter α=0.5 (distribution A);
  • (b) a bivariate extreme value distribution with asymmetric logistic dependence structure (8.5) with parameters given in Section 6.1 (distribution B);
  • (c) an inverted multivariate extreme value distribution with symmetric logistic dependence structure (8.4) with parameter α, for which ηij for any pair of variables is 2α (distribution C);
  • (d) a bivariate normal distribution with correlation coefficient ρij, for which ηij=(1+ρij)/2 (distribution D).

Section 8 shows the theoretical derivation of extremal properties of these distributions. Distributions A and B are asymptotically dependent whereas distributions C and D are asymptotically independent. We select the parameters of distributions C and D so that ηij=0.75 for all bivariate pairs. For each distribution we simulated 200 replicate data sets each of size 5000. We applied a range of existing and conditional methods, selecting thresholds so that 10% of each data set was used for estimation by each method. We compare the performance of the methods for a range of forms of extreme event. Preliminary studies showed that the relative errors varied little with p and so we show results for p = 10−4,10−6, 10−8 only.

6.2.1.Simultaneously extreme bivariate events

Table 2 shows the median, 2.5 and 97.5 percentiles of the estimated sampling distribution of the relative errors of vp when C(v)=(v,∞)2. First consider distributions A and B for which the existing method based on property (2.1) with ηY = 1 is asymptotically the correct form of model. The existing method with ηY = 1 has small relative errors centred on zero for distribution A, but for distribution B the method overestimates by a small but significant amount. The conditional method and the existing method with estimated ηY are unbiased but have variable relative errors for distribution A. For distribution B the conditional model with weak pairwise exchangeability significantly underestimates whereas the existing method with ηY estimated and the asymmetric conditional model are equally variable and unbiased. For the asymptotically independent distributions C and D, the estimators perform differently from one another but similarly over distributions. The existing method with ηY = 1 grossly overestimates. The other two methods are unbiased with the conditional approach generally having less variability. In Section 3.3 we noted a discontinuity in the normalizing parameters as independence is approached. We extended the above simulation study to assess the performance of the methods as these discontinuities are approached. For distributions A and C, as α ↑ 1, all methods perform similarly to the behaviour shown in Table 2 with a small bias observed for the ηY = 1 approach, and both of the other two methods being unbiased with similar variances. In summary, these results suggest that the general performance of the conditional method is good but that, when asymmetry is present, the diagnostic procedures of Section 6.1 are vital for model selection.

Table 2.   Median (and 2.5 and 97.5 percentiles) of the estimated sampling distribution of relative errors of vp for simultaneously extreme bivariate events†
DistributionMethodMedians (and 2.5 and 97.5 percentiles) (×100) for the following values of p:
  1. †The four distributions are listed in Section 6.2. The true return levels are, for distribution A, vp=8.6, 13.0, 17.9, for distribution B, vp=7.8, 12.4, 17.0, for distribution C, vp=6.5, 9.7, 13.2, and, for distribution D, vp=6.9, 10.2, 13.8, for p=10−4,10−6, 10−8 respectively. Four methods of estimation are used: the existing method with ηY=1 and with ηY estimated, and the conditional method with weak pairwise exchangeability and asymmetry.

AExisting, ηY=1−0.1 (−1.0,0.8)−0.1 (−0.7,0.5)−0.0 (−0.5,0.4)
Existing, inline image−0.8 (−15.0,0.6)−0.8 (−16.0,0.4)−0.6 (−17.0,0.3)
Conditional (weak pairwise exchangeability)−1.4 (−4.0,0.8)−1.6 (−4.1,0.5)−1.6 (−5.0,0.4)
BExisting, ηY=14.6 (3.7,5.6)3.0 (2.4,3.6)2.1 (1.7,2.5)
Existing, inline image−1.0 (−14.0,5.3)−2.9 (−17.0,3.4)−3.9 (−19.0,2.4)
Conditional (weak pairwise exchangeability)−15.0 (−21.0,−8.8)−14.0 (−21.0,−7.4)−12.0 (−19.0,−6.3)
Conditional (asymmetry)−4.0 (−12.0,4.2)−5.7 (−15.0,0.5)−6.1 (−17.0,0.0)
CExisting, ηY=123.0 (22.0,24.0)26.0 (26.0,28.0)29.0 (28.0,29.0)
Existing, inline image−0.6 (−16.0,14.0)−0.1 (−18.0,17.0)0.2 (−18.0,18.0)
Conditional (weak pairwise exchangeability)−0.6 (−8.6,5.3)0.6 (−13.0,8.2)0.8 (−18.0,9.8)
DExisting, ηY=128.0 (27.0,29.0)31.0 (30.0,32.0)32.0 (32.0,33.0)
Existing, inline image−1.9 (−15.0,14.0)−1.9 (−17.0,16.0)−2.2 (−18.0,17.0)
Conditional (weak pairwise exchangeability)−0.6 (−10.0,7.3)−0.1 (−15.0,9.2)−0.1 (−25.0,12.0)

6.2.2.Non-simultaneously extreme bivariate events

Now consider estimating quantiles of the distribution of Y2|Y1>r for a given r, i.e. for a given q we estimate v satisfying & pr ;(Y2 < v|Y1 > r) = q. Equivalently, we wish to estimate v where C(v) = (r,∞) × (−∞,v), and & pr ;(Y1>r) = p/q where p and q are given. Table 3 shows summary characteristics of the sampling distribution of the relative error of the conditional method for combinations of p and q. For p = 10−4 and q = 0.2,0.5,0.8 the respective true values of r = 7.6,8.5,9.0, and the corresponding values for v are 6.7, 8.8 and 10.5 for distribution A, 6.2, 7.8 and 9.8 for distribution B, 2.5, 4.4 and 6.7 for distribution C, and 2.1, 3.6 and 5.7 for distribution D. This illustrates that if q<1 and the variables are asymptotically independent the existing methods are inappropriate for estimating v as all elements of C(v) are not simultaneously extreme in each component. For each distribution, the estimators based on the conditional approach have a larger variance than in Table 2, with the variability increasing as q is decreased. Only for long-range extrapolations for distribution B is there a significant bias, but even this is small. The relative errors are much the smallest for distribution A and grow as we extrapolate for distributions C and D.

Table 3.   Median (and 2.5 and 97.5 percentiles) of the estimated sampling distribution of relative errors of vp for non-simultaneously extreme bivariate events†
Distribution (method)qMedians (and 2.5 and 97.5 percentiles) (×100) for the following values of p:
  1. †The four distributions are listed in Section 6.2. The conditional method is used with weak pairwise exchangeability or asymmetry.

A (weak pairwise exchangeability)0.2−3.1 (−13,2.7)−4.7 (−15,1.6)−5.1 (−16,1.0)
0.5−2.0 (−9.4,1.2)−2.5 (−11,0.8)−2.6 (−12,0.5)
0.8−0.8 (−6.7,3.8)−0.9 (−7.8,3.5)−1.0 (−9.2,2.9)
B (asymmetry)0.2−15 (−36,0.7)−17 (−43,−1.1)−16 (−47,−2.0)
0.5−11 (−25,−0.9)−12 (−29,−1.9)−12 (−32,−2.2)
0.8−8.4 (−19,−0.3)−9.1 (−21,−1.6)−9.2 (−23,−1.6)
C (weak pairwise exchangeability)0.26.7 (−17,35)16 (−18,58)25 (−19,81)
0.53.1 (−15,23)7.9 (−17,36)13 (−19,51)
0.80.5 (−18,22)2.6 (−21,32)5.5 (−20,42)
D (weak pairwise exchangeability)0.2−4.4 (−37,22)8.5 (−34,45)17 (−32,60)
0.5−1.5 (−22,22)4.4 (−23,35)6.3 (−27,44)
0.8−1.3 (−25,27)−0.7 (−27,33)0.2 (−29,41)

6.2.3.Multivariate events

To illustrate the performance of the conditional method in higher dimensional problems, we consider the estimation of v when inline image for distributions A and C. For such sets, values of all the variables are equally influential and the set comprises regions of both simultaneous and non-simultaneous extreme values of the components. Distribution A exhibits asymptotic dependence without being asymptotically conditionally independent, whereas distribution C is asymptotically independent and asymptotically conditionally independent. Because of the symmetry of both dependence structures, in each case we fitted a model with aj|i=a and bj|i=b for all i and j. The limit values of (a,b) for distributions A and C are (1,0) and (0,0.585) respectively. We examined the observed components of the standardized residuals Z|i for each Yi>uYi to see whether asymptotic conditional independence was a reasonable assumption. Our findings agreed with the limiting properties, so we proceed to estimate vp assuming asymptotic conditional independence for distribution C only. For distribution A we find that the median (and 2.5 and 97.5 percentiles) (×100) of the estimated sampling distribution of relative errors of vp are 4.4 (1.7,7.6) and 1.8 (−4.5,6.8) for p = 10−4 and p = 10−6 respectively. The same quantities for distribution C are 0.1 (−6.6,4.9) and −0.1 (−10.0,7.4). The estimates are close to the true values with increasing variability in relative error for longer-range extrapolation.

7.Air quality monitoring application

We now analyse the extremes of the five-dimensional air pollutant variable that was presented in Section 1. The primary aim of this analysis is to study the underlying extremal dependence structure of the variables. By identifying this structure we can assess whether the relationships between the extreme values of these variables conform with scientific understanding of the production of and interaction between the pollutants and the climatic factor represented by season. We measure the extremal dependence by estimating the individual model parameters and by examining functionals of extremes of the joint distribution.

First we select the data to be analysed. The pollutants exhibit regular seasonal variation, which we account for by focusing separately on two periods: winter (from November to February inclusively) and early summer (from April to July inclusively) and treating the joint distribution of the pollutants as stationary in each period. This proposal is supported by empirical evidence and by knowledge of the seasonal behaviour of the variable (Photochemical Oxidants Review Group, 1997). The measurements follow a diurnal cycle and exhibit marked short-term dependence. By focusing on componentwise daily maxima of hourly means we remove this short-term non-stationarity and substantially reduce temporal dependence. The residual serial dependence is due, among other things, to short-term persistence of local atmospheric pressure systems. We do not attempt to take this temporal dependence into account in this analysis. The data set contains some large values on or around November 5th each year (fireworks night), which were removed for the subsequent analysis. An exploratory analysis also revealed six data points with excessive PM10-values (in excess of 200 μg m−3) during April 1997 and three winter points with unusually large values of some functionals of (NO2, SO2, PM10). We performed the modelling and inference stages of the following analysis including and excluding these large points, to assess the sensitivity to their presence. The estimated dependence structures were not affected by the removal of these outliers; however, marginal estimates were more physically self-consistent when the points were left out. We report the analysis that was undertaken with the outliers excluded; these outliers are omitted from the data plots in Figs 1 and 2.

We fit the marginal model (1.3), for each component and for each season. Table 4 shows the resulting values of the threshold uXi, the threshold non-exceedance probability inline image, the estimated generalized Pareto distribution parameters inline image and inline image, and the estimated marginal 0.99 quantiles inline image for each component and each season. The values of inline image highlight differences in the marginal distributions of separate components within each season and of the same component over seasons, with O3 and NO exhibiting the largest statistically significant variation over seasons. Though inline image differ over components, the stability of theinline image-values for each component over season suggests that seasonality primarily affects the variance of the components rather than the shape of their distributions.

Table 4.   Summary of generalized Pareto models fitted to the marginal distributions of the air pollution data†
SeasonParameterResults for the following pollutants:
  1. †Thresholds used for marginal modelling are denoted uXi; the associated non-exceedance probabilities are inline image; estimated scale inline image and shape inline image parameters; estimated 0.99 quantilesinline image. Bootstrap-based stand- ard errors are given in parentheses.

inline image0.
inline image15.8 (3.1)9.1 (1.0)32.2 (3.5)42.9 (7.0)22.8 (2.5)
inline image−0.29 (0.14)0.01 (0.08)0.02 (0.07)0.08 (0.12)0.02 (0.08)
inline image70 (2)75 (3)180 (10)152 (16)127 (8)
inline image0.
inline image6.2 (0.7)9.3 (0.9)117.4 (13.1)19.7 (2.4)37.5 (4.2)
inline image−0.37 (0.06)−0.03 (0.08)−0.09 (0.08)0.11 (0.09)−0.20 (0.07)
inline image40 (1)80 (3)494 (30)104 (10)145 (6)

Let us now consider the dependence model, applied to the data after transformation to Gumbel margins by using transformation (1.4). We model each season separately, initially considering the most general model consisting of a set of five conditional models with no constraints between the different conditional distributions, so θ consists of 80 unconstrained dependence parameters.

The first modelling choice to be made is that of the dependence threshold to be used to fit the conditional model (4.1). For simplicity we restrict the search for the dependence thresholds to values of uYi = u for all i. Of the diagnostics that are discussed in Section 4.4, we found that those assessing the stability of the inline image-values and the independence tests were the most revealing in this application. A dependence threshold u such that & pr ;(Yi < u)=0.7 was supported by the diagnostics, although there appeared to be limited sensitivity to this choice. The resulting inline image values, and the sampling distributions of pairs inline image for all ij, are shown for the summer and winter seasons in Fig. 5. In particular, the pairwise sampling distributions are shown by the convex hull of 100 bootstrap realizations from the sampling distribution of inline image. Plots of this type were used to assess the stability of inline image to the choice of threshold. Significant shifts in the region that is encompassed by the convex hull indicate sensitivity of parameter estimates to the choice of threshold. An appropriate threshold should have the property that raising the threshold higher does not result in any significant shifts once the increased variability of estimates made by using higher thresholds is accounted for. The minimum such appropriate threshold is selected for efficiency purposes.

Figure 5.

Figure 5.

 Comparison of dependence parameter estimates inline image for (a) (NO2, O3), (b) (NO, O3), (c) (SO2, O3), (d) (PM10, O3), (e) (NO, NO2), (f) (SO2, NO2), (g) (PM10, NO2), (h) (SO2, NO), (i) (PM10, NO) and (j) (PM10, SO2), using a dependence threshold equal to the 70% marginal quantile: for i and j in the same order as the variables in the descriptor for each part of the figure, bootstrap convex hulls were used for inline image (——, summer; bsl00086bsl00086, winter) and for inline image (· · · · · · ·, summer; - - - - - -, winter) (associated point estimates: s, summer; w, winter)

Figure 5.

Figure 5.

 Comparison of dependence parameter estimates inline image for (a) (NO2, O3), (b) (NO, O3), (c) (SO2, O3), (d) (PM10, O3), (e) (NO, NO2), (f) (SO2, NO2), (g) (PM10, NO2), (h) (SO2, NO), (i) (PM10, NO) and (j) (PM10, SO2), using a dependence threshold equal to the 70% marginal quantile: for i and j in the same order as the variables in the descriptor for each part of the figure, bootstrap convex hulls were used for inline image (——, summer; bsl00086bsl00086, winter) and for inline image (· · · · · · ·, summer; - - - - - -, winter) (associated point estimates: s, summer; w, winter)

Having decided on a dependence threshold, we consider possible simplifications to the estimated dependence structure. From Fig. 5 and plots of the data with Gumbel marginals (Fig. 2), it is clear that there are significant differences in levels of extremal dependence between different pairs of variables.Fig. 5 shows that, for each season, there are pairs of variables for which the bivariate sampling distributions of inline image and inline image differ significantly, as the convex hulls do not intersect. For example, Fig. 5 shows that in summer (PM10, O3) and (SO2, NO) and in winter (SO2, NO2) and (SO2, NO) do not exhibit weak pairwise exchangeability. This finding indicates that a global weakly pairwise exchangeable dependence structure is inappropriate for these data, a conclusion which is supported in the winter period by a complete lack of stability in inline image over all dependence threshold choices for the global weakly pairwise exchangeable model. Though for some pairs of pollutants there is no evidence to reject weak pairwise exchangeability, in the absence of more detailed knowledge about the process we do not attempt to identify subsets of pairs for which we may assume a simplified pairwise dependence model. Finally, we consider whether the Z|i are independent for any i, i.e. whether we can assume asymptotic conditional independence between the margins of the residual distribution G|i. Scatterplots of pairs of components of Z|i for each i confirm that this assumption is inappropriate. Testing for asymptotic conditional independence between pairs of variables revealed that, for the summer data, SO2 and O3 are asymptotically conditionally independent given any other variable, although these two variables are not unconditionally independent. The same conclusion can be drawn for winter NO2 and O3 levels.

Fig. 5 also shows substantial differences between the dependence parameter estimates that are obtained for the summer and winter data sets. All pairs which have O3 as one component exhibit stronger dependence in the summer period than in the winter period, whereas for other pairs the dependence is either of similar strength or weaker in summer than in winter. The strongest dependence between any pair occurs in the winter between all the pairs of the triple (NO, NO2, PM10), with reasonable evidence that these variables are asymptotically dependent. No other pairs of variables exhibit asymptotic dependence in either season.

For non-positively associated variables, estimates of (cj|i,dj|i) (not shown) reveal the degree of dependence. In the summer, the only such conditional distribution is that of NO given extreme SO2, although with the conditioning reversed these variables are clearly positively dependent. In winter, SO2 and PM10 are both negatively dependent on high O3 values, whereas NO2 and NO appear to be independent of O3 when O3 is extreme. Conversely, in winter O3 is negatively associated with extreme NO2, NO and SO2. Negative dependence is also identified for all winter variables given that SO2 is extreme, with the exception of PM10 which appears to be independent of extreme SO2.

These findings are consistent with the current understanding of urban pollution patterns. In winter, air pollution episodes typically occur when cold, stable weather conditions trap pollutants, allowing levels to build. Since the majority of such pollution derives from vehicle emissions, winter episodes consist of simultaneously elevated levels of nitrogen and sulphur compounds and particulate matter. Conversely, since the production of excessive O3 needs strong sunlight, O3 levels generally remain at relatively low levels during the winter months regardless of the presence of other pollutants. In the absence of strong sunlight, O3 levels are negatively associated with high presences of nitrogen compounds as O3 reacts destructively with NO. The stronger dependence that is observed between O3 and the other variables during the summer supports the existing understanding of the photochemical processes that produce excessive O3 levels during summer smog. Temperature inversions and low winds that accompany high pressure systems trap vehicle emissions, which are then exposed to long hours of sunshine. Thus high levels of O3 accompany elevated levels of the other pollutants (Photochemical Oxidants Review Group, 1997; Colls, 2002; Housley and Richards, 2001).

To illustrate the implications of both the different levels of dependence between the pairs and the different marginal distributions, in Fig. 6 we show pseudosamples, on the measured scale, from the conditional distribution of the remaining variables given that NO exceeds a high threshold. On each pairwise plot the curve corresponds to equal marginal quantiles. The near asymptotic dependence of both NO2 and PM10 on NO is clearly seen by the grouping of simulated points around this curve. The NO2 points are more scattered than the PM10 points for large NO values as inline image are positive and negative respectively. Similarly, O3 is seen to be negatively dependent on NO whereas SO2 is dependent but asymptotically independent of NO. The effect of the negative inline image for SO2|NO is the increasing concentration of this conditional distribution for larger NO values.

Figure 6.

 Simulated winter air pollution points conditional on the NO component exceeding vXi, the 0.99 marginal quantile of this variable: |, threshold vXi (points below and above this threshold are the original data and data simulated under the fitted model respectively); +, points that do not fall in the set C5(23); ∘, points that fall in the set C5(23); bsl00150, 10 points with the largest values of inline image; ——, equal marginal quantiles

We now focus on estimating a range of functionals of the joint tails of X. Coles and Tawn (1994) discussed several benefits of the multivariate approach (the joint probability method) over the univariate approach (the structure variable method). We see the major advantage of the former being the self-consistency of the resulting estimates of any such functionals; this is particularly important here where we illustrate a range of functionals for which no single structure variable approach could have been used.

We first turn to the estimation of the conditional expectation of each component given that NO exceeds a particular level. These estimates reflect both the marginal and the dependence features of the air pollution variables. Fig. 6 shows pseudosamples from the conditional distribution of each variable given that NO exceeds its 0.99 marginal quantile. Table 5 shows estimated expectations for each variable conditional on the NO level exceeding various thresholds. When we condition on NO exceeding its 0.95 quantile, empirical estimates of this functional are sufficiently reliable to be compared with the model-based estimates and these are seen to be consistent. Conditional expectations of each variable increase as we move to higher quantiles of NO, with the exception of winter O3, the only variable to exhibit negative association with large NO values.

Table 5.   Empirical and model-based estimates of conditional expectations of the air pollution variables given values of NO in excess of a range of quantiles of that variable†
XjSeasonE(Xj), empiricalE{Xj|Xi>xi(0.95)}E{Xj|Xi>xi(0.99)}, model based
EmpiricalModel based
  1. †Standard errors are given in parentheses. Variable Xi is NO throughout.

O3Winter20.0 (0.5)8.8 (1.4)10.3 (1.1)8.3 (1.2)
Summer32.0 (0.4)35.9 (3.0)34.4 (2.4)39.6 (4.3)
NO2Winter44.2 (0.5)67.2 (2.5)65.1 (2.2)75.4 (4.4)
Summer37.6 (0.5)57.5 (2.6)54.6 (2.4)62.2 (4.3)
NOWinter135.5 (4.4)454.0 (13.0)431.5 (23.2)569.9 (45.2)
Summer55.2 (1.5)161.2 (7.2)157.6 (8.2)213.5 (17.5)
SO2Winter21.0 (0.9)38.4 (3.7)35.6 (4.0)44.6 (6.7)
Summer17.4 (1.2)36.6 (11.3)36.9 (5.4)48.5 (11.8)
PM10Winter48.4 (1.2)105.8 (5.2)105.0 (4.7)132.3 (8.2)
Summer41.1 (1.0)72.9 (5.2)66.3 (4.5)83.7 (7.9)

We now concentrate on the estimation of return levels of linear combinations of variables on the Gumbel marginal scale. This choice of functional is made to emphasize the effect of dependence on extreme combinations. We focus on subvectors of Y of size m = 2,…,d, indexed by ℳ ⊆ {1,…,d} with associated multidimensional sets Cm(v) = {y  ∈  ℝm : Σi ∈ ℳyi > v} and we report estimated return levels vp as defined in equation (6.1). This choice of set allows an exploration of extremal dependence in parts of the space in which not all the variables are simultaneously extreme. To gain insight about combinations of the pollutants that fall in the set C5(v) for large v, in Fig. 6 we highlight the simulated points with NO exceeding its 0.99 quantile that fall in C5(23) and indicate which of these have the largest values of inline image. Simulated points in C5(23) tend not to have particularly large values of O3 but do occur with moderate SO2 values and extreme values of NO2 and PM10. The strong dependence between (NO, NO2, PM10) leads to the largest values in C5(23) occurring when any one of these variables is extreme.

Fig. 7 shows empirical and model-based return level estimates for Cm(v) for ℳ corresponding to (O3, NO2) and (NO2, SO2, PM10). Return levels calculated under independence and perfect dependence are also marked. For the pair (O3, NO2) the C2(11) set is shown after transformation to the original margins in Fig. 1. High levels of O3 and NO2 are associated with summer photochemical smog. Empirical return level estimates show that stronger dependence between these variables during the summer leads to elevated return levels. Model-based return levels agree closely with the empirical values and show that this seasonal difference is statistically significant as the confidence intervals for the return levels are separated. The estimated return levels for the winter (O3, NO2) lie significantly below the independence curve, highlighting the negative dependence between these variables during the winter. Elevated levels of all three components of (NO2, SO2, PM10) are associated with winter urban air pollution episodes and correspondingly we see larger return levels in winter, indicating stronger dependence between these three variables in this season, although this effect is not significant as the confidence intervals overlap. Both Fig. 7(a) and Fig. 7(b) show excellent agreement between the model-based return level estimates and the empirical estimates, illustrating the good fit of our dependence model.

Figure 7.

 Return level estimates for the set Cm(v), for ℳ corresponding to (a) (O3, NO2) and (b) (NO2, SO2, PM10): bsl00086bsl00086, point estimates for summer; - - - - - -, point estimates for winter variables; ░, pointwise 95% confidence intervals (which overlap in (b)); ∘, empirical points for summer; ×, empirical points for winter; ——, return levels calculated under perfect dependence (upper) and exact independence (lower)

8.Theoretical examples

We now derive the limiting conditional characteristics that were identified in Section 3 for a range of theoretical examples including those summarized in Table 1. Where possible, results are given for a d-dimensional random variable Y, and in a few special cases for bivariate Y only.

First we give the precise form of the multivariate extreme value distribution which plays a key role in the examples that are given in this section. A d-dimensional random variable Y with standard Gumbel margins has a multivariate extreme value distribution if its joint distribution function can be expressed as (Pickands, 1981)


where V(d), termed the exponent measure, is given by


where H(d) is the distribution function of an arbitrary random variable on the (d − 1)-dimensional unit simplex


satisfying the marginal moment constraint


and wd = 1−(w1+…+wd−1). We refer to a multivariate extreme distribution as having mass on the boundary if H(d) places mass on the boundary of ��d−1. We denote by mj|i the mass on the boundary of ��d−1 for which wj = 0 and wi > 0 and let Mi = {j:mj|i>0;ji} and inline image. The density of H(d) on the interior of ��d−1 is denoted by h(d) when it exists. Some parametric examples of V(d) are given now.

8.1.Multivariate exchangeable logistic distribution

Gumbel (1960) introduced the multivariate exchangeable logistic distribution with


for any d ≥ 2 and 0 < α leqslant R: less-than-or-eq, slant 1. Independence is given by α=1 and perfect positive dependence in the limit as α → 0. There is no mass on the boundary of ��d−1 for 0<α<1.

8.2.Multivariate asymmetric logistic distribution

The multivariate asymmetric logistic dependence structure given by Tawn (1990) has


where K is an index variable over the power set S of {1,…,d}, 0 < αK leqslant R: less-than-or-eq, slant 1 for all K ∈ S and 0 leqslant R: less-than-or-eq, slant θj,K leqslant R: less-than-or-eq, slant 1 for j=1,…,d. Further conditions on θj,K are that θj,K = 0 if jK or if Πk ∈ K θk,K=0 and that ΣK ∈ S θj,K=1 for all j. Similarly, for identifiability, αK = 1 when |K|=1. If, for any K, there exists j ∈ K with θj,K>0 then there is positive association between the elements of YK, where YK is the subvector of Y made up of the variables that are indexed by the elements of set K. For this example mj|i = ΣK ∈ S(j)∖S(i) θj,K where S(i) denotes the subclass of S, all of whose members contain i.

8.3.Bivariate discrete measure

Ledford and Tawn (1998) defined the distribution for which H(2), satisfying constraint (8.3), places m atoms of mass λ1,…,λm at points w1,…,wm on the interior of ��1:


where I is the indicator function. For such H(2),


where w*=y1/(y1+y2). There is no mass on the boundary of ��1.

8.4.Multivariate extreme value distribution

For Pr(Y<y) given by equation (8.1), the conditional distribution function of Yi|Yi=yi is


where inline image is the derivative of V(d)(y) with respect to yi. If H(d) places any mass on the subset of ��d−1 for which wiwj > 0 then ηij = 1; otherwise inline image. If mj|i = 0 there is a unique normalization but if 0 < mj|i < 1 there are two normalizations that give non-degenerate Gj|i(zj|i). Let Zj|i=Yjaj|iyi; then there are non-degenerate limits for aj|i=0 if mj|i>0 and aj|i=1 irrespective of mj|i, with the combined limit being


for −∞<zj|i<∞ and where z(0,i)=(z1|i,…,zi−1|i,0,zi+1|i,…,zd|i). Thus Gj|i has mass 1−mj|i at zj|i = ∞ if aj|i=0 or mass mj|i at zj|i = −∞ if aj|i = 1. This may appear to contradict the uniqueness properties, up to type, of the normalization and limit law that was discussed in Section 3.2. However, only one of these non-degenerate limits has no mass at ∞, so we are interested in only one of these limits. When mj|i=0 for all ji, the limiting joint conditional distribution of Z|i is


We illustrate these limit properties with the three examples above. For the exchangeable logistic distribution, normalization Z|i=Yiyi gives


For the asymmetric logistic distribution, mj|i is given above and


For the bivariate discrete measure dependence structure, setting Z|i=Yiyi yields


where inline image when i=1 and inline image when i=2.

8.5.Inverted multivariate extreme value distribution

Ledford and Tawn (1997) examined the inverted bivariate extreme value distribution, which we extend here to the multivariate case. For V(d) defined in equation (8.2), the survivor function of this multivariate distribution is given by


For this distribution inline image. Assuming that the Yj grow with Yi, then, as yi→∞,


Further simplification is not possible without more information about the shape of H(d) around wi = 0. We first consider the bivariate case where all the mass of H(2) is in the interior of ��1 and the measure density satisfies inline image as w1 → 0 and w2 → 1 for 0 < si and −1 < ti for i=1,2. The transformation inline image where bj|i=(ti+1)/(ti+2) gives the following limiting survivor function of variable Zj|i:


Thus the limiting distribution of Zj|i is Weibull.

We now consider the logistic examples. When V(d) is of exchangeable logistic form, ηij=2α. Using normalization inline image, where bj|i=1−α for all ji, gives the limiting survivor function of the variable Z|i:


hence the Z|i are asymptotically conditionally independent Weibull variables. Although not in the multivariate extreme class of distributions, the inverted multivariate Crowder distribution (Crowder, 1989) has the same ηij as the inverted multivariate extreme value distribution with exchangeable logistic dependence structure and the same values of a|i and b|i with inline image as in equation (8.6).

When V(d) is of asymmetric logistic form, inline image. Let inline image; then inline image gives non-degenerate Gj|i(zj|i) where α(ij) = max(αK : K  ∈  S(i)S(j)). Let K(ij) be the set of {K:K ∈ S(i)S(j) & αK=α(ij)}. Under this normalization, the joint survivor function for the Z|i is


where AijK ∈ K(ij)θi,K(θj,K/θi,K)1/α(ij). Thus the variables in set  Mi are asymptotically conditionally independent whereas the variables in Mi are not. Variables in set Mi are asymptotically conditionally independent of those in set  Mi.

8.6.Multivariate normal distribution

Let V be a d-dimensional random variable, distributed as a standard multivariate normal random variable, with correlation matrix Σ. Let Y represent V after transformation to Gumbel marginal distributions, via marginal transformations:


where Φ is the standard normal distribution function. The pairwise coefficient of tail dependence for this distribution is ηij=(1+ρij)/2. We use Mill's ratio to approximate transformation (8.7) for large positive (or negative) components v and y of v and y to give




The normalization that is used to give a non-degenerate limit for Z|i is


To determine the limiting distribution of Z|i we use the property that the event Z|ileqslant R: less-than-or-eq, slantz|i|Yi=yi for large yi can be approximated by the event Vi leqslant R: less-than-or-eq, slant vi|Vi = vi for large vi where vi =Φ−1[ exp {− exp (−yi)}] and vi has elements vj|i which using expressions (8.8) and (8.9) are found to satisfy


The conditional distribution of Vi|Vi=vi is (d−1)-dimensional multivariate normal with mean vector ρivi and covariance matrix inline image, where ρi is the ith column of Σ with ith element omitted and Σi is Σ with ith row and ith column omitted. Hence it follows that the Z|i are jointly (d−1)-dimensional multivariate normal with mean 0 and covariance matrix inline image, where S is the diagonal matrix with diagonal √2|ρi|sgn(ρi).

8.7.Multivariate Morgenstern distribution

The bivariate Morgenstern distribution is stated in Joe (1997), page 149. A multivariate extension of this distribution is given by


for −1leqslant R: less-than-or-eq, slantαleqslant R: less-than-or-eq, slant1. Independence is given by α=0. Negative and positive dependence are respectively given by α<0 and α>0. Perfect positive or negative dependence is not attainable under this model. For this distribution, inline image. Taking Z|i=Yi gives


so Z|i is distributed as a (d−1)-dimensional Morgenstern random variable with the sign of parameter α reversed. For positively and negatively dependent Y, the Z|i are respectively negatively and positively dependent. For dgeqslant R: gt-or-equal, slanted3 the marginal distributions of the Z|i are Gumbel and all margins of dimension less than d−1 are mutually independent. In contrast, for d=2, Gj|i is a mixture of Gumbel distributions.


We are grateful to the Engineering and Physical Sciences Research Council for funding, and Stuart Coles, Anthony Ledford, Gareth Roberts and the referees for very helpful contri- butions. The air pollution data set originates from the National Air Quality Archive http://www.airquality.co.uk. This and the measurement programmes are funded by the Department for the Environment, Food and Rural Affairs, the Scottish Executive, the Welsh Assembly Government and the Department of the Environment for Northern Ireland.

Appendix A: Proof of theorem 1

Non-degeneracy of each marginal distribution of the limiting conditional distribution (3.1) requires that


where Gj|i is the jth marginal distribution of G|i. Putting zj|i=0 in equation (A.1) gives the required condition for aj|i(yi). The limit relationship (A.1) holds for all zj|i, because Y has an absolutely continuous density, so the limit relationship continues to hold when differentiated with respect to zj|i. Dividing the resulting limit relationship by 1−Fj|i gives




Putting zj|i=0 in equation (A.2) we see that up to proportionality


which gives the required result up to type.


Discussion on the paper by Heffernan and Tawn

Richard L. Smith (University of North Carolina, Chapel Hill)

The original formulation of multivariate extreme value theory was


where (Xi1,…,Xid),i=1,2,…, are independent identically distributed d-dimensional random vectors, an1,…,and and bn1,…,bnd are normalizing constants and G is a non-degenerate d-dimensional distribution function.

I do not know who first proposed this definition, but it emerged in several papers in the 1950s, during what might be called the golden age of asymptotic distributions in probability. As such, it seems to have been motivated more by considerations of mathematical elegance than by messy practical problems such as controlling air pollution. In retrospect, it seems surprising that this original formulation survived so long.

In due course, it was recognized that equation (1) is equivalent to multivariate regular variation (Resnick, 1987). One definition is that, after transforming the marginal distributions to be asymptotically of Fréchet form, for measurable sets A we have


where b is a regularly varying function of some index α>0 (there is no loss of generality by assuming that α=1) and ν is a measure on the cone




for any scalar t>0 (Fig. 8).

Figure 8.

 Representation of a bivariate distribution with regularly varying tails

Despite the apparent mathematical abstraction of this approach, it proved of considerable practical value for statistics; for example, specific cases of ν led to numerous parametric families for multivariate extreme value distributions, and the regular variation interpretation also suggested a natural way to develop threshold methods for multivariate extremes (Coles and Tawn, 1991, 1994; Joe et al., 1992).

Nevertheless, by the mid-1990s it was clear that there were severe deficiencies in this approach. In many cases, the limiting measure ν is concentrated on the axes—the asymptotically independent case—and direct application of the asymptotic theory simply leads to a representation which is neither interesting nor of practical value.

Ledford and Tawn (1996, 1997) were the first to propose a radically new direction. For bivariate (X1,X2), and assuming that α=1 in the above representation, they postulated


and later


where ℒ are slowly varying functions and c1,c2 and η are new indices.

Resnick (2002) and Maulik and Resnick (2003) have shown that these results follow from a property which they called hidden regular variation, which extends equation (2) by allowing for different rates of convergence on different subcones of ℰ.

The present paper takes the theory in an entirely different direction. By considering conditional limit theorems for d−1 components, as the dth tends to its upper end point, the authors obtain a more general theory encompassing previous versions of multivariate extreme value theory, but also covering cases in which some of the components do not become extreme at all. This is both an innovative theoretical develop- ment and potentially a very useful piece of practical statistical methodology; the authors are to be congratulated for their innovation.

Nevertheless, you would not expect me to make a discussion contribution to a Royal Statistical Society meeting without raising some queries about what the authors are doing. I would like to raise four points.

First, just how general are these representations? I am not thinking so much about theoretical conditions for the existence of the various limits as whether they actually capture the kind of behaviour that we might want to model. Consider the system


where Z1 and Z2 are independent with the Gumbel distribution Pr(Zjleqslant R: less-than-or-eq, slantz)= exp {− exp (−z)}. Then Y1 and Y2 have the same marginal distribution. If we condition on Y2=y2→∞, there are two possibilities for Y1: either Y1=Y2 with probability inline image, or Y1 has a limiting distribution that is independent of Y2 and not extreme. So we have a conditional distribution that is a mixture of two distributions on completely different scales. The example of Section 6.1 (predicting Y1 from Y2) actually illustrates the same phenomenon, without the degeneracy of this example, but the best explanation that the authors can give is that the limiting conditional distribution, after renormalization, has an atom at −∞. I do not find this very satisfactory; are there not instances when we want to characterize the non-extreme component of the mixture, to a greater level of detail than this?

Second, the normalizing constants a|i(yi) and b|i(yi) are given simple parametric representations that seem to be more ‘proof by example’ than a general result. Could we not model these location and scale parameters as smooth nonparametric functions of yi, avoiding such ad hoc assumptions?

Third, there is the method of estimation itself. The authors use maximum likelihood for estimating the generalized Pareto margins, Gaussian estimation for the conditional means and standard deviations and pseudolikelihood estimation for combining the various conditional distributions into a multivariate family, a veritable witches’ soup of estimation methods, all nicely stirred up with the bootstrap as seemingly the only means of keeping control of all the estimation errors. Although I applaud the authors’ eclecticism, would it not be better to have a more coherent estimation strategy?

Finally, I could not let the paper pass without some comment on the example. To be frank, I am puzzled about the point of this analysis; what is the purpose of computing the exact functionals that they do? One difficulty is that we really do not know about health effects in the level of detail that we would like. I could contrast this situation with some of Jonathan Tawn's earlier work on variables like sea-level surges and wind speeds. For any given combination of these variables, there are mechanical models that would determine whether an oil platform is toppled. There is not the same level of expertise that would determine, for example, whether a specific combination of O3 and NO2 might kill a sick patient, where either pollutant on its own would not. However, there is another possible application of the kinds of calculation that the authors make. We may not have precise mechanical models for health effects, but we do for the physics and chemistry of the pollutants themselves. Statistical calculations of the kind performed by these authors, together with physical and chemical models, could be very valuable in determining whether specific combinations of precursor variables are likely to result in a violation of air pollution standards. Nevertheless, one would like to see the calculation carried through to its ultimate conclusion, to determine the ultimate merits of the approach.

As with any good paper read to the Society, this one contains plenty of thought-provoking material, both on the theoretical foundations of the method and on numerous aspects of its practical application. There is much work to be done, but plenty has already been done by Janet Heffernan and Jonathan Tawn. It gives me great pleasure to propose a vote of thanks.

Stuart Coles (Università di Padova)

This paper is a remarkable achievement, providing, as it does, a framework for the modelling of extremes in problems that are genuinely multidimensional. One might say that in more than one sense, quite literally, the paper takes multivariate extremes to places they have never been before.

Interestingly, the timing of the reading of the paper coincides with a series of ‘magic’ events across the UK—Derren Brown's ‘Russian roulette’; David Blaine's ‘Above the below’—and on first reading it is easy to think that there is some comparable sleight-of-hand going on here. Multivariate extreme value theory hinges on multivariate regular variation, which leads to limit representations that can be interpreted as models only for the joint upper tail. But, ‘abracadabra’, here we have models that are apparently valid in regions where just one or more variables is large. This is magic indeed, and very powerful magic also, as it enables a much more detailed scrutiny of the extremes of multivariate data than has previously been possible. In particular, for data analyses of any reasonable dimension, it is most likely that there will be no vector observations that are large in all components, rendering virtually useless all standard approaches to the multivariate analysis of extremes.

But, of course, all magic is illusion, and behind the apparent supernatural lies true science. The science here is the substitution of the multivariate regular variation framework with limit assumption (3.1). And it is this assumption that opens up the box of tricks. There is some risk here though. The strongest defence for the use of extreme value models is the robustness of their asymptotic formulation—here we are shown that a variety of models do satisfy limit (3.1), but this is far from the breadth of characterization that regular variation provides. It is likely to take a stronger theoretical case, and further convincing empirical application, before the method becomes widely adopted by a scientific community that is rightly wary of the applicability of extreme value models as a panacea for all problems extreme. That said, for those of us who have undertaken many analyses of multivariate extreme data, the model encapsulates empirical experience that, although it is important to distinguish between asymptotic independence and asymptotic dependence (controlled here by the parametric part of the dependence model), other aspects of dependence (absorbed here by the nonparametric components) require less precise specification.

Aside from questions about characterization, it would be easy to be picky about other aspects of the methodology proposed. Model components are not necessarily mutually consistent; likelihoods are at best approximate; and the whole modelling procedure is subject to several layers of approximation. In defence, the authors claim that their approach offers a simpler interpretation of multivariate extreme value modelling, reducing it to something that can be thought of as a form of regression model. Surrounded by such a complicated inference procedure though, it is difficult to imagine a wider scientific audience taking such a sympathetic view. The results are impressive though: the simulation study is convincing, and the air pollution example revealing, each suggesting that the new procedure is robust to apparent limitations in precise model specification and inference.

There remain, however, some important theoretical issues (which may indeed have subsequent knock-on effects when considering the suitability of the procedure for practical application). The conditional limit G is left unspecified—different parent families giving different limits—but are all limit distributions possible? And do some subsets of normalization choices impose restrictions on the class of possible limits G that can arise? Furthermore, as the authors explain, there is a discontinuity across models—both in normalization and limit—as different families approach independence. Are there practical implications of this observation for modelling data that are close to independent? One can also think of a discontinuity in dependence class in moving from asymptotic dependence to asymptotic independence. The original models of Ledford and Tawn (1996, 1997) were explicit about this. Similarly, Coles and Pauli (2002) made explicit the separation across asymptotic dependence type by giving prior weight to class type in a formal Bayesian analysis. In the present paper the classification of dependence type is determined by the normalization coefficients a and b. Is the semiparametric method that estimates these parameters appropriate when the entire family of asymptotically dependent distributions sits in such a small part of the (a, b) space?

In conclusion, the paper leaves open several questions but lays down a convincing framework for making multivariate extreme value models both workable and sensible for problems of moderate to high dimensionality. It therefore gives me great pleasure to second the vote of thanks to the authors.

The vote of thanks was passed by acclamation.

Christopher Ferro (University of Reading)

I shall mention two possible modifications to the authors’ estimation procedure and describe two possible point process models.

First, when estimating the normalizing functions a|i(·) and b|i(·), it may be beneficial to replace the Gaussian model for G|i with a kernel density estimator. This would avoid the nuisance parameters λ|i, account for dependence in Z|i and yield a smoothed estimate of G|i. The option of modifying the mass on each Z|i might even prove useful for ensuring self-consistency. Second, estimating cj|i and dj|i only if inline image and inline image could bias the choice of parametric form for the normalizing functions. Perhaps some informal comparison of the maximum likelihood under each of the two forms could be made. Alternatively, Bayesian estimation with prior mass on each form could account naturally for this structural uncertainty.

Point process models can provide elegant characterizations of the extremal behaviour of stochastic processes. For example, de Haan (1985) gave conditions for which the point process of suitably normalized points (Ykan)/bn,1leqslant R: less-than-or-eq, slantkleqslant R: less-than-or-eq, slantn, has a non-homogeneous Poisson limit. With regard to the conditional model, it would be interesting to characterize possible limits for the process of points


In the presence of serial dependence, extremes can group together in clusters. For example, univariate exceedances of a high threshold occur according to a compound Poisson process in the limit under appropriate conditions (Hsing et al., 1988); multivariate extensions are described by Nandagopalan (1994). Marked point processes of the form


which record marks φn(Ykun) at points k/n on (0,1] for which Yk exceeds threshold un are considered by Leadbetter (1995) and Rootzén et al. (1998). With regard to the conditional model, we might expect that any limit of a point process


would be compound Poisson with a mark distribution that depends on G|i. Do the authors believe that such point process models will prove useful?

Adam Butler (Lancaster University)

My contribution will focus on the theoretical extension of the conditional approach of the authors to the situation in which X is a first-order Markov chain. This extension gives further insight into the role of the normalizing constants in quantifying extremal dependence and also suggests ways in which the statistical model proposed in the paper may be made more parsimonious if we are willing to assume that X is Markovian.

Let X be a first-order Markov chain with standard Gumbel marginal distributions. We assume that for each k ∈ {2,…,d} there are constants ak|(k−1) ∈ [0,1] and bk|(k−1) ∈ [0,1) which normalize (Xk|Xk−1) in the sense that


for all zk|(k−1) ∈ Uk|(k−1), where gk|(k−1) is the density of a non-degenerate random variable with support Uk|(k−1). We assume further that extremal dependence is positive, so that ak|(k−1)+bk|(k−1)>0; the case ak|(k−1)=bk|(k−1)=0 is actually trivial.

What is the extremal dependence structure within such a Markov chain? Suppose that for each i ∈ {2,…,d} we can find constants ai|1 ∈ [0,1] and bi|1 ∈ [0,1) and a function m(i−1,i)|1:U(i−1)|1×Ui|1Ui|(i−1) such that


and such that


and inline image are continuously differentiable and bijective functions with non-zero Jacobian. Then by positive extremal dependence and the Markov property it follows that


so that a|1 and b|1 normalize (X−1|X1) and Z|1 is a Markov chain.

The quantity ei|1 may be seen as the ‘error’ that is made in choosing normalizing constants for (Xi|X1), and Mi|1 transforms (Z2|1,…,Zi|1) into independent random variables (Z2|1,…,Zi|(i−1)) via the Jacobi transformation formula.

We may use this theory to find normalizing constants for a Markov chain. For example, assume that ak|(k−1)=1 and bk|(k−1)=0 for all k ∈ {2,…,d} (so that Xk is asymptotically dependent on Xk−1); then for each i ∈ {3,…,d} we have


which may be set to 0 (and therefore is trivially of the correct order) by taking ai|1=1,bi|1=0 and m(i−1,i)|1=zi|1z(i−1)|1.

Normalizing constants and conditional limit distributions are given in Table 6 for a range of scenarios. If the ak|(k−1)-values are all 0 then bi|1 has a simple product structure and Z|1 is a log-random-walk. Conversely if the ak|(k−1)-values are all positive, and the bk|(k−1)-values are constant or increasing in k, then ai|1 has a simple product structure and Z|1 is a random walk. If the bk|(k−1)-values are decreasing in k then e(i−1),i|1 cannot be made asymptotically sufficiently small for any choices of ai|1 and bi|1, so the theory that is presented here cannot be applied; this does not necessarily imply that normalizing constants do not exist, however, since the conditions for existence outlined here are sufficient but not necessary.

Table 6.   Normalizing constants and conditional limit distributions for a Markov chain Y with known pairwise structure (Yk−1Yk)
Pairwise structure of (Yk|Yk−1), k ∈ {2,…, i}Pairwise structure of (Yi|Y1), i ∈ {3,…, d}Dependence structure of Z|1
ak|(k−1)bk|(k−1)For example (Yk−1Yk)bi|1bi|1
0PositiveInverted multivariate extreme0bk|(k−1)Log-random-walk
Positive0Multivariate extremeak|(k−1)0Random walk
PositiveConstant, bNormalak|(k−1)bRandom walk
PositiveIncreasing ak|(k−1)bi|(i−1)Independent
PositiveDecreasing e(i−1,i)|1(y1) cannot be made suitably small

Clive Anderson (University of Sheffield)

I add my congratulations to the authors for the important and practical fresh thinking that this paper represents.

I have two specific comments.

A major motivation for the paper is extrapolation. Any extrapolation relies on something beyond the data, usually an assumption about properties of an underlying model. In standard statistical extreme value theory the assumption has been that there is some underlying regularity in the tails of the relevant distributions connecting properties at the edge of the data with properties beyond the data. A precise description of this regularity is given by the theory of regular variation and its extensions. However, though the mathematical theory gives a precise description, it cannot guarantee that the regularity will hold in applications; there is no law of nature that says that distributions of physical or chemical variables must always have upper tails that vary in these precise ways. The use of traditional statistical extremes methods therefore requires an act of faith on the part of the practitioner: that in the application to hand there is indeed enough smoothness in the operation of natural laws to justify the regularity assumption. With existing methods the price that we must pay for extrapolation is thus made clear through regular variation theory. To allow users to recognize the implicit assumptions that the new methodology demands, it seems important similarly to characterize the assumptions about the distributions of variables that are needed here. What class of multivariate distributions are the normalizing functions of family (3.8) appropriate for? Is there a simple regularity or invariance property characterizing existence and assumed properties of the G|i?

My second point also relates to connections between the new methodology and knowledge in the field of application. Often in applications of statistical extremes much is known about processes and relationships between variables, even at extreme levels. The chemistry of air pollution is an example. This knowledge may be uncertain but is nevertheless much more than ignorance. It seems highly desirable that statistical extreme value theory should develop a methodology that can incorporate such scientific knowledge. The authors’ new approach is exciting in offering a possible basis for this development, either through the imposition on the new structure of patterns of dependence derived from the science, or through the incorporation of scientific knowledge into the new approach by Bayesian methods. Success in such development would be very valuable, scientifically as well as statistically.

S. K. Sahu (University of Southampton) and K. V. Mardia (University of Leeds)

This impressive paper presents methodology for studying dependence structure in multivariate extreme values. The authors must be commended because of their efforts in modelling the levels of different pollutants in a multivariate set-up. By doing so they can shed light on the complex interrelationships between the most common pollutants. Our comments mostly relate to the air pollution example which illustrates the methodology.

In modelling air pollution, we believe, it is also important to account for spatiotemporal variation, though this is not the objective in this paper. The paper is not clear whether the multivariate response was observed in one location or in several locations in the city of Leeds. There is usually a very strong regional effect in air pollution. Therefore, any analysis of data from several sites should take account of the spatial association.

The paper, however, does take care of temporal association by including a dichotomous (summer and winter) variable. In practical situations this may not be sufficient since it is well known that there is a long-term decreasing trend in air pollution. For example, the Environmental Protection Agency claim that between 1992 and 2001 the average PM10 concentrations decreased 14%; see Environmental Protection Agency (2002). Moreover, to contrast the aggregate effects in summer and winter months the authors removed the data around November 5th (fireworks night in the UK). This is not desirable in a predictive situation since the predictions for a future year around November 5th are likely to be underestimated. In analysing similar spatiotemporal data Sahu and Mardia (2005) showed that better predictions can be obtained when the high values of air pollution (PM2.5) (around July 4th, Independence Day in the USA) are included in the analysis.

Often, the relationship between the air pollutants is strongly affected by other factors, e.g. wind speed and cloud cover. Different levels of these factors induce different chemical reactions and relationships between the pollutants. That is why a study using the marginal distribution of the chosen air pollutants may not reveal the full association structure.

Of course the semiparametric models that are used to analyse the multivariate data cannot predict the air pollution levels in space and time. The models must be made spatiotemporal to make predictions in space and time. The literature in this area is growing; Sahu and Mardia (2005) modelled the PM2.5 data for the city of New York; Smith et al. (2003) presented models for PM2.5 data for three particular southern states in the USA. However, none of the models adopted in those references are multivariate. It will be interesting to extend the multivariate models of the present paper to allow for space–time predictions.

Debbie J. Dupuis (University of Western Ontario, London)

The authors are to be congratulated on having produced very original work. The methodological develop- ments are driven by scientific questions and possible solutions are reached through a creative and unconventional approach.

Of the many issues in the paper, I would like to focus on the practical implementation of the new methodology. Firstly, although imposing self-consistency of the θ-parameters may substantially reduce the performance, can inconsistencies resulting from an unconstrained parameterization be used as model diagnostics? Or, are the other uncertainties in estimation overwhelming?

Secondly, the authors assess the approximate independence of Z|i and Yi visually through Fig. 4. Is this sufficient? What are the consequences of proceeding with some degree of dependence?

Also, the authors’ outliers, data points with unusually large marginal or functional values, did not have an effect. Could points outlying in the dependence structure, i.e. not in agreement with the dependence that is suggested by the bulk of the points, have an effect?

Finally, in their air quality monitoring application, the authors treat the joint distribution of the pollutants as stationary in each period. In the Great Lakes region of Canada and the USA, this is not the case. Consider the daily maximum ozone levels that were recorded at a surface monitoring station near Hamilton, Canada. High level ozone levels are plotted as a function of day of the year in Fig. 9.

Figure 9.

 Daily maximum ozone concentrations in exceedance of 73 ppb as a function of day of the year: the data are for April 1st–September 30th for years 1985–1994

Fitting the generalized Pareto distribution with a quadratic trend in the log-scale parameter, β(t)= exp (b0+b1t+b2t2), where t=(i−91)/183, i=91,…,183, is the scaled day-of-the-year covariate, leads to an increase of 10 in the maximized log-likelihood over the value that is obtained under the time homo- geneous model. Estimates under both models are listed in Table 7. Given the larger negative shape estimate in this example, it is only for larger quantiles that we see the reduced standard errors resulting from the more accurate scale estimate under the time-dependent model.

Table 7.   Estimated shape and quantiles inline image based on the generalized Pareto distribution fitted to the marginal distribution of ozone data in Fig. 9
Modelinline imageinline imagefor the following values of p:
  1. †Threshold at 73 ppb, inline image, n73=171. Estimated standard errors are given in parentheses. Time-dependent values are for the estimated peak of the season.

Stationary−0.21 (0.07)110 (2)123 (4)135 (7)142 (11)
Time dependent−0.32 (0.07)116 (4)127 (5)136 (6)140 (6) 

Fig. 10 shows that crossing rates are also time dependent so it is not straightforward to estimate return levels. Results are presented for a threshold of 73 ppb; however, they are similar over a range of appropriate thresholds. We could also have proceeded with a time-dependent threshold, but that presents its own set of challenges. The non-stationarity plaguing this marginal analysis remains in any joint analysis. Is it possible to deal with such non-stationarity under the new approach? Can the authors comment on any resulting complications and the consequences of ignoring the time dependence?

Figure 10.

 Crossing rates of a threshold of 73 ppb of the data in Fig. 9

Holger Drees (University of Hamburg)

In recent years nonparametric estimators of the probability of multivariate extreme events (‘failure regions’) have been constructed. The classical extreme value approach can deal only with asymptotically dependent componentwise maxima or with failure regions such that none of the components of their complements is extreme (see de Haan and Sinha (1999)). An alternative methodology, first proposed by Ledford and Tawn (1996, 1997, 1998) (see also Draisma et al. (2004)), can be used directly only for failure regions where all components are jointly extreme. By a combination of these multivariate and standard univariate extreme value procedures, the failure probability can also be estimated in some other situations. For example, in contrast with what is claimed in Section 6.2.2, for given p and q we may first estimate r such that Pr(Y1>r)=p/q and then use Ledford and Tawn's approach to determine v such that Pr(Y1>r,Y2geqslant R: gt-or-equal, slantedv)=p(1−q)/q and hence Pr(Y1>r,Y2<v)=p.

However, none of the methodologies that are so far available allows us to estimate the probability of more complex failure regions in the case of asymptotic independence. Therefore new approaches, like that proposed by Heffernan and Tawn, are welcome. In a simulation study and a practical application Heffernan and Tawn demonstrate that apparently their procedure yields reasonable estimates of the probability of quite general extreme events that cannot be analysed by methods suggested previously.

However, a deeper understanding of the theoretical properties of their method of estimation is needed before its usefulness can be assessed. A particularly serious drawback of their approach is the dependence of the underlying model described in Section 4.1 on the completely arbitrary choice of the distribution that is used in the standardization of the margins. More precisely, at least some of the distributions that are discussed in Section 8 do not fit into the general modelling framework any more if the Gumbel distribution is replaced with some other extreme value distribution.

For example, consider the inverted bivariate extreme value distribution with symmetric logistic dependence structure and Gumbel marginal distributions. If we standardize the margins to a Fréchet distribution F(x)= exp (−xβ), x>0, for some β>0, i.e., if we consider the standardized random variables inline image then




It is not difficult to prove that normalizing functions a and b such that the right-hand side of equation (3) converges to a non-degenerate limiting distribution function do not exist. So if we decide to standardize the margins to a Fréchet distribution then the method proposed will lead to other (and most likely quite poor) estimates than in the setting with Gumbel margins. In view of the arbitrariness of the choice of the marginal distribution that is used in the standardization step, its influence on the outcome is likely to limit the applicability of the new approach.

The following contributions were received in writing after the meeting.

Anthony C. Atkinson (London School of Economics and Political Science)

I have two small points to make about this interesting paper.

The first is that, in Section 4.4, the authors use diagnostics that are based on models fitted above a series of thresholds. This division of the data into two parts is the complement of that in the forward search (Atkinson and Riani, 2000; Atkinson et al., 2004), where the focus is on the central part of the data. Residuals and test statistics are monitored as the fitted subset size increases, changes indicating the inclusion of outliers. How might the authors detect the effect of outliers when inference is solely based on observations above a threshold?

A related use of the forward search is to find unsuspected clusters of observations. For example, there might be a systematic difference between work days and those at the week-end. What effect would such a mixture of distributions have on the distribution of extremes?

Of course, if the data are to be treated as a multivariate sample without structure, only overall extreme observations matter: if such readings are never generated at week-ends, the presence of two groups would not affect the model. But, if, for example, a time series model is fitted, the extreme values at week-ends might be important to model building and parameter estimation. A method is then required of revealing the clusters and extremes.

J. Beirlant and Y. Goegebeur (Katholieke Universiteit Leuven)

We congratulate Janet Heffernan and Jonathan Tawn on a most interesting contribution to multivariate extreme value statistics. In contrast with existing multivariate extreme value methods, the method that is presented here is also designed to estimate probabilities of tail regions where not all components are extreme. The paper also provides a practical answer to truly multivariate cases in contrast with most earlier proposals where only the bivariate case is handled in detail.

The solution proposed is based on assumption (3.1) on the limiting conditional distribution of the vector excluding one component, given an extreme value of this component, with specific parametric families of normalizing functions. Although this basic assumption can be verified on various classical examples provided by the authors, we wonder to what extent this limiting behaviour holds for large classes of multivariate distributions such as the elliptical distributions.

In a bivariate case we could imagine another conditional approach, where the conditional value of extreme values of one variable is modelled given the other taking non-extreme values. We can then make use of the peaks-over-threshold approach extending the generalized Pareto distribution to a regression model by taking one or more of its parameters as a function of the covariates followed by a maximum likelihood fit of the extended model to the exceedances over a sufficiently high (covariate-dependent) threshold; see for instance Davison and Smith (1990). Following this approach, estimates of tail probabilities are obtained in a straightforward way, at non-extreme levels of the covariate. In higher dimensions this would of course extend to other types of extreme sets C than those which were considered in the paper.

Inference about the marginal and dependence structure is undertaken in a stepwise way. The authors propose to estimate the marginal components first, followed by a subsequent estimation of the dependence structure, thereby taking the marginal parameters as fixed. Inspired by our own research on estimating the extreme value index when information on several data groups is available we wonder whether imposing extra information such as the dependence structure followed by a joint estimation of marginal and dependence components provides improved extreme value index estimates compared with a separate analysis of each marginal component.

M.-O. Boldi and A. C. Davison (Swiss Federal Institute of Technology, Lausanne)

There is an intriguing duality between modern Bayesian methods and the construction of multivariate extreme value distributions. In the first, conditional densities are known, and Markov chain simulation is used to estimate marginal densities for quantities of interest. In the second, marginal densities are known, and the goal is to build plausible joint distributions with the given margins. Despite the very substantial progess that has been made by researchers including the current authors, this remains a knotty problem, and the lateral thinking that this paper represents is a welcome addition to the literature, which will repay careful study.

A major difficulty for practical use of multivariate extremal methods is to build joint distributions that are sufficiently flexible to represent a wide range of forms of tail behaviour, including asymptotic independence and near-independence as well as dependence. In other contexts mixture distributions are widely used for semiparametric modelling of densities, and with some twists they can also be used for modelling multivariate extremes. To outline how, suppose that X1,…,Xd have a joint distribution with standard Fréchet distributions, and that H(w) is the distribution of the pseudoangular variable W=(X1,…,Xd)/(X1+…+Xd). This variable plays a central role in joint extremal modelling, but construction of flexible distributions which satisfy the constraint E(W)=d−1(1,…,1) is a headache. In recent work we have implemented a reversible jump Markov chain Monte Carlo algorithm (Green, 1995) which fits mixtures of Dirichlet distributions satisfying this constraint. The idea is to parameterize the Dirichlet components in terms of their means and shape parameters, and to constrain the reversible jump algorithm to fix the mixture mean at its desired value while allowing the number of components to vary. Some tweaking yields a fast algorithm for Bayesian estimation of H(w).

As an example with d=2, we have fitted distribution B of the paper. Fig. 11(a) shows extreme values of w and a density estimate that was obtained with our approach; the mixture typically has three or four components. For an example with d=3, we took the air pollution data of the paper and estimated the joint tail for (NO, O3, NO2). The density typically uses 3–5 mixture components and reveals extremal independence of (O3, NO) and dependence of (NO2, NO). As similar performance is achieved in higher dimensions this seems a useful approach. It does not, however, solve the problem of estimating quantiles of simultaneous extreme events, because asymptotic independence is reached only at the border of the parameter domain. For example, in trying to reproduce the quantile estimation of Table 2, we obtained results that were comparable with those of the existing methods.

Figure 11.

 (a) Fitted density of pseudoangles from the tail of distribution B (——, posterior median density; - - - - - - -, 95% and 5% quantiles) and (b) bivariate density estimate for pseudoangles of (NO, O3, NO2)

C. Chatfield (University of Bath)

This paper looks at multivariate extreme value theory. In Section 1, the authors begin by making the key assumption that they have a sample of independent (my italics) observations on a vector random variable, and all the theory depends on this assumption. Yet, the main example in the paper uses environmental time series where the independence assumption will generally not apply. It is therefore disturbing that the phrase ‘time series’ does not appear anywhere in the paper and that there is little discussion of sequential aspects of modelling environmental data.

It is well known that most time series show strong autocorrelations, perhaps because of trend, cyclic variation or short-term dependence. It is also well known that failure to account for correlations within a series can seriously compromise attempts to model relationships between series. The most famous example showing what can go wrong is that described by Box and Newbold (1971). I suspect that the authors themselves are aware of the danger, and, later in the paper (Section 7), they admit that their hourly measurements ‘exhibit marked short-term dependence’. However, they base their analysis on daily maxima, which ‘substantially reduce temporal dependence’ and the paper does not attempt to take ‘temporal dependence into account in the analysis’. I would like to be told how large the ‘reduced’ dependence is, because I know that environmental daily extrema can show non-trivial autocorrelations.

Despite the potential problems that are involved in using autocorrelated data, I suspect that environmental time series will be the main area of application of the theory in this paper. It is therefore important that researchers should check the size of any autocorrelations before attempting to model relationships between environmental extreme values. Ideally, the theory should be extended to cope with autocorrelated data, though I realize that this may not be a realistic possibility. At a more practical level, my main concern is to ensure that other researchers are not misled into making the independence assumption for environmental time series, even for daily extrema, without very clear justification, and without realizing the dangers that it entails.

Anthony Ledford (Man Investments, London)

The authors are to be commended for developing an innovative approach that provides a much needed new direction in the field of multivariate extreme value statistics and its application. This comment focuses on some of the differences between the framework that they adopt and the usual multivariate regular variation framework of existing methods, and also points to some possible connections. Following Section 3.2, we focus on the bivariate case only since the authors’ approach reduces to examining pairs of variables.

A frequent starting-point for existing approaches is to take X with unit Fréchet marginal distributions and to assume that multivariate regular variation holds for either the joint tail function 1−Pr(Xleqslant R: less-than-or-eq, slantx) or the joint survivor function Pr(X>x). Letting Y= log (X), so that Y has Gumbel margins, this corresponds to seeking non-degenerate limits for either




as u→∞. There are three rather obvious points:

  • (a) these expressions involve ratios of probabilities;
  • (b) the same functionals appear in the numerator and denominator in each case;
  • (c) each marginal variable approaches ∞ at a common rate.

The authors’ approach seeks functions a(·) and b(·)>0 such that


has a non-degenerate limit as y2→∞. Expressed in this way the same functional appears in the numerator and denominator, but in contrast with the other points above the ratio involves density components rather than probabilities and the growth rates may be different in the marginal variables or set to ∞.

An alternative representation of equation (5) is given by


Clearly, alternative representations to this are possible also. We wonder whether some version of l'Hopital's rule can be applied to enable the normalizing functions and limit of equation (5) as y2→∞ to be related to corresponding quantities for the ratio of probabilities


or some other quantity that is similar in structure to expressions (4a) and (4b) as doing so may make the relationship between the existing and authors’ approaches more apparent.

L. Peng (Georgia Institute of Technology, Atlanta) and Y. Qi (University of Minnesota Duluth)

We congratulate the authors for their thoughtful models. It is well known that multivariate extreme value theory fails in estimating rare events when the marginals are asymptotically independent, and additional model assumptions are needed in this case. Our discussion is restricted to the following two aspects of extrapolation.

One of the key issues in this paper is to estimate P(Yi ∈ Ci|Yi>uYi) via a Monte Carlo method, where Ci is a rare event, i.e. Ci contains no sample points of Yi in general. Since Yi has a known Gumbel distribution, we could sample inline images from the Gumbel distribution such that some inline images are very large. Hence the question becomes whether, conditionally on large inline image we can simulate inline images from the estimated conditional distribution of inline image such that some inline images fall into Ci. A simple way is to assume that the probability density function of the conditional variable Yi|Yi=yi>uYi has a parametric model (depending on Yi). The authors propose a structure for the conditional distribution of Yi|Yi=yi>uYi (see model (4.1)) such that the nonparametrically estimated conditional distribution has the following property: conditionally on large inline image, we can sample inline image from the estimated conditional distribution such that inline image.

Possibility of extrapolation

Suppose that Yi and Yi are independent. Thus the conditional distribution of Yi|Yi=yi>uYi is independent of Yi. Therefore, conditionally on large Yi, few of the observations Yi fall into Ci since Ci is a rare event, and the sample that is drawn from the nonparametrically estimated conditional distribution cannot be out of the original sample range. In other words, a structure (independent of Yi) is needed for extrapolating the distribution function of Yi. However, the model in this paper gives no such structure independent of Yi; see model (4.1). Does the method in this paper work for the case of independence? There may be a similar question for the case of asymptotic independence.

Ability of extrapolation

Suppose that our observations inline image have Gumbel marginals and assume that model (4.1) holds exactly, i.e. assume that a|i and b|j are known. Then the estimated distribution of inline image is the empirical distribution based on observations


where u is a chosen threshold. Let D denote the maximal value of A. Therefore D is determined by sampling. Now suppose that inline image is sampled from the sampling algorithm that is given in Section 4.3. We do not need step 5 since we assume that the marginals have Gumbel distributions. It is easy to see that inline image since b|i is non-negative. From here we conclude that, for a large value of threshold, inline image is bounded by D in some cases even when inline image. For example, it is true for the last four cases in Table 1. Therefore, the ability to extrapolate may be quite limited in some cases.

Johan Segers (Tilburg University)

The paper is an important contribution to the extreme value literature because it seems to be the first genuine attempt to construct models for and to do inference on the restriction of a distribution of a random vector on regions where some components are extreme and others are not. Complementing Section 2 of the paper, I argue here that, indeed, existing methods fall short in this respect. However, in my opinion, the main drawback of the model proposed is that, in contrast with existing methods, it lacks a compelling argument justifying it as the only reasonable one possible. Since the model is to be extrapolated from, such a justification would be highly desirable.

Let Y=(Y1,…,Yd) be a random vector with distribution function F. For simplicity, we take the mar- gins of F to be unit Fréchet, i.e. Pr(Yi>y)=1− exp (−1/y) for y>0 and i=1,…,d. Assume that F is in the domain of attraction of a multivariate extreme value distribution G. Then, for (y1,…,yd) ∈ (0,∞]d,


By max-stability, also


Combining these two equations and the mean value theorem yields


as n→∞. This suggests the approximation


the relative error of which is small provided that all yi are sufficiently large. This approximation, together with the spectral representation of the exponent measure of G, forms the basic modelling approach in cases where the margins are asymptotically dependent.

If the margins are asymptotically independent, then this approximation is too coarse to apply to Pr(Y ∈ C) for sets C contained in inline image with ui large for all i ∈ I; here I is a subset of {1,…,d} of at least two elements, and πI denotes the projection of ℝd into ℝ|I| given by πI(x1,…,xd)=(xi)i ∈ I. Instead, we may use the decomposition


For every I, the joint survivor function of the subvector (Yi)i ∈ I requires a model in the spirit of Ledford and Tawn (1997) for the bivariate case or Maulik and Resnick (2003) for the higher dimensional case.

In both cases, we obtain a model for 1−F(y1,…,yd) for all (y1,…,yd) in a quadrant [u,∞)d, say, with u sufficiently large. Now, if the task is to estimate Pr(Y ∈ C), then the model is reliable only if C is in the σ-field generated by the sets inline image with yigeqslant R: gt-or-equal, slantedu for all i=1,…,d. For example, the model is inadequate if C is of the form inline image for some large s.

The authors replied later, in writing, as follows.

We thank the discussants for their interesting and thought-provoking contributions.

Theoretical justification

Coles, Smith and Segers call for stronger justification for the assumption of the limit form that is claimed for the conditional distributions motivating our model. We feel that it is clear from the discussion that ours is not the only possible form of conditional model. Indeed, Butler presents Markov chain examples, with the asymptotic independence property, for which our characterization (3.1) is insufficient. Similarly, there are examples of asymptotically dependent variables with Gumbel marginals, for which the multivariate extreme value distribution is inappropriate; see Schlather (2001). We have shown that our characterization provides a flexible parsimonious model structure which incorporates all existing multivariate extreme value models, in addition to a broad class of asymptotically independent forms. We find this fact sufficiently compelling for statistical modelling.

A multivariate regular variation (MRV) property provides the only mathematical justification for existing models. However, it is unhelpful to have a nice theory which is too restrictive to apply in practice. This paper tries to encourage extreme value modellers out of the restrictions of the MRV assumptions.

Links with existing methods

Coles, Anderson and Ledford speculate on the existence of regularity or invariance properties underlying our formulation. We consider the independence of the standardized residuals and the conditioning variable to provide one such invariance. An alternative interpretation is the expression of our assumption as a direct relaxation of the MRV assumptions. If the d-dimensional random variable X has unit Fréchet marginal distributions then MRV can be expressed as the property that, for x>1 and B any Borel set, as t→∞ then


where Ω is a random variable having the same distribution as that of X/|X| given that |X|>t in the limit as t→∞. Taking |·| to be the maximum norm and letting Y= log (X) so that Y has Gumbel marginal variables, gives that as t→∞


where Ω* is a random variable having the same distribution as that of YiYi given that Yi=max(Y1,…,Yd)>t in the limit as t→∞ and B* is  log (B). Here Ω* is a (d−1)-dimensional variable which is negative in each component and is obtained by a one-to-one mapping of Ω. If all the components of Y are asymptotically dependent then the limiting distribution of Ω* is non-degenerate; if all the components of Y are mutually asymptotically independent the marginal distributions of Ω* have all their mass at −∞. Our model assumption amounts to the existence of functions a|i and b|i such that as t→∞


with the marginal distributions of the corresponding limiting random variable Ω* being non-degenerate and placing no mass at ∞.

We disagree with Drees's claims that the hidden MRV model formulations that were proposed by Ledford and Tawn can be used to estimate failure probabilities of the type described in Section 6.2.2. Our reasons are given in the paper and are further supported by Segers's comment in the discussion.

We agree with Beirlant and Goegebeur that an investigation of whether our representation holds for the broad class of distributions within the elliptical family would be a helpful addition to the families of distributions that we consider in Section 8. As many elliptical distributions are asymptotically dependent (Hult and Lindskog, 2002; Eddy and Gale, 1981) they are automatically covered by our characterization.

Beirlant and Goegebeur also suggest potential links between our conditional approach and the fitting of the univariate generalized Pareto distribution with other variables as regressors. This is an interesting point. For asymptotically dependent variables the relationship is not straightforward, but for asymptotically independent variables there could be valuable associations. This point is discussed further in Dupuis and Tawn (2001). One possible way forward is proposed by Ferro, who points out connections with marked point processes. His outline structure suggests that the extension of the characterization to temporally dependent multivariate data should be possible.

Model formulation

Coles points out that, although the form of G is unspecified in our model, some structure must exist. We see our unconstrained nonparametric approach to the estimation of G as a strength, particularly as the precise form of G has a limited effect on extrapolations. For asymptotically dependent variables we have identified some distributional structure for G. On-going work has shown that this can be used to improve the inference of our conditional model. More generally we are still unclear how to identify such structure and how to exploit it for modelling or inference purposes. Ferro suggests that the inconsistency of conditionals could be overcome by weighting the z-values. We are investigating an alternative approach which resolves consistency by weighting the separate G-estimates to form a single model.

The assumptions behind our models for the functions aj|i(y) and bj|i(y) also stimulated discussion. Smith suggests smooth nonparametric model choices for these functions instead of our parametric forms. We do not see how this could be implemented when interest is in extrapolation. None-the-less, nonparametric estimates could provide a valuable additional diagnostic assessment of our modelling structure, within the range of the data. Butler's results show how greater parsimony can be achieved in our models for aj|i(y) and bj|i(y) under special classes of dependence. We feel that this work reveals the flexibility of the model structure and it provides the first illustration of graphical structures in the context of multivariate extreme values.

We disagree with Drees's assertion that our results depend on an arbitrary choice of marginals. The choice of Gumbel margins is required to give the location–scale family of normalizations. For other choices of marginal a location–scale family of normalizations is inappropriate and is not what we are claiming. For Drees's example, Yi, i=1,2, distributed as inverted logistic with Gumbel marginal variables,


Thus for Gumbel margins a location–scale normalization does work as we claim. For the suggested Fréchet(β) marginal variables inline image i.e. inline image for i=1,2 and β>0, with the same inverted logistic dependence structure, simple transformation of result (6) gives


Thus a non-linear normalization is required for the conditional behaviour of inline image based on inline image. The same issue arises when considering the MRV property, which requires Fréchet-type tails. The MRV property does not rule out Gumbel marginal variables from existing theory: it is simply that the representation is most naturally expressed for variables with Fréchet margins. In those cases, as in ours, initial marginal transformation resolves the issue in practice.


We sympathize with the objection of Coles and Smith to our mélange of estimation methods whose complicated nature may restrict the application of the methods proposed. We wanted to impose as few modelling assumptions as possible which led to the collection of separate semiparametric conditional models being fitted through ad hoc methods. Though we could have formulated a model for which standard inference techniques could be applied, we believe that such a model would have restricted our dependence modelling capability substantially.

Anderson points out that any form of extrapolation requires some form of smoothness which may not be guaranteed in a particular application. The assumption of smoothness here is given by characterization (3.1). In practice the variables that are studied need not extrapolate smoothly, so it is necessary to consider carefully which functions of the variables are most reliably extrapolated. We feel that this is more an issue for marginal than dependence modelling as the dependence models that we use essentially only rely on the joint behaviour of the ranks of observations, so any non-linear monotonic transformation of the marginal variables will result in the same dependence structure.

Peng and Qi are concerned about the use of the empirical estimate of distribution G|i since, for asymptotically independent variables, this approach results in zero estimates of Pr(Y ∈ Ci|Yi>u) for some sets Ci which are extreme in more than one variable. Although we agree that this is true, we note that the methods of Ledford and Tawn (1996, 1997) apply to events of that type. However, this raises the unresolved issue of how the representation of Ledford and Tawn (1997) and characterization (3.1) are related; they rely on different asymptotics but a characterization which embeds both results appears to be needed to address Peng and Qi's concern.

We agree with Dupuis about the importance of good quality diagnostics and are interested in her suggested additional diagnostic based on inline image. Dupuis and Atkinson also raise this issue of the influence of outliers and their detection. We feel that the strong connections between our model diagnostics and those which are used in regression should lead to improved identification of model inadequacies and of outliers relative to existing diagnostics for multivariate extremes. As with all extreme value methods, the ultimate diagnostic is to check for invariance of estimates to the choice of threshold.

Application: scientific knowledge

Various points in the discussion relate to the potential benefits from exploiting existing scientific knowledge in our analysis of the air pollution data. Anderson suggests the inclusion of knowledge about pollution relationships in our fitting process, possibly using prior information, whereas Smith suggests that physical and chemical models that are associated with our analysis could be useful for short-term prediction of events which violate air pollution standards. We intend to follow these suggestions in our on-going developments of extreme value models for air pollutants.

Both Chatfield, and Sahu and Mardia are concerned that ignoring temporal dependence and covariates means that we miss some structure of the dependence between variables. We agree with this assertion to some extent but view that the choice of model depends on the purpose of the analysis. One difference between our example and that presented by Sahu and Mardia is that in their example the model is conditionally specified in time. We are only interested in the joint distribution, irrespective of time, so ignoring time evolution should not induce bias. The short-term temporal dependence in this data set is limited but we agree that covariates such as sunlight, wind speed and cloud cover are relevant and may provide a more scientifically satisfactory description of dependence. This extension might also deal with the objection of Dupuis, who questions our assumption of stationarity within each season, as the covariates change smoothly over time.

Smith questions our focus on functionals of combinations of the variables as the links between air pollutants and human health are vague. Though we agree with this point, the study of functionals is useful in summarizing the multivariate structure of the data and fitted model, and the relationships between variables.

Extensions to incorporate mixtures of dependence forms

Several discussants consider the case in which the extremal dependence structure involves a mixture of two forms of dependence (such as the possible week-end–weekday structure in our data set that was pointed out by Atkinson). Smith is concerned that our approach only models the more strongly dependent component of the mixture and suggests that a more robust estimation of the parameters is required in this case. We agree that this is a weakness of the current approach but defend the proposed approach on the grounds of parsimony, and we point out that this feature is also true for existing methods. Indeed, our method gives improved performance over existing methods for the mixture example that is given in Section 6.1. We also demonstrate that our diagnostics clearly identify the need for a mixture model for these data; we believe that such identification would be less clear cut by using existing diagnostic methods.

We expect that it will be straightforward to extend our model to incorporate mixtures of dependence structure, but until a clearer inference scheme is developed the selection of the number of mixture components would be ad hoc. Boldi and Davison describe their use of a mixture of asymptotically dependent parametric models. This is an interesting development; however, we are not sure of its merits relative to a nonparametric MRV approach.

Coles and Ferro raise concerns that the class of asymptotically dependent variables and negatively associated variables are represented by single points in the parameter space of our conditional model. We note that, unless the extrapolation is long range, the differences between estimates that are obtained under an asymptotically dependent model and those from a strongly associated asymptotically independent model will be slight. However, we agree that this issue needs to be addressed if our knowledge of the importance of these classes is to be incorporated into the inference.

References in the discussion