## 1.Introduction and background

Multivariate extreme value theory and methods concern the characterization, estimation and extrapolation of the joint tails of multidimensional distributions. Accurate assessments of the probabilities of extreme events are sought in a diversity of applications from environmental impact assessment (Coles and Tawn, 1994; Joe, 1994; de Haan and de Ronde, 1998; Schlather and Tawn, 2003) to financial risk management (Embrechts *et al.*, 1997; Longin, 2000; Stărică, 2000; Poon *et al.*, 2004) and Internet traffic modelling (Maulik *et al.*, 2002; Resnick and Rootzén, 2000). The application that is considered in this paper is environmental. We examine five-dimensional air quality monitoring data comprising a series of measurements of ground level *ozone* (O_{3}), *nitrogen dioxide* (NO_{2}), *nitrogen oxide* (NO), *sulphur dioxide* (SO_{2}) and *particulate matter* (PM_{10}), in Leeds city centre, UK, during the years 1994–1998 inclusively.

Regulation of air pollutants is undertaken because of their well-established deleterious effects on human health, vegetation and materials. Government objectives for concentrations of air pollutants are given in terms of single variables, rather than combinations of variables (Department of the Environment, Transport and the Regions, 2000). However, atmospheric chemists are increasingly aware of the importance of understanding the dependence between different air pollutants. Recent atmospheric chemistry research (Photochemical Oxidants Review Group, 1997; Colls, 2002; Housley and Richards, 2001) has highlighted issues concerning extremal dependence between air pollutants. In particular, the Photochemical Oxidants Review Group (1997) suggested that the dependence between O_{3} and some other atmospheric pollutants strengthens as the level of O_{3} increases. This is of concern since it is known that O_{3} has synergistic corrosive effects in combination with other sulphur- and nitrogen-based pollutants. The adverse health effects of particulate matter are also believed to be exacerbated by the excessive presence of other gaseous pollutants.

The gases are recorded in parts per billion, and the particulate matter in micrograms per cubic metre. The data are available from http://www.blackwellpublishing.com/rss We compare data from *winter* (from November to February inclusively) and early *summer* (from April to July inclusively).

Fig. 1 shows the daily maxima of the hourly means of the O_{3} and NO_{2} variables for each of these seasons. The highest values of O_{3} are observed in the summer, as O_{3} is formed by a series of reactions that are driven by sunlight (Brimblecombe, 2001). The reactions involve hydrocarbons and NO_{2}; large values of the latter occur with large O_{3} values as shown Fig. 1. This positive dependence between O_{3} and NO_{2} in summer is not observed during the winter when the sunlight is weaker. Dependence between the air pollution variables influences the combinations which can occur when any one of the pollutants is large. In Section 7 we estimate several functionals of the extreme values of the joint distribution of the air pollution variables. One such functional is the probability that these variables occur in an extreme set *C* ⊂ ℝ^{d}, an example of such a set being shown in the *summer* data plot of Fig. 1(a). The precise specification of this set is discussed in Section 7. Pairs of (O_{3}, NO_{2}) could occur in the set that is shown in Fig. 1 by being extreme in a single component, or by being simultaneously (but possibly less) extreme in both components.

The air pollution problem is a typical example of multivariate extreme value problems, summarized as follows. Consider a continuous vector variable **X** = (*X*_{1},…,*X*_{d}) with unknown distribution function *F*(**x**). From a sample of *n* independent and identically distributed observations from *F* we wish to estimate functionals of the distribution of **X** when **X** is extreme in at least one component. The methods that are developed in this paper allow any such functional to be considered. However, to simplify the presentation we shall focus much of our discussion on estimating & pr ;(**X** ∈ *C*) where *C* is an extreme set such that for all **x** ∈ *C* at least one component of **x** is extreme. Typically no observations will have occurred in *C*. The structure of *C* motivates the following natural partition of *C* into *d* subsets . Here, *C*_{i} is that part of *C* for which *X*_{i} is the largest component of **X**, as measured by the quantiles of the marginal distributions. Specifically, for each *i* = 1,…,*d*, let *F*_{Xi} denote the marginal distribution of *X*_{i}; then

We assume that subsets of *C* of the form *C*∩{**x** ∈ ℝ^{d}:*F*_{Xi}(*x*_{i})=*F*_{Xj}(*x*_{j}) for some *i*≠*j*} can be ignored; these are null sets provided that on these subsets there are no singular components in the dependence structure of **X**. The partition of *C* into *C*_{1} and *C*_{2} for (O_{3}, NO_{2}) is shown in Fig. 1; the curved boundary between the sets is due to the inequality of the two marginal distributions.

With the partition of *C* defined in this way, *C* is an extreme set if all *x*_{i}-values in a non-empty *C*_{i} fall in the upper tail of *F*_{Xi}, i.e., if *v*_{Xi} = inf_{x ∈ Ci}(*x*_{i}), then *F*_{Xi}(*v*_{Xi}) is close to 1 for *i* = 1,…,*d*. So

Consider the estimation of & pr ;(**X** ∈ *C*) by using decomposition (1.1). We need to estimate & pr ;(*X*_{i} > *v*_{Xi}) and & pr ;(**X** ∈ *C*_{i}|*X*_{i} > *v*_{Xi}), the former requiring a marginal extreme value model and the latter additionally needing an extreme value model for the dependence structure. We focus on these two terms in turn.

Methods for marginal extremes are now relatively standard; see Davison and Smith (1990), Smith (1989) and Dekkers *et al.* (1989). Univariate extreme value theory provides an asymptotic justification for the generalized Pareto distribution to be an appropriate model for the distribution of excesses over a suitably chosen high threshold; see Pickands (1975). Thus, we model the marginal tail of *X*_{i} for *i*=1,…,*d* by

Here *u*_{Xi} is a high threshold for variable *X*_{i}, *β*_{i} and *ξ*_{i} are scale and shape parameters respectively with *β*_{i} > 0 and *s*_{+} = max(*s*,0) for any *s* ∈ ℝ. We require a model for the complete marginal distribution *F*_{Xi} of *X*_{i} for each *i* = 1,…,*d*, since to estimate & pr ;(**X** ∈ *C*_{i}|*X*_{i} > *v*_{Xi}) we need to describe all *X*_{j}-values that can occur with any large *X*_{i}. We adopt the semiparametric model for *F*_{Xi} of Coles and Tawn (1991, 1994), i.e.

where is the empirical distribution of the *X*_{i}-values. We denote the upper end point of the distribution by *x*^{Fi}, which is ∞ if *ξ*_{i}0 and *u*_{Xi}−*β*_{i}/*ξ*_{i} if *ξ*_{i}<0. Model (1.3) provides the basis for estimating the & pr ;(*X*_{i}>*v*_{Xi}) term of decomposition (1.1).

Both the marginal and the dependence structures of **X** are needed to determine & pr ;(**X** ∈ *C*_{i}|*X*_{i} > *v*_{Xi}). We disentangle these two contributions and focus on the dependence modelling by working with margins that are assumed known for much of the following. We transform all the univariate marginal distributions to be of standard Gumbel form by using the probability integral transform, which for our marginal model (1.3) is

where *ψ*_{i} = (*β*_{i},*ξ*_{i}) are the marginal parameters. This transformation gives & pr ;(*Y*_{i} *y*) = exp {− exp (−*y*)} for each *i*, so & pr ;(*Y*_{i} > *y*) ∼ exp (−*y*) as *y* → ∞, and *Y*_{i} has an exponential upper tail. To clarify which marginal variable we are using, we use **X** and **Y** throughout to denote the variable with its original marginal distributions and with Gumbel margins respectively.

We now focus on extremal dependence modelling of variables with Gumbel marginal distributions. Modelling dependence for extreme values is more complex than modelling univariate extreme values and despite there already being various proposals the methodologies are still evolving. When interest is in the upper extremes of each component of **Y**, the dependence structures fall into two categories: asymptotically dependent and asymptotically independent. Variable **Y**_{−i} is termed asymptotically dependent on and asymptotically independent of variable *Y*_{i} when the limit

is non-zero and zero respectively. Here **Y**_{−i} denotes the vector **Y** excluding component *Y*_{i} and **y** a vector of *y*-values. All the existing methods for multivariate extreme values (outlined in Section 2) are appropriate for estimating & pr ;(**X** ∈ *C*) under asymptotic dependence of the associated **Y**, or for asymptotically independent variables provided that all **x** ∈ *C* are large in all components.

Fig. 2 shows the winter air pollution data transformed, by using transformations (1.4), to have identical Gumbel marginal distributions. It is clear from Fig. 2 that the extremal dependence between the NO variable and each of the other variables varies from pair to pair, with asymptotic dependence a feasible assumption only for (NO, NO_{2}) and (NO, PM_{10}). Thus the range of sets for which existing methods can be used to estimate & pr ;(**X** ∈ *C*) is re- stricted.

We present an approach to multivariate extreme values that constitutes a change of direction from previous extreme value methods. Our modelling strategy is based on an assumption about the asymptotic form of the conditional distribution of the variable given that it has an extreme component, i.e. the distribution of **Y**_{−i}|*Y*_{i} = *y*_{i} as *y*_{i} becomes large. This conditional approach provides a natural extension of the univariate conditional generalized Pareto distribution model (1.2) to the multivariate case as & pr ;(**X** ∈ *C*_{i}|*X*_{i}>*v*_{Xi}) can be expressed as

where the integrand is evaluated by using the distribution of **Y**_{−i}|*Y*_{i}=*y*_{i} after marginal transformation. When *v*_{Xi} > *u*_{Xi} the derivative of is the generalized Pareto density function with scale and shape parameters *β*_{i}+*ξ*_{i}(*v*_{Xi}−*u*_{Xi}) and *ξ*_{i} respectively.

Our conditional approach applies whether the variables are asymptotically dependent or asymptotically independent; it can be used to estimate & pr ;(**X** ∈ *C*) for any extreme set *C*, and it is applicable in any number of dimensions. The model that we use for the conditional distribution is motivated by an asymptotic distributional assumption and is supported by a range of theoretical examples. The model is semiparametric; parametric regression is used to estimate the location and scale parameters of the marginals of the joint conditional distribution and nonparametric methods are used to estimate the multivariate residual structure. Though our approach lacks a complete asymptotic characterization of the probabilistic structure, such as those which underpin existing extreme value methods, we show that strong mathematical and practical advantages are given by our approach in comparison with existing multivariate extreme value methods.

Existing methods are presented in Section 2. In Section 3 we state the new asymptotic assumption on which our conditional model is based, present some theoretical examples and draw links between the proposed and current methods. The examples motivate the modelling strategy that is introduced in Section 4. In Section 5 inference for the model is discussed. The methods are compared by using simulated data in Section 6. In Section 7 we illustrate the application of the techniques by analysing the extreme values of the air pollution data. Finally, in Section 8 we give the detailed working for the theoretical examples that are presented in Section 3.