Model-based inference of conditional extreme value distributions with hydrological applications

Multivariate extreme value models are used to estimate joint risk in a number of applications, with a particular focus on environmental fields ranging from climatology and hydrology to oceanography and seismic hazards. The semi-parametric conditional extreme value model of Heffernan and Tawn (2004) involving a multivariate regression provides the most suitable of current statistical models in terms of its flexibility to handle a range of extremal dependence classes. However, the standard inference for the joint distribution of the residuals of this model suffers from the curse of dimensionality since in a $d$-dimensional application it involves a $d-1$-dimensional non-parametric density estimator, which requires, for accuracy, a number points and commensurate effort that is exponential in $d$. Furthermore, it does not allow for any partially missing observations to be included and a previous proposal to address this is extremely computationally intensive, making its use prohibitive if the proportion of missing data is non-trivial. We propose to replace the $d-1$-dimensional non-parametric density estimator with a model-based copula with univariate marginal densities estimated using kernel methods. This approach provides statistically and computationally efficient estimates whatever the dimension, $d$ or the degree of missing data. Evidence is presented to show that the benefits of this approach substantially outweigh potential mis-specification errors. The methods are illustrated through the analysis of UK river flow data at a network of 46 sites and assessing the rarity of the 2015 floods in north west England.

insurance purposes, we are interested in understanding the joint probability of events such as those observed in winter 2015/2016 and the likely nature of events that are even more extreme.
Let R i represent the river flow at gauge i at a given time with corresponding location s i . Consider n independent and identically distributed realisations of the variable R = (R 1 , … , R d ), with this variable representing the joint behaviour of river flows at d gauges recorded over a given time period. From observations of these variables, we are interested in estimating marginal and joint probabilities. For example, for assessing the rarity of the December 5th, 2015, event in North West England, let v i be the measured flood value in this event for the ith gauge in the region. Then, we need to know about marginal risk assessment at gauge i, through estimating the probabilities P (R i > v i ), i = 1, … , d, and for joint risk assessment, the probability P (R ∈ A), where A = {r = (r 1 , … , r d ) ∈ R d ∶ r i > v i , i = 1, … , d}. More generally, we are interested in estimating the probability P (R ∈ A), where the set A ⊂ R d is extreme for at least one component, say, R i of R, so that for all r ∈ A, r i > q i with q i being a high quantile for variable i.
For modelling spatial multivariate extremes data, the most widely used approach uses max-stable processes (Asadi, Davison, & Engelke, 2015;Davison, Padoan, & Ribatet, 2012). However, max-stable processes imply a strong form of extremal dependence, termed asymptotic dependence, in which the largest values at each site, over different events, can occur in the same particular flood event. The assumption of asymptotic dependence is probably reasonable for local-scale studies, such as in a mesoscale river basin. However, for larger scale studies, such as widespread studies across regions of the UK, this dependence assumption is highly restrictive as the largest values at different sites are unlikely to be occur in a single event.
Recent developments in statistical modelling of hydrological extremes allow us to now place such widespread events into a probabilistic framework Keef, Tawn, & Lamb, 2013;Lamb et al., 2010). Underpinning such methods is the theory of multivariate conditional extremes of Heffernan and Tawn (2004). This approach is able to handle the required mixture of both asymptotic dependence and asymptotic independence (a weaker form of extremal dependence than asymptotic dependence; see Section 2.2 for both extremal dependence structures that are identified in river flow data. Their conditional dependence model is formed through a semiparametric regression with parametric components describing variation in the means and the variances of the joint conditional distribution, and the joint distribution of the multivariate residuals being estimated empirically. The parametric components determine the core extremal dependence features, such as whether subsets of the variables are asymptotically dependent or asymptotically independent, and model across the range of possible dependence structures. For hydrological applications, the method needs to be able to handle high dimensions (typically for 10-1, 000 sites); give realistic simulations of multivariate extreme events; enable the estimation of the risk of events, which are simultaneously rare at all and/or many sites; and allow covariates to be incorporated. Direct application of the Heffernan and Tawn (2004) method fails when dealing with any one of these issues, let alone being able to address all of these aspects in one analysis. The key problem with Heffernan and Tawn (2004) is that empirical multivariate residual modelling suffers from the curse of dimensionality, which along with its restriction to its reliance on the previously observed residuals, means that extrapolations to rarer events correspond to relocated and rescaled versions of past events. These events have poor coverage over the extremal regions of the sample space in high-dimensional studies and so lead to inefficient inference.
An additional complication that hydrological applications bring is that of missing data. Here, we assume the data to be missing at random. Data are likely to be missing when gauges are installed at different times or gauges become faulty. The Heffernan and Tawn (2004) approach, with its empirical residual distribution model, can only be applied for a d-dimensional problem when all components of the d-dimensional variable are observed. One approach would be to only analyse complete vector observations. This approach is highly restrictive, for example, when considering the whole of the UK river network, with ∼ 1, 000 gauges, which were considered as part of the National Flood Resilience Review (Tawn, Shooter, Towe, & Lamb, 2018), no concurrent observations are observed at all locations, and hence, lead to highly inefficient inference about extreme events. An alternative approach, proposed by , is to replace these missing data, via infilling all the missing residuals with jointly generated multiple samples for the distribution of missing residuals given the observed residuals. This approach, which assumes a Gaussian copula for the joint distribution of missing and observed residuals only and treats fully observed variables empirically, is hugely computationally intensive when the amount of missing data is nontrivial. Critically, it fails to address all the other problems with the Heffernan and Tawn (2004) method that are described above.
Instead, in this paper, the full residual distribution is modelled semiparametrically: One-dimensional kernel-smoothed distribution functions capture the marginal behaviours of the observed residuals and a Gaussian copula is used for their dependence structure (Joe, 2014). Although this change in approach may at first seem rather small, it has major implications for the applicability of the method, in that it addresses all the problematic issues of Heffernan and Tawn (2004), as well as handling large volumes of missing data efficiently. The primary reasons for its success are that it removes the problems of the curse of dimensionality and the choice of copula is flexible and parsimonious. Of course, there is a cost to be incurred by this modelling approach, as there is no theoretical motivation to support this assumption. However, here, we show plenty of evidence to suggest that the Gaussian copula is suitable for modelling the residual copula structure, mainly as it plays a secondary role in capturing the extremal dependence relative to the Heffernan and Tawn (2004) regression parameters. It is important to have strong diagnostic tools to assess departures from this model and a clear understanding of the effects of misspecification. This paper is the first that looks carefully at these aspects and finds that there are substantial improvements from the added flexibility and the more efficient use of the data on the estimation of probabilities of rare events.
The Heffernan and Tawn (2004) model is explained briefly in Section 2, with the extensions that we propose and their connections with previously adopted Gaussianity assumptions given in Section 3. The methodology for testing the validity of our proposed approach, including dealing with missing data, is detailed in Section 3.2. The comparisons with existing approaches to handle missing values are presented in Section 3.3. A generic simulation algorithm for the proposed conditional extreme value model and techniques for estimating probabilities of extreme joint events are given in Section 4.
Then, examples of the proposed methodology are given in Sections 5 and 6 for simulated and observed data, respectively. The methodology is applied to study widespread flooding in North West England; the success of the different methods is compared through estimated probabilities of joint flood risk. The paper finishes with a discussion that considers ways in which the model can be made more parsimonious. Throughout this paper, all vector algebra is to be interpreted as being componentwise.

Marginal model
The model for the marginal distributions of R has two components, separated using the predetermined threshold level u i for variable R i (i = 1, … , d). For a univariate random variable R i , asymptotic theory considers the distribution of excesses over a threshold of u i , scaled by some function c(u i ) > 0, that is, P (c(u i )(R i − u i ) ≥ r|R i > u i ), with r > 0; if this converges to a nondegenerate limit as u i tends to the upper endpoint of the distribution of R i ; then, the limit distribution can only be the generalised Pareto distribution (GPD; Pickands, 1971). If it is assumed that this limit model holds exactly for some large-enough threshold u i , it follows that with the scale parameter i > 0, the shape parameter i ∈ R, and the notation [r] + = max (r, 0) (Davison & Smith, 1990). Above the threshold, the GPD is adopted. For those points below the threshold u i , there is no theoretical justification for any particular model choice, so instead, a kernel-smoothed empirical cumulative distribution functionF i (r) of R i is used. Thus, is the probability of an exceedance above the threshold u i . Estimating ( i , i ) for each gauge separately can lead to inefficient inference as the spatial coherence and dependence of R i over gauges suggest that ( i , i ) and ( j , j ) should be more similar when gauges i and j are closer together. Methods such as the covariate hierarchical/latent variable models that spatially smooth the GPD parameters have been developed by Cooley, Nychka, and Naveau (2007) and Cooley and Sain (2010). These models are ideal in the generation of marginal quantile maps as they share information from neighbouring sites to reduce any uncertainty in the estimation of quantiles. As the focus of this paper is on dependence modelling, we restrict ourselves to separate marginal fits but recognise that this typically can be improved upon.
To help estimate the dependence structure of the random variable R, the data are transformed componentwise to a variable Y = (Y 1 , … , Y d ), with common Laplace margins, via the transform for i = 1, … , d, where F i is given in Equation (2). The transformation to Laplace margins means that P( for y > 0 and v > 0. Therefore, the marginal random variables of Y now have exponential upper and lower tails. This is a minor deviation from the Heffernan and Tawn (2004) approach, as they transform to Gumbel margins, but the use of Laplace margins unifies the handling of positive and negative dependence (Keef, Papastathopoulos, & Tawn, 2013).

Introduction to extremal dependence properties
Extremal dependence properties need to be studied for all combinations of the variables as, unlike for multivariate Gaussian distribution, not all dependence is determined by the set of pairwise dependences. Therefore, consider C ∈ 2 D with |C| ≥ 2 and D = (1, … , d); then, define a measure of extremal dependence for variables {R i ; i ∈ C} by where F i is the marginal distribution function of R i . If C > 0( C = 0), the variables in C are jointly asymptotically dependent (asymptotically independent). Here, C > 0 means that extreme events can occur simultaneously over all sites in C, whereas if C = 0, such events are impossible for the set of sites C. Clearly, for B ⊂ C, it is possible that C = 0 and B > 0, but if B = 0, then C = 0. Thus, it is possible to have asymptotic dependence locally but asymptotic independence over all sites. If a copula model is used, the extremal dependence structure is predetermined by the choice of the copula before the model is fitted. For example, the class of bivariate extreme value distribution copulas has 1,2 > 0 (unless the variables are independent) and the class of multivariate Gaussian copula, with parameters { i, j ; i ≠ j ∈ D}, has C = 0 (unless ij = 1 for all i, j ∈ C for all C ∈ 2 D with |C| ≥ 2). Other standard copula models typically can only handle one of the two classes of extremal dependence (Heffernan, 2000). As both of the extremal dependence classes are typically observed in extreme river flow data sets (see Tawn et al., 2018), a standard copula approach is almost never sufficiently flexible. Instead, like with univariate extremes, we appeal to asymptotic formulations to motivate a class of models specific to the tail region. These models allow any possible combination of feasible C values for C ∈ 2 D .

Extremal model for conditional dependence
After making the transformation given in Equation (3), the extremal behaviour of the joint tail of the random variable Y can now be determined. The approach models Y given that at least one of its elements is extreme, that is, given that max(Y) > v for large v, where v is a dependence threshold.
First assume that Y 1 > v; then, the joint distribution of the (d − 1) remaining variables Y −1 = (Y 2 , … , Y d ) is modelled conditional on Y 1 being above v. The approach is motivated by the following asymptotic formulation studied by Heffernan and Tawn (2004) and Heffernan and Resnick (2007). The underlying idea is to see how Y −1 behaves as Y 1 gets large. In order to avoid nondegeneracy of the limiting conditional distribution of Y −1 as Y 1 tends to its upper end point, it is sensible to look for a componentwise location-scale transformation of Y −1 using functions of Y 1 . As dependence between Y 1 and each component of Y −1 may be different, these location-scale transformations need to have the flexibility to be different for each component. This leads to the assumption that there exist normalising functions, a(.) ∶ R → R d−1 and b(.) > ∶ R → R d−1 + , such that the following limit probability holds for y > 0: where the joint distribution function G(z) is nondegenerate in each margin and has no mass for any margin at infinity. The first term in the limit given in Equation (4) arises from the fact that Y 1 follows a standard Laplace distribution. The second term in the limit characterises the behaviour of Y −1 |Y 1 > v in terms of the limiting distribution function G(z) along with the location a(.) and scale b(.) functions. It is assumed that the normalisations of the variables Y −1 and Y 1 are independent in the limit. This last assumption parallels that in classical point process models for multivariate extremes and regularly varying distributions (Coles & Tawn, 1991;Resnick, 2013), with radial and angular representations being assumed to be independent in the limit as the radial variable tends to infinity. Heffernan and Tawn (2004) show that formulation (4) holds for all standard copula models. As a result of Equation (4), G(z) is the limiting conditional distribution of where Z ∼ G, and we call Z the residual of the conditional extreme value model. The result of the limits given in Equations (4) and (5) is that Z and Y 1 are independent given that Y 1 > v in the limit as v → ∞. Similar limits, with potentially different a(.), b(.), and G, hold for Y −j |Y j > v for any j = 2, … , d. Joining together these d different conditionals, we have a model for the joint tail behaviour of Y, when at least one component is large.
Under weak assumptions on the joint distribution of Y, Heffernan and Resnick (2007) show that componentwise a(·) and b(·) must be regularly varying functions satisfying certain constraints, which for Laplace margins corresponds to each of the components of a (respectively b) being regularly varying functions of index 1 (respectively less than 1). Heffernan and Tawn (2004), , and Papastathopoulos and Tawn (2016) found that, although different classes of extremal dependence have different forms for a(.) and b(.), they all can be well approximated in a simple parametric form, which is the dominant power term of the regularly varying functions, that is, excluding the slowly varying function. For Laplace margins, this form simplifies to follows that C > 0 and the variables indexed by C are asymptotically dependent. Similarly, if i < 1 for any i ∈ C −1 , then C = 0 and the variables indexed by C are asymptotically independent. Thus, controls the collections of variables, which are asymptotically dependent with variable Y 1 . It is clear therefore that this model captures all the possible sets of asymptotically independent and dependent variables as set out in Section 2.2. This unification of the parametric forms for all dependence classes enables flexible efficient statistical modelling unlike with standard parametric copula modelling. Heffernan and Tawn (2004) assume that limit (4) holds exactly above a sufficiently large dependence threshold v and that the normalising functions are given by the parametric forms (6). This leads to the following model: where −1 ≤ ≤ 1 and −∞ < < 1 and Z ∼ G, where G is a marginally nondegenerate distribution function and the Z is independent of Y 1 . There is no general theoretically justified family of distributions G for the multivariate residuals Z, so Heffernan and Tawn (2004) assumed that Z has marginal finite means and variances and 2 , respectively, where = ( 2 , … , d ) and = ( 2 , … , d ). As a result, the following expressions for the conditional expectation and variance of Y i |Y 1 = y can be determined for y > v and i = 2, … , d: (8) Heffernan and Tawn (2004) model the joint distribution of Z nonparametrically using an empirical joint distribution, with the specific form of this model presented in Section 2.4. So far, we have presented the behaviour of Y|Y 1 > v for large v, or equivalently Y|Y i > v for an arbitrary i ∈ D, but we really want the behaviour of Y| max(Y) > v. This conditional behaviour can be derived from the set of distributions of Y|Y i > v for i ∈ D. As the conditioning variable changes to Y i , the norming functions a(·) and b(·) and the limiting distributions G all change with i. We can piece together results from a series of models of the form above. A limitation of this set of models is that self-consistency is not ensured unless specific constraints on these different normalisation and distribution functions are made. A lack of self-consistency may lead to inconsistencies when joint exceedance probabilities are estimated, with the results depending on the choice of a conditioning variable. Heffernan and Tawn (2004) review ways of avoiding this problem with partitioning the sample space, and Liu and Tawn (2014) discuss a number of approaches to reduce this problem. In this paper we will, however, largely look at the individual conditional distributions, that is, Y|Y i > v for i ∈ D and not overall joint tail inference.

Inference
The dependence parameters and of the Heffernan and Tawn (2004) model are estimated through pairwise maximum pseudolikelihood for the n v pairs with Y 1 > v. The pseudolikelihood L ( , , , ) for inference for ( , ) is constructed under the temporary working assumption that that is, independent Gaussian distributions. Hence, The maximum pseudolikelihood estimateŝ= (̂2, … ,̂d) and̂= (̂2, … ,̂d) are found, by jointly maximising Equation (9), with and . Now, we present the Heffernan and Tawn (2004) modelling and inference for the joint distribution of the residuals. This is where our inference approach outlined in Section 3 differs. Firstly, the temporary working assumption of independent Gaussianity of the components of Z used in the estimation of and is discarded. With the fitted values of these parameters, there are n v observed exceedances of v by Y 1 , denoted by y 1j , j = 1, … , n v . The associated vectors of residuals are with its component associated with Y i given by Heffernan and Tawn (2004) estimate the joint distribution function G through the empirical joint distribution function of these residuals z (1) , … , z (n v ) . Extrapolation from the model comes from (7), with larger events arising when Y 1 is larger than the observed events. Due to the independence of Y 1 and Z, for Y 1 > u, all simulated events are of the form Y −1 = ( + z ( ) ), for y > v and j = 1, … , n v . This leads to simulated events on Laplace margins being shifted and rescaled versions of past events. Thus, the extrapolation is restricted to n v sets of 1-dimensional extrapolations, which clearly do not span the required extrapolation space, particularly when n v is small relative to d.

Semiparametric inference for G
We model the joint residual distribution G by a semiparametric joint distribution model with 1-dimensional kernel-smoothed marginal distribution functions and a Gaussian copula (Joe, 2014). LetĜ i (z) be the kernel-smoothed distribution function for observations of Z i ; then, with h i > 0, being the bandwidth (Silverman, 1986) and z ij , given by expression (10), corresponding to the ith component of the jth residual vector when Y 1 > v. The kernel-smoothed distribution provides flexibility as it allows smooth interpolation between observed data points as well as some limited extrapolation, and critically, it leads to a nondeterministic extrapolation of past events. Our model for the joint distribution function G is then where z = (z 2 , … , z d ), and Φ and Φ d − 1 (., Σ) are the cumulative distribution functions of a standard univariate Gaussian and a standard ( The use of the componentwise probability integral transformation gives Our copula assumption (12) then corresponds to Z N being a (d − 1)-dimensional standard Gaussian distribution with the correlation matrix Σ giving a relationship between the residuals, which is fully determined by its bivariate marginals. Furthermore, the Gaussian copula is chosen because it is computationally feasible in high dimensions and is closed to marginalisation and conditioning. The Gaussian copula has an asymptotically independent extremal dependence structure (Ledford & Tawn, 1996); however, this property is not restrictive as the joint tails of Z are not vital for determining the joint tails of Y −1 |Y 1 as that distribution is a mixture over Y 1 , for Y 1 > v, so even independent Z can lead to Y −1 |Y 1 > v being asymptotically dependent. See Section 3.3 for details of how to estimate Σ.
Unlike the standard Heffernan and Tawn (2004) approach, the residuals are no longer restricted to the sample as the kernel smoothing allows both interpolation and limited extrapolation of the residuals, and the Gaussian copula enables new combinations of Z to occur.

Tests of the Gaussian copula assumption
A formal test to check whether the copula is fairly close to being Gaussian is required to avoid the residual joint model being applied inappropriately. For assessing pairwise dependence, visual inspections of the residual distribution is sometimes sufficient; however, this comparison fails to assess the importance of higher order dependence. In order to assess the full dependence structure, we adopt the methods of Bortot, Coles, and Tawn (2000) for assessing the Gaussian copula in joint tail regions.
Consider the set of independent and identically distributed observations of Z N , which follows a (d − 1)-dimensional multivariate Gaussian distribution with correlation matrix Σ. The square of the Mahalanobis distance is defined by . In reality, there are missing (at random) values in the observations of the residual variable Z N and the percentage of missing values is not consistent across locations. Therefore, the test statistic T has to be adapted to account for the different record lengths of data. First, let We can define the adapted test statistic of Gaussianity to be where n v is the number of observations of Z N . If a particularly large value of T * is observed, then there is a deviation away from the assumption of multivariate normality. The sampling distribution of T * under the null hypothesis for a given pattern of missing data is easily derived by Monte Carlo methods but has been constructed to have E(T * ) = 0 and Var(T * ) = 1 under the null hypothesis of the Gaussian copula whatever the missingness pattern, provided min(d 1 , … , d n v ) ≥ 1 and Σ is known. Heffernan and Tawn (2004) only consider vectors of complete observations so with any missing data, the method will be highly inefficient. The data-usage efficiency can be defined as 100 ∑ n i=1 1 (d i = d − 1) ∕n with 1 being the indicator function and d i as defined in Section 3.2.  developed a strategy to replace each missing variable by a sample of m replicates generated from a d − 1 − d i -dimensional Gaussian approximation for the conditional distribution of the missing Z N i given the observed Z N i elements for all i with d i < d − 1. This approach has major computational problems when more than a small number of missing values are present as it requires w

Handling missing values
where w needs to be reasonably large to remove Monte Carlo noise, for example, w ∈ (100, 1000). This approach is subsequently referred to as the infill approach.
We propose using our Gaussian copula model to give a statistically and computationally efficient approach. Equation (12) is used to transform the Z variables, on their original margins, to Z N on Gaussian margins. Concurrent pairs of observations of Z N are used to estimate the correlation parameters provided that a datum exists for a given Z N i and Z N pair. This gives the following estimated correlation matrixΣ, with (i, j)th entry of̂i , beinĝ 1 i,k and similarly forz . When there are no concurrent data for the pair (i, j), that is, ∑ n v k=1 1 i,k 1 ,k = 0, then a covariate model or prior information can be used to give an estimate. As the correlation matrix is estimated for nonoverlapping data sets, there is a possibility that the resulting estimated correlation matrix Σ is not positive semidefinite. However, there are eigen-decomposition methods that can solve this problem by giving the nearest positive-definite matrixΣ toΣ that maintains unit diagonals (Franklin, 2012).

Connections with other models
There have been some Gaussian assumptions made in other work using the Heffernan and Tawn (2004) model, but that differs from what is proposed here. In the original Heffernan and Tawn (2004) paper for the inference of the regression parameters ( , ), a pseudolikelihood is constructed with independent Gaussian residuals, but for subsequent inference on Z, this assumption was then dropped. Thus, there is in fact no overlap with the approach in Heffernan and Tawn (2004). Motivated by early findings in this paper, in a spatial setting, Tawn et al. (2018) assume that Z is a realisation from a Gaussian process at a set of sites, so there they make an assumption of marginal Gaussianity for Z in addition to the Gaussian copula we assume. In that paper, there is no discussion on how to assess the Gaussian copula model or why it may be appropriate. This is what this paper does.

Simulation of extreme events
The procedure to simulate from our model for R, assuming that its first component is large, is an adaptation of the algorithm in Heffernan and Tawn (2004) and Jonathan, Ewans, and Randell (2013). Firstly, we define q i,p as the pth quantile of R i ; thus, F i (q i,p ) = p. The aim is then to simulate R|R 1 > q 1,p . On Laplace margins, this corresponds to simulating Here, we assume p is sufficiently large so that v p > v, where v is the dependence threshold described in Section 2.3.
The steps of the simulation procedure are outlined as follows.
1. Simulate Z N from a standard (d − 1)-dimensional Gaussian distribution with correlation matrixΣ, as defined in (12). 2. Transform Z N marginally through a 1-dimensional kernel-smoothed distribution function to produce a sample of 3. Independent of Z N , draw a value of the conditioning variable Y 1 from a standard exponential distribution above v p , for example, Derive the simulated value of the conditioned variates Y −1 , which is a function of Y 1 , Z and the estimated dependence parameters (̂,̂), via This gives a sample of Y = (Y 1 , Y −1 ) with Y 1 > v p . 5. The inverse of the probability integral transform, as given in Equation (3), can be used to transform Y back to its original margins of R = (R 1 , … , R d ), with R 1 > q 1,p .
In the simulation of spatially consistent extreme events, we want to ensure that events are simulated conditional on R being extreme for at least one location. We adopt the model of  that generates an extreme event conditional on the event {max(F 1 (R 1 ), … , F d (R d )) > p} with p near 1, or equivalently {∃i = 1, … , d ∶ R i > q i,p }. After transformation to Laplace margins, this corresponds to simulating max (Y 1 , … , Y d ) > v p . To be able to simulate from this conditional distribution using the previous algorithm for simulating from Y|Y 1 > v p , we need to determine the conditioning gauge for each event. The approach is to first simulate where here each of these conditional probabilities can be estimated from our models for Y|Y k > v, for k = 1, … , d.
Finally, if I p = j, then apply the above algorithm for Y|Y 1 with the index 1 replaced by j and this point is rejected if max(Y − ) > Y , that is, Steps 1-5 need repeating until, for the selected gauge, j, we have max(Y − ) < Y .

Estimation of joint extreme events
In many applications, such as the design of flood defence schemes or assessing potential flood losses over an insurance portfolio, interest lies in accurately estimating the probability of rare events across a number of spatial locations or environmental hazards. The Monte Carlo methods described in Section 4.1 are the most effective way to estimate many extreme events. However, as was noted in Section 1, there are major limitations with these methods for events that are rare relative to the marginal probability for the conditioning variable. Estimation of these probabilities requires a more careful analysis, which we can achieve for the first time here due to our semiparametric residual distribution model choice. We will illustrate the estimation for both these types of events. Firstly, consider an event A that is extreme in the sense that at least R 1 is extreme. Then, there exists a value of p, near 1 such that An estimate of this joint probability is given bŷ whereR 1 , … ,R are independent and identically distributed values simulated from R|R 1 > q 1,p , and is the number of the simulations. However, if {R i ; i ∈ C}, with 1 ∈ C, is asymptotically independent, then as C = 0, the conditional probability that is being estimated by the Monte Carlo methods above is near zero if A ⊂ ∏ i∈C (q i,p , ∞). For sets such as A, it is better to exploit the Gaussian copula structure and express the result through an integral for which standard numerical integration methods can be used. Specifically, for A = ∏ i∈D (q i,p i , ∞), with p 1 near 1, the model gives whereG(z) = (G 2 (z 2 ), … ,G d (z d )), y −1 = (y 2 , … , y d ) with y i being the p i th quantile of a Laplace distribution, and Φ d−1 (., Σ) is the joint survivor function of the standard multivariate Gaussian variable with correlation matrix Σ. This result allows us to reduce the complexity of the (d − 1)-dimensional integral calculation of rare event probabilities through the direct evaluation of the multivariate Gaussian joint survivor function and a 1-dimensional integral.

SIMULATION STUDY
To assess the performance of our proposed Gaussian copula approach, for modelling the joint distribution of the residuals in the conditional multivariate extremes model, we undertake a simulation study to compare it with the empirical approach of Heffernan and Tawn (2004) and with an approach using a multivariate kernel density estimatê where the ith kernel is Gaussian with mean z i , H is a positive definite bandwidth matrix (Wand & Jones, 1994), and {z 1 , … , z n v } are the observed residuals. The methods are compared via their estimation of the probability with p = 0.99, 0.998, and 0.999. Data are simulated from a symmetric multivariate extreme value logistic distribution (Tawn, 1990), with dependence parameter ∈ (0, 1] with the lower and upper limits for corresponding in perfect dependence and independence, respectively. For the symmetric logistic distribution and a given dimension d, the true probability of Equation (17) (−1) m p m . For all < 1, the variables are asymptotically dependent, that is, D > 0, and hence, parameters of the Heffernan and Tawn (2004) model are = 1 and = 0. Furthermore, for this distribution, the true copula for Z is not Gaussian, so our model gives a misspecification. We consider d = 5, 10, and 20 with = 0.75 (results with = 0.5 are not reported but are similar) and a sample size of 5,000 with 25 replicated data sets and a 0.98 dependence threshold corresponding to 100 observations being in the joint tail region. Correctly in each case, we find that there is strong evidence to reject the Gaussian copula assumption, at a 5% level, when using the test statistic (14) for each of our simulations. Despite this, we proceed to using the Gaussian copula model to see if this misspecification is important for inference. Table 1 shows results for d = 5, where the regression parameters are both set to their true values and when they are estimated. For this relatively low-dimensional case, all three methods perform broadly similarly both in terms of their point estimates and bootstrap-based 95% confidence intervals, with all intervals containing the truth. Despite its clear misspecification, the Gaussian copula method gives estimates that are closest to the truth in all six cases. In addition, we see that the multivariate kernel approach performs worst (underestimating) in all cases.
Furthermore, note that getting good knowledge of the regression parameters ( , ) is more important than the choice of distributional model for Z. This feature is interesting given that much of multivariate extreme value inference has focussed on assuming asymptotic dependence (fixing the regression parameters) and effectively only estimating Z in different ways. These results suggest that focus of attention has been misplaced.
Higher dimensional studies, d = 10 and 20, are compared in Table 2 with the true regression dependence parameters treated as known to enable easier comparison of the different methods for handling the residuals. The multivariate  kernel approach is now clearly failing when d = 10 and becomes increasingly computationally expensive as d increases and so is omitted from the d = 20 study. The empirical approach of Heffernan and Tawn (2004) and the Gaussian copula approach perform broadly similarly as well, though again where they differ the Gaussian method works best. Therefore, even in this case with clear misspecification, the proposed Gaussian copula is at least very highly competitive relative to the existing method. It should be noted that, when there is either no misspecification or there are missing data, the Gaussian copula approach substantially outperforms the empirical approach of Heffernan and Tawn (2004); see Section 6.2.2 for an example of this.

Data
We apply the proposed semiparametric conditional extreme value model to daily mean measurements of river flow data from the National River Flow Archive (NRFA) to answer questions typically proposed by flood risk managers. Gauges from the north west region of England were selected and the locations of these are given in Figure 1; on average, each gauge has record length of approximately 30 years. This region has one of the better spatial coverages of data in the UK. The proportion of missing values in the data is relatively low. The region exhibits varying spatial characteristics, for example, due to changing soil types and elevation, the behaviour is likely to be very different in Cumbria compared with, say, Manchester (in the north and south of the region, respectively). The data set was selected as it has been used for previous spatial flood risk assessments (Lamb et al., 2010;Tawn et al., 2018;Towe, Tawn, Lamb, Sherlock, & Liu, 2016), and it is a region badly affected by the 2015 floods, discussed in Sections 1 and 6.4. For the data, we discuss how our proposed methodology can aid in producing better inferences for rare events at much reduced computational cost and with minimal risk of misspecification error. In Section 6.2, we will illustrate all of the steps of the methodology with a basic case study of 10 sites and then undertake to a full application to 46 gauges in Section 6.3. We see the 10-site study as important as it lets us look carefully at some of the features of the modelling/inference without getting lost in the volume of the data. In particular, we can look at what happens when large portions of the data are missing. To help investigate how our methods work in the basic case study, we estimate probabilities of extreme events for two data sets. The original data set, denoted by F, has 1% missing (0.5% are missing conditional on the first site being large), with a missingness pattern that allows use of Heffernan and Tawn (2004) and the infill approach of . The second data set, denoted by M, has 28% removed to missing status in such a way that no complete observations are available (30% are missing conditional on the first site being large). In both of the analyses, the conditioning site is the same and can be identified by the triangle in Figure 1. For the full application considering 46 gauges, in Section 6.3, 2% of the data are missing.

Basic case study 6.2.1 Assessing the Gaussian copula
First, we use the original data set to assess our modelling assumptions for these data. Conditioning on R 1 being large, we focus on studying the behaviour of Z N , the residuals after the marginal transformation to standard Gaussian margins. A check of the assumption of standard Gaussian margins is given in Figure 2; the empirical quantiles of a standard Normal are plotted against those of the residuals Z N with this being a pooled QQ plot over all margins and replicates of Z N . The different lines in Figure 2 for each respective margin of Z N show that there is no significant deviation away from the line of equality; therefore, the marginals satisfy the assumptions for the proposed Gaussian copula model. Pairwise bivariate kernel density estimates for Z N can be seen in Figure 3. From a visual inspection, the pairwise dependence seems close to Gaussianity; although in a couple of pairs such as (Z N 3 , Z N 5 ), there does seem to be departure away from the expected elliptical contours. Figure 3 does not help us assess any higher order dependence, and as a result, the test for Gaussianity (as given in Section 3.2) is performed to test the assumption of a Gaussian copula more rigorously. The test statistic is calculated using the methodology given in Section 3.2. The p value is calculated to be equal to 0.29, which is greater than the significance level of 0.05. Therefore, the assumption of a Gaussian copula seems reasonable.
Some benefits of the Gaussian copula approach are that the new method is able to interpolate and extrapolate the observed residuals giving simulated events, which are not simply deterministic functions of observed events. A comparison of these features of the Heffernan and Tawn (2004) and Gaussian copula approaches is illustrated in Figure 4. Under these two approaches, Figure 4 shows data and simulations of (top) Y 2 |Y 1 > v p and (bottom) (Y 2 , Y 3 )|Y 1 > v p , both for p = 0.99. From the top row, our proposed approach is seen to give a continuous distribution for Y 2 |Y 1 with slightly more variation in Y 2 |Y 1 > v p . This additional variation, which seems realistic given the extremal behaviour of the observed data set, is due to the use of kernel-smoothed marginal distribution functions for Z N . Similarly, from the bottom row, it can be seen that the simulated joint residuals can differ from observed values, due to the Gaussian copula assumption. Collectively, these new features lead to the simulation of a more realistic joint sample with our proposed approach than that from the Heffernan and Tawn (2004) model.

Conditional probabilities for flood risk management
In many flood risk management cases, interest lies in determining the spatial extent of any given flood event. One common risk measure that flood managers are interested in is the probability that given a site, say, Site 1, exceeds its pth quantile that there are then at least m other sites that also exceed their respective pth quantile, that is,

FIGURE 4
Top row: observed (black) and simulated (grey) joint behaviour of Site 1 and Site 2, given that an extreme event is observed at Site 1. Bottom row: observed (black) and simulated (grey) joint behaviour of Site 2 and Site 3, given that an extreme event is observed at Site 1. Left: the existing method. Right: our proposed method. In all of the figures, the data are shown after transformation to standard Laplace margins

TABLE 3
The estimates (with 95% confidence intervals in parenthesis) for the conditional probability 100 m,T , given in Equation (18), with m=5 using the original (F) and 28% missing data (M) Note. The T is the probability that corresponds to a specific annual return period. The Heffernan and Tawn (2004)  For m,p , given in Equation (18) with m = 5, in Table 3, we provide a point estimate and associated 95% confidence intervals, obtained by using the parametric bootstrap for a range of return periods. These estimates are compared using the Heffernan and Tawn (2004) method with two missing value methods (the infill method of Keef et al., 2009, and our proposed Gaussian copula method). The two data sets denoted F and M are considered; see Section 6.1.

Probability Heffernan and Tawn (F) Infill (F) Gaussian copula (F) Infill (M) Gaussian copula (M)
For data set F, all three methods produce very similar estimates. This is not surprising for the Heffernan and Tawn (2004) and infill methods as for 99% of the data these methods are identical. However, for the Gaussian copula, we are using the modelled residual copula for all the data that are extreme at the conditioning site, and therefore, to find that the estimate varies so little from that of Heffernan and Tawn (2004) is particularly pleasing. For the F data, confidence intervals for both the missing data methods are largest due to a combination of the additional Monte Carlo uncertainty and residual marginal distribution smoothing in the respective methods. Here, only 1% of the data were missing, so we would not expect to see any clear improvement in using these missing data methods, which use all partially observed components unlike in the Heffernan and Tawn (2004) method.
For data set M, it is impossible to obtain estimates from the Heffernan and Tawn (2004) approach due to there being no observations being made concurrently. What is pleasing to see here is that the two missing data methods give broadly similar estimates to those from data set F. In particular, the Gaussian copula model gives estimates that are very close to those using the F data sets for all events in Table 3, whereas for the infill method, the estimates are less self-consistent for the rarer of these events. The confidence intervals of the two methods are approximately the same, which is to be expected as both model the missing values by using a Gaussian copula but handle the computation in different ways. Naturally, the confidence intervals for the M data are larger than the equivalent ones for the F data. Note. This table uses the same return periods as in Table 3.
A critical feature is that the Gaussian copula approach is computationally much quicker even in this basic case. Specifically, the time to get the point estimates using the Gaussian copula is 30% less than the infill method (assuming = 100), and this efficiency gain improves dramatically as both the number of sites and the proportion of missing data increase.
The probabilities in Table 3 were estimated through simulation. However, if we were interested in all sites being above a given return level, this corresponds to m = 9 in Equation (18). This probability is incredibly computationally expensive to estimate through Monte Carlo simulation; however, the methods developed in Section 4.2 can provide us with an estimate that avoids Monte Carlo noise, as it is obtained using the formulation (15) divided by p, with d = 10. Table 4 provides estimates of the 9,p for the same return periods as in Table 3 along with the corresponding numerical integration error.

Large-scale study
Here, the entirety of the north west region of England is considered; this equates to 46 sites in our study. The first modelling step is to fit the conditional extreme value model of Heffernan and Tawn (2004) conditioning on each of the 46 gauges in turn. For each of these 46 models, the estimates of the dependence parameters and are obtained along with the residuals Z of the model.
The residuals Z N of the model are tested to determine whether they can be characterised by using a Gaussian copula. For each conditioning gauge, in turn, the sampling distribution of the test statistic T * , as given in Section 3.2, is obtained through Monte Carlo simulation and a p value for a Gaussian copula is derived. Figure 5 shows a histogram of the p values with all of the 46 p values above the 5% significance level. Therefore, we can conclude that there is no evidence against modelling the residual distribution with a Gaussian copula. Given this conclusion, it seems reasonable to use the model-based Gaussian copula for the multivariate residual component of the conditional extreme value model of Heffernan and Tawn (2004).
We can use these models to make extrapolations using the Monte Carlo methods given in Section 4.1. These simulations maintain the extremal dependence structure of the observed data set but will also generate events that are larger and more varied than those we have already observed. Two such examples are shown in Figure 6 with these illustrating how the spatial structure of an event varies depending on where in the region the event is extreme. The two events have  been selected to be extreme at two different sites in the region, in Cumbria and Manchester, in the north and south of the region, respectively. In Figure 6a, when the conditioning location is in Cumbria, there is a much wider spatial impact, than in Figure 6b, for an event near Manchester. This reflects that, when we condition on Cumbria being extreme, relative to Manchester being extreme, the associated parameters are larger over many more sites, so the spatial extremal dependence is stronger and extreme events in the north of the region are more widespread than those in the south of the region.
To further study the varying spatial characteristics of extreme flood events, a conditioning site is selected to have an extreme event and the distribution of the number of other gauges that are also extreme is estimated. This estimated distribution is derived for the same two conditioning sites as in Figure 6. Here, the probability of exactly m other gauges is m,p − m + 1,p , and this is estimated for three return periods. Estimates of m,p − m + 1,p are compared in Figure 7 for the two conditioning gauges. There is a clear difference in these estimated probabilities. The estimates show that there is greater clustering of flood events when conditioning on the Cumbria site being large. However, some of this clustering could be explained by the fact there is a higher density of gauges in this region. Furthermore, the estimates decay to  zero, for m > 1, at different rates; thus, events become more localised as they become more extreme, due to asymptotic independence.

Determining the rarity of the storm Desmond event
The methodology is used to determine the rarity of river flows that were observed on the December 5, 2015, storm Desmond event. This estimate is derived from the daily mean river flow data discussed in Section 6.1 with the results presented on a daily scale. The observed daily mean river flows are shown in Figure 8a with the largest values observed near Lancaster and Carlisle. However, when we determine the associated estimated marginal return periods, with inference using the GPD tail model (2), the river flow observed near Lancaster is found to be the most extreme, as shown in Figure 8b. The marginal observational probability for the Lancaster gauge is estimated to be 3.6 × 10 −5 . Figure 8b shows that the event was particularly rare over all Cumbria and northern Lancashire, but it was extreme at only one of the gauges near Manchester in the south of the study region.
In order to determine the probability P(R 1 > q 1,p 1 , … , R d > q d,p d ) of jointly observing river flows over the region, which are worse than the 2015 event, we use both the empirical Heffernan and Tawn (2004) residual approach and our Gaussian copula approach with the joint probability given by the integral (15). We illustrate the calculations by separately taking the conditioning gauge to the Cumbrian gauge, shown in Figure 6a, and the Lancaster gauge, identified by Figure 8b. Using the Cumbrian gauge, we estimate the joint probability to be < 1.60 × 10 −12 and 3.70 × 10 −9 using the respective methods, whereas these respective estimates become 9.50 × 10 −10 and 8.00 × 10 −9 using Lancaster. When conditioning on the Cumbrian gauge, we can only bound the joint probability using the empirical Heffernan and Tawn (2004) residual approach as we get no events as extreme as that observed at Lancaster in 10 8 events simulated, where all of which exceed the observed 2015 event at the Cumbrian gauge. In contrast, the Gaussian copula approach gives estimated probabilities that are stable with respect to the conditioning gauge and are computationally efficient in contrast to the existing approach for such an extreme and widespread event.

DISCUSSION
Through using semiparametric model-based inference, this paper has shown how the methodology of Heffernan and Tawn (2004) can be extended to produce more efficient inferences, particularly as the dimension of the multivariate problem increases. Our approach proposed improvements in the inference of the residual distribution of the Heffernan and Tawn (2004) model, via kernel-smoothed marginal distributions and using a Gaussian copula. These methods also help in terms of computational and statistical efficiency in dealing with the problem of missing data that is commonly encountered in environmental data sets.
Our proposed Gaussian copula approach has a downside in that a different correlation matrix Σ is required for each conditioning site. Thus, for d sites, there are d ) correlation parameters to estimate, that is, O(d 3 ) parameters. As a result, it seems sensible to determine whether there are any known relationships that can help to make the model parsimonious. An approach suggested by a referee was to adopt a semiparametric specification method similar to that of de Carvalho and Davison (2014), whereby the different residual densities are interlinked via a tilting term, that is, with g i (z) = dG i (z)∕dz, where G i is the limiting distribution in expression (4) when conditioning on variable Y i being large, and with ( i , i ) being constants. If condition (19) holds, the number of parameters reduces to O(d 2 ). Unfortunately, this formulation does not appear to be appropriate for our residual data either before or after standardisation to Gaussian marginals. An alternative O(d 2 ) approach would be to use a stationary Gaussian process to explain Z N (Tawn et al., 2018), but that requires the process to be modelled in an appropriate space. In standard environmental studies, the Euclidean distance metric between sites is used to explain spatial dependence. However, as shown by  and Asadi et al. (2015), Euclidean distance is not always sufficient for capturing the dependence between river flow gauges. The more appropriate distance metric is to consider for the hydrological distance, which is defined as the distance between centroids of the associated catchments for each site. This takes into account that two gauges that spatially might be far apart in fact are similar in nature as they lie within the same catchment. In order to determine whether this factor could be used to simplify the correlation matrix, four conditioning sites were selected with differing spatial locations and catchment areas. Conditional on location k, the estimates of the correlation between Z i and Z j (for sites s i and s j ) given Y k is large, denoted by ij|k for i, j ≠ k, were plotted as a function of both the Euclidean ||(s i , s j )|| E and hydrological ||(s i , s j )|| H distance for each pair. This comparison of the correlation and distance metrics can be seen in Figure 9. As expected as the distance between pairs of sites increases, the correlation tends to decrease. Interestingly, there is no substantial difference between the explanatory capabilities of Euclidean and hydrological distance. Anomalous behaviour can be seen in Figure 9a; as for one of the sites, the residual correlation with all other sites is approximately equal to zero. This site is close to conditioning gauge 68003; therefore, the Heffernan and Tawn (2004) model has explained all of extremal behaviour at this gauge, with the other sites. This illustrates that ij|k will depend on s k as well. Other known hydrological characteristics could also be used to explain the residual dependence structure; these include variables such as the catchment responsiveness as well as the soil type. For example, a chalk catchment is slower to respond to heavy rainfall events than a catchment in North West England (Boorman, Hollis, & Lilly, 1995). Generalising these features is difficult as we are trying to simplify the correlation of unexplained behaviour of the extremes rather than of the observed process itself.
This paper has shown that the proposed Gaussian copula model for the joint residual distribution of the Heffernan and Tawn (2004) model is ideal for classes of asymptotically dependent and asymptotically independent distributions. A simulation study shows in low-and high -dimensional examples the benefits of the proposed approach over other alternatives for both missing and nonmissing data problems, as well as under misspecification of the Gaussian copula. A case study of river flow data shows the benefits of the method for assessing the risk of an event similar to the storm Desmond event. An analogous analysis using existing methods would have been both incredibly computationally expensive and numerically sensitive to the choice of conditioning variable to estimate using existing methods.