Sub‐asymptotic motivation for new conditional multivariate extreme models

Statistical models for extreme values are generally derived from non‐degenerate probabilistic limits that can be used to approximate the distribution of events that exceed a selected high threshold. If convergence to the limit distribution is slow, then the approximation may describe observed extremes poorly, and bias can only be reduced by choosing a very high threshold at the cost of unacceptably large variance in any subsequent tail inference. An alternative is to use sub‐asymptotic extremal models, which introduce more parameters but can provide better fits for lower thresholds. We consider this problem in the context of the Heffernan–Tawn conditional tail model for multivariate extremes, which has found wide use due to its flexible handling of dependence in high‐dimensional applications. Recent extensions of this model appear to improve joint tail inference. We seek a sub‐asymptotic justification for why these extensions work and show that they can improve convergence rates by an order of magnitude for certain copulas. We also propose a class of extensions of them that may have wider value for statistical inference in multivariate extremes.

then the maxima may still be strongly dependent. The class of distributions with this property is elucidated by one of the standard measures of extremal dependence (Coles et al., 1999;Joe, 1993), where X i $ F i for i ¼ 1,2. The values χ > 0 and χ ¼ 0 are, respectively, termed asymptotic dependence and asymptotic independence; with a dependent/independent bivariate extreme value distribution, the largest values of X 1 and X 2 can/cannot occur together. A related quantity, χ, lies in the interval [À 1, 1] and is used to distinguish among different degrees of asymptotic independence (Coles et al., 1999). Owing to the breadth of the class of asymptotically independent distributions, there have been numerous studies of sub-asymptotic properties of bivariate maxima. For example, Bofinger and Bofinger (1965) and Bofinger (1970) derived the correlation of componentwise maxima for bivariate Gaussian and certain other copulas for n ≤ 50; more recent examples are Beranger et al. (2017) and Beranger et al. (2019).
There have been parallel developments for multivariate threshold methods. The underpinning limit theory is based on point processes and multivariate regular variation (Coles & Tawn, 1991;de Haan, 1985;Resnick, 2007). Some work focuses on how second-order features influence estimators (Cai et al., 2011), whereas other approaches reframe the problem by converting second-order features into the primary term in the limit theory. Ledford and Tawn (1997) take X 1 and X 2 to have unit Fréchet marginal distributions and consider lim t!∞ t 1=η PrðX 1 > xt, X 2 > ytÞ for fixed x, y > 0 and some constant η, with 0 < η ≤ 1, yielding a finite limit which gives a first-order limit model that smoothly encompasses perfect dependence, asymptotic dependence, asymptotic independence and complete independence. This limit characterization was later extended to lim t!∞ t λðγÞ PrðX 1 > xt γ , X 2 > yt 1Àγ Þ, where λ(γ) is a positive function of γ (0 ≤ γ ≤ 1) satisfying a range of conditions described in de Valk (2016). These results, and other related asymptotically motivated models (Huser & Wadsworth, 2019;Wadsworth et al., 2017), encompass both asymptotic dependence and asymptotic independence, but they only consider growth rates in the arguments of the joint survivor function, on Fréchet marginals that are linked through a power; on exponential margins, these growth rates are proportional. Furthermore, current results in these cases are for low-dimensional cases only.
In this paper, we focus on the conditional extremal model of Heffernan and Tawn (2004), which places no preconditions on the relative growth of the large variables and has been widely used for substantive applications owing to its ability to handle a wide range of joint tail dependencies, its parsimony, its simple computational properties and its applicability to high dimensions (Tawn et al., 2018). To simplify the notation, we deal with the bivariate case, but extension to the general multivariate case, of both existing methods and our developments, is straightforward.
This model was originally presented for marginally Gumbel distributed random vectors, but Keef et al. (2013) showed that formulation on the Laplace scale is preferable when positive or negative dependence is possible, so we first transform (X 1 , X 2 ) to random variables (X, Y) with Laplace margins via the probability integral transform and similarly for Y, preserving the dependence structure through the copula, according to Sklar's (1959) z-representation theorem.
Under conditions specified by Heffernan and Tawn (2004), which include the joint distribution of (X, Y) being in the standard domain of attraction of the bivariate extreme value distribution, the conditional extremal model presupposes the existence of normalizing functions aðÁÞ : ℝ þ ! ℝ and bðÁÞ : ℝ þ ! ℝ þ such that for x > 0, where H(Á) is a non-degenerate distribution function with no mass at infinity. Under mild assumptions on the distribution of (X, Y), results in Heffernan and Resnick (2007) imply that with the functions L a ðxÞ and L b ðxÞ slowly varying: LðxtÞ=LðxÞ ! 1 for any fixed t > 0 as x ! ∞, where L is either L a or L b . Two aspects of the limit distribution should be noted. First, the Laplace margins imply that the exponential limit for X À u is exact for any positive u. Second, (2) corresponds to Z u and X becoming independent as u ! ∞, so the finite-u distribution H u of Z u depends less and less on X as the limit is approached; that is, H u ! H.
In order to construct a statistical model, Heffernan and Tawn (2004) and Keef et al. (2013) assume that the limit on the right-hand side of (2) holds exactly above some finite u, that is, H u ¼ H, and they adopt parametric families for a(Á) and b(Á) that satisfy (3), yield a parsimonious model and encompass a wide range of asymptotic dependence and asymptotic independence structures. By considering the forms of a(Á) and b(Á) in a broad class of copulas, they propose taking canonical parametric forms for a and b, that is, which include all the normings they found and correspond to approximating the slowly varying functions by L a ðxÞ α and L b ðxÞ 1 in expression (3). The latter is equivalent to setting L b ðxÞ ¼ b for any constant b > 0, with the change in norming absorbed into the variance of H(Á). If α ¼ 1 HðÀzÞe Àz dz, and otherwise they are asymptotically independent with χ ¼ 0 and the value of χ dependent on the upper tail form of H.
Evidence is emerging that the canonical norming functions (4) are not optimal for all theoretical copulas or in statistical practice. Papastathopoulos and Tawn (2016) found examples of the inverted multivariate extreme value copula (i.e., the lower joint tail of the multivariate extreme value copula) for which more general forms of L a ðxÞ and L b ðxÞ of the form (3) are required. Tendijck et al. (2020) and Simpson et al. (2020) have also found improved fits using a 0 ðxÞ ¼ α 0 þ αx for some constant α 0 .
In this paper, we study possible theoretical justifications for this improved performance by exploring the sub-asymptotic behaviour of the conditional multivariate extreme value limit (2) for some well-studied copulas. We quantify the relative benefits of different characterizations in (3) by determining their respective rates of convergence in (2). We also explore whether relaxing the limiting independence assumption for X and Z u can further improve the rates of convergence. Another motivation for our study is that when simulating data on which to assess the performance of methods to fit the conditional model (e.g., Lugrin, 2018), the estimates of a 0 (x) and b 0 (x) for x > u can misleadingly suggest a poor fit, as it is a(x) and b(x) for x > u that are being estimated; sub-asymptotic forms for a(x) and b(x) are helpful in providing a baseline for comparison.
The sub-asymptotic behaviours that we find suggest novel parsimonious sub-asymptotic parametric forms for a(x) and b(x), which can reduce the sensitivity of inferences to the choice of threshold u and enable a lower threshold to be used in practice. This is important, as small differences in parameter estimates and uncertainty at finite levels can lead to large differences when extrapolating to rarer events.
Section 2 introduces the framework used to study the sub-asymptotic behaviour of the conditional tail model and our rate of convergence metrics. In Section 3, we consider three copulas for which incorporating the sub-asymptotic structure can lead to improved convergence; the proofs are in the supporting information. In Section 4, we unify our findings and propose sub-asymptotic parametric models that extend the Heffernan-Tawn class of norming functions.

| CONVERGENCE FORMULATIONS
The right-hand side of expression (2) encapsulates the limiting conditional independence of Z u ¼ fY À aðXÞg=bðXÞ and the excesses X À u for large X. We first consider the marginal limiting behaviour of Z. Under further assumptions, relating to convergence and existence of joint densities, Heffernan and Resnick (2007), Resnick and Zeber (2014) and Wadsworth et al. (2017) show that where a(Á), b(Á) and H(Á) are the same as in (2).
The purpose of our sub-asymptotic analysis is to characterize the behaviour of the remainder terms, defined in the notation of (4) by aðxÞ À a 0 ðxÞ $ r a ðxÞ, bðxÞ where r a (x) and r b (x) satisfy r a (x) = o{a 0 (x)} and r b (x) = o{b 0 (x)} as x ! ∞, and are to be interpreted as the leading order terms only in the differences a(x) À a 0 (x) and b(x) À b 0 (x), respectively. Specifically, we consider the second-order normalization for a(Á) and b(Á), with With these sub-asymptotic forms, we are able to refine the normalization of Y in (5), yielding the sub-asymptotic conditional distribution with H x (z) ! H(z) as u ! ∞. Heffernan and Tawn (2004) gave the rate of convergence of the conditional distribution for various copula models in terms of r 0 (x, z) ! 0, as with (X, Y) on the Gumbel scale, finding that the rate at which r 0 (x, z) ! 0 did not depend on z. We shall need similar results with Laplace margins.
We consider how much we can improve the convergence rate of r 0 (x, z), when using the sub-asymptotic norming, by studying the rate of convergence to zero of We also want to quantify the sub-asymptotic remainder, using r ðsÞ 1 ðx, zÞ ¼ Pr We hope to show that r ðsÞ 1 ðx, zÞ ¼ ofr 1 ðx,zÞg and r 1 (x, z) = o{r 0 (x, z)} as x ! ∞ for all z and that the rates of convergence to zero for the distances r, r 1 and r ðsÞ 1 do not depend on z. Section 3 gives two examples where this improved convergence is achieved and one where it is impossible to find better normalizations than a 0 (x) and b 0 (x). We shall present these rates on a scale that is invariant to the marginal choice, by converting to a return period n, where PrðX > xÞ ¼ n À1 .
If we choose x such that PrðX > xÞ ¼ 1=n, then the rate of convergence to the limit distribution is O log log n= ffiffiffiffiffiffiffiffiffiffiffi log n p À Á using the ultimate norming in (8), which is improved to O 1= ffiffiffiffiffiffiffiffiffiffiffi log n p À Á by the sub-asymptotic norming in (9), and the sub-asymptotic remainder (10) is Ofðlog log nÞ 2 =ðlog nÞ 3=2 g.
If b 0 ðxÞ ¼ 1 þ ðρ 2 xÞ 1=2 , then Theorem 1 still holds for all ρ, including ρ ¼ 0, though the variances of H and H x no longer have the ρ 2 term and the third-order terms of a 1 (x) and b 1 (x) diverge as ρ ! 0. Moreover, H x is not Gaussian for finite x and has a truncated lower tail, with the truncation diminishing as x ! ∞.
When assessing the performance of different methods to fit the Heffernan and Tawn model to simulated data above a finite threshold u, it is tempting to use the limiting norming functions a 0 (x) and b 0 (x) in (4) as the true values of the location and scale functions in (2). In the case of simulated data from the Gaussian copula, Theorem 1 shows that this can be misleading, as the sub-asymptotic norming a 1 (Á), b 1 (Á) gives a better approximation to the location and scale functions above u. By replacing x by the threshold u in the logarithmic terms of (11) and taking u large, we derive second-order approximations for α = a 1 (x)/x and β ¼ log b 1 ðxÞ=log x of the forms Figure 1 illustrates convergence of these approximations when p = 0.5, and for values of u corresponding to Laplace quantiles from 0.975 to 0.99998, for both the simplified forms in (12) and (13) and those including the next-order term from (11). Convergence is very slow, so it makes sense to consider second-order approximations when measuring the adequacy of finite-sample estimates. For an idea of the amount of data needed to reach such quantiles, we change the scale of the abscissa to the return period scale, using with F L ðÁÞ the Laplace distribution function, x any quantile on the Laplace scale and n Y ¼ 365:25 the number of observations per year. Even with the equivalent of more than 100 years of daily data, the parameters differ strikingly from their asymptotic values.

| Logistic distribution
Let (X, Y) have a bivariate logistic distribution with Laplace margins, with V given at (14). In the following, we do not consider the case γ ¼ 1 corresponding to complete independence. The degree of asymptotic Theorem 3. Let (X, Y) have a bivariate inverted logistic distribution with dependence parameter γ (0, 1] and Laplace margins. Then, the ultimate normings (4) for Y given that X = x, with x large, are a 0 (x) = x and b 0 ðxÞ ¼ 1; the sub-asymptotic normings (6) are identical to a 0 and b 0 , and r 0 ðx, zÞ ¼ Ofr 1 ðx, zÞg ¼ Ofr ðsÞ 1 ðx, zÞg.

| SUB-ASYMPTOTIC MODEL
Based on the examples studied in Section 3, we now suggest a class of sub-asymptotic models for the Heffernan and Tawn (2004) model that improves convergence rates relative to the limit model and contains the models of Tendijck et al. (2020) and Simpson et al. (2020) and all the terms that improve convergence rates for the three copulas studied above. This model should yield better statistical inferences than the canonical formulation (4). The proposed extension is parsimonious, with just two further parameters in its simplest form, where α and β are from the first-order norming functions a 0 and b 0 , γ a > À1, γ b ≥ 0, and the functions L a , L b are slowly varying at infinity.
For statistical modelling, we must specify L a and L b , and as in earlier work (e.g., Ledford & Tawn, 1996), we fix them to be constant above some threshold u, that is, L a ðxÞ ¼ δ a and L b ðxÞ ¼ δ b for x > u. Second-order effects are hard to estimate, so in practice, it may suffice to set γ a ¼ γ b ¼ 1.

AUTHOR CONTRIBUTIONS
This work is part of the PhD thesis of TL, jointly supervised by ACD and JAT. The bulk of the theoretical development was done by TL, with help from ACD and, principally, JAT. All authors contributed to the writing of the paper.

FINANCIAL DISCLOSURE
This research was partially supported by the Swiss National Science Foundation.