Spatial deformation for nonstationary extremal dependence

Modeling the extremal dependence structure of spatial data is considerably easier if that structure is stationary. However, for data observed over large or complicated domains, nonstationarity will often prevail. Current methods for modeling nonstationarity in extremal dependence rely on models that are either computationally difficult to fit or require prior knowledge of covariates. Sampson and Guttorp (1992) proposed a simple technique for handling nonstationarity in spatial dependence by smoothly mapping the sampling locations of the process from the original geographical space to a latent space where stationarity can be reasonably assumed. We present an extension of this method to a spatial extremes framework by considering least squares minimization of pairwise theoretical and empirical extremal dependence measures. Along with some practical advice on applying these deformations, we provide a detailed simulation study in which we propose three spatial processes with varying degrees of nonstationarity in their extremal and central dependence structures. The methodology is applied to Australian summer temperature extremes and UK precipitation to illustrate its efficacy compared with a naive modeling approach.


Introduction
Statistical methodology for spatial extremes can increasingly handle data sampled at more observation locations.If these observations are taken over large domains with complex features, then there is a strong chance that the data will exhibit spatial non-stationarity in both the marginal distributions and dependence structure.Marginal non-stationarity can often be dealt with by site-wise modelling and transformation.However, there are currently few methods to deal with non-stationarity in extremal dependence structures, and a typical approach is to falsely assume stationarity when fitting spatial extremes models.This may be appropriate when modelling data sampled over small and/or homogeneous regions in space, but as we will illustrate through the examples in Section 3, this assumption is not realistic for many datasets with larger spatial domains.
Non-stationarity in the spatial dependence structure has been studied by Huser and Genton (2016) in the context of max-stable models, through incorporation of a non-stationary variogram.
However, this approach requires knowledge of relevant covariates, and asymptotically dependent maxstable models for spatial extremes have been shown to be too inflexible for many spatial datasets (Wadsworth and Tawn, 2012, Davison et al., 2013, Huser et al., 2017, Huser and Wadsworth, 2019).
Another approach is to assume local stationarity for model fitting, see Blanchet and Creutin (2017), Castro-Camilo and Huser (2019).This framework is well-suited to modelling processes with shortrange dependence but is unlikely to fully capture dependence at large distances.Cooley et al. (2007) and Blanchet and Davison (2011) account for non-stationarity by transforming their spatial domain of interest to some new 'climate space' in which observation locations with similar characteristics are grouped closer together.Again, this approach requires access to relevant covariates and a deeper understanding of the processes which are being modelled.
In this work we develop a computationally quick and simple method, which does not require prior knowledge of covariates and which can be applied before fitting any model suited to spatial extremes.Our method uses spatial deformation and is based on the work of Sampson and Guttorp (1992) and Smith (1996), which has not been fully adapted for use in a spatial extremes framework.
The deformation methodology may reveal physical features and/or covariates that can be incorporated into a spatial extremes model, removing the need for models with complex dependence structures.Wadsworth and Tawn (2019) applied the deformation method of Smith (1996) before fitting a conditional spatial extremes model to the same Australian summer temperatures data that we explore in Section 4.1.However, because this method is not tailored to extremal dependence, it was neccesary to assume that patterns in non-stationarity were similar for both the extremal and nonextremal dependence structures.Youngman (2020) and Chevalier et al. (2017) provide extensions of the Sampson and Guttorp (1992) methodology and fit models for spatial extremes using deformations: a Gaussian process using a censored pairwise likelihood and a max-stable model, respectively.
Although these models may be reasonable for some processes, use of either puts restrictions on the types of dependence that the process can exhibit.We look to develop a method that makes no strong assumptions on the extremal dependence structure.
The remainder of this section provides an overview of existing methodology for spatial deformation and modelling of spatial extremes.Our developments of the spatial deformation methodology are detailed in Section 2. We present a simulation study in Section 3, which is usually absent from the literature on spatial deformations.This study is used to convey that our adaptations to the deformation methodology are necessary when considering extremal dependence and that our method can be used for different processes with a wide range of extremal dependence structures.Finally, we apply our method to temperature and precipitation datasets in Section 4, and conclude with a discussion in Section 5.

Non-stationary spatial processes
The spatial deformation approach for handling non-stationarity in spatial processes was first proposed by Sampson and Guttorp (1992) and Guttorp and Sampson (1994), with further developments by Meiring et al. (1997), see Sampson (2010, Ch. 9.5).The underlying principle of their approach is that a smooth non-linear transformation can be used to map the sampling locations of a process from a geographical plane, or G-plane, to some latent space, which they name a D-plane, or dispersion plane.Within the D-plane, the dependence structure of the process is assumed to be both stationary and isotropic, and the usual statistical inferences can be made using stationary geostatistical models.
To obtain the D-plane, optimisation techniques are used to minimise some objective function which is associated with a stationary geostatistical model.Here Sampson and Guttorp (1992) use multidimensional scaling and a stationary spatial dispersion function, whereas further work proposed by Smith (1996) uses the likelihood for a stationary Gaussian process.Our approach is to change this objective function for one which is associated with a stationary spatial extremes model, such as the max-stable, or inverted max-stable, processes.
We begin by assuming we have realisations Z = {Z 1 , . . ., Z N } from a spatial field observed at sampling locations s 1 , . . ., s d , and so we have Z k = {Z k (s 1 ), . . ., Z k (s d )} for all k = 1, . . ., N .We require some smooth mapping function from the G-plane to the D-plane, given by f (s i ) = s * i for i = 1, . . ., d, where s i = (x i , y i ) and s * i = (x * i , y * i ) are the corresponding locations in the D-plane.Both Sampson and Guttorp (1992) and Smith (1996) propose the use of thin-plate splines to achieve this mapping.However, we note that under certain conditions on the correlation structure, analytical forms for f (•) do exist.Perrin and Meiring (1999) prove that this mapping is identifiable assuming differentiability of the stationary and isotropic correlation function used for fitting and Perrin and Senoussi (2000) derive analytical forms for f (•) under the same assumption, with extensions to anisotropic correlation structures.As these results are available only for correlation functions, and not for extremal dependence functions, we instead use the more flexible thin-plate spline approach.
A thin-plate spline is a mapping function f (•), passing through a finite number of data points Here we have denoted f * the 'true' function that we wish to estimate with the thin-plate spline, f , and f * i are observations.Green and Silverman (1994) give a solution to this problem in the form where and g i (x, y) = h 2 i log h i , with h i the Euclidean distance between (x, y) and (x i , y i ).This represents f as the sum of linear terms and n radial basis functions with centres at the observed data locations (x i , y i ) and the constraints are in place to ensure that the system of equations does not become overdetermined.
An interpolating spline satisfies f * i = f (x i , y i ) for all i = 1, . . ., n, whereas we desire a smoothing spline; this can be created by minimising for some smoothing parameter α > 0. Sampson and Guttorp (1992) give a method for estimating α in the context of multidimensional scaling, but here we take the approach of Smith (1996), who uses a restricted representation of (1) instead.A subset of m radial basis functions is used and so we let The choice of this subset is discussed in Section 3.
The function in (1) maps R 2 to R, so the spline is applied twice with different parameter estimates to produce both components.Smith (1996) gives a parametrisation as where b 1 > 0, b 2 > 0, ρ ∈ R and each of the sequences δ (1) , δ (2) satisfy the constraint in (2).The introduction of the parameters b 1 , b 2 and ρ is to ensure that the model is invariant under orthogonal rotations when m = 0. Overall, this yields a spline with 2m − 3 free parameters whenever m ≥ 3.
The resulting spline is then used to map the sampling locations s i to locations s * i in a latent space.
Parameters are estimated by minimising some objective function provided by a stationary model.As previously mentioned, Sampson and Guttorp (1992) use a stationary spatial dispersion model and multidimensional scaling, the details of which are not given here.Instead, we focus on the approach by Smith (1996), who uses a stationary Gaussian likelihood.It is assumed that , where µ and Ω are the mean vector and a stationary covariance matrix, respectively.As we are only interested in measuring the dependence structure, it is assumed that the means and variances at each location are known.Analysis is then simplified to only considering the minimisation of the negative log likelihood given by where Ω and Ω are the theoretical, and sample, correlation matrices and tr(•) and | • | are the trace and determinant operators, respectively.The entries of the theoretical correlation matrix are produced by using a stationary covariance function.Smith (1996) uses the Matérn covariance function, and so where θ 1 > 0, θ 2 > 0 and K θ 2 (•) is the modified Bessel function of the second kind of order θ 2 and h * ij = s * i − s * j is the Euclidean distance between locations s * i and s * j in the D-plane.It is noted that θ 1 can be set to 1 as the spatial scaling of the locations is controlled by the spline.

Spatial extremes
Before describing an extension of the spatial deformation methodology tailored to spatial extremes, we first provide a brief review of methods for modelling spatial extremes.

Max-stable and inverted max-stable processes
Max-stable processes were introduced by de Haan (1984) and developed further by Smith (1990) and Schlather (2002), who suggested models that were first fitted by pairwise composite likelihood in Padoan et al. (2010).They are usually described by a spectral construction.Suppose {r i ; i ≥ 1} are points of a Poisson process on (0, ∞) with unit intensity.Let S ⊆ R 2 be a spatial index set, and {W i (s); s ∈ S, i ≥ 1} be independent and identically distributed copies of a non-negative stochastic is a max-stable process with unit Fréchet margins.The d-dimensional joint distribution function for where the exponent is Careful specification of the stochastic process W (s) leads to a limited selection of parametric models for the max-stable process.A particularly flexible model is the Brown-Resnick model (Brown andResnick, 1977, Kabluchko et al., 2009).This involves specifying W (s) = exp{U (s) − γ * (s, 0)} for U (s) a centred Gaussian process with semivariogram γ * (•, •) and where U (0) = 0 almost surely.This leads to a 2-dimensional joint distribution with exponent function where a = [2γ * (s i , s j )] 1/2 and Φ(•) denotes the standard normal distribution function.Note that for a stationary and isotropic Brown-Resnick process, γ * (s i , s j ) is dependent on h ij = s i − s j only.For clarity, we write γ(h ij ) when Z is stationary and isotropic, and γ * (s i , s j ), otherwise.Representations for (10) in higher dimensions exist (see Huser and Davison (2013) or Wadsworth and Tawn (2014)), but due to their computational complexity, inference for max-stable processes is typically done pairwise, providing a reasonable balance between computation time and efficiency.
Max-stable processes are inherently asymptotically dependent, or perfectly independent.That is, Z(s i ) and Z(s j ) are asymptotically dependent, or perfectly independent, for all s i , s j ∈ S.Here we characterise asymptotic dependence using the upper tail index χ (Joe, 1997).Assuming Z(s i ) ∼ F i , Z(s j ) ∼ F j , we have where the process is asymptotically independent at locations s i and s j if χ(s i , s j ) = 0, and asymptotically dependent otherwise.Here we write χ(s i , s j ) as Z is not necessarily stationary; henceforth, we write χ(h ij ) for h ij = s i − s j when it is assumed that χ is a function of distance only.As this measure is theoretically non-zero at all spatial lags for any max-stable process exhibiting positive spatial association ie., χ(s i , s j ) > 0 for all s i , s j ∈ S, we require other modelling approaches to deal with processes that may exhibit asymptotic independence.
Wadsworth and Tawn (2012) introduced the inverted max-stable process as that obtained by applying a monotonically decreasing marginal transformation to a max-stable process.For example, with Z as defined in (7), taking Y (s) = 1/Z(s) gives an inverted max-stable process with exponential margins and joint survival function where V is as given in (9).Such a process is asymptotically independent with χ(s i , s j ) = 0 for all s i = s j , but can accommodate a variety of flexible extremal dependences structures exhibiting positive association.The dependence in asymptotically independent processes may be characterised by a pre-limiting version of (11).Specifically, under an assumption of hidden regular variation (Ledford andTawn, 1996, Resnick, 2002), with L(•) slowly varying at 0 and η(s i , s j ) ∈ (0, 1] the coefficient of tail dependence.For an inverted max-stable process, χ q (s i , s j ) = (1 − q) V (1,1)−1 .
We fit both max-stable and inverted max-stable models after applying our deformation method for non-stationary spatial extremes.Note that although max-stable processes are typically taken to represent the limiting behaviour of maxima, in practice they, along with inverted max-stable processes, can be used for all extreme values through specification of a censored likelihood; see Section 2.5.
Inference on these models can then be used to determine the efficacy of our deformation method.

Conditional extremes
An alternative approach to modelling spatial extremes is to condition on the behaviour of the process when it is extreme at a single site.Here we give a brief overview of modelling the extremal behaviour of the process at two sites using this approach.For a full characterisation, see Wadsworth and Tawn (2019) or Shooter et al. (2019).We suppress some of the notation used by Wadsworth and Tawn (2019) and Shooter et al. (2019) as we are only considering a discrete pairwise fit, that we will employ in Section 4 as a diagnostic measure.For further details of the discrete approach, see Heffernan and Tawn (2004).Winter et al. (2016) apply this same methodology to a dataset of Australian temperatures, which we revisit in Section 4.1.
We begin by assuming that {X(s) : s ∈ S ⊂ R 2 } is a stationary and isotropic process with exponential-tailed marginals and denote X(s i ) = X i .Conditioning on X i = x i > u being large and considering X j , i = j, Heffernan and Tawn (2004) assume that there exist normalising functions where G is non-degenerate.Re-writing Z = {X j − a(x i )}/b(x i ) as the standardised residual, and making the assumption that the limit holds above some high threshold u, we have where Inference on G is often simplified by making the working assumption that Z ∼ N (µ, σ 2 ) and using a specified parametric form for the normalising functions a(•), b(•).For positively dependent data, we simplify the normalising functions to a( . The bivariate form of the conditional model can thus be expressed The conditional model holds some useful advantages over joint modelling using max-stable, or inverted max-stable, processes.For one, it is able to handle both asymptotically dependent, or asymptotically independent, data.Parameter estimates for α and β can indicate the nature of the dependence between X j and X i .For example, asymptotic dependence between X j and X i is implied by estimates α = 1, β = 0. Within the class of asymptotically independent variables, α < 1, β > 0, with α = β = 0 giving near extremal independence. The spatial extensions of this model (Wadsworth andTawn, 2019, Shooter et al., 2019) specify α and β as functions of distance between sites, when the underlying process is stationary and isotropic.
As such, we can use these parameter estimates as diagnostics, to determine whether our deformation method has created a process that has a more stationary extremal dependence structure.We are motivated to use these estimates as our deformation method does not use a conditional extremes approach for fitting.

Spatial deformation for extremes
In this section, we discuss our adaptations of the deformation methodology for application in a spatial extremes framework.We begin in Section 2.1 by proposing a new objective function to that of (5).
Instead, we consider minimising the difference between theoretical and empirical χ measures, where the former are produced through specification of a stationary max-stable dependence structure for the process in the D-plane.This does not in fact mean that this method will not work for asymptotically independent data; on the contrary, in Sections 2.2 and 2.3 we show that the model choice for χ(•) is somewhat arbitrary and a single, simple parametric form works well for both classes of extremal dependence.Section 2.4 follows with some practical advice for choosing the anchor points used in estimating the thin-plane spline and we conclude with details of model fitting and selection using censored pairwise likelihoods in Section 2.5.To assess the efficacy of the deformations we produce, we fit full max-stable, and inverted max-stable, dependence models.

Objective function
To adapt the methodology of Sampson and Guttorp (1992) and Smith (1996) to better suit a spatial extremes framework, we change the objective function given in (5) to the Frobenius norm of the difference between theoretical and empirical pairwise dependence matrices X : That is, we estimate the parameters of the thin plate spline through computing where χ(h * ij ), defined in ( 11), is the upper tail index calculated between the process at locations s * i and s * j in the D-plane and χ(h * ij ) is its empirical estimate.Recall that we assume stationarity in the D-plane, and so write χ(h * ij ), rather than χ(s * i , s * j ).In practice, this measure cannot be estimated in the limit as q → 1.As such, we estimate χ(h * ij ) by fixing some high threshold q < 1 and calculating where Fk (•) is the empirical distribution of observations Z(s * k ) = Z(s k ).Under asymptotic dependence, we assume that χ q (h * ij ) ≡ χ(h * ij ) for large enough q.Under asymptotic independence, although χ q (h * ) → 0 as q → 1, we typically have χ q (h * ) > 0 for q < 1 and spatial structure in this measure that makes it informative about non-stationarity.
We now focus on a choice of function χ(h * ), which we only require to be monotonically decreasing from 1 to 0. This leaves several options, including specific parametric forms for χ(h * ) and χ q (h * ) from max-stable, and inverted max-stable, processes.We remark that while we have used χ to measure extremal dependence, other extremal dependence measures exist, and can also be used in this framework.For example, the coefficient of tail dependence, η(h * ij ), from (13) can also be used to characterise the strength of asymptotic independence in extremes.This can be estimated separately from χ(h * ij ), however, we found that due to the high variance of the estimator for η(h * ij ), it was often outperformed by using χ(h * ij ).

Asymptotic dependence versus asymptotic independence
As a parametric model for χ(h * ) we take the form implied by the stationary Brown-Resnick process, where θ(•) is the extremal coefficient function (Schlather and Tawn, 2003) and ) controls the dependence of the max-stable field and a typical choice for the semivariogram would be where λ > 0 is a scaling parameter and κ ∈ (0, 2] is a smoothing parameter.Note that setting κ = 2 yields the Smith process (Smith, 1990), a special case of the Brown-Resnick process.As previously mentioned when discussing the Smith (1996) methodology for spatial deformation, we can set the scaling parameter λ to 1, as the spatial scaling of locations is controlled by the deformation itself.Note that the motivation for using the Brown-Resnick process as a parametric model is that χ(h * ) → 0 as h * → ∞, unlike other popular parametric models.For a stationary inverted Brown-Resnick process, we have We denote the dependence measures in ( 16) and ( 18) as χ BR and χ IBR q , respectively.Note that although these two measures have different parametric forms, and are applicable to different dependence structures, they often approximate each other very closely when used within a deformation framework; this is illustrated in Figure 1.Here we create deformations for a simulated dataset as described in Section 3.1 using both χ BR and χ IBR q .The plots show that both methods give very similar deformations when considering the non-stationarity in the χ(h * ij ) estimates.This seems to be the case for both asymptotically dependent and asymptotically independent data.Hence, for the sake of simplicity we only use χ BR to create deformations in the case studies in Section 4, as it appears to be flexible enough to capture non-stationarity in both classes of extremal dependence.
Comparison of deformations created using both parametric forms χ BR and χ IBR q for χ(•) for both max-stable data (left) and inverted max-stable data (right).Plots show empirical χ(h * ij ) estimates against distance, where the black triangles correspond to those created using χ(•) given by ( 16) and green triangles for those created using (18).The blue and red lines give the fitted function from ( 16) and ( 18), respectively.Distances are normalised so that the maximum distance is consistent between deformations.

Choice of parametric model for χ(h * )
We have also found that the function χ(h * ) from a Brown-Resnick process is sufficiently flexible to create suitable deformations for a variety of different extremal dependence structures.This is for similar reasons to above; different functions χ(h * ) which decrease to zero as h * → ∞ can approximate each other well.To illustrate this, we also considered the Gaussian-Gaussian process (Wadsworth and Tawn, 2012), which encompasses different dependence structures to the Brown-Resnick process, but for which where ρ(h * ) is a stationary correlation function and h = (h * , h * ) T and φ(•) is the bivariate Gaussian density function with mean 0 and covariance matrix Σ = diag(σ 2 , σ 2 ).Note that using a Matérn correlation function given in (6) with parameters θ 1 > 0 and θ 2 > 0, this function has one extra parameter than χ BR (h * ), namely σ > 0.
We chose not to use this parametric form for χ(h * ), due to the high computational cost required to compute the double integral for each pair of locations.However, we have found that the deformation method described in Section 2 appears fairly robust to the choice of χ(h * ).As Supplementary Figure 9 in the Supplementary Material shows, the much simpler χ BR (h * ) can approximate the more complex χ GG (h * ) very closely for much of h * ∈ R + .

Practical aspects for creating deformations
We now comment on practical aspects of creating the deformations, including choosing a subset of radial basis functions for the thin-plate spline and reducing the chances of producing a non-bijective transformation.
We found that there is no simple robust method for picking the number m, or configuration, of the anchor points used in the deformation splines given in (3).As detailed in Sampson and Guttorp (1992), there is a trade-off in picking m.Larger values provide "better" deformations, in the sense that the objective function to be calculated is lower and the deformations seem to capture more of the non-stationarity in the process.However, this comes at the price of computational cost, the risk of over-fitting and the phenomenon in which the D-plane folds on to itself.This provides a nonbijective transformation, which is physically unrealistic.Iovleff and Perrin (2004) detail an approach to ensure that the deformation is always bijective through use of a simulated annealing algorithm, with later extensions by Youngman (2020).These approaches add further constraints into the modelling procedure, which we have chosen to avoid.Instead we use a more heuristic approach for avoiding non-bijectivity.
We begin by randomly sampling m 0 initial anchor points with index set given by I 0 = {i 1 , . . ., i m 0 }.
There is no single best way to choose I 0 ; however, we found that ensuring that the anchor points are spread out over the spatial domain helped to create better deformations.Performing a deformation with m 0 +1 = 0}.This ensures that the initial input into the optimisation program creates a deformation that is already bijective.We then continue in this fashion until we have created a deformation using m * anchor points.Bijectivity is checked by eye.
Using this approach reduces the chances of the D-plane folding as m increases and provides a deformation with m * anchor points.Here we set m * as approximately a quarter of the sampling locations as we have not found a clear way to optimize this aspect.Typically this approach can be used for a number of initial index sets.However, in the interest of reducing computational cost, the simulation studies in Section 3 are conducted using the same initial index set for each deformation method.We also ensure that the new index sampled at each iteration is consistent across different samples, processes and deformation methods.

Model fitting and selection
To determine whether the deformation has created a process that is more stationary in the extremal dependence structure, and to compare between deformation methods, we look to fitting max-stable and inverted max-stable models to the data using the sampling locations in both the G-plane and the Dplane.In Section 1, the computational complexities of the max-stable and inverted max-stable models were discussed.To accommodate for this, we take a pairwise composite likelihood approach and assume independence between pairs (Padoan et al., 2010).The joint distribution for a Brown-Resnick process is given in (8) and the joint survival function for an inverted Brown-Resnick process is given in (12).
Note that the former is on standard Fréchet margins, whereas the latter is on standard exponential.To compare between the asymptotically dependent and asymptotically independent structures provided by the two models, we calculate all likelihoods on exponential margins, by first using a site-wise empirical transformation.
Given realisations {z 1 , . . ., z N } from a spatial field, observed at sampling locations s 1 , . . ., s d , the censored composite likelihood is where where ( λ, κ) are the maximum likelihood estimates from ( 19), H(•) is the Hessian matrix and J(•) is the variance of the score function, i.e.
In practice, we estimate J(•) by using numerical methods to find ∆ i = ∇ log L CL ( λ, κ; z i ), and then estimating the variance of the score function by setting a block of length b < N and computing The block sizes are chosen such that each block of data is more reasonably assumed approximately independent.This is usually specific to the data and will be given alongside any results.

Simulation study
We conduct three simulation studies to illustrate the efficacy of the deformation framework for modelling extremal dependence of non-stationary spatial processes.These studies are designed to highlight the following: • When fitting a stationary model to the extremal dependence of non-stationary spatial data, using a deformation method will improve the fit when compared to using the original sampling locations in the G-plane; • The deformation methodology described in Section 2.1 is more effective than the original Smith (1996) method when modelling non-stationary extremal dependence, as the latter is tailored towards modelling dependence in the body of the data rather than the extremes; • It is often necessary to use a deformation method that is tailored explicitly to extremal dependence, rather than dependence throughout the body; especially for processes that exhibit different degrees of non-stationarity throughout their extremal and central dependence structures.
In order to illustrate these points, we consider five different processes.These processes are chosen as they each exhibit different behaviour in their respective extremal dependence structures.In Section 3.1, we consider two processes: a non-stationary Brown-Resnick process and a non-stationary inverted Brown-Resnick process.In Section 3.2, we consider two more processes which are both mixtures of stationary and non-stationary processes.We term these max-mixture process and one exhibits asymptotic dependence whilst the other exhibits asymptotic independence.A final process is considered in Section 3.3, which is an asymptotically independent Gaussian mixture process.
For each setting, we begin with a sample of 1000 realisations of a spatial process.For this sample, we create four separate deformations using the procedure set out in Section 2.4.The first two deformations are created using the approach detailed in Section 2.1; with χ BR from ( 16) and χ IBR q from (18) as the dependence measures used in the objective function in ( 14).The latter two are correlation-based deformation methods: one of these is the original Smith (1996) methodology, while the other method replaces χ(h * ij ) in ( 14) with pairwise correlation ρ(h * ij ) as the dependence measure, and replaces the theoretical χ(h * ) function with the stationary Matérn correlation function detailed in (6).Note that in both of the latter two methods, correlation is estimated on a Gaussian marginal scale, and for the former two methods, we set q = 0.9 in ( 15) and ( 18).
As detailed in Section 2.5, we evaluate the efficacy of each of the four deformations by fitting a model to the extremal dependence of the sample.We fit the same dependence model five times: once using the sampling locations in the original G-plane and then once for each of the respective D-plane sampling locations given from the four deformations.For each fitted model, we calculate the CLAIC given in (21).Ordering of the CLAIC allows us to determine which deformation method (if any) was the most effective in accounting for the non-stationarity in that sample.As the underlying process from which the sample is drawn is known, we fit a stationary extremal dependence model of an appropriate class.That is, for processes that are asymptotically dependent, we fit a stationary Brown-Resnick model, and for processes that are asymptotically independent, we fit a stationary inverted Brown-Resnick model.
This procedure is repeated for 50 different samples of a single process.In this simulation study, each deformation for each sample is created using the same anchor points.For each sample, we determine which deformation method was the most effective and the proportion of times this occurred over all samples is reported, with the results in Tables 1, 2 and 3.These results show that stationary dependence models for non-stationary spatial processes routinely provide a better fit if the deformation methodology is used as a preprocessing step.We also show that the original Smith (1996) deformation is outperformed by our extensions.given in ( 17).The use of the radial basis function ψ(s) within this semivariogram causes pairs that are closer to o to be more strongly dependent than those pairs that are further away.From ( 16) and ( 17), the Brown-Resnick process with this semivariogram has theoretical χ(s i , s j ) given by

Non
for locations s i , s j and κ ∈ (0, 2], λ > 0. For this study, we take the centre o to be the origin and use scale and shape parameters λ = 2 and κ = 0.8 in (24).To illustrate the process a high resolution realisation is given in Supplementary Figure 10.Simulations are produced using the method of Dieker and Mikosch (2015).20) as the 90% empirical quantile, which is also used for estimating χ(h * ij ) in ( 15).

Process (G-plane)
Table 1 contains some interesting results.Most notably, in all cases a deformation has aided in model fitting when compared to using the original simulation grid.For both the max-stable, and inverted max-stable, cases, improvements on the efficacy of the original Smith (1996) method are made by utilising the Frobenius norm in the objective function.However, it is not entirely clear whether use of an extremal dependence measure for creating deformations is necessary in this case.We often found that deforming the space using measures for dependence throughout the distribution created better deformations than those using extremal dependence measures.We believe that this is because the variance of the estimator for ρ(h * ij ) is much lower than that of χ(h * ij ), as we use all of the data to estimate correlation, and that there are strong similarities in patterns of spatial non-stationarity for the central-and extremal-dependence structures of this process.We next consider other processes with more complicated dependence structures.

Max-mixture process
We now consider the hybrid dependence model, detailed in full by Wadsworth and Tawn (2012).Let X(s) be a max-stable process and Y (s) an asymptotically independent spatial process, both with standard Fréchet margins.For ω ∈ [0, 1], H(s) = max{ωX(s), (1 − ω)Y (s)} is an asymptotically dependent spatial process with standard Fréchet margins.In particular, we take X(s) to be the nonstationary Brown-Resnick process detailed in Section 3.1 and Y (s) to be a marginally transformed stationary Gaussian process with the Matérn correlation structure detailed in ( 6).
It can be shown that the theoretical χ(h ij ) values for H(s) are the same as for X(s), but multiplied by ω.There is no closed form for the correlation for H(s) on the Gaussian scale.Computationally, it can be shown that it is a mixture of the correlation from both X(s) and Y (s).As such, we would expect the extremal dependence and central dependence of H(s) to be mixtures of those coming from X(s) and Y (s), with different amounts of mixing occurring for both.We set ω to be 0.3 and take By construction of H(s), taking its reciprocal creates an asymptotically independent process on standard exponential margins, as with the inverted max-stable process.As in Section 3.1, the simulation study is repeated separately for the asymptotically dependent and asymptotically independent mixtures.The results are given in Table 2.
In contrast to the results given in Table 1, Table 2 shows a clearer need for an extremal dependence-based approach when creating deformations for a process that exhibits more complicated dependence structures.Here this max-mixture process is designed to represent a process with a mixture of stationarity in both the extremal dependence and dependence throughout the distribution.We now consider a process that has non-stationary extremal dependence, but is nearly stationary in the body.

Gaussian mixture process
With previous simulations, we found it is sometimes sufficient to simply use measures of central dependence when deforming the spatial domain to create a process with a more stationary extremal dependence structure.This is because the central-and extremal-dependence structures of these processes are closely related and using either approach typically creates similar deformations.In applications, we may find that these structures are not so closely related.As such, we are motivated to consider a process that is designed to have completely different dependence in the body to the tails.
Let Y S (s), Y NS (s) be stationary and non-stationary Gaussian processes, respectively, each with standard Gaussian margins.We then consider the process where s 0 ∈ S is a fixed location, Φ(•) is the standard Gaussian cdf, and p ∈ [0, 1] is a probability.By specifying Y * (s) in this manner, we create a process with an extremal dependence structure determined mostly by the correlation structure of Y NS and with dependence through the body determined mostly by Y S .Simulation of this process is simple; we draw Y (s 0 ) ∼ N (0, 1) and then simulate the rest of the field conditioning on that value and whether Φ(Y (s 0 )) ≤ p or Φ(Y (s 0 )) > p.
For this particular study, we use replications of this Gaussian mixture sampled at 81 equally We take s 0 to be the origin and p = 0.9.Both Y S and Y NS are specified to have the Matérn correlation structure given in ( 6), with respective parameter sets 2 ) and θ (NS) = (θ , o).Note that θ (NS) contains an extra parameter as we use the difference of the radial basis functions given in ( 23) and detailed by (Fouedjio et al., 2015) as a measure of pairwise distance, rather than Euclidean distance.The parameters for this study are set to θ (S) = (2, 1) and θ (NS) = (2, 0.8, (0, 0)).Results are given in Table 3. Smith (1996) 0 Table 3: Proportion of lowest CLAIC estimates provided by fitting models to deformations of 50 realisations of the Gaussian mixture process, see (25).The CLAIC has been estimated with a block size of b = 1, corresponding to temporal independence.Composite likelihoods are estimated with the threshold in (20) as the 90% empirical quantile, which is also used for estimating χ(h * ij ) in (15).
Table 3 highlights a clear need for extremal dependence-based methods when creating deformations for processes that have different patterns of non-stationarity in their central-and extremal dependence structures.In contrast to the results given in the previous studies, here using χ(h * ij ) or χ q (h * ij ) is always favoured.

Case studies
We present two case studies using our deformation methodology.In both cases, we follow the procedure set out in Section 2.4.However, as we consider relatively large spatial domains we use Great Earth distance in place of Euclidean distance for h and h * .We consider 30 different initial index sets, taking the best deformation over all sets.Here we define the best deformation to be that which provides the lowest objective value in ( 14) whilst remaining a bijective mapping.When using extremal dependence measures, we focus on deformations based on χ BR only, following the justification in Section 2.2.We then fit max-stable and inverted max-stable models to the data in the G-plane and D-plane, comparing the model fits using CLAIC estimates.For both studies, all pairs of sampling locations are used in model fitting and the block size in ( 22) corresponds to a season.We propose two diagnostics for scrutinising the model fits and deformations.

Australian summer temperatures
Data consist of daily summer (DJF) maximum near-surface air temperatures taken from the HadGHCND global gridded dataset (Caesar et al., 2006) and interpolated to 72 grid point locations covering Australia, for the period 1957-2014.Previous analysis of this data has been conducted using the multivariate conditional extremes model, detailed in Section 1.2.2 (Winter et al., 2016) and its spatial extension (Wadsworth and Tawn, 2019).Figure 2 shows the original sampling locations and estimated pairwise χ(h ij ) against distances.We estimate χ(h ij ) by setting q = 0.98 in (15).The deformation was produced using m * = 18, i.e. a quarter of the original sampling locations.These are presented as the blue points on Figures 2 and 3, where the latter figure depicts the sampling locations in the D-plane.Figure 3 also presents χ(h * ij ) against distance in the deformed space.We observe that the deformation has created a process that appears to be much more stationary with regards to the χ(h * ij ) estimates in the new coordinates.

G-Plane
Figure 2: Australia summer temperatures.Left: the original 72 sampling locations.The blue points are the anchor points used for the thin-plate splines.Right: empirical χ(h ij ) measures against distance (km).Estimates χ(h ij ) are calculated above a threshold given by the 98% empirical quantile.Composite likelihoods are estimated with the threshold in (20) as the 98% empirical quantile.( * estimated using Smith process likelihood).CLAIC and negative composite log-likelihood estimates are given to four significant figures.

D-Plane
The fits of the max-stable and inverted max-stable models are summarised in Table 4.The CLAIC estimates suggest that a max-stable model is more appropriate for the data.This becomes even more apparent when we consider that fitting an inverted Brown-Resnick model yields an inverted Smith model as the best fit.These processes are typically quite smooth and often provide unrealistic representations of actual data.However, we note that when naively fitting models on the G-plane, the inverted Smith model provided the lowest CLAIC estimate.This is further evidence that nonstationarity in this data should be incorporated into the modelling procedure.
We use two diagnostics to scrutinise the deformation and the model fit.As our deformation method is tailored to χ(h * ij ), we seek to use other extremal dependence measures to verify that the resulting deformation is not subject to overfitting.To do this, the conditional extremes model described in Section 1.2.2 is fitted pairwise and the parameter estimates are used to calculate the conditional expectation of one variable when the other variable is at the modelling threshold u, taken as the 98% quantile of the marginal distribution.For each pair, (X(s i ), X(s j )), i = j, we have where (α, β, μ) are the maximum likelihood estimates for the model.For a stationary and isotropic process, we would expect this measure to be a smooth function of Euclidean distance.The conditional expectation is plotted against distance for both the process on the G-plane and the D-plane.
A second diagnostic is used to evaluate the best model fit in the D-plane.As we have used χ(h * ij ) to create the deformations, we compare the theoretical triple-wise χ, which we denote χ(s * i , s * j , s * k ) = χ(s i , s j , s k ), from the model fits against empirical estimates.The triple-wise χ is defined as for i = j = k.For a Brown-Resnick process, the theoretical value for this measure is where V 2 (•, •; l, m) is the pairwise exponent given in (10) and V 3 (•, •, •) is the triple-wise exponent measure, for which the parametric form is given in Huser and Davison (2013); recall that if the process is stationary, both of these are functions of Euclidean distance.A similar parametrisation can be given for χ q (s * i , s * j , s * k ) for an inverted Brown-Resnick process, which is Standard errors for empirical estimates of χ q (s * i , s * j , s * k ) are estimated using a stationary bootstrap (Politis and Romano, 1994).We begin by drawing a random block size B from a geometric distribution with mean K.The bootstrap sample for locations s i , s j , s k , i = j = k is built by drawing a random starting time τ and creating a block of observations {z * τ , . . ., z * τ +B−1 }, where z * t = {z t (s i ), z t (s j ), z t (s k )}, which we add to the bootstrap sample.This procedure is repeated and the bootstrap is built up iteratively until it has length n.We then estimate χ(s * i , s * j , s * k ) for that sample and repeat for a number of samples.When choosing locations to compare empirical and theoretical values of χ(s * i , s * j , s * k ), we take advantage of the gridded structure of the coordinates in the G-plane, and ensure that each set of points share roughly the same configuration and pairwise distances.This is used to evaluate the stationarity of the dependence structure on the original G-plane, as we would expect the empirical values of χ(s * i , s * j , s * k ) to be consistently similar across sets of locations with the same configuration.
Diagnostics for the deformations and best model fit are given in Figure 4.For the estimation of χ(s * i , s * j , s * k ), 30 sets of three adjacent locations along the north/south transect in the G-plane are randomly selected and a stationary bootstrap with mean block size K = 14 and 1000 samples is used to create 95% confidence intervals for the empirical estimates of χ(s * i , s * j , s * k ).Empirical estimates of χ(s * i , s * j , s * k ) are calculated above the 98% quantile.The right panel of Figure 4 displays estimates for the conditional expectation from the conditional extremes model, where distances are normalised so that the average distance is equal for both the values in the G-plane and the D-plane.The diagnostic based on χ(s * i , s * j , s * k ) from Figure 4 suggests that a max-stable model is a reasonable fit for the data in the deformed space, as the patterns of the theoretical χ(s * i , s * j , s * k ) values follow the empirical estimates.The large variability in the bootstrap estimates across sets of locations with similar configurations suggests that the process on the original plane is highly non-stationary.
Estimates from the conditional extremes model provide further evidence that the deformation has produced something more stationary with regards to the dependence structure, especially at smaller distances.The use of a measure for extremal dependence that is not used for fitting lends credibility to the χ(h * ij ) plot in Figure 3 and suggests that the deformation has worked well.

UK precipitation rate
Data consist of hourly precipitation rate (mm/day) observed at locations on two 10 × 10 grids; the first is centred in Snowdonia, Wales and the second is centred in the Scottish Highlands.Observations     , s * j , * ) (black dots) with q = 0.95 and 95% confidence intervals using the stationary bootstrap.Red dots are the respective theoretical values suggested by the model fits.Left: Snowdonia.Right: Highlands.Figure 7 shows that the inverted max-stable model gives a relatively good fit to the extremal dependence of both datasets with sampling locations mapped to their respective D-planes, but the fit appears better for the Scottish Highlands.The low variability in the χ q (s * i , s * j , s * k ) estimates suggests that the original process may not be highly non-stationary.The pairwise conditional expectation estimates in Figure 8 suggest that both deformations have produced a more stationary process, albeit more so in the case of the Snowdonia D-plane.The small change in the Highlands estimates may that to the χ(h ij ) values has occurred, especially when compared to the Snowdonia estimates.This may also explain the stronger agreement of the χ q (s * i , s * j , s * k ) measures in Figure 8.To investigate the possibility of overfitting, we recreated the diagnostic using deformations created with fewer anchor points, but this did not show any improvements.

Discussion
In this paper, we presented a simple yet effective approach to modelling non-stationary extremal dependence.This approach extends that of Sampson and Guttorp (1992) and Smith (1996) to be applicable for modelling extremal dependence, rather than dependence throughout the body.We do this by replacing the objective function in these methods with the Frobenius norm of the difference between empirical, and theoretical, pairwise dependency matrices, with the theoretical measures coming from a stationary dependence model.Although most of our focus is on χ(h * ij ) as the dependence measure, we have also shown that this is easily replaced by other measures, such as χ q (h * ij ) and correlation.Model selection is carried out using pairwise composite likelihoods and CLAIC estimation and we propose diagnostics for evaluating these model fits.
We presented two case studies; in each scenario, we showed that when modelling the extremal dependence of the data using stationary models, better fits are provided using our methodology.Here we have fit very simple models to the data.However, in practice these deformations may be used as a pre-processing step to reveal covariates or orography that can be incorporated into the modelling procedure.Two diagnostics were introduced and used to provide evidence that our approach has produced a process which is more stationary with regards to the extremal dependence.
As with many areas of extreme value analysis, there is a bias-variance trade-off present when estimating χ q .Using values of q closer to 1 puts greater focus on extremal dependence at the expense of increased variance of the estimator.In Sections 3 and 4, we choose q close to 1 whilst preserving some initial spatial structure observed in the χ estimates.However, if q is too high then it is possible that any structure is masked by the high variability of the estimators and the deformation methodology is likely to fail in such circumstances.We have not considered the effect of estimator variability on the deformation, but note this could form a future research direction.
6 Supplementary Material 6.1 Comparison of χ GG (h) and χ BR (h) In Section 2.2, we discuss using the theoretical χ(h) function from a Brown-Resnick model rather than a Gaussian-Gaussian model.This is because the former is less computationally expensive to compute and often approximates the latter very closely for h ∈ R + .To illustrate this, Figure 9 shows the best approximation of χ BR (h) to some fixed χ GG (h) with Matérn correlation function and parameter set i indexed by i ≥ 4 as those indexed by i = 1, 2, 3 are uniquely determined by the constraints given in (2).If the deformation for I 0 is bijective, we create a new set of indices I 1 = {I 0 , i m 0 +1 }, where i m 0 +1 is sampled from the remaining indices.A deformation is then created using I 1 , but with initial parameters in the optimisation program given by ψ1 = { ψ0 , δ with u a high threshold and F (•) and f (•) the bivariate joint distribution and density functions for the model.Note that although we set λ = 1 when producing the deformation, here we treat it as a free parameter.Although the likelihoods give a good indication of the performance of the deformation methods, we use the Composite Likelihood version of the Akaike Information Criterion (CLAIC) for model selection.As given inVarin et al. (2011), the CLAIC is -stationary Brown-Resnick and inverted Brown-Resnick process The first setting we consider consists of replications of a non-stationary Brown-Resnick, and inverted Brown-Resnick, process sampled at 64 equally spaced locations on [−1, 1] × [−1, 1].We use a nonstationary variogram in the exponent function in (16) to ensure that χ(h ij ) is not simply a function of distance.In the context of non-stationary Gaussian processes, Fouedjio et al. (2015) propose a semivariogram of the form γ * (s i , s j ) where γ * (s i , s j ) = γ( ψ(s j ) − ψ(s j ) ), (23) and ψ(s) = o + (s − o) s − o is a radial basis function with some centre point o and γ(•) is the stationary and isotropic semivariogram

Figure 3 :
Figure 3: Australia summer temperatures.Left: the 72 sampling locations in the D-plane.The blue points are the anchor points used for the thin-plate splines.The coordinates have been scaled to [0, 1] ×[0, 1], which equals the aspect ratio of the left plot in Figure 2. Right: empirical χ(h * ij ) measures against distance in the D-plane.The red line gives the fitted function χ(h * ) used in the deformation.

Figure 4 :
Figure 4: Australian summer temperatures diagnostics.Left: estimates of χ(s * i , s * j , s * k ) (black dots) and 95% confidence intervals using the stationary bootstrap.Red dots are the respective theoretical values suggested by the model fit.Right: conditional expectation from conditional extremes model.Red points denote estimates for the process on the D-plane; black points are those on the G-plane.
Figure shows sets original locations their estimates χ(h ) distances.both we χ(h ) setting = in Both are using * 25 these presented the points Figure Figure presents deformations estimates χ(h ij against in respective spaces.observeboth have a that to much stationary regards their χ(h ij estimates the coordinates.both deformations more around of elevation.

Figure 5 :Figure 6 :
Figure 5: Top row: Snowdonia.Bottom row: Scottish Highlands.Left: the original 100 sampling locations.The blue points are the anchor points used for the thin-plate splines.Right: empirical χ(h ij ) measures against distance (km) in the respective G-planes.Estimates χ(h ij ) are calculated above a threshold given by the 95% empirical quantile.

Figure
Figure UK model diagnostics. of q *, s * j , * ) (black dots) with q = 0.95 and 95% confidence intervals using the stationary bootstrap.Red dots are the respective theoretical values suggested by the model fits.Left: Snowdonia.Right: Highlands.

Figure 8 :
Figure 8: UK precipitation deformation diagnostics.Conditional expectation from conditional extremes model.Red points denote estimates for the process on the D-plane; black points are those on the G-plane.Left: Snowdonia.Right: Highlands.

6. 2 Figure 10 :
Figure10gives a high-resolution heatmap of one realisation of a non-stationary Brown-Resnick process, with pairwise χ(s i , s j ) given in (24).This process is used in the simulation studies in Sections 3.1 and 3.2.

Table 1 :
Proportion of lowest CLAIC estimates provided by fitting models to deformations for 50 realisations of non-stationary Brown-Resnick and inverted Brown-Resnick processes.The CLAIC has been estimated with a block size of b = 1, corresponding to temporal independence.Composite likelihoods are estimated with the threshold in (

Table 2 :
Proportion of lowest CLAIC estimates provided by fitting models to deformations of 50 realisations of asymptotically dependent and asymptotically independent max-mixture processes.The CLAIC has been estimated with a block size of b = 1, corresponding to temporal independence.Composite likelihoods are estimated with the threshold in (

Table 4 :
Model parameters and diagnostics for the Australian summer temperatures data.