An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach


Håvard Rue, Department of Mathematical Sciences, Norwegian University of Science and Technology, N-7491 Trondheim, Norway.


Summary.  Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in inline image only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matérn class, provide an explicit link, for any triangulation of inline image, between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere.

1. Introduction

Gaussian fields (GFs) have a dominant role in spatial statistics and especially in the traditional field of geostatistics (Cressie, 1993; Stein, 1999; Chilés and Delfiner, 1999; Diggle and Ribeiro, 2006) and form an important building block in modern hierarchical spatial models (Banerjee et al., 2004). GFs are one of a few appropriate multivariate models with an explicit and computable normalizing constant and have good analytic properties otherwise. In a domain inline image with co-ordinate inline image, x(s) is a continuously indexed GF if all finite collections {x(si)} are jointly Gaussian distributed. In most cases, the GF is specified by using a mean function μ(·) and a covariance function C(·,·), so the mean is μ=(μ(si)) and the covariance matrix is Σ=(C(si,sj)). Often the covariance function is only a function of the relative position of two locations, in which case it is said to be stationary, and it is isotropic if the covariance functions depends only on the Euclidean distance between the locations. Since a regular covariance matrix is positive definite, the covariance function must be a positive definite function. This restriction makes it difficult to ‘invent’ covariance functions stated as closed form expressions. Bochner's theorem can be used in this context, as it characterizes all continuous positive definite functions in inline image.

Although GFs are convenient from both an analytical and a practical point of view, the computational issues have always been a bottleneck. This is due to the general cost of inline image to factorize dense n×n (covariance) matrices. Although the computational power today is at an all time high, the tendency seems to be that the dimension n is always set, or we want to set it, a little higher than the value that gives a reasonable computation time. The increasing popularity of hierarchical Bayesian models has made this issue more important, as ‘repeated computations (as for simulation-based model fitting) can be very slow, perhaps infeasible’ (Banerjee et al. (2004), page 387), and the situation is informally referred to as ‘the big n problem’.

There are several approaches to try to overcome or avoid ‘the big n problem’. The spectral representation approach for the likelihood (Whittle, 1954) makes it possible to estimate the (power) spectrum (using discrete Fourier transforms calculations) and to compute the log-likelihood from it (Guyon, 1982; Dahlhaus and Künsch, 1987; Fuentes, 2008) but this is only possible for directly observed stationary GFs on a (near) regular lattice. Vecchia (1988) and Stein et al. (2004) proposed to use an approximate likelihood constructed through a sequential representation and then to simplify the conditioning set, and similar ideas also apply when computing conditional expectations (kriging). An alternative approach is to do exact computations on a simplified Gaussian model of low rank (Banerjee et al., 2008; Cressie and Johannesson, 2008; Eidsvik et al., 2010). Furrer et al. (2006) applied covariance tapering to zero-out parts of the covariance matrix to gain a computational speed-up. However, the sparsity pattern will depend on the range of the GFs, and the potential in a related approach, named ‘lattice methods’ by Banerjee et al. (2004), section A.5.3, is superior to the covariance tapering idea. In this approach the GF is replaced by a Gaussian Markov random field (GMRF); see Rue and Held (2005) for a detailed introduction and Rue et al. (2009), section 2.1, for a condensed review. A GMRF is a discretely indexed Gaussian field x, where the full conditionals π(xi|xi), i=1,…,n, depend only on a set of neighbours ∂i to each site i (where consistency requirements imply that if i ∈ ∂j then also j ∈ ∂i). The computational gain comes from the fact that the zero pattern of the precision matrix Q (the inverse covariance matrix) relates directly to the notion of neighbours; Qij≠0i ∈ ∂jj; see, for example, Rue and Held (2005),section 2.2. Algorithms for Markov chain Monte Carlo sampling will repeatedly update from these simple full conditionals, which explains to a large extent the popularity of GMRFs in recent years, starting already with the seminal papers by Besag (1974, 1975). However, GMRFs also allow for fast direct numerical algorithms (Rue, 2001), as numerical factorization of the matrix Q can be done by using sparse matrix algorithms (George and Liu, 1981; Duff et al., 1989; Davis, 2006) at a typical cost of inline image for two-dimensional GMRFs; see Rue and Held (2005) for detailed algorithms. GMRFs have very good computational properties, which are of major importance in Bayesian inferential methods. This is further enhanced by the link to nested integrated Laplace approximations (Rue et al., 2009), which allow fast and accurate Bayesian inference for latent GF models.

Although GMRFs have very good computational properties, there are reasons why current statistical models based on GMRFs are relatively simple, in particular when applied to area data from regions or counties. First, there has been no good way to parameterize the precision matrix of a GMRF to achieve predefined behaviour in terms of correlation between two sites and to control marginal variances. In matrix terms, the reason for this is that one must construct a positive definite precision matrix to obtain a positive definite covariance matrix as its inverse, so the conditions for proper covariance matrices are replaced by essentially equivalent conditions for sparse precision matrices. Therefore, often simplistic approaches are taken, like letting Qij be related to the reciprocal distance between sites i and j (Besag et al., 1991; Arjas and Gasbarra, 1996; Weir and Pettitt, 2000; Pettitt et al., 2002; Gschlößl and Czado, 2007); however, a more detailed analysis shows that such a rationale is suboptimal (Besag and Kooperberg, 1995; Rue and Tjelmeland, 2002) and can give surprising effects (Wall, 2004). Secondly, it is unclear how large the class of useful GMRF models really is by using only a simple neighbourhood. The complicating issue here is the global positive definiteness constraint, and it might not be evident how this influences the parameterization of the full conditionals.

Rue and Tjelmeland (2002) demonstrated empirically that GMRFs could closely approximate most of the commonly used covariance functions in geostatistics, and they proposed to use them as computational replacements for GFs for computational reasons like doing kriging (Hartman and Hössjer, 2008). However, there were several drawbacks with their approach; first, the fitting of GMRFs to GFs was restricted to a regular lattice (or torus) and the fit itself had to be precomputed for a discrete set of parameter values (like smoothness and range), using a time-consuming numerical optimization. Despite these ‘proof-of-concept’ results, several researchers have followed up this idea without any notable progress in the methodology (Hrafnkelsson and Cressie, 2003; Song et al., 2008; Cressie and Verzelen, 2008), but the approach itself has shown to be useful even for spatiotemporal models (Allcroft and Glasbey, 2003).

The discussion so far has revealed a modelling or computational strategy for approaching the big n problem in a seemingly good way.

  • (a) Do the modelling by using a GF on a set of locations {si}, to construct a discretized GF with covariance matrix Σ.
  • (b) Find a GMRF with local neighbourhood and precision matrix Q that represents the GF in the best possible way, i.e. Q−1 is close to Σ in some norm. (We deliberately use the word ‘represents’ instead of approximates.)
  • (c) Do the computations using the GMRF representation by using numerical methods for sparse matrices.

Such an approach relies on several assumptions. First the GF must be of such a type that there is a GMRF with local neighbourhood that can represent it sufficiently accurately to maintain the interpretation of the parameters and the results. Secondly, we must be able to compute the GMRF representation from the GF, at any collections of locations, so fast that we still achieve a considerable speed-up compared with treating the GF directly.

The purpose of this paper is to demonstrate that these requirements can indeed be met for certain members of GFs with the Matérn covariance function in inline image, where the GMRF representation is available explicitly. Although these results are seemingly restrictive at first sight, they cover the most important and most used covariance model in spatial statistics; see Stein (1999), page 14, which concluded a detailed theoretical analysis with ‘Use the Matérn model’. The GMRF representation can be constructed explicitly by using a certain stochastic partial differential equation (SPDE) which has GFs with Matérn covariance function as the solution when driven by Gaussian white noise. The result is a basis function representation with piecewise linear basis functions, and Gaussian weights with Markov dependences determined by a general triangulation of the domain.

Rather surprisingly, extending this basic result seems to open new doors and opportunities, and to provide quite simple answers to rather difficult modelling problems. In particular, we shall show how this approach extends to Matérn fields on manifolds, non-stationary fields and fields with oscillating covariance functions. Further, we shall discuss the link to the deformation method of Sampson and Guttorp (1992) for non-stationary covariances for non-isotropic models, and how our approach naturally extends to non-separable space–time models. Our basic task, to do the modelling by using GFs and the computations by using the GMRF representation, still holds for these extensions as the GMRF representation is still available explicitly. An important observation is that the resulting modelling strategy does not involve having to construct explicit formulae for the covariance functions, which are instead only defined implicitly through the SPDE specifications.

The plan of the rest of this paper is as follows. In Section 2, we discuss the relationship between Matérn covariances and a specific SPDE, and we present the two main results for explicitly constructing the precision matrices for GMRFs based on this relationship. In Section 3, the results are extended to fields on triangulated manifolds, non-stationary and oscillating models, and non-separable space–time models. The extensions are illustrated with a non-stationary analysis of global temperature data in Section 4, and we conclude the main part of the paper with a discussion in Section 5. Thereafter follow four technical appendices, with explicit representation results (A), theory for random fields on manifolds (B), the Hilbert space representation details (C) and proofs of the technical details (D).

2. Preliminaries and main results

This section will introduce the Matérn covariance model and discuss its representation through an SPDE. We shall state explicit results for the GMRF representation of Matérn fields on a regular lattice and do an informal summary of the main results.

2.1. Matérn covariance model and its stochastic partial differential equation

Let ‖·‖ denote the Euclidean distance in inline image. The Matérn covariance function between locations inline image is defined as


Here, K is the modified Bessel function of the second kind and order ν>0, κ>0 is a scaling parameter and σ2 is the marginal variance. The integer value of ν determines the mean-square differentiability of the underlying process, which matters for predictions that are made by using such a model. However, ν is usually fixed since it is poorly identified in typical applications. A more natural interpretation of the scaling parameter κ is as a range parameter ρ; the Euclidean distance where x(u) and x(v) are almost independent. Lacking a simple relationship, we shall throughout this paper use the empirically derived definition ρ=√(8ν)/κ, corresponding to correlations near 0.1 at the distance ρ, for all ν.

The Matérn covariance function appears naturally in various scientific fields (Guttorp and Gneiting, 2006), but the important relationship that we shall make use of is that a GF x(u) with the Matérn covariance is a solution to the linear fractional SPDE


where (κ2−Δ)α/2 is a pseudodifferential operator that we shall define later in equation (4) through its spectral properties (Whittle, 1954, 1963). The innovation process inline image is spatial

Gaussian white noise with unit variance, Δ is the Laplacian


and the marginal variance is


We shall name any solution to equation (2)a Matérn field in what follows. However, the limiting solutions to the SPDE (2) as κ→0 or ν→0 do not have Matérn covariance functions, but the SPDE still has solutions when κ=0 or ν=0 which are well-defined random measures. We shall return to this issue in Appendix C.3. Further, there is an implicit assumption of appropriate boundary conditions for the SPDE, as for αgeqslant R: gt-or-equal, slanted2 the null space of the differential operator is non-trivial, containing, for example, the functions  exp (κeTu), for all ‖e‖=1. The Matérn fields are the only stationary solutions to the SPDE.

The proof that was given by Whittle (1954, 1963) is to show that the wave number spectrum of a stationary solution is


using the Fourier transform definition of the fractional Laplacian in inline image,


where φ is a function on inline image for which the right-hand side of the definition has a well-defined inverse Fourier transform.

2.2. Main results

This section contains our main results, however, in a loose and imprecise form. In the appendices, our statements are made precise and the proofs are given. In the discussion we shall restrict ourselves to dimension d=2 although our results are general.

2.2.1. Main result 1

For our first result, we shall use some informal arguments and a simple but powerful consequence of a partly analytic result of Besag (1981). We shall show that these results are true in the appendices. Let x be a GMRF on a regular (tending to infinite) two-dimensional lattice indexed by ij, where the Gaussian full conditionals are


and |a|>4. To simplify the notation, we write this particular model as


which displays the elements of the precision matrix related to a single location (section 3.4.2 in Rue and Held (2005) uses a related graphical notation). Owing to symmetry, we display only the upper right quadrant, with ‘a’ as the central element. The approximate result (Besag (1981), equation 14)) is that


where l is the Euclidean distance between ij and ij. Evaluated for continuous distances, this is a generalized covariance function, which is obtained from equation (1) in the limit ν→0, with κ2=a−4 and σ2=1/4π, even though equation (1) requires ν>0. Informally, this means that the discrete model defined by expression (5) generates approximate solutions to the SPDE (2) on a unit distance regular grid, with ν=0.

Solving equation (2) for α=1 gives a generalized random field with spectrum


meaning that (some discretized version of) the SPDE acts like a linear filter with squared transfer function equal to R1. If we replace the noise term on the right-hand side of equation (2) by Gaussian noise with spectrum R1, the resulting solution has spectrum inline image, and so on. The consequence is GMRF representations for the Matérn fields for ν=1 and ν=2, as convolutions of the coefficients in representation (6): ν=1,




The marginal variance is 1/{4πν(a−4)ν}. Fig. 1 shows how accurate these approximations are for ν=1 and range 10 and 100, displaying the Matérn correlations and the linearly interpolated correlations for integer lags for the GMRF representation. For range 100 the results are indistinguishable. The root-mean-square error between correlations up to twice the range is 0.01 and 0.0003 for range 10 and 100 respectively. The error in the marginal variance is 4% for range 10 and negligible for range 100.

Figure 1.

 Matérn correlations (inline image) for range (a) 10 and (b) 100, and the correlations for the GMRF representation (∘)

Our first main result confirms the above heuristics.

 Result 1.  The coefficients in the GMRF representation of equation (2) on a regular unit distance two-dimensional infinite lattice for ν=1,2,…, is found by convolving model (6) by itself ν times.

Simple extensions of this result include anisotropy along the main axes, as presented in Appendix A. A rigorous formulation of the result is derived in the subsequent appendices, showing that the basic result is a special case of a more general link between SPDEs and GMRFs. The first such generalization, which is based on irregular grids, is the next main result.

2.3. Main result 2

Although main result 1 is useful in itself, it is not yet fully practical since often we do not want to have a regular grid, to avoid interpolating the locations of observations to the nearest grid point, and to allow for finer resolution where details are required. We therefore extend the regular grid to irregular grids, by subdividing inline image into a set of non-intersecting triangles, where any two triangles meet in at most a common edge or corner. The three corners of a triangle are named vertices. In most cases we place initial vertices at the locations for the observations and add additional vertices to satisfy overall soft constraints of the triangles, such as maximally allowed edge length, and minimally allowed angles. This is a standard problem in engineering for solving partial differential equations by using finite element methods (FEMs) (Ciarlet, 1978; Brenner and Scott, 2007; Quarteroni and Valli, 2008), where the quality of the solutions depends on the triangulation properties. Typically, the triangulation is chosen to maximize the minimum interior triangle angle, so-called Delaunay triangulations, which helps to ensure that the transitions between small and large triangles are smooth. The extra vertices are added heuristically to try to minimize the total number of triangles that are needed to fulfil the size and shape constraints. See for example Edelsbrunner (2001), and Hjelle and Dæhlen (2006) for algorithm details. Our implementation in the Rinla package ( is based on Hjelle and Dæhlen (2006).

To illustrate the process of triangulation of inline image, we shall use an example from Henderson et al. (2002) which models spatial variation in leukaemia survival data in north-west England. Fig. 2(a) displays the locations of 1043 cases of acute myeloid leukaemia in adults who were diagnosed between 1982 and 1998 in north-west England. In the analysis, the spatial scale has been normalized so that the width of the study region is equal to 1. Fig. 2(b) displays the triangulation of the area of interest, using fine resolution around the data locations and rough resolution outside the area of interest. Further, we place vertices at all data locations. The number of vertices in this example is 1749 and the number of triangles is 3446.

Figure 2.

 (a) Locations of leukaemia survival observations, (b) triangulation using 3446 triangles and (c) a stationary correlation function (inline image) and the corresponding GMRF approximation (∘) for ν=1 and approximate range 0.26

To construct a GMRF representation of the Matérn field on the triangulated lattice, we start with a stochastic weak formulation of SPDE (2). Define the inner product


where the integral is over the region of interest. The stochastic weak solution of the SPDE is found by requiring that


for every appropriate finite set of test functions{φj(u),j=1,…,m}, where ‘inline image’ denotes equality in distribution.

The next step is to construct a finite element representation of the solution to the SPDE (Brenner and Scott, 2007) as


for some chosen basis functions {ψk} and Gaussian-distributed weights {wk}. Here, n is the number of vertices in the triangulation. We choose to use functions ψk that are piecewise linear in each triangle, defined such that ψk is 1 at vertex k and 0 at all other vertices. An interpretation of the representation (9) with this choice of basis functions is that the weights determine the values of the field at the vertices, and the values in the interior of the triangles are determined by linear interpolation. The full distribution of the continuously indexed solution is determined by the joint distribution of the weights.

The finite dimensional solution is obtained by finding the distribution for the representation weights in equation (9) that fulfils the stochastic weak SPDE formulation (8) for only a specific set of test functions, with m=n. The choice of test functions, in relation to the basis functions, governs the approximation properties of the resulting model representation. We choose φk=(κ2−Δ)1/2ψk for α=1 and φk=ψk for α=2. These two approximations are denoted the least squares and the Galerkin solution respectively. For αgeqslant R: gt-or-equal, slanted3, we let α=2 on the left-hand side of equation (2) and replace the right-hand side with a field generated by α−2, and let φk=ψk. In essence, this generates a recursive Galerkin formulation, terminating in either α=1 or α=2; see Appendix C for details.

Define the n×n matrices C, G and K with entries


Using Neumann boundary conditions (a zero normal derivative at the boundary), we obtain our second main result, expressed here for inline image and inline image.

 Result 2.  Let Qα,κ2 be the precision matrix for the Gaussian weights w as defined in equation (9) for α=1,2,…, as a function of κ2. Then the finite dimensional representations of the solutions to equation (2) have precisions


Some remarks concerning this result are as follows.

  • (a) The matrices C and G are easy to compute as their elements are non-zero only for pairs of basis functions which share common triangles (a line segment in inline image), and their values do not depend on κ2. Explicit formulae are given in Appendix A.
  • (b) The matrix C−1 is dense, which makes the precision matrix dense as well. In Appendix C.5, we show that C can be replaced by the diagonal matrix inline image, where inline image, which makes the precision matrices sparse, and hence we obtain GMRF models.
  • (c) A consequence of the previous remarks is that we have an explicit mapping from the parameters of the GF model to the elements of a GMRF precision matrix, with computational cost inline image for any triangulation.
  • (d) For the special case where all the vertices are points on a regular lattice, using a regular triangularization reduces main result 2 to main result 1. Note that the neighbourhood of the corresponding GMRF in inline image is 3×3 for α=1, is 5×5 for α=2, and so on. Increased smoothness of the random field induces a larger neighbourhood in the GMRF representation.
  • (e) In terms of the smoothness parameter ν in the Matérn covariance function, these results correspond to ν=1/2,3/2,5/2,…, in inline image and ν=0,1,2,…, in inline image.
  • (f) We are currently unable to provide results for other values of α; the main obstacle is the fractional derivative in the SPDE which is defined by using the Fourier transform (4). A result of Rozanov (1982), chapter 3.1, for the continuously indexed random field, says that a random field has a Markov property if and only if the reciprocal of the spectrum is a polynomial. For our SPDE (2) this corresponds to α=1,2,3,…; see equation (3). This result indicates that a different approach may be needed to provide representation results when α is not an integer, such as approximating the spectrum itself. Given approximations for general 0leqslant R: less-than-or-eq, slantαleqslant R: less-than-or-eq, slant2, the recursive approach could then be used for general α>2.

Although the approach does give a GMRF representation of the Matérn field on the triangulated region, it is truly an approximation to the stochastic weak solution as we use only a subset of the possible test functions. However, for a given triangulation, it is the best possible approximation in the sense that is made explicit in Appendix C, where we also show weak convergence to the full SPDE solutions. Using standard results from the finite element literature (Brenner and Scott, 2007), it is also possible to derive rates of convergence results, like, for α=2,


Here, xn is the GMRF representation of the SPDE solution x, h is the diameter of the largest circle that can be inscribed in a triangle in the triangulation and c is some constant. The Hilbert space scalar product and norm are defined in definition 2 in Appendix B, which also includes the values and the gradients of the field. The result holds for general dgeqslant R: gt-or-equal, slanted1, with h proportional to the edge lengths between the vertices, when the minimal mesh angles are bounded away from zero.

To see how well we can approximate the Matérn covariance, Fig. 2(c) displays the empirical correlation function (dots) and the theoretical function for ν=1 with approximate range 0.26, using the triangulation in Fig. 2(b). The match is quite good. Some dots show a discrepancy from the true correlations, but these can be identified to be due to the rather rough triangulation outside the area of interest which is included to reduce edge effects. In practice there is a trade-off between accuracy of the GMRF representation and the number of vertices used. In Fig. 2(b) we chose to use a fine resolution in the study area and a reduced resolution outside. A minor drawback in using these GMRFs in place of given stationary covariance models is the boundary effects due to the boundary conditions of the SPDE. In main result 2 we used Neumann conditions that inflate the variance near the boundary (see Appendix A.4 for details) but other choices are also possible (see Rue and Held (2005), chapter 5).

2.4. Leukaemia example

We shall now return to the example from Henderson et al. (2002) at the beginning of Section 2.3 which models spatial variation in leukaemia survival data in north-west England. The specification, in (pseudo-)Wilkinson–Rogers notation (McCullagh and Nelder (1989), section 3.4) is


using a Weibull likelihood for the survival times, and where ‘wbc’ is the white blood cell count at diagnosis, ‘tpi’ is the Townsend deprivation index (which is a measure of economic deprivation for the related district) and ‘spatial’ is the spatial component depending on the spatial location for each measurement. The hyperparameters in this model are the marginal variance and range for the spatial component and the shape parameter in the Weibull distribution.

Kneib and Fahrmeir (2007) reanalysed the same data set by using a Cox proportional hazards model but, for computational reasons, used a low rank approximation for the spatial component. With our GMRF representation we easily work with a sparse 1749×1749 precision matrix for the spatial component. We ran the model in Rinla ( using integrated nested Laplace approximations to do the full Bayesian analysis (Rue et al., 2009). Fig. 3 displays the posterior mean and standard deviation of the spatial component. A full Bayesian analysis took about 16 s on a quad-core laptop, and factorizing the 2797×2797 (total) precision matrix took about 0.016 s on average.

Figure 3.

 (a) Posterior mean and (b) standard deviation of the spatial effect on survival by using the GMRF representation

3. Extensions: beyond classical Matérn models

In this section we shall discuss five extensions to the SPDE, widening the usefulness of the GMRF construction results in various ways. The first extension is to consider solutions to the SPDE on a manifold, which allows us to define Matérn fields on domains such as a sphere. The second extension is to allow for space varying parameters in the SPDE which allows us to construct non-stationary locally isotropic GFs. The third extension is to study a complex version of equation (2) which makes it possible to construct oscillating fields. The fourth extension generalizes the non-stationary SPDE to a more general class of non-isotropic fields. Finally, the fifth extension shows how the SPDE generalizes to non-separable space–time models.

An important feature in our approach is that all these extensions still give explicit GMRF representations that are similar to expressions (9) and (10), even if all the extensions are combined. The rather amazing consequence is that we can construct the GMRF representations of non-stationary oscillating GFs on the sphere, still not requiring any computation beyond the geometric properties of the triangulation. In Section 4 we shall illustrate the use of these extensions with a non-stationary model for global temperatures.

3.1. Matérn fields on manifolds

We shall now move away from inline image and consider Matérn fields on manifolds. GFs on manifolds are a well-studied subject with important application to excursion sets in brain mapping (Adler and Taylor, 2007; Bansal et al., 2007; Adler, 2010). Our main objective is to construct Matérn fields on the sphere, which is important for the analysis of global spatial and spatiotemporal models. To simplify the current discussion we shall therefore restrict the construction of Matérn fields to a unit radius sphere inline image in three dimensions, leaving the general case for the appendices.

Just as for inline image, models on a sphere can be constructed via a spectral approach (Jones, 1963). A more direct way of defining covariance models on a sphere is to interpret the two-dimensional space inline image as a surface embedded in inline image. Any three-dimensional covariance function can then be used to define the model on the sphere, considering only the restriction of the function to the surface. This has the interpretational disadvantage of using chordal distances to determine the correlation between points. Using the great circle distances in the original covariance function would not work in general, since for differentiable fields this does not yield a valid positive definite covariance function (this follows from Gneiting (1998), theorem 2). Thus, the Matérn covariance function in inline image cannot be used to define GFs on a unit sphere embedded in inline image with distance naturally defined with respect to distances within the surface. However, we can still use its origin, the SPDE! For this purpose, we simply reinterpret the SPDE to be defined on inline image instead of inline image, and the solution is still what we mean by a Matérn field, but defined directly for the given manifold. The Gaussian white noise which drives the SPDE can easily be defined on inline image as a (zero-mean) random GF W(·) with the property that the covariance between W(A) and W(B), for any subsets A and B of inline image, is proportional to the surface integral over AB. Any regular 2-manifold behaves locally like inline image, which heuristically explains why the GMRF representation of the weak solution only needs to change the definition of the inner product (7) to a surface integral on inline image. The theory in Appendices B–D covers the general manifold setting.

To illustrate the continuous index definition and the Markov representation of Matérn fields on a sphere, Fig. 4 shows the locations of 7280 meteorological measurement stations on the globe, together with an irregular triangulation. The triangulation was constrained to have minimal angles 21 and maximum edge lengths corresponding to 500 km based on an average Earth radius of 6370 km. The triangulation includes all the stations more than 10 km apart, requiring a total of 15182 vertices and 30360 triangles. The resulting GF model for α=2 is illustrated in Fig. 5, for κ2=16, corresponding to an approximate correlation range 0.7 on a unit radius globe. Numerically calculating the covariances between a point on the equator and all other points shows, in Fig. 5(a), that, despite the highly irregular triangulation, the deviations from the theoretical covariances determined by the SPDE (calculated via a spherical Fourier series) are practically non-detectable for distances that are larger than the local edge length (0.08 or less), and nearly undetectable even for shorter distances. A random realization from the model is shown in Fig. 5(b), resampled to a longitude–latitude grid with an area preserving cylindrical projection. The number of Markov neighbours of each node ranges from 10 to 34, with an average of 19. The resulting structure of the precision matrix is shown in Fig. 6(a), with the corresponding ordering of the nodes shown visually in Fig. 6(b) by mapping the node indices to grey scales. The ordering uses the Markov graph structure to divide the graph recursively into conditionally independent sets (Karypis and Kumar, 1999), which helps to make the Cholesky factor of the precision matrix sparse.

Figure 4.

 (a), (b) Data locations and (c), (d) triangulation for the global temperature data set analysed in Section 4, with a coastline map superimposed

Figure 5.

 (a) Covariances (∘, numerical result for the GMRF approximation; inline image, theoretical covariance function) and (b) a random sample from the stationary SPDE model (2) on the unit sphere, with ν=1 and κ2=16

Figure 6.

 (a) Structure of the (reordered) 15182×15182 precision matrix and (b) a visual representation of the reordering: the indices of each triangulation node have been mapped to grey scales showing the governing principle of the reordering algorithm, recursively dividing the graph into conditionally independent sets

3.2. Non-stationary fields

From a traditional point of view, the most surprising extension within the SPDE framework is how we can model non-stationarity. Many applications require non-stationarity in the correlation function and there is a vast literature on this subject (Sampson and Guttorp, 1992; Higdon, 1998; Hughes-Oliver et al., 1998; Cressie and Huang, 1999; Higdon et al., 1999; Fuentes, 2001; Gneiting, 2002; Stein, 2005; Paciorek and Schervish, 2006; Jun and Stein, 2008; Yue and Speckman, 2010). The SPDE approach has the additional huge advantage that the resulting (non-stationary) GF is a GMRF, which allows for swift computations and can additionally be defined on a manifold.

In the SPDE defined in equation (2), the parameters κ2 and the innovation variance are constant in space. In general, we can allow both parameters to depend on the co-ordinate u, and we write


For simplicity, we choose to keep the variance for the innovation constant and instead scale the resulting process x(u) with a scaling parameter τ(u). Non-stationarity is achieved when one or both parameters are non-constant. Of particular interest is the case where they vary slowly with u, e.g. in a low dimensional representation like




where the basis functions inline image are smooth over the domain of interest. With slowly varying parameters κ2(u) and τ(u), the appealing local interpretation of equation (12) as a Matérn field remains unchanged, whereas the actual form of the non-stationary correlation function achieved is unknown. The actual process of ‘combining all local Matérn fields into a consistent global field’ is done automatically by the SPDE.

The GMRF representation of equation (12) is found by using the same approach as for the stationary case, with minor changes. For convenience, we assume that both κ2 and τ can be considered as constant within the support of the basis functions {ψk}, and hence


for a naturally defined inline image in the support of ψi and ψj. The consequence is a simple scaling of the matrices in expression (10) at no additional cost; see Appendix A.3. If we improve the integral approximation (13) from considering κ2(u) locally constant to locally planar, the computational preprocessing cost increases but is still inline image for each element in the precision matrix Q.

3.3. Oscillating covariance functions

Another extension is to consider a complex version of the basic equation (2). For simplicity, we consider only the case α=2. With innovation processes inline image and inline image as two independent white noise fields, and an oscillation parameter θ, the complex version becomes


The real and imaginary stationary solution components x1 and x2 are independent, with spectral densities


on inline image. The corresponding covariance functions for inline image and inline image are given in Appendix A. For general manifolds, no closed form expression can be found. In Fig. 7, we illustrate the resonance effects obtained for compact domains by comparing oscillating covariances for inline image and the unit sphere, inline image. The precision matrices for the resulting fields are obtained by a simple modification of the construction for the regular case; the precise expression is given in Appendix A. The details of the construction, which are given in Appendix C.4, also reveal the possibility of multivariate fields, similar to Gneiting et al. (2010).

Figure 7.

 Correlation functions from oscillating SPDE models, for θ=0, 0.1, …, 1, on (a) inline image and (b) inline image, with κ2=12 and ν=1

For θ=0, the regular Matérn covariance with ν=2−d/2 is recovered, with oscillations increasing with θ. The limiting case θ=1 generates intrinsic stationary random fields, on inline image invariant to addition of cosine functions of arbitrary direction, with wave number κ.

3.4. Non-isotropic models and spatial deformations

The non-stationary model that was defined in Section 3.2 has locally isotropic correlations, despite having globally non-stationary correlations. This can be relaxed by widening the class of SPDEs considered, allowing a non-isotropic Laplacian, and also by including a directional derivative term. This also provides a link to the deformation method for non-stationary covariances that was introduced by Sampson and Guttorp (1992).

In the deformation method, the domain is deformed into a space where the field is stationary, resulting in a non-stationary covariance model in the original domain. Using the link to SPDE models, the resulting model can be interpreted as a non-stationary SPDE in the original domain.

For notational simplicity, assume that the deformation is between two d-manifolds inline image to inline image, with inline image, u ∈ Ω, inline image. Restricting to the case α=2, consider the stationary SPDE on the deformed space inline image,


generating a stationary Matérn field. A change of variables onto the undeformed space Ω yields (Smith, 1934)


where F(u) is the Jacobian of the deformation function f. This non-stationary SPDE exactly reproduces the deformation method with Matérn covariances (Sampson and Guttorp, 1992). A sparse GMRF approximation can be constructed by using the same principles as for the simpler non-stationary model in Section 3.2.

An important remark is that the parameters of the resulting SPDE do not depend directly on the deformation function itself, but only its Jacobian. A possible option for parameterizing the model without explicit construction of a deformation function is to control the major axis of the local deformation given by F(u) through a vector field, given either from covariate information or as a weighted sum of vector basis functions. Addition or subtraction of a directional derivative term further generalizes the model. Allowing all parameters, including the variance of the white noise, to vary across the domain, results in a very general non-stationary model that includes both the deformation method and the model in Section 3.2. The model class can be interpreted as changes of metric in Riemannian manifolds, which is a natural generalization of deformation between domains embedded in Euclidean spaces. A full analysis is beyond the scope of this paper, but the technical appendices cover much of the necessary theory.

3.5. Non-separable space–time models

A separable space–time covariance function can be characterized as having a spectrum that can be written as a product or sum of spectra in only space or time. In contrast, a non-separable model can have interaction between the space and time dependence structures. Whereas it is difficult to construct non-separable non-stationary covariance functions explicitly, non-separable SPDE models can be obtained with relative ease, using locally specified parameters. Arguably, the most simple non-separable SPDE that can be applied to the GMRF method is the transport and diffusion equation


where m is a transport direction vector, H is a positive definite diffusion matrix (for general manifolds strictly a tensor) and inline image is a stochastic space–time noise field. It is clear that even this stationary formulation yields non-separable fields, since the spatiotemporal power spectrum of the solution is


which is strictly non-separable even with m=0 and H=I. The driving noise is an important part of the specification and may require an additional layer in the model. To ensure a desired regularity of the solutions, the noise process can be chosen to be white in time but with spatial dependence, such as a solution to inline image, for some αgeqslant R: gt-or-equal, slanted1, where inline image is space–time white noise. A GMRF representation can be obtained by first applying the ordinary spatial method, and then discretizing the resulting system of coupled temporal stochastic differential equations with, for example, an Euler method. Allowing all the parameters to vary with location in space (and possibly in time) generates a large class of non-separable non-stationary models. The stationary models that were evaluated by Heine (1955) can be obtained as special cases.

4. Example: global temperature reconstruction

4.1. Problem background

When analysing past observed weather and climate, the Global Historical Climatology Network data set ( (Peterson and Vose, 1997) is commonly used. On August 8th, 2010, the data contained meteorological observations from 7280 stations spread across continents, where each of the 597373 rows of observations contains the monthly mean temperatures from a specific station and year. The data span the period 1702–2010, though counting, for each year, only stations with no missing values, yearly averages can be calculated only as far back as 1835. The spatial coverage varies from less than 400 stations before 1880 up to 3700 in the 1970s. For each station, covariate information such as location, elevation and land use is available.

The Global Historical Climatology Network data are used to analyse regional and global temperatures in the GISTEMP temperature series (see (Hansen et al., 1999, 2001) and HadCRUT3 (Brohan et al., 2006) global temperature series, together with additional data such as ocean-based sea surface temperature measurements. These analyses process the data in different ways to reduce the influence of station-specific effects (which is a procedure knows as homogenization), and the information about the temperature anomaly (the difference in weather from the local climate, the latter defined as the average weather over a 30-year reference period) is then aggregated to latitude–longitude grid boxes. The grid box anomalies are then combined by using area-based weights into an estimate of the average global anomaly for each year. The analysis is accompanied by a derivation of the resulting uncertainty of the estimates.

Though different in details, the gridding procedures are algorithmically based, i.e. there is no underlying statistical model for the weather and climate, only for the observations themselves. We shall here present a basis for a stochastic-model-based approach to the problem of estimating past regional and global temperatures, as an example of how the non-stationary SPDE models can be used in practice. The ultimate goal is to reconstruct the entire spatiotemporal yearly (or even monthly) average temperature field, with appropriate measures of uncertainty, taking the model parameter uncertainty into account.

Since most of the spatial variation is linked to the rotational nature of the globe in relation to the sun, we shall here restrict ourselves to a rotationally invariant covariance model, which reduces the computational burden. However, we shall allow for regional deviations from rotational symmetry in the expectations. The model separates weather from climate by assuming that the climate can be parameterized by non-stationary expectation and covariance parameters μ(u), κ(u) and τ(u), for inline image, and assuming that the yearly weather follows the model defined by equation (12), given the climate. Using the triangulation from Fig. 4 with piecewise linear basis function, the GMRF representation that is given in Appendix A.3 will be used, with xt denoting the discretized field at time t. To avoid complications due to temporal dependence between monthly values, we aggregate the measurements into yearly means and model only the yearly average temperature at each location. A full analysis needs to take local station-dependent effects into account. Here, we include only the effect of elevation. To incorporate a completely integrated station homogenization procedure into the model would go far beyond the scope of this paper, and we therefore use the ‘adjusted’ Global Historical Climatology Network data set, which includes some outlier quality control and relative station calibrations.

4.2. Model summary

The climate and observation model is governed by a parameter vector θ={θ,θ,θ,θs,θ}, and we denote the yearly temperature fields x={xt} and the yearly observations y={yt}, with t=1970,…,1989. Using basis function matrices B (all 49 spherical harmonics up to and including order 6; see Wahba (1981)), B and B (B-splines of order 2 in sin(latitude), shown in Fig. 8), the expectation field is given by μx|θ=Bθ, the local spatial dependence κ(u) is defined through  log (κ2)=Bθ and the local variance scaling τ(u) is defined through  log (τ)=Bθ. The prior distribution for the climate field is chosen as approximate solutions to the SPDE inline image, where σ≫0, which provides natural relative prior weights for the spherical harmonic basis functions.

Figure 8.

 (a) Three transformed B-spline basis functions of order 2, and approximate 95% credible intervals for (b) standard deviation and (c) approximate correlation range of yearly weather, as functions of latitude

The yearly temperature fields xt are defined conditionally on the climate as


where Qx|θ is the GMRF precision corresponding to model (12) with parameters determined by (θ,θ). Introducing observation matricesAt, that extract the nodes from xt for each observation, the observed yearly weather is modelled as


where Stθs are station-specific effects and Qy|x,θ=I  exp (θ) is the observation precision. Since we use the data only for illustrative purposes here, we shall ignore all station-specific effects except for elevation. We also ignore any remaining residual dependences between consecutive years, analysing only the marginal distribution properties of each year.

The Bayesian analysis draws all its conclusions from the properties of the posterior distributions of (θ|y) and (x|y), so all uncertainty about the weather xt is included in the distribution for the model parameters θ, and conversely for θ and xt. One of the most important steps is how to determine the conditional distribution for the weather given observations and model parameters,


where inline image is the conditional precision, and the expectation is the kriging estimator of xt. Owing to the compact support of the basis functions, which is determined by the triangulation, each observation depends on at most three neighbouring nodes in xt, which makes the conditional precision have the same sparsity structure as the field precisions Qx|θ. The computational cost of the kriging estimates is inline image in the number of observations, and approximately inline image in the number of basis functions. If basis functions with non-compact support had been used, such as a Fourier basis, the posterior precisions would have been fully dense matrices, with computational cost inline image in the number of basis functions, regardless of the sparsity of the prior precisions. This shows that when constructing computationally efficient models it is not enough to consider the theoretical properties of the prior model, but instead the whole sequence of computations needs to be taken into account.

4.3. Results

We implemented the model by using Rinla. Since (x|y,θ) is Gaussian, the results are only approximate with regard to the numerical integration of the covariance parameters (θ,θ,θ). Owing to the large size of the data set, this initial analysis is based on data only from the period 1970–1989, requiring 336960 nodes in a joint model for the yearly temperature fields, measurements and linear covariate parameters, with 15182 nodes in each field, and the number of observations in each year ranging between approximately 1300 and 1900, for each year including all stations with no missing monthly values. The full Bayesian analysis took about 1 h to compute on a 12-core computer, with a peak memory use of about 50 Gbytes during the parallel numerical integration phase. This is a notable improvement over earlier work by Das (2000) where partial estimation of the parameters in a deformation-based covariance model of the type in Section 3.4 took more than a week on a supercomputer.

The 95% credible interval for the measurement standard deviation, including local unmodelled effects, was calculated as (0.628,0.650) C, with posterior expectation 0.634 C. The spatial covariance parameters are more difficult to interpret individually, but we instead show the resulting spatially varying field standard deviations and correlation ranges in Fig. 8, including pointwise 95% credible intervals. Both curves show a clear dependence on latitude, with both larger variance and correlation range near the poles, compared with the equator. The standard deviations range between 1.2 and 2.6 C, and the correlation ranges vary between 1175 and 2825 km. There is an asymmetric north–south pole effect for the variances, but a symmetric curve is admissible in the credible intervals.

Evaluating the estimated climate and weather for a period of only 20 years is difficult, since ‘climate’ is typically defined as averages over periods of 30 years. Also, the spherical harmonics that were used for the climate model are not of sufficiently high order to capture all regional effects. To alleviate these problems, we base the presentation on what can reasonably be called the empirical climate and weather anomalies for the period 1970–1989, in effect using the period average as reference. Thus, instead of evaluating the distributions of (μ|y) and (xtμ|y), we consider inline image and inline image, where inline image. In Figs 9(a) and 9(b), the posterior expectation of the empirical climate, inline image, is shown (including the estimated effect of elevation), together with the posterior expectation of the temperature anomaly for 1980, inline image. The corresponding standard deviations are shown in Figs 9(c) and 9(d). As expected, the temperatures are low near the poles and high near the equator, and some of the relative warming effect of the thermohaline circulation on the Alaska and northern European climates can also be seen. There is a clear effect of regional topography, showing cold areas for high elevations such as in the Himalayas, Andes and Rocky Mountains, as indicated by an estimated cooling effect of 5.2 C per kilometre of increased elevation. It is clear from Figs 9(c) and 9(d) that including ocean-based measurements is vital for analysis of regional ocean climate and weather, in particular for the south-east Pacific Ocean.

Figure 9.

 Posterior means for (a) the empirical 1970–1989 climate and (b) the empirical mean anomaly 1980 with (c) and (d) the corresponding posterior standard deviations respectively: the climate includes the estimated effect of elevation; an area preserving cylindrical projection is used

With this in mind, we might expect that the period of analysis and data coverage are too restricted to allow detection of global trends, especially since the simple model that we use a priori assumes a constant climate. However, the present analysis, including the effects of all parameter uncertainties, still yields a 95% Bayesian prediction interval (0.87,2.18) C per century (expectation 1.52 C) for the global average temperature trend over the 20-year period analysed. The posterior standard deviation for each global average temperature anomaly was calculated to about 0.09 C. Comparing the values with the corresponding estimates in the GISS series, which has an observed trend of 1.48 C per century for this period, yields a standard deviation for the differences between the series of only 0.04 C. Thus, the results here are similar to the GISS results, even without the use of ocean data.

The estimated trend has less than a 2% probability of occurring in a random sample from the temporally stationary model that was used in the analysis. From a purely statistical point of view, this could indicate either that there is a large amount of unmodelled temporal correlation in the yearly weather averages or that the expectation is non-stationary, i.e. that the climate was changing. Since it is impossible to distinguish between these two cases by using only statistical methods on the single realization of the actual climate and weather system that is available, a full analysis should incorporate knowledge from climate system physics to balance properly the change in climate and short-term dependence in the weather in the model.

5. Discussion

The main result in this work is that we can construct an explicit link between (some) GFs and GMRFs by using an approximate weak solution of the corresponding SPDE. Although this result is not generally applicable for all covariance functions, the subclass of models where this result is applicable is substantial, and we expect to find additional versions and extensions in the future; see for example Bolin and Lindgren (2011). The explicit link makes these GFs much more practically applicable, as we might model and interpret the model by using covariance functions while doing the computations by using the GMRF representation which allows for sparse matrix numerical linear algebra. In most cases, we can make use of the integrated nested Laplace approximation approach for doing (approximate) Bayesian inference (Rue et al., 2009), which requires the latent field to be a GMRF. It is our hope that the SPDE link might help in bridging the literature of (continuously indexed) GFs and geostatistics on one side, and GMRFs or conditional auto-regressions on the other.

Furthermore, the simplicity of the SPDE parameter specifications provides a new modelling approach that is not dependent on the theory for constructing positive definite covariance functions. The SPDE approach allows easy construction of non-stationary models, defined in a natural way that provides good local interpretation, via spatially varying parameters, and is computationally very efficient, as we still obtain GMRF representations. The extension to manifolds is also useful, with fields on the globe as the main example.

A third issue, which has not yet been discussed, is that the SPDE approach might help to interpret external covariates (e.g. wind speed) as an appropriate drift term or similar in the related SPDE and then this covariate would enter the spatial dependence models correctly. This is again an argument for more physics-based spatial modelling but, as we have shown in this paper, such an approach can also provide a huge computational benefit.

On the negative side, the approach comes with an implementation and preprocessing cost for setting up the models, as it involves the SPDE, triangulations and GMRF representations, but we firmly believe that such costs are unavoidable when efficient computations are required.


This paper is dedicated to the memory of Julian E. Besag (1945–2010), whose work on Markov random fields from 1974 onwards inspired us to investigate the link to Gaussian random-field models that are commonly used in spatial statistics.

The authors thank the Research Section Committee and reviewers for their very helpful comments and suggestions. We are also grateful to Peter Guttorp for encouraging us to address the global temperature problem, Daniel Simpson for providing the convergence rate result (11) and the key reference in Appendix C.5 and for invaluable discussions and comments, and to Georg Lindgren for numerous comments on the manuscript.


Appendix A: Explicit results

This appendix includes some explicit expressions and results that are not included in the main text.

A.1. Regular lattices

Here we shall give some explicit precision expressions for grid-based models on inline image and inline image. Consider the SPDE


where H is a diagonal d-dimensional matrix with positive diagonal elements (compare with Section 3.4).

For any given ordered discretization u1,…,un on inline image, let γi=uiui−1, δi=ui+1ui and si=(γi+δi)/2. Since d=1, we can write H=Hgeqslant R: gt-or-equal, slanted0, and the elements on row i, around the diagonal, of the precision are given by


where ai=H/γisi, bi=H/δisi and ci=κ2+ai+bi. If the spacing is regular, s=δ=γ, and a=ai=biH/δ2 and c=ciκ2+2a. The special case α=2 with κ=0 and irregular spacing is a generalization of Lindgren and Rue (2008).

For inline image, assume a given regular grid discretization, with horizontal (co-ordinate component 1) distances γ and vertical (co-ordinate component 2) distances δ. Let s=γδ, a=H11/γ2, b=H22/δ2 and c=κ2+2a+2b. The precision elements are then given by


If the grid distances are proportional to the square root of the corresponding diagonal elements of H (such as in the isotropic case γ=δ and H11=H22), the expressions simplify to s=γδ, a=b=H11/γ2=H22/δ2 and c=κ2+4a.

A.2. Triangulated domains

In this section, we derive explicit expressions for the building blocks for the precision matrices, for general triangulated domains with piecewise linear basis functions. For implementation of the theory in Appendix C, we need to calculate


For 2-manifolds such as regions in inline image or on inline image, we require a triangulation with a set of vertices v1,…,vn, embedded in inline image. Each vertex vk is assigned a continuous piecewise linear basis function ψk with support on the triangles attached to vk. To obtain explicit expressions for equation (19), we need to introduce some notation for geometry of an arbitrary triangle. For notational convenience, we number the corner vertices of a given triangle T=(v0,v1,v2). The edge vectors opposite each corner are


and the corner angles are θ0, θ1 and θ2.

The triangle area |T| can be obtained from the formula |T|=‖e0×e1‖/2, i.e. half the length of the vector product in inline image. The contributions from the triangle to the inline image- and C-matrices are given by


The contribution to G0,1 from the triangle T is


and the entire contribution from the triangle is


For the boundary integrals in expression (19), the contribution from the triangle is


where inline image(edge k in T lies on ∂Ω). Summing the contributions from all the triangles yields the completeinline image-, C-, G- and B-matrices.

For the anisotropic version, parameterized as in Appendix A.1 and Appendix C.4, the modified G-matrix elements are given by


where adj(H) is the adjugate matrix of H, for non-singular matrices defined as  det (H)H−1.

A.3. Non-stationary and oscillating models

For easy reference, we give specific precision matrix expressions for the case α=2 for arbitrary triangulated manifold domains Ω. The stationary and simple oscillating models for α=2 have precision matrices given by


where θ=0 corresponds to the regular Matérn case and 0<θ<1 are oscillating models. Using the approximation from expression (13), the non-stationary model (12) with α=2 has precision matrix given by


where κ2 and τ are diagonal matrices, with inline image and τii=τ(ui). As shown in Appendix C.5, all the C should be replaced by inline image to obtain a Markov model.

A.4. Neumann boundary effects

The effects on the covariance functions resulting from using Neumann boundary conditions can be explicitly expressed as a folding effect. When the full SPDE is


the following theorem provides a direct answer, in terms of the Matérn covariance function.

Theorem 1.  If x is a solution to the boundary value problem (23) for Ω=[0,L] and a positive integer α, then


where rM is the Matérn covariance as defined on the whole of inline image.

Theorem 1, which extends naturally to arbitrary generalized rectangles in inline image, is proved in Appendix D.1. In practice, when the effective range is small compared with L, only the three main terms need to be included for a very close approximation:


Moreover, the resulting covariance is nearly indistinguishable from the stationary Matérn covariance at distances greater than twice the range away from the borders of the domain.

A.5. Oscillating covariance functions

The covariances for the oscillating model can be calculated explicitly for inline image and inline image, from the spectrum. On inline image, complex analysis gives


which has variance {4 cos (πθ/2)κ3}−1. On inline image, involved Bessel function integrals yield


which has variance {4πκ2 sinc(πθ)}−1.

Appendix B: Manifolds, random fields and operator identities

B.1. Manifold calculus

To state concisely the theory needed for constructing solutions to SPDEs on more general spaces than inline image, we need to introduce some concepts from differential geometry and manifolds. A main point is that, loosely speaking, for statisticians who are familiar with measure theory and stochastic calculus on inline image, many of the familiar rules for calculus for random processes and fields still apply, as long as all expressions are defined in co-ordinate-free manners. Here, we give a brief overview of the concepts that are used in the subsequent appendices. For more details on manifolds, differential calculus and geometric measure theory see for example Auslander and MacKenzie (1977), Federer (1978) and Krantz and Parks (2008).

Loosely, we say that a space Ω is a d-manifold if it locally behaves as inline image. We consider only manifolds with well-behaved boundaries, in the sense that the boundary ∂Ω of a manifold, if present, is required to be a piecewise smooth (d−1)-manifold. We also require the manifolds to be metric manifolds, so that distances between points and angles between vectors are well defined.

A bounded manifold has a finite maximal distance between points. If such a manifold is complete in the set sense, it is called compact. Finally, if the manifold is compact but has no boundary, it is closed. The most common metric manifolds are subsets of inline image equipped with the Euclidean metric. The prime example of a closed manifold is the unit sphere inline image embedded in inline image. In Fourier analysis for images, the flat torus commonly appears, when considering periodic continuations of a rectangular region. Topologically, this is equivalent to a torus, but with a different metric compared with a torus that is embedded in inline image. The d-dimensional hypercube [0,1]d is a compact manifold with a closed boundary.

From the metric that is associated with the manifold it is possible to define differential operators. Let φ denote a function inline image. The gradient of φ at u is a vector ∇φ(u) defined indirectly via directional derivatives. In inline image with Euclidean metric, the gradient operator ∇ is formally given by the column vector (∂/∂u1,…,∂/∂ud)T. The LaplacianΔ of φ at u (or the Laplace–Beltrami operator) can be defined as the sum of the second-order directional derivatives, with respect to a local orthonormal basis, and is denoted Δφ(u)=∇·∇φ(u). In Euclidean metric on inline image, we can write inline image. At the boundary of Ω, the vector n(u) denotes the unit length outward normal vector at the point u on the boundary ∂Ω. The normal derivative of a function φ is the directional derivative ∂nφ(u)=n(u)·∇φ(u).

An alternative to defining integration on general manifolds through mapping subsets into inline image is to replace Lebesgue integration with integrals defined through normalized Hausdorff measures (Federer, 1951, 1978), here denoted inline image. This leads to a natural generalization of Lebesgue measure and integration that coincides with the regular theory on inline image. We write the area of a d-dimensional Hausdorff measurable subset A⊂Ω as inline image, and the Hausdorff integral of a (measurable) function φ as inline image. An inner product between scalar- or vector-valued functions φ and ψ is defined through


A function inline image, mgeqslant R: gt-or-equal, slanted1, is said to be square integrable if and only if inline image, which is denoted φ ∈ L2(Ω).

A fundamental relationship, that corresponds to integration by parts for functions on inline image, is Green's first identity,


Typical statements of the identity require φ ∈ C1(Ω) and ψ ∈ C2(Ω), but we shall relax these requirements considerably in lemma 1.

We also need to define Fourier transforms on general manifolds, where the usual cosine and sine functions do not exist.

 Definition 1  (generalized Fourier representation). The Fourier transform pair for functions inline image is given by


(Here, we briefly abuse our notation by including complex functions in the inner products.)

If Ω is a compact manifold, a countable subset {Ek,k=0,1,2,…} of orthogonal and normalized eigenfunctions to the negated Laplacian, −ΔEk=λkEk, can be chosen as basis, and the Fourier representation for a function inline image is given by


Finally, we define a subspace of L2-functions, with inner product adapted to the differential operators that we shall study in the remainder of this paper.

 Definition 2.  The Hilbert space inline image, for a given κgeqslant R: gt-or-equal, slanted0, is the space of functions inline image with ∇φ ∈ L2(Ω), equipped with inner product


The inner product induces a norm, which is given by inline image. The boundary case κ=0 is also well defined, since inline image is a seminorm, and inline image is a space of equivalence classes of functions, that can be identified by functions with 〈φ,1〉Ω=0.

Note that, for κ>0, the norms are equivalent, and that the Hilbert space inline image is a quintessential Sobolev space.

B.2. Generalized Gaussian random fields

We now turn to the problem of characterizing random fields on Ω. We restrict ourselves to GFs that are at most as irregular as white noise. The distributions of such fields are determined by the properties of expectations and covariances of integrals of functions with respect to random measures: the so-called finite dimensional distributions.

In classical theory for GFs, the following definition can be used.

 Definition 3  (GF). A random function inline image on a manifold Ω is a GF if {x(uk),k=1,…,n} are jointly Gaussian random vectors for every finite set of points {uk ∈ Ω,k=1,…,n}. If there is a constant bgeqslant R: gt-or-equal, slanted0 such that E{x(u)2}leqslant R: less-than-or-eq, slantb for all u ∈ Ω, the random field has bounded second moments.

The complicating issue in dealing with the fractional SPDEs that are considered in this paper is that, for some parameter values, the solutions themselves are discontinuous everywhere, although still more regular than white noise. Thus, since the solutions do not necessarily have well-defined pointwise meaning, the above definition is not applicable, and the driving white noise itself is also not a regular random field. Inspired by Adler and Taylor (2007), we solve this by using a generalized definition based on generalized functions.

 Definition 4  (generalized function). For a given function space inline image, an inline image-generalized function inline image, with an associated generating additive measure inline image, is an equivalence class of objects identified through the collection of integration properties that is defined by 〈φ,xΩ=x*(φ), for all x*-measurable functions inline image.

When x* is absolutely continuous with respect to the Hausdorff measure on Ω, x is a set of regular functions, at most differing on sets with Hausdorff measure zero. The definition allows many of the regular integration rules to be used for generalized functions, without any need to introduce heavy theoretical notational machinery, and provides a straightforward way of generalizing definition 3 to the kind of entities that we need for the subsequent analysis.

 Definition 5  (generalized GF). A generalized GF x on Ω is a random L2(Ω) generalized function such that, for every finite set of test functions {φi ∈ L2(Ω), i=1,…,n}, the inner products 〈φi,xΩ, i=1,…,n, are jointly Gaussian. If there is a constant bgeqslant R: gt-or-equal, slanted0 such that inline image for every φ ∈ L2(Ω), the generalized field x has L2(Ω)-bounded second moments, abbreviated as L2(Ω) bounded.

Of particular importance is the fact that white noise can be defined directly as a generalized GF.

 Definition 6  (Gaussian white noise). Gaussian white noise inline image on a manifold Ω is an L2(Ω)-bounded generalized GF such that, for any set of test functions {φi ∈ L2(Ω),i=1,…,n}, the integrals inline image, i=1,…,n, are jointly Gaussian, with expectation and covariance measures given by


In particular, the covariance measure of inline image over two subregions A,B⊆Ω is equal to the area measure of their intersection, |AB|Ω, so the variance measure of inline image over a region is equal to the area of the region.

We note that the popular approach to defining white noise on inline image via a Brownian sheet is not applicable for general manifolds, since the notion of globally orthogonal directions is not present. The closest equivalent would be to define a set-indexed Gaussian random function inline image, such that inline image and inline image. This definition is equivalent to that above (Adler and Taylor, 2007), and the Brownian sheet is a special case that considers only rectangular regions along the axes of inline image, with one corner fixed at the origin.

B.3. Operator identities

Identities for differentiation and integration on manifolds are usually stated as requiring functions in C1, C2 or even C, which is much too restrictive to be applied to generalized functions and random fields. Here, we present the two fundamental identities that are needed for the subsequent SPDE analysis; Green's first identity and a scalar product characterization of the half-Laplacian.

B.3.1. Stochastic Green's first identity

We here state a generalization of Green's first identity, showing that the identity applies to generalized fields, as opposed to only differentiable functions.

Lemma 1.  If ∇f ∈ L2(Ω) and Δx is L2(Ω) bounded, then (with probability 1)


If ∇x is L2(Ω) bounded and Δf ∈ L2(Ω) , then (with probability 1)


For brevity, we include only a sketch of the proof.

 Proof.  The requirements imply that each integrand can be approximated arbitrarily closely in the L2-senses using Cq-functions inline image and inline image, where q in each case is sufficiently large for the regular Green's identity to hold for inline image and inline image. Using the triangle inequality, it follows that the expectation of the squared difference between the left- and right-hand sides of the identity can be bounded by an arbitrarily small positive constant. Hence, the difference is zero in quadratic mean, and the identity holds with probability 1.

B.3.2. Half-Laplacian

In defining and solving the SPDEs considered, the half-Laplacian operator needs to be characterized in a way that permits practical calculations on general manifolds. The fractional modified Laplacian operators (κ2−Δ)α/2, κ,αgeqslant R: gt-or-equal, slanted0, are commonly (Samko et al. (1992), page 483) defined through the Fourier transform, as defined above:


on inline image;


on compact Ω, where λk, k=0,1,2,…, are the eigenvalues of −Δ. The formal definition is mostly of theoretical interest since, in practice, the generalized Fourier basis and eigenvalues for the Laplacian are unknown. In addition, even if the functions are known, working directly in the Fourier basis is computationally expensive for general observation models, since the basis functions do not have compact support, which leads to dense covariance and precision matrices. The following lemma provides an integration identity that allows practical calculations involving the half-Laplacian.

Lemma 2.  Let φ and ψ be functions in inline image. Then, the Fourier-based modified half-Laplacians satisfy


whenever either

  • (a)inline image,
  • (b) Ω is closed or
  • (c) Ω is compact and 〈φ,∂nψ∂Ω=〈∂nφ,ψ∂Ω=0.

For a proof, see Appendix D.2. Lemma 2 shows that, for functions ψ fulfilling the requirements, we can use the Hilbert space inner product as a definition of the half-Laplacian. This also generalizes in a natural way to random fields x with L2(Ω)-bounded ∇x, as well as to suitably well-behaved unbounded manifolds.

It would be tempting to eliminate the qualifiers in part (c) of lemma 2 by subtracting the average of the two boundary integrals to the relationship, and to extend lemma 2 to a complete equivalence relationship. However, the motivation may be problematic, since the half-Laplacian is defined for a wider class of functions than the Laplacian, and it is unclear whether such a generalization necessarily yields the same half-Laplacian as the Fourier definition for functions that are not of the class Δφ ∈ L2(Ω). See Ilićet al. (2008) for a partial result.

Appendix C: Hilbert space approximation

We are now ready to formulate the main results of the paper in more technical detail. The idea is to approximate the full SPDE solutions with functions in finite Hilbert spaces, showing that the approximations converge to the true solutions as the finite Hilbert space approaches the full space. In Appendix C.1, we state the convergence and stochastic FEM definitions that are needed. The main result for Matérn covariance models is stated in Appendix C.2, followed by generalizations to intrinsic and oscillating fields in Appendix C.3 and Appendix C.4. Finally, the full finite element constructions are modified to Markov models in Appendix C.5.

C.1. Weak convergence and stochastic finite element methods

We start by stating formal definitions of convergence of Hilbert spaces and of random fields in such spaces (definitions 7 and 8) as well as the definition of the finite element constructions that will be used (definition 9).

 Definition 7  (dense subspace sequences). A finite subspace inline image is spanned by a finite set of basis functions Ψn={ψ1,…,ψn}. We say that a sequence of subspaces inline image is dense in inline image if for every inline image there is a sequence {fn}, inline image, such that inline image.

If the subspace sequence is nested, there is a monotonely convergent sequence {fn}, but that is not a requirement here. For given inline image, we can choose the projection of inline image onto inline image, i.e. the fn that minimizes inline image. The error ffn is orthogonal to inline image, and the basis co-ordinates can be determined via the system of equations inline image, for all k=1,…,n.

 Definition 8  (weak convergence). A sequence of L2(Ω)-bounded generalized GFs {xn} is said to converge weakly to an L2(Ω)-bounded generalized GF x if, for all f,g ∈ L2(Ω),


as n→∞. We denote such convergence by


 Definition 9  (finite element approximations). Let inline image be a second-order elliptic differential operator, and let inline image be a generalized GF on Ω. Let inline image denote approximate weak solutions to the SPDE inline image on Ω.

  • (a) The weak Galerkin solutions are given by Gaussian w={w1,…,wn} such that
    for every pair of test functions inline image.
  • (b) The weak least squares solutions are given by Gaussian w={w1,…,wn} such that
    for every pair of test functions inline image.

C.2. Basic Matérn-like cases

In the remainder of the appendices, we let inline image. In the classic Matérn case, the SPDE inline image can, for integer α-values, be unravelled into an iterative formulation


For integers α=1,2,3,…, y is a solution to the original SPDE. To avoid solutions in the null space of (κ2−Δ), we shall require Neumann boundaries, i.e. the solutions must have zero normal derivatives at the boundary of Ω. In the Hilbert space approximation, this can be achieved by requiring that all basis functions have zero normal derivatives.

We now formulate the three main theorems of the paper, which show what the precision matrices should look like for given basis functions (theorem 2), that the finite Hilbert representations converge to the true distributions for α=1 and α=2 and dense Hilbert space sequences (theorem 3) and finally that the iterative constructions for αgeqslant R: gt-or-equal, slanted3 also converge (theorem 4). A sequence inline image of piecewise linear Hilbert spaces defined on non-degenerate triangulations of Ω is a dense sequence in inline image if the maximal edge length decreases to zero. Thus, the theorems are applicable for piecewise linear basis functions, showing weak convergence of the field itself and its derivatives up to order min(2,α).

Theorem 2  (finite element precisions). Define matrices C, G and K through


and denote the distribution for w with N(0,Q−1), where the precision matrix Q is the inverse of the covariance matrix, and let xnkψkwk be a weak inline image approximation to inline image, inline image, with Neumann boundaries, and ∂nψk=0 on ∂Ω.

  • (a) When α=2 and inline image, the weak Galerkin solution is obtained for Q=KTC−1K.
  • (b) When α=1 and inline image, the weak least squares solution is obtained for Q=K.
  • (c) When α=2 and inline image is an L2(Ω)-bounded GF in inline image with mean 0 and precision inline image, the weak Galerkin solution is obtained for inline image.

Theorem 3  (convergence). Let x be a weak solution to the SPDE inline image, inline image, with Neumann boundaries on a manifold Ω, and let xn be a weak inline image approximation, when inline image is Gaussian white noise. Then,


if the sequence inline image is dense in inline image, and either

  • (a)α=2, and xn is the Galerkin solution, or
  • (b)α=1 and xn is the least squares solution.

Theorem 4  (iterative convergence). Let y be a weak solution to the linear SPDE inline image on a manifold Ω, for some L2(Ω)-bounded random field inline image, and let x be a weak solution to the SPDE inline image, where inline image. Further, let yn be a weak inline image approximation to y such that


and let xn be the weak Galerkin solution in inline image to the SPDEs inline image on Ω. Then,


For proofs of the three theorems, see Appendix D.3.

C.3. Intrinsic cases

When κ=0, the Hilbert space from definition 2 is a space of equivalence classes of functions, corresponding to SPDE solutions where arbitrary functions in the null space of (−Δ)α/2 can be added. Such solution fields are known as intrinsic fields and have well-defined properties. With piecewise linear basis functions, the intrinsicness can be exactly reproduced for α=1 for all manifolds, and partially for α=2 on subsets of inline image, by relaxing the boundary constraints to free boundaries. For larger α or more general manifolds, the intrinsicness will only be approximately represented. How to construct models with more fine-tuned control of the null space is a subject for further research.

To approximate intrinsic fields with αgeqslant R: gt-or-equal, slanted2 and free boundaries, the matrix K in theorem 2 should be replaced by GB (owing to Green's identity), where the elements of the (possibly asymmetric) boundary integral matrix B are given by Bi,j=〈ψi,∂nψj∂Ω. The formulations and proofs of theorem 3 and theorem 4 remain unchanged, but with the convergence defined only with respect to test functions f and g orthogonal to the null space of the linear SPDE operator.

The notion of non-null-space convergence allows us to formulate a simple proof of the result from Besag and Mondal (2005), that says that a first-order intrinsic conditional auto-regressive model on infinite lattices in inline image converges to the de Wij process, which is an intrinsic generalized Gaussian random field. As can be seen in Appendix A.1, for α=1 and κ=0, the Q-matrix (equal to G) for a triangulated regular grid matches the ordinary intrinsic first-order conditional auto-regressive model. The null spaces of the half-Laplacian are constant functions. Choose non-trivial test functions f and g that integrate to 0 and apply theorem 3 and definition 8. This shows that the regular conditional auto-regressive model, seen as a Hilbert space representation with linear basis functions, converges to the de Wij process, which is the special SPDE case α=1, κ=0, in inline image.

C.4. Oscillating and non-isotropic cases

To construct the Hilbert space approximation for the oscillating model that was introduced in Section 3.3, as well as non-isotropic versions, we introduce a coupled system of SPDEs for α=2,


which is equivalent to the complex SPDE


The model in Section 3.3 corresponds to h1=κ2 cos (πθ), h2=κ2 sin (πθ), H1=I and H2=0.

To solve the coupled SPDE system (33) we take a set {ψk,k=1,…,n} of basis functions for inline image and construct a basis for the solution space for (x1x2)T as


The definitions of the G- and K-matrices are modified as follows:


Using the same construction as in the regular case, the precision for the solutions is given by


where Q=Q(h1,H1)+Q(h2,H2), and Q(·,·) is the precision that is generated for the regular iterated model with the given parameters. Surprisingly, regardless of the choice of parameters, the solution components are independent.

C.5. Markov approximation

By choosing piecewise linear basis functions, the practical calculation of the matrix elements in the construction of the precision is straightforward, and the local support makes the basic matrices sparse. Since they are not orthogonal, the C-matrix will be non-diagonal, and therefore the FEM construction does not directly yield Markov fields for αgeqslant R: gt-or-equal, slanted2, since C−1 is not sparse. However, following standard practice in FEMs, C can be approximated with a diagonal matrix as follows. Let inline image be a diagonal matrix, withinline image, and note that this preserves the interpretation of the matrix as an integration matrix. Substituting C−1 with inline image yields a Markov approximation to the FEM solution.

The convergence rate for the Markov approximation is the same as for the full FEM model, which can be shown by adapting the details of the proofs of convergence. Let f and g be test functions in inline image and let fn and gn be their projections onto inline image, with basis weights wf and wg. Taking the difference between the covariances for the Markov (inline image) and the full FEM solution (xn) for α=2 yields the error


Requiring inline image, it follows from lemma 1 in Chen and Thomée (1985) that the covariance error is bounded by ch2, where c is some constant and h is the diameter of the largest circle that can be inscribed in a triangle of the triangulation. This shows that the convergence rate from expression (11) will not be affected by the Markov approximation. In practice, the C-matrix in K should also be replaced by inline image. This improves the approximation when either h or κ is large, with numerical comparisons showing a covariance error reduction of as much as a factor 3. See Bolin and Lindgren (2009) for a comparison of the resulting kriging errors for various methods, showing negligible differences between the exact FEM representation and the Markov approximation.

Appendix D: Proofs

D.1. Folded covariance: proof of theorem 1

Writing the covariance of the SPDE solutions on the interval inline image in terms of the spectral representation gives an infinite series,


where λ0=(κ2αL)−1 and λk=2L−1{κ2+(πk/L)2}α are the variances of the weights for the basis functions  cos (uπk/L), k=0,1,2,….

We use the spectral representation of the Matérn covariance in the statement of theorem 1 and show that the resulting expression is equal to the spectral representation of the covariance for the solutions to the given SPDE. The Matérn covariance on inline image (with variance given by the SPDE) can be written as


Thus, with inline image denoting the folded covariance in the statement of theorem 1,


Rewriting the cosines via Euler's formulae, we obtain


where we used the Dirac measure representation


Finally, combining the results yields


which is precisely the expression sought in equation (35).

D.2. Modified half-Laplacian equivalence: proof of lemma 2

For brevity, we present only the proof for compact manifolds, as the proof for inline image follows the same principle but without the boundary complications. The main difference is that the Fourier representation is discrete for compact manifolds and continuous for inline image.

Let λkgeqslant R: gt-or-equal, slanted0, k=0,1,2,…, be the eigenvalue corresponding to eigenfunction Ek of −Δ (definition 1). Then, with inline image, the modified half-Laplacian from Appendix B.3.2. is defined through inline image, and we obtain


and, since inline image, we can change the order of integration and summation,


since the eigenfunctions Ek and Ek are orthonormal.

Now, starting from the Hilbert space inner product,


and, since inline image and Ek,Ek ∈ L2(Ω), we can change the order of differentiation and summation,


and, since in addition ∇Ek,∇Ek ∈ L2(Ω), we can change the order of summation and integration,


Further, Green's identity for 〈∇Ek,∇EkΩ yields


Since ∇φ,∇ψ ∈ L2(Ω) we can change the order of summation, integration and differentiation for the boundary integrals,


By the boundary requirements in lemma 2, whenever Green's identity holds, the boundary integral vanishes, either because the boundary is empty (if the manifold is closed), or the integrand is 0, so collecting all the terms we obtain


and the proof is complete.

D.3. Hilbert space convergence

D.3.1. Proof of theorem 2 (finite element precisions)

The proofs for theorem 2 are straightforward applications of the definitions. Let wf and wg be the Hilbert space co-ordinates of two test functions inline image, and let inline image.

For case (a), α=2 and inline image, so


owing to Green's identity, and


This covariance is equal to


for every pair of test functions fn, gn when Q=cov(w,w)−1=KTC−1K.

For case (b), α=1 and inline image. Using the same technique as in (a), but with lemma 2 instead of Green's identity, inline image and


so Q=KTK−1K=K, noting that K is a symmetric matrix since both C and G are symmetric.

Finally, for case (c), α=2 and inline image is a GF on inline image with precision inline image. Using the same technique as for (a),


and the finite basis representation of the noise inline image gives


Requiring equality for all pairs of test functions yields inline image. Here, keeping the transposes allows the proof to apply also to the intrinsic free-boundary cases.

D.3.2. Proof of theorem 3 (convergence)

First, we show that expression (28) follows from expression (29). Let inline image, let f and g be functions in inline image and let inline image be the solution to the PDE


and correspondingly for g. Then inline image and inline image are in inline image and further fulfil the requirements of lemma 1 and lemma 2. Therefore,




where the last equality holds when α=2, since inline image is L2(Ω) bounded. The convergence of xn to x follows from expression (29). In the Galerkin case (a), we have


and similarly for the least squares case (b).

For expression (29), let fnkψkw,k and gnkψkw,k be the orthogonal projections of f and g onto inline image. In case (a), then




as n→∞. Similarly in case (b), for any inline image fulfilling the requirements of lemma 2,




as n→∞.

D.3.3. Proof of theorem 4 (iterative convergence)

First, we show that expression (31) follows from expression (32). Let inline image and inline image be defined as in the proof of theorem 3. Then, since inline image,




and the convergence of xn to x follows from expression (32). For expression (32), as in the proof of theorem 3, inline image, and


as n→∞, owing to requirement (30).

Discussion on the paper by Lindgren, Rue and Lindström

John T. Kent (University of Leeds)

This paper uses finite element methods to give a principled construction of Matérn-type Gaussian models in a variety of spatial settings. A key property of such models is that they have a sparse inverse covariance or precision matrix. The paper gives a comprehensive treatment of these models and it seems destined to become a landmark paper in spatial analysis. In many ways the paper is a natural sequel to Besag's (1974) paper that was read to the Society, and it forms a fitting tribute to his memory.

In earlier work often a great distinction was made between conditional auto-regression (CAR) and simultaneous auto-regression (SAR) models. In terms of a zero-mean Gaussian process {xij} on the integer lattice in inline image and the notation A(x)ij=(xi−1,j+xi+1,j+xi,j−1+xi,j+1)/a for the shrunken first-order neighbourhood average (a>4), the one-parameter versions are


(CAR) and


(SAR). In the first, the conditional distribution of xij given the values of the process at all the remaining sites depends only on the nearest neighbours; in the second a filtered version of the x-process equals (discrete) white noise. The second process is a discrete approximation to the stochastic partial differential equation (2) based on the differential operator Dκ,α=(κ2−Δ)α/2 with α=2.

The difference between the two processes can be seen most clearly in the spectral domain. Setting inline image and inline image for ω ∈ (−π,π)2, the two spectral densities are


By taking suitable convolutions, both models can be extended to higher order neighbourhoods as described in the paper and, letting the even integer αgeqslant R: gt-or-equal, slanted2 denote the order of neighbourhood, we have


where the dimension is d=2 here and M(ν) stands for the Matérn model of index ν. In higher dimensions, I wonder to what extent we need to restrict α to be sufficiently large that ν>0, or at least νgeqslant R: gt-or-equal, slanted0. One of the clever observations in the paper is to note that this SAR and CAR identification can be extended to odd integers αgeqslant R: gt-or-equal, slanted1 where, for the approximate weak solution of the stochastic partial differential equation, the finite clement ideas from Section 2.3 are used.

The role of the null space of a differential operator is not clear to me. Consider the stationary process on all of inline image generated by Dκ,α with κ>0 and α=2. As noted in the paper, this differential operator has a null space which includes certain exponential functions. However, the random field is well defined even when specified to have mean 0. Further, if Dκ,α is used to motivate a process on a finite domain using the finite element construction, it is not clear to me what happens to the null space.

The situation is more delicate for intrinsic processes (Kent and Mardia, 1994). Consider the self-similar process on inline image generated by D0,α, which is intrinsic provided that the real parameter α is sufficiently large that ν=αd/2>0. The intrinsic order is p=[ν], where [·] denotes the integer part. An intrinsic process is defined only up to the null space of polynomials of degree p. When α is an even integer, the differential operator D0,α has a null space given by the polynomials of degree r=α−1, and this result extends to all integers αgeqslant R: gt-or-equal, slanted1. For integer α in dimensions d=1,2, it follows that p=r. However, if dgeqslant R: gt-or-equal, slanted3, then p<r. Thus, in the intrinsic setting, the question about the role of the null space of the differential operator also arises.

Recall that a differential operator D can be used to define a smoothing spline with penalty term ∫(Dϕ)2 for sufficiently smooth functions ϕ on inline image. For example, D0,2 leads to cubic and thin plate splines in dimensions d=1,2. As the paper continually emphasizes, this construction is computationally intensive for large n when dgeqslant R: gt-or-equal, slanted2. Hence it is natural to ask how successfully the finite element ideas of this paper can be used to yield a computationally efficient approximation.

The paper mentions briefly how deformations can be used to introduce non-stationarity in a real-valued Gaussian process. Gaussian models can also be used to construct deformations of Euclidean space. Bookstein (1989) suggested the use of thin plate splines, and Michael Miller and his colleagues (e.g. Christensen et al. (l996)) developed more sophisticated models using non-linear partial differential equations motivated by continuum mechanics. My former doctoral student, Godwin (2000), constructed deformations which were constrained by elastic penalties and discretized by using finite element methods. In this case a pair of interacting partial differential equations for the horizontal and vertical displacement is obtained, depending on two parameters called the Lamé coefficients; changing the ratio between them can have a dramatic effect on the fitted deformation.

In summary, I found this to be an extremely stimulating paper and it gives me great pleasure to propose the vote of thanks.

Peter J. Diggle (Lancaster University)

This paper is an important contribution to an important topic. Latent Gaussian fields are widely used as components of geostatistical models (Diggle et al., 1998) and of point process models (Møller et al., 1998). I believe that they should also be more widely used for the analysis of spatially discrete data. Consider, for example, data consisting of counts Yi associated with each of n subregions Ai that partition a region of interest, A. A standard class of models for data of this kind is that


where the Xi form a Markov random field in which


where inline image is the average of the values of Xj associated with the ni subregions Aj that are considered to be neighbours of Ai. The usual approach to defining the neighbours is through contiguities: Ai and Aj are neighbours if they share a common boundary. This is appealing in a regular geography; less so when the Ai vary substantially in size and shape. An alternative is to assume that


where now X(·) is a spatially continuous Gaussian field.

A pragmatic reason for preferring class (36) over class (37) has long been that the associated computations are much less burdensome. The remarkable computational efficiency that the methods in this paper achieve should instead allow a chioce between classes (36) and (37) to be made on the basis of their merits as models.

Another important aspect of the paper is its delivery of appealing non-stationary constructions through the paper's equation (12). In this respect, it is a pity that the restriction to integer α excludes the exponential correlation function (κ=0.5) in the two-dimensional case. The popularity of the Matérn family of correlation functions stems from the fact that the integer part of κ corresponds to the mean-square differentiability of X(·). But κ is difficult to estimate, and a widely used strategy is to choose its value among a small number of qualitatively different candidates, say κ=0.5,1.5,2.5, corresponding to continuous, differentiable and twice-differentiable fields X(·) respectively.

The paper also gives a new slant on low rank approximations by representing the field X(·) as


Here again, computational efficiency is crucial as the number of terms in the summation needs to be large when the range of the spatial correlation is small relative to the dimensions of the region of interest.

The paper uses integrated nested Laplace approximations as the basis for inference, building on Rue et al. (2009). For the applications that are considered in tonight's paper, the focus on marginal posteriors is a little too restrictive. If, as is often the case in applications, the focus of scientific interest is on predictive inference for one or more non-linear functionals of the unobserved realization of X(·), we need a reliable approximation to the joint predictive distribution of the whole field X(·), which is in practice approximated by its joint distribution on a fine grid spanning the region of interest.

The ability to fit models of this kind without resorting to Markov chain Monte Carlo methods is very welcome. My strong impression is that, for problems of this degree of complexity, the tuning and empirical assessment of convergence of Markov chain Monte Carlo algorithms remains something of a black art. However, is tuning still necessary for the methods that are proposed in the paper to deliver accurate inferences, and if so how delicate is this tuning?

The acute myeloid leukaemia (AML) data that are analysed in the paper consist of the residential locations and (censored) survival times of all recorded cases in an area of north-west England. The authors’ analysis follows earlier published analyses in treating these as geostatistical data, implicitly assuming that the locations have been sampled independently of the survival process. But this may not be so—the data are a realization of a marked point process with carrier space inline image, where inline image is the set of all locations of people resident in the study region who are at risk of contracting AML. This does not invalidate the authors’ analysis, which addresses the spatial variation in survival prognosis conditional on contraction of AML. But if the wider objective is to identify spatially varying factors that are involved in the aetiology of AML it potentially tells only half of the story, as there may be unrecognized spatially varying factors that affect both disease risk and survival prognosis; for a discussion of some of the methodological issues that are involved, see Diggle et al. (2010). For the AML data, although there is clear evidence of a marginal association between a simple circle counting estimate of the local density of cases and the hazard for survival (p<0.001 in a Cox proportional hazards analysis), this is accounted for by the authors’ adjustments for sex, white blood cell count and deprivation (p=0.349). However, a thorough joint analysis of risk and survival prognosis should also take into account the spatial variation in the local density of the population at risk.

It is with great pleasure that I second the vote of thanks for what will be, I am sure, a very influential paper.

The vote of thanks was passed by acclamation.

J. B. Illian (St Andrews University) and D. P. Simpson (Norwegian University of Science and Technology, Trondheim) Explicitly linking Gaussian fields and Gaussian Markov random fields—relevance for point process modelling

We congratulate the authors for this inspiring paper which we are sure will have a strong influence on spatial and spatiotemporal modelling for years to come. The stochastic partial differential equation approach provides an alternative representation for a large class of non-stationary Gaussian random-field models without needing explicitly to derive a covariance function. A piecewise linear Gaussian Markov random-field approximation is constructed that globally approximates the true random field up to a given resolution. This is a particularly interesting feature in the context of spatial point process modelling.

A log-Gaussian Cox process is a spatial point process that models the log-intensity field as a Gaussian random field defined continuously over the whole observation window. A common model fitting approach is to place a fine lattice over the observation window (Møller and Waagepetersen, 2007) and to count the number of points that are present in each grid box (Rue et al., 2009; Illian and Rue, 2010; lllian et al., 2011). This count value has a Poisson distribution and the latent spatial structure is modelled through a Gaussian Markov random field. Using this approach, the computational lattice has two roles: to estimate the latent Gaussian field, and to approximate the position of the points through a binning procedure. The first of these aims is entirely natural, whereas the second is an artefact of the approximation and leads to finer lattices than are really necessary. This is wasteful. As the stochastic partial differential equation approach constructs a continuously indexed random field, it is no longer necessary to bin the data over the lattice. It is then possible to construct a numerical approximation to the point process likelihood and to perform inference as normal. In particular, it is possible to perform extremely fast approximate inference for realistically complicated marked point process models by using the integrated nested Laplace approximations framework of Rue et al. (2009).

This approach begets a whole host of modelling questions. Consider the point pattern that was discussed in Illian and Hendrichsen (2010) and in Illian et al. (2011) consisting of the locations of muskoxen herds in Zackenberg valley in eastern Greenland; see Fig 10 for the location of the study area within Greenland and a map of the area. The observation window has a complicated boundary structure; it contains hard boundaries (the sea to the south), permeable boundaries (a river that the muskoxen may, reluctantly, cross) and artificial boundaries (the end of the observation window in the north). The stochastic partial differential equation approach allows us to incorporate this important information into our models. This is impossible with standard covariance-based models.

Figure 10.

 Map indicating the structure of the Zackenberg valley and its boundaries

We thank Mads Forchhammer from Greenland Ecosystem Monitoring, and Ditte Hendrichsen for introducing us to the Zachenberg muskoxen data as an example of different boundary characteristics within a study site.

Tilmann Gneiting and Michael Scheuerer (Universität Heidelberg)

We congratulate Lindgren, Rue and Lindström on an exceptionally rich and original paper that builds bridges between statistics, probability, approximation theory, numerical analysis and applied fields, and opens up a wealth of new perspectives.

We share the authors’ excitement about the ease with which the stochastic partial differential equation approach allows for the modelling of non-stationarity, both on Euclidean spaces and on manifolds. As the authors note, an appealing interpretation of the non-stationary model (12) is that of a locally stationary Matérn field, whereas the actual form of its correlation function is unknown. Would members of the locally stationary Matérn correlation function that was developed by Stein (2005) and Anderes and Stein (2011) be candidates?

We second the authors’ call for ‘more physics-based spatial modelling’ (Section 5) with enthusiasm. In this context, Balgovind et al. (1983) derived a spatial statistical model for the errors in numerical weather prediction schemes from the physics of large-scale atmospheric flow. Their arguments lead to the stochastic dynamic differential equation (2.7) for the geopotential error field, which is of form (12) in this paper with α=2, where the κ(u) term varies smoothly with geographic latitude. This defines a spatial model on the sphere and so provides an applied instance where the manifold setting is essential, similarly to the global temperature example in the paper. On more general types of manifolds, approaches developed by approximation theorists may prove related, and useful; see, for example, Narcowich (1995).

Perhaps the strongest limitation of the authors’ ingenious approach is the restriction to a small set of feasible values for the Matérn smoothness parameter (Section 2.3, remark (e)). In view of well-known sample path properties of Matérn fields (see Guttorp and Gneiting (2006), and references therein), this creates what could be called a ‘roughness gap’, particularly in inline image, where smoothness parameters ν=1,2,… allow for Gaussian fields with differentiable sample paths only. What value of ν was used in the global temperature example, where the spatial domain is the sphere, and what are the implications?

In a way, the restriction just mentioned is natural, as any probabilistically principled approximation of Gaussian fields by discretely indexed Gaussian Markov random fields can be expected to yield Markov models in the continuum limit, which is indeed what happens, leading to processes with reciprocally polynomial spectral densities (Section 2.3, remark (f)) and the (intrinsic, generalized) de Wijs field (Appendix C.3).

R. Furrer and E. Furrer (University of Zurich) and D. Nychka (National Center for Atmospheric Research, Boulder)

The authors are to be congratulated for this timely paper describing a very useful and practical methodological approach and computational procedure. There are many statisticians who are still working with Gaussian random fields simply because it is ‘virtually’ trivial to embed non-stationarity in the covariance function. With the stochastic partial differential equation approch, non-stationary modelling using Gaussian Markov random fields has become straightforward as well and this will take much wind out of the so-called ‘big n’ problem research.

We would like to point out a link from the presented work with recent developments concerning the asymptotics of the kriging predictor. It is well known that spline smoothing and kriging prediction are related in the sense that kriging prediction can be interpreted similarly to smoothing splines as a penalized minimization problem where the roughness penalty is determined by the covariance function of the underlying spatial process. For both settings the basic idea lies in representing the estimator or predictor as a weighted average of the observations by using a weight function, which is—in contrast with kernel methods—not known in closed form. On the basis of Nychka (1995), we have exploited an approach of how to approximate this weighting function by using its reproducing kernel properties. For the kriging predictor in a stationary setting this so-called equivalent kernel is shown to have a simple form in terms of the Fourier transform of the covariance function of the process. The equivalent kernel can also be characterized as the Green function of a pseudodifferential operator, which is closely related to the one of the paper.

In our research we aim to use the equivalent kernel to analyse the asymptotic behaviour of the kriging predictor in terms of bias and variance (Furrer and Nychka, 2007; Furrer, 2008). For this we need to show that it satisfies the so-called exponential envelope condition, which is accomplished for fractional smoothing parameters ν by using complex analysis (Furrer et al., 2011). It is our hope that the use of reproducing kernel methodology and of techniques from complex analysis, especially in the case of fractional values of α, could also be beneficial to the presented work.

On a second note, we were surprised to see that the range parameter of the fitted climate field was smallest around the equator and increased towards both poles. Although the observational data cannot be directly compared with general circulation model output, the latter usually exhibits the inverse behaviour. This is partially due to larger ocean proportions in the mid-latitudes, compared with the high latitudes in the northern hemisphere and due to sea ice components around the pole.

Paul Fearnhead (Lancaster University)

I congratulate the authors on their principled approach and elegant (yet easily implementable) solution to the important computational bottleneck of inverting covariance matrices that arises in spatial statistics. I have two comments, one relating to experience of constructing their Gaussian Markov random-field (GMRF) approximations in one dimension, and the other to the restrictions on the covariance models for which GMRF approximations can be calculated.

To see how easy it is to construct the GMRF approximation, and how accurate it is, I ran some experiments for approximating one-dimensional Gaussian processes. Firstly, constructing the approximation was relatively simple to implement, and computationally cheap. (I imagine that the main difficulty in higher dimensions will come from needing to construct the triangulation.) In terms of accuracy, results for inline image and three different values of ρ=√(8ν)/κ are shown in Fig. 11.

Figure 11.

 Accuracy of GMRF approximation for inline image (all results are for using 100 basis functions in the GMRF approximation): (a)–(c) variance of the true Gaussian process (inline image), and the GMRF approximation (––––) as a function of time; (d)–(f) correlation of the Gaussian process (inline image) and the GMRF approximation (––––) at different time points with its value at time 0.2; (a), (d) ρ=1; (b), (e) inline image (c), (f) inline image

The covariance structure of the GMRF approximation varied little with the number of basis vectors used, and Fig. 11 shows the result by using 100 basis vectors. The key observation is that the GMRF approximation is excellent except for within a distance ρ of the boundary of the interval considered; this is particularly noticeable for the marginal variance. A question to the authors is how should the GMRF approximation be constructed in practice to avoid these issues of deteriorating accuracy at the boundary? One simple approach would be to construct the GMRF approximation over an interval that extends a distance ρ beyond the data.

Secondly, a comment on the generality of the method: Stein (1999) argued for the use of the Matérn covariance model on the basis of its flexibility. The argument is based on the ability to vary ν to allow for different degrees of smoothness in the underlying spatial model. So, is the restriction of the authors’ approach to integer values of ν+d/2 a practically important restriction on the flexibility of models that can be fitted to the data?

One way to extend the class of covariance models that can be used is to consider a spatial process X(u), defined as


where Y(u) and Z(u) are independent Gaussian fields with different Matérn covariance functions (but each restricted to integer values of ν+d/2). It is possible to analyse such models by using Markov chain Monte Carlo methods (or even integrated nested Laplace approximations), in such a way that you need only to invert the covariance matrices of Y(u) or Z(u), but not X(u). Hence the GMRF approximations could be used to do the needed calculations efficiently.

Peter Challenor, Yiannis Andrianakis and Gemma Stephenson (National Oceanography Centre, Southampton)

We congratulate the authors on what we believe will be a significant paper. Our own application of Gaussian fields is in the statistical analysis of computer experiments (see for example Kennedy and O'Hagan (2001)). Large computer simulators are used in many areas of science. In essence we follow the following procedure.

  • (a) We carry out a designed experiment, running the computer simulator as few times as possible to span its input space.
  • (b) What is known as an emulator is built by fitting a Gaussian field to the results of this experiment, linking the simulator inputs to its outputs.
  • (c) A second experiment is often used for diagnostic purposes to check that we have a good emulator.
  • (d)The emulator is used to make inferences about the simulator.

For further details see Although the methods used are very similar to the analysis of spatial data we work in a much larger number of dimensions. We have one dimension for each uncertain parameter in the computer code being analysed. Challenor et al. (2010) examined a climate simulator with 16 uncertain input parameters. However, because runs of the computer simulator are often expensive we have a limited number of data points from a designed experiment: usually about 10 per dimension (Loeppky et al., 2009). This means that we have a different big n problem. Unlike the geostatistics problem we have a relatively small number of data points but a large dimension—a big d problem rather than a big n. The framework presented here is very attractive because of the ease of including, for example, non-stationarity through the stochastic partial differential equation formulation, it can handle data on manifolds and the theory is appropriate for Rd. We are not sure that the implementation in many dimensions will be as simple as for R1 or R2. The triangulation will be complex and since we often work in a sequential way modifying the emulator as additional experiments are carried out. Since our experiments are designed we might be able to anticipate the positioning of our future data points when we build the triangulation and make it part of the design process. One possibility we are considering is whether a more complex choice of basis function than piecewise linear would allow a simpler triangulation and, at the expense of computational complexity, gain in the set-up costs.

Jesper Møller (Aalborg University)

This important and impressive paper by Lindgren, Rue and Lindström provides a computationally feasible approach for large spatial data sets analysed by a hierarchical Bayesian model. It involves a latent Gaussian field, a parameter θ of low dimension, Gaussian Markov random-field approximations to stochastic partial differential equations for covariance functions, and the integrated nested Laplace approximations package for the computations instead of time-consuming Markov chain Monte Carlo methods. Other recent papers by the authors (Simpson et al., 2010; Bolin and Lindgren, 2011a) compare the approach in this paper with kernel convolution methods (process convolution approaches) and covariance tapering methods, and conclude that the Gaussian Markov random-field approximation to stochastic partial differential equations is superior.

The Matérn covariance function plays a key role, where for example in the planar case the authors assume that the shape parameter ν is a non-negative integer when considering the stochastic partial differential equation. Is there some link to the fact that this stationary planar covariance function is proportional to the mixture density of a zero-mean radially symmetric bivariate normal distribution N2(0,WI2) with the variance W following a (ν+1)-times convolution of an exponential distribution?

Despite its popularity and flexibility for modelling different degrees of smoothness, is this three-parameter Matérn covariance function really sufficiently flexible for modelling large spatial data sets? Would a flexible non-parametric Bayesian approach be more appropriate for ‘huge’ spatial data sets, although this of course may be computationally slow? The dimension of θ may then be expected to be so high that integrated nested Laplace approximations (Rue et al., 2009) would not work; as the dimension of θ may even be varying, a reversible jump Markov chain Monte Carlo method (Green, 1995) may be needed when updating θ from its full conditional. When updating the Gaussian field from its full conditional (corresponding to a finite set of locations), a Metropolis–Hastings algorithm may apply (Roberts and Tweedie, 1996; Møller and Waagepetersen, 2004).

The authors do not discuss model checking. The integrated nested Laplace approximation provides quick estimates of the marginal posterior distributions of the Gaussian field and of θ. For model checking based on the joint posterior distribution, e.g. when comparing the data with simulations from the posterior predictive distribution, I presume that Markov chain Monte Carlo algorithms still are needed.

Finally, using a triangulation for a finite element representation of a Gaussian field is an appealing idea. For a spatial point pattern modelled by a log-Gaussian Cox process (Møller et al., 1998), I expect that a regular triangulation would be used, since both the point pattern and the ‘empty space’ provide important information.

Xiangping Hu and Daniel Simpson (Norwegian University of Science and Technology, Trondheim)

We congratulate the authors on their excellent contribution to practical spatial statistics. We are particularly excited about the local specification of these random fields, which is markedly different from the constructs that are used for traditional spatial models. This stochastic partial differential equation specification is particularly useful in that it aviods any considerations of positive definiteness—if a solution exists it is automatically a Gaussian random field with a valid covariance function. Following from the comment in Section 3.3 of the paper, we have been investigating the extension of these methods to multivariate Gaussian random fields.

The construction of valid eross-covariance functions for multivariate Gaussian random fields is a very difficult problem and thus far only very specialized methods exist (see Gelfand and Banerjee (2010), and sources cited therein). The stochastic partial differential equation specification, however, automatically constructs valid cross-covariance functions! Inspired by the multivariate Matérn fields constructed by Gneiting et al. (2010), we define our multivariate random field x(s)=(x1(s),x2(s))T as the solution to


where inline image and fi(s) are independent (but not necessarily identical) noise processes. If we choose our exponents as αij=0,2,4,… and take the noise processes fi(s) to be Markov, then following the procedure outlined in Section 2.3 we arrive at a bivariate Markov random field. A sample from a negatively correlated bivariate random field is given in Fig. 12.

Figure 12.

 Bivariate Matérn random field with negative correlation

We conclude this comment by noting that all the extensions mentioned in Section 3 of this excellent paper can be applied to this situation, In particular, we can construct non-stationary, spatiotemporal Gaussian random fields over fairly arbitrary manifolds. We plan to expand on this in future work.

David Bolin (Lund University)

The methods presented in the paper are indeed useful in a wide range of applications; however, if the Gaussianity assumption cannot be justified one cannot apply the methodology directly. Although the authors look only at Gaussian applications, the extension to certain non-Gaussian models is fairly straightforward.

A non-Gaussian model can be obtained by changing the Gaussian white noise to some other non-Gaussian process, i.e. define X as the solution to


for some non-Gaussian process Z. What then differs in the method is the calculation of the elements on the right-hand side of the weak formulation of the stochastic partial differential equation, i.e. the integrals of the basis functions ϕ with respect to the noise,


An interesting class of non-Gaussian fields are the generalized asymmetric Laplace fields (Åberg et al., 2009). If Z is of this type, integral (41) can be expressed as a Gaussian variable with mean γΩϕ(s) ds+μΩϕ(s) Γ(ds) and variance σ2Ωϕ2(s) Γ(ds), where Γ is a gamma process. Thus, the weak solution to equation (40) can be expressed as a GMRF conditionally on the integrals with respect to Γ. The solution to equation (40) can be viewed as a Laplace moving average model (Åberg et al., 2009; Åberg and Podgórski, 2010) with a symmetric Matérn kernel f. The covariance function for X is Matérn and the marginal distribution for X(s) is given by the characteristic function


In Fig. 13, a simulation of the process on inline image is shown when μ=γ=σ=κ=1. In Fig. 13 we can also observe close agreement between the empirical covariance function based on samples from 1000 simulations and the true Matérn covariance, as well as close similarity between the empirical density and the true density calculated by using numerical Fourier inversion of equation (42). This indicates that the method works for this non-Gaussian case.

Figure 13.

 Simulation results using the SPDE formulation of a Laplace moving average model: (a) histogram of the samples from 1000 simulations together with the true density; (b) empirical covariance function for X(50) together with the true Matérn covariance; (c) simulation of the process on inline image

Besides providing a computationally efficient method for simulation, an advantage with the stochastic partial differential equation formulation is that it simplifies parameter estimation, which usually must be done using the method of moments for these models (Wegener, 2010). However, the stochastic partial differential equation formulation facilitates estimation in a likelihood framework using the EM algorithm.

Laplace processes are a special case of the Lévy processes of type G (see for example Wiktorsson (2002)), and the method extends to this larger class and possibly other non-Gaussian models as well. This is currently being investigated together with the parameter estimation problem for these models.

The following contributions were received in writing after the meeting.

Patrick E. Brown (Cancer Care Ontario, Toronto, and University of Toronto)

This is a very interesting paper which is certain to enable a wide range of spatial analyses which were previously intractable. Particular credit is due to the authors for making their excellent software available.

The interpretation of the Markov random-field approximation on the regular square grid could be slightly modified to allow for its use with irregularly spaced data. Rather than evaluating the surface X(s) only at a set of grid points sij, could we not use the xij to approximate the continuous X(s) by a surface which is piecewise constant within grid cells centred at sij? This would be a special case of the method used for irregular data, with a regular square grid in place of irregular triangles and constant basis functions within cells in place of the linear basis functions. An important difference between the piecewise constant grid approximation and that used in the paper is that observations need not lie on the vertices of the grid.

Fig. 14 shows the regular grid, piecewise constant approximation fit to the leukaemia data by using the Rinla software. Grid cells are l/100th the size of the y-dimension of the study region, and the grid is extended by 20 cells beyond the study region in each direction to avoid edge effects. The posterior means and standard deviations are nearly identical to those shown in Fig. 3 of the paper. The disadvantage of this analysis is that it appears to be more computationally demanding (taking roughly 10 min on an eight-core computer), though it is likely to scale quite well as increasing the number of observations would not increase the number of vertices on the lattice.

Figure 14.

 Results of a GMRF approximation analysis of the leukaemia data, using a regular square grid assuming a piecewise constant random effect within grid cells: (a) posterior mean; (b) posterior standard deviation

Can the authors offer advice on the choice of grid? How would we compare the quality of the approximation for a fine regular grid with piecewise constant bases to a much coarser triangular grid with linear basis functions? There is certainly a limit to roughness of a surface approximated by a coarse grid, and presumably a limit to how smooth a Markov surface on an extremely fine grid can be.

Michela Cameletti (University of Bergamo) and Sara Martino (Norwegian University of Science and Technology, Trondheim)

We congratulate the authors for this excellent paper that defines a link between Gaussian fields with Matérn covariance function and Gaussian Markov random fields. From our point of view, the stochastic partial differential equation (SPDE) approach, combined with the integrated nested Laplace approximations algorithm proposed by Rue et al. (2009), introduces a new modelling strategy that is particularly useful for spatiotemporal geostatistical processes. The key point is that the spatiotemporal covariance function and the dense covariance matrix of a Gaussian field are substituted respectively by a neighbourhood structure and a sparse precision matrix, that together define a Gaussian Markov random field. In particular, the good computational properties of Gaussian Markov random fields and the computationally effective approximations of the integrated nested Laplace approximations algorithm make it possible to overcome the so-called ‘big n problem’. This issue refers to the infeasibility of linear algebra operations involving dense matrices and arises in many environmental fields where large spatiotemporal data sets are available.

The authors mention in Section 3.5 the possibility of extending the SPDE approach to non-separable spatiotemporal models. In this regard, we wonder whether it is possible to use the SPDE approach for approximating a spatiotemporal Gaussian field with a non-separable covariance function belonging to the general class defined by Gneiting (2002). As described in Cameletti et al. (2011) for air quality data, models characterized by these non-separable covariance functions are extremely computationally expensive because they involve matrices whose dimension is given by the number of data in space and time. Thus, parameter estimation and spatial prediction become infeasible by using Markov chain Monte Carlo methods. Moreover, such non-separable covariance functions are defined by a large number of parameters and the convergence when using Markov chain Monte Carlo methods can be an issue. Thus, if the SPDE approach could deal with the general class of non-separable covariance functions given in Gneiting (2002), it would be an important result for spatiotemporal geostatistical modelling.

Daniel Cooley and Jennifer A. Hoeting (Colorado State University, Fort Collins)

We congratulate the authors for establishing this important link between Gaussian fields (GFs) and Gaussian Markov random fields (GMRFs). To a large extent, GFs provide the foundation for geostatistics. Even when one restricts assumptions to only a mean and covariance function, a GF is not far removed from any geostatistical analysis since there is always a GF with these first- and second-order properties. For kriging the best linear unbiased predictor and the conditional expectation of a GF correspond and, if performing maximum likelihood estimation, the GF assumption is explicit.

Rather than dealing with point-located geostatistical data, GMRFs have their origin in modelling areal or lattice data. In comparison with geostatistical methods, statistical practice for areal data has seemed ad hoc. Whether assessing auto-correlation (e.g. Geary's C or Moran's I) or constructing models (GMRFs or other auto-regressive models), areal data methods have relied on an adjacency matrix constructed from researchers’a priori assumptions of spatial dependence. Modelling a GMRF often includes justifying a dependence structure described by a small number of parameters and constructing an adjacency matrix. The process of constructing a dependence structure from an adjacency matrix is typically heuristic at best and particularly difficult for irregular lattices. However, the computational advantages of GMRFs and other auto-regressive models usually outweighed the disadvantages. The authors’ link between GFs and GMRFs allows for the construction of a meaningful dependence structure in a GMRF setting for both regular and irregular lattices, allows for the use of GMRFs on point-located data and enables fast computation that has always been the advantage of GMRFs over GFs.

That the GF–GMRF link is made through a Matérn covariance function is important as the Matérn function is often defended as the most theoretically justifiable. In geostatistical practice, estimating the Matérn function's smoothness parameter is challenging. Often the smoothness parameter ν is fixed according to an a priori belief of the smoothness of process realizations. One can view the choice of template for a regular grid given in Section 2.2 as equivalent to setting the smoothness parameter at the outset, as is common practice.

A challenge in spatial statistics is that complex models require large sample sizes to estimate model parameters reliably (Irvine et al., 2007), but many modelling procedures are too computationally complex for large sample sizes. The link between GFs and GMRFs will allow more researchers to investigate important statistical issues like model selection, precision of parameter estimates (especially spatial covariance parameters), sampling designs and more.

Rosa M. Crujeiras and Andrés Prieto (University of Santiago de Compostela)

We thank the authors for such an interesting paper, which forms a bridge between Gaussian fields and Gaussian Markov random fields by means of stochastic partial differential equations (SPDEs). The recent developments in numerical methods for PDEs may play an important role, by suitable modifications in the stochastic context. Our comments are focused on the procedure for solving the SPDE (2) by means of finite element methods (FEMs), concerning the mesh construction, the FEM approximation and the boundary conditions imposed.

The construction of a triangulated lattice to approximate the Matérn field is required for constructing the FEM approximation. The authors point out that the vertices are placed on sample locations and a heuristic refinement is used to minimize the total number of triangles satisfying the mesh constraints. Such a generation is explicitly determined by the geometric requirements of sample locations and the geometry of the observation region. However, the mesh design could be driven additionally by the Matérn field itself, by means of an iterative procedure, which adapts the mesh to the behaviour of the SPDE solution minimizing an a posteriori error estimate. This is a well-known procedure in the engineering literature called h-adaptivity (see Ainsworth and Oden (2000) for further details).

Also in the FEM approximation, low order (piecewise linear) elements are used to approximate the Matérn field in the computational domain. These elements provide satisfactory approximations in the stationary case, but other choices are possible. For instance, discontinuous Galerkin methods (Hesthaven and Warburton, 2008), Petrov–Galerkin methods or finite volume methods (see LeVeque (2002)) are possible alternatives. In fact, finite volume methods may be more suitable in the non-separable space–time models, with a dominant transport term.

As the authors remarked, there is a boundary effect in the covariance approximation due to boundary conditions. In this approach, Neumann conditions are imposed on the boundary of the computational domain when solving the SPDE. Rue and Held (2005) showed that the effect of boundary conditions is negligible if the length of the computational domain is sufficiently large compared with the effective range of the covariance, which enables us to capture the variability of the process. Nevertheless, this embedding drawback due to the non-exact homogeneous Neumann boundary conditions (which are not satisfied by the Matérn class) could be overcome by settling other boundary values in the second-order elliptic problem: non-local DtN operators (Givoli, 1999), absorbing boundary conditions (Keller and Givoli, 1989) or perfectly matched layers (see Bermúdez et al. (2010)).

Marco A. R. Ferreira (University of Missouri, Columbia)

I congratulate Dr Lindgren and his colleagues for their valuable contribution to the area of spatial statistics modelling and computation. In addition, I commend the authors for making their methodology publicly available in the Rinla package.

Lindgren and colleagues build on a stochastic partial differential equation based explicit link between Gaussian fields and Gaussian Markov random fields to develop fast computations for Gaussian fields with Matérn covariance functions. In addition, the stochastic partial differential equation may be thought of as a generator process that produces Matérn Gaussian fields. Within this stochastic partial differential equation framework, Matérn fields on manifolds and non-stationary fields arise naturally.

However, in this work the authors consider the smoothness parameter ν fixed and taking values only on the set of positive integers. This contrasts with one of the advantages of the use of the Matérn class of covariance functions: the smoothness parameter ν may be estimated from the data. Thus, the use of the Matérn class allows the degree of mean-square differentiability of the process to be data adaptive. In the current work, this particular advantage of the Matérn class is lost in favour of fast computation.

Finally, I am curious about what were the priors used by the authors for the hyperparameters. Even though the data sets considered are fairly large, some strange things may happen when one analyses spatial data. For example, improper uniform priors may lead to improper posterior distributions (Berger et al., 2001).

Geir-Arne Fuglstad (Norwegian University of Science and Technology, Trondheim)

I congratulate the authors for their work on bringing together modelling with stochastic partial differential equations (SPDEs) and computations with efficient (sparse) Gaussian Markov random fields. This is an elegant way of creating (space–time) consistent models, which take advantage of the computational benefits of sparse precision matrices. I wish to comment on the brief statement at the end of Section 3.5 about how this method can be extended to a non-separable space–time SPDE. I used an approach that is similar to the one described there, but with a different spatial discretization. The steps that were used are described briefly in what follows.

Consider the non-separable space–time SPDE


where inline image is standardized Gaussian white noise and A,T>0 are constants. The spatial discretization is done by a finite volume method, and the Gaussian variable associated with each cell in the grid is allowed to be a Gaussian process in time. This reduces the space–time SPDE to a linear system of SDEs in time,


where U is the vector of Gaussian processes associated with the cells, C is a sparse matrix relating the derivative to the value at the cell itself and neighbouring cells and D is the vector of Gaussian white noise processes associated with each of the cells.

This system is then approximated with the backward Euler method. The use of the backward Euler method gives a Markov property in time, as each time step is only conditionally dependent on the closest time steps, thus still giving the desired speed-up compared with a dense precision matrix.

The results of this procedure proved quite good and the finite volume method extends to transport terms in a natural way. One simple example of using this procedure is the case with von Neumann boundary conditions at the spatial boundaries and a fixed starting state. Fig. 15 shows one simulation from such a situation with the starting condition

Figure 15.

 One realization of the SPDE (43), with A=5, T=10, von Neumann boundary conditions ∂u/∂x=0 on the spatial boundaries and the starting condition given in equation (44)

Hence the approach briefly indicated in Section 3.5 does indeed seem to be a viable approach for space–time SPDEs.

Andrew Gelman (Columbia University, New York)

When using Bayesian inference (or, more generally, structured probability modelling) to obtain precise inferences from data, we have a corresponding responsibility to check the fit and the range of applicability of our models. Using expected values and simulated replicated data, we can graphically explore ways in which the model does not fit the data and broadly characterize particular inferences as interpolation or extrapolation. I am wondering whether the authors have considered using their powerful computational machinery to understand and explore the fit of their models. I think that graphical exploration and model checking could greatly increase the utility of these models in applications.

Peter Guttorp (University of Washington, Seattle, and Norwegian Computing Center, Oslo) andBarnali Das (Westat, Rockville)

This is a very important paper in spatial statistics. The late Julian Besag used to argue that the Markov random-field approach was better than the geostatistical approach, both from the computational and the conceptual point of view. The difficulty was always the requirement to have observations on a regular grid. Le and Zidek (2006) solved this by creating a grid containing the observations, and treating most of the grid points as missing, but then only a small fraction of the data are non-missing. The approach by Rue and his colleagues adapts the tessellation to the data and, in our opinion, proves that Julian once again was right.

The current approaches to estimating global mean temperature suffer from several difficulties. They are not truly global in character, they do not take into account the non-stationarity of the global temperature field and the covariance structure is local, not global. One of us (Das, 2000) developed an approach to estimating non-stationary processes on the globe. The idea was based on the deformation approach (Sampson, 2010), by mapping the globe to a sphere, on which isotropic covariance functions are fully characterized (Obukhov, 1947). The implementation consisted of alternating transformations of latitude and longitude. Fig. 16 shows the resulting deformation, which expands the southern hemisphere (indicating smaller correlation length) and compresses the northern hemisphere.

Figure 16.

 Deformation of the globe (from Das (2000))

Owing to the computational complexity of this fitting procedure, only 724 stations in the Global Historical Climate Network version 2 data set with a record of at least 40 years were used (115 randomly selected stations were reserved for model evaluation). Fig. 17 shows the resulting correlation estimates around three different points on the globe, showing a clear indication of inhomogeneity. The computational complexity of this fitting, not to mention spatial estimation of the global temperature field, has been reduced by several orders of magnitude in the work by Rue and his colleagues.

Figure 17.

 Estimates of non-stationary spatial correlation around three points on the globe (Australia, West Africa and North America) (from Das (2000))

It appears that the temperature analysis by Rue and his colleagues has reasonable results; for example, the coefficient for regression on altitude is similar to what has been seen in the literature. We do wonder whether the estimates of the uncertainty of the various published global temperature series are similar to those computed by the new approach. Also, to what extent would inclusion of ocean temperature data change the estimates and their uncertainties?

Ben Haaland (Duke–National University of Singapore Graduate Medical School) andDavid J. Nott (National University of Singapore)

As the authors emphasize perhaps the greatest value of this paper lies in the non-stationary and other extensions their results provide. One potential application area is Gaussian field (GF) emulation of computer models. Modelling output of computer models as GFs is useful for constructing emulators, which are computationally cheap mimics of computer models providing appropriate descriptions of uncertainty (Santner et al., 2003). The computer model involves inputs λ, and we have run the model at designed values λi, i=1,…,n say. There has been recent interest in dynamic emulation where the computer model output is a time series. Conti and O'Hagan (2010) discussed dynamic emulation with GFs. One approach treats time as an additional input, but for long time series this treats a large data set as an observation of a GF and we are in the realm of the ‘big n problem’. Correlations are often highly non-stationary in the input space and time. Although appropriate covariates in the mean sometimes help, currently used models do not seem sufficiently flexible. The methods provided have great potential, but we have some questions. First, the authors focus on the two-dimensional case and we wonder whether computational benefits decrease as the dimensionality increases. Secondly, we wonder in higher dimensions whether the boundary effects are more problematic.

Whereas the explicit representation discussed by the authors concerns positive integer α, the representation has broad applicability. Draws from a GF with Matérn covariance are functions in a Sobolev space Hα(Ω) (Wendland, 2005). For integer α, these functions have α square integrable derivatives. The value of α is typically selected so that the function estimate has sensible properties and fractional values are not ordinarily chosen. These function spaces are nested decreasing, Hα(Ω) ⊃ Hα+ɛ(Ω) for ɛ>0. Hence, a function with fractional smoothness α>1 can be approximated by a function in Hα(Ω) and the convergence results hold. The infinitely smooth function space corresponding to the commonly used Gaussian covariance is contained in all Sobolev spaces. So, a Markov representation of a Matérn covariance with large smoothness could be used to approximate a Gaussian covariance. A balance must be found between better computational properties of less smoothness and better approximation properties of more smoothness. However, we wonder to what extent the convergence (theorems 3 and 4) and sparsity (Appendix C.5) results depend on using the typical finite element basis.

John Haslett, Chaitanya Joshi and Vincent Garreta (Trinity College Dublin)

We congratulate the authors on a most stimulating paper, and we see applications everywhere.

One application that concerns us at the moment is a multivariate space–time process in which the temporal changes can be much longer tailed than Gaussian. The simplest ease is a univariate process at a single point in space. We have found that the normal inverse Gaussian distribution provides a flexible basis for temporal increments (see, for example, Barndorff-Nielsen and Halgreen (1977)). This model and its extensions are now widely used in finance (see, for example, Kumar et al. (2011)). Its multivariate extension (Wu et al., 2009)) provides a basis for the spatiotemporal version. The multivariate normal inverse Gaussian distribution can be motivated as a scale mixture of Gaussian distributions.

Impressed by the link between Gaussian fields and the Gaussian Markov random fields presented in this paper, we wonder whether it may be possible to extend the existing methodology to include non-Gaussian fields. If so, we suspect that one approach may be via subordination, which is one way of changing the distributional properties of a stochastic process.

Let X(t) be a random process, and let T(t) be a monotone, non-decreasing Lévy process; then the process X{T(t)} is said to be subordinated to the process X(t) and T(t) is called the directing process.

Subordination is an elegant way of producing both subdiffusive and superdiffusive motions from regular diffusive motions such as random walks and Brownian motion; see for example Sokolov (2000) and Eliazar and Klafter (2004). In particular, temporal subordination has been widely used to model systems whose subjective ‘operational time’ is different from the objective ‘physical time’, such as those often found in statistical physics (see for example Eliazar and Klafter (2005)) and econometrics (see for example Clark (l973) and Carr et al. (2003)). In particular, subordination has been shown to result in processes which are more leptokurtic (Clark, l973).

We therefore wonder whether, by using a (spatially) subordinated Brownian motion W{U(u)}, it might be possible to modify the stochastic partial differential equation in equation (3)


where the directing process U(u) is any suitable monotone, non-decreasing Lévy process, so that a (non-Gaussian) weak solution to this equation exists.

We would like to know the author's views regarding this approach or any other approach leading to non-Gaussian fields.

Michael Höhle (Robert Koch Institute, Berlin)

The work presents innovative numerical solutions for geostatistical modelling. Despite the complexity of its mathematical detail, an open source implementation including an R interface is provided. This encourages potential users, including those without advanced mathematical knowledge, to apply the methods in practice. I congratulate the authors on their work regarding both the theoretical and the applied dimension.

As a comment, I miss explicit mention of how the methodology proposed can be used for inference in spatial point processes—specifically log-Gaussian Cox processes where Gaussian fields play an important role. In Rue et al. (2009), a proposal is given on how to use a Gaussian Markov random-field approximation in this case. It would therefore be of interest to see a comparison between this approximation and the current proposal. Furthermore, the leukaemia application in the paper provides the important link of using latent Gaussian fields in spatial regression modelling: it bridges between the joint posterior of the latent field and the hyperparameters as given by Rue et al. (2009), page 32l. With specific application of Gaussian fields in spatiotemporal point process modelling in mind, how well does the integrated nested Laplace approximations approach allow for a likelihood where π(y|x,θ) is not simply a product of the individual observations anymore? Of particular interest would be whether the approach proposed is usable for adding Gaussian field random effects to regression models of the conditional intensity function, e.g. as in the stochastic epidemic model of Höhle (2009).

L. Ippoliti and R. J. Martin (University G. d'Annunzio Chieti–Pescara) and R. J. Bhansali (University of Liverpool)

We congratulate the authors for developing a power Gaussian Markov random-field approximation to the correlation function of the Matérn class of Gaussian random fields, extending Besag's (1981) approximation using K0 to the completely symmetric conditional auto-regressive CAR(1) process, and Whittle's (1954) approximation using K1 to the completely symmetric simultaneous auto-regressive SAR(1) process.

We have two questions for the authors.

For large distances, both correlations (Matérn and approximating power CAR) are small, and the difference is small. From Abramowitz and Stegun (1965), equation (9.7.2),


Do the authors have any results on behaviour of the power CAR correlations for large distances?

The other question is on the interpolation. If the number of points to be interpolated is small the Gaussian Markov random-field specification has several advantages: compared with kriging, an inverse covariance-based predictor should be much quicker and more accurate, and avoid the common ill-conditioning problems of Σ. However, this assumes that the precision matrix of the Matérn function is well approximated by that of the power CAR with power ν+1. Consider the Matérn function with ν=2. Using a 512×512 torus lattice Tables 1 and 2 show some low lag inverse correlations for ranges 30 and 100. The inverse correlations (the same values for both ranges to three decimal places) of the corresponding power CAR are given in Table 3.

Table 1.   Matérn inverse correlations—range = 30
LagCorrelations for the following lags:
Table 2.   Matérn inverse correlations—range=100
LagCorrelations for the following lags:
Table 3.   Inverse correlations of the completely symmetric power CAR
LagInverse correlations for the following lags:

Despite the good fit for the correlation function, the power CAR is not a good approximation to the inverse correlations within the CAR neighbourhood, and also the Matérn values can be appreciably different from 0 outside the neighbourhood. More importantly, the interpolation variance can be appreciably larger for the approximating power CAR—Table 4. Do the authors have any views on how much accuracy it is reasonable to sacrifice to obtain faster predictions?

Table 4.   Interpolation variance for Matérn and completely symmetric power CAR for two different values of ν and ranges
νRangeInterpolation variance, MatérnInterpolation variance, CAR

Note that the Gaussian correlation structure, the limit as ν→∞ of the Matérn function, has an exact explicit form on the rectangular planar lattice for the inverse variance matrix, or for the finite CAR representation, which can be obtained from the results in Martin (2000).

Finally, some results comparing directly specified CAR models with integrated continuous processes over irregular areas are given in Martin and Dwyer (1994).

Venkata K. Jandhyala and Stergios B. Fotopoulos (Washington State University, Pullman)

We congratulate the authors for a timely and inspiring paper on a topic that is at the forefront of contemporary statistical modelling and computational statistics. The advantages of working with Gaussian Markov random fields have been well exploited for fitting hierarchical Bayesian models for spatial data. However, as pointed out by the authors, one often wishes to model data drawn from Gaussian fields (GFs). In the absence of the Markovian property, GFs are dense and are not easily amenable for computations. In this regard the authors have come up with a breakthrough result where they have demonstrated (with the necessary technical details) that any GF with a Matérn covariance function can be suitably replaced by an equivalent Gaussian Markov random field whose precision matrix is sparse. The depth of their result becomes evident by the general nature of spatial domains to which the result has been extended in the paper, e.g. to irregular lattices and manifolds. Thus, the authors have done a remarkable job.

We believe that the authors’ results can be useful to address the change-point problem for spatiotemporal GFs, which is a problem that has been well formulated recently by Majumdar et al. (2005). Estimating the unknown change-point for retrospective data has been approached under both maximum likelihood estimation and Bayesian methods. Recent simulation studies performed by Fotopoulos et al. (2010) and Jandhyala et al. (2011) for estimating the change-point when changes occur in the mean vector alone or in the mean and/or covariance matrix of a multivariate Gaussian series have shown that the maximum likelihood estimate (MLE) performs marginally better than the Bayesian estimate, in most cases. Asymptotic distribution of the change-point MLE developed in both Fotopoulos et al. (2010) and Jandhyala et al. (2011) appear to be extendable to spatiotemporal Gaussian Markov random fields when changes occur in the elements of the precision matrix at some unknown time point. However, taking advantage of the main contribution of the present paper, it seems that asymptotic distribution of change-point MLEs can be extended to GFs with Matérn covariance functions. Thus, the results derived by the authors not only are useful for Bayesian hierarchical model fitting but also seem to apply quite well to asymptotic distribution of the MLE in a change-point setting. One of the limitations of the change-point MLE in the treatment of Fotopoulos et al. (2010) and Jandhyala et al. (2011) is that its asymptotics require the assumption of independence over time. It seems that such an assumption about the MLE should extend to spatiotemporal GFs also, which may not be satisfactory, and this is where the Bayesian approach to change-point structure can be advantageous.

Mikyoung Jun (Texas A&M University, College Station)

It is my honour to congratulate the authors on this excellent paper. Gaussian random-field models with a Matérn covariance function are popular, useful and commonly used in geostatistical applications, but the computation is problematic with large data sets. This paper provides a computationally efficient way for such a model through the Gaussian Markov random-field framework.

I have a few comments to add. My first comment is regarding the smoothness parameter ν. As the authors acknowledge in Section 2.3, the current link between a Gaussian random field with Matérn covariance function and the Gaussian Markov random-field model is only possible for integer values of α (and thus integer increments of ν), which, in my view, could be a little limited. Given that we cannot distinguish well ν-values larger than 3 or so, this gives only two or three possible values of ν for the application. I wonder how this might affect the model performance, especially parameter estimation and spatial prediction.

Regarding the above point, how is the estimation of the parameter ν done? Is it fixed, or do you fit possible values separately and compare the fit? In Section 4.3, the authors mention that spatial covariance parameters are difficult to interpret individually (I am not sure whether I completely agree with this) and do not provide detail on how each parameter is estimated and what the estimates look like. I am also wondering, if you misspecify ν-values, how this might affect the estimation of the rest of the parameters, and the spatial prediction.

I am particularly interested in the development for the processes on a sphere and non-stationary (and non-isotropic) models for the processes on a sphere. One of the limitations of the covariance models in Jun and Stein (2007, 2008) is that they are singular at the poles without some constraints. It appears that the approach suggested here may not suffer from this limitation. Is any constraint required for the model in this paper, to guarantee this? Do you need to have vertices at the poles or is any constraint required on the triangularization near the poles?

My last comment is that the third issue in Section 5 seems to me very interesting. Incorporating external covariates such as wind speed or wind direction in the covariance structure should be useful for environmental applications. However, as the authors point out, it may be difficult to do it in a way that the resulting covariance model is guaranteed to be valid. The method proposed may naturally provide ways to develop such physics-based covariance models.

Håvard Wahl Kongsgård (Peace Research Institute, Oslo)

I congratulate Lindgren, Rue and Lindström for an excellent paper and for the computational implementation. In conflict studies, microeconomics and related fields, spatial inquiries have become increasingly popular. For these fields, large data sets with multiple variables are often favoured.

However, most scholars nevertheless refrain from utilizing spatial regression as methods are often poorly computationally implemented and computer intensive. Consequently, spatial independence remains a major concern.

With great effect I recently applied the methods implemented in Rinla on a series of large point-based conflict data sets from Vietnam, Afghanistan and lraq (Kongsgaard, 2011). When combined with spatial–temporal visualization, this type of analysis can be of value for tactical risk assessment, in a military or humanitarian capacity. Lindgren, Rue and Lindström's work is especially interesting as the new approaches make it possible to adjust for relative spatial distance.

Although the method introduced is an approximation with limitations, it is easy to use and can be applied in a streamlined fashion. Given rugged low level computational implementation and support for large sample sizes, the new functions will be a good supplement to the Rinla library. It is my opinion that these methods have great potential within fields of application.

Giovanna Jona Lasinio (‘‘Sapienza’’ University of Rome) andAlessio Pollice (‘‘Aldo Moro’’ University of Bari)

First, we congratulate the authors for their extremely interesting work that sheds new light on Gaussian Markov random-field (GMRF) modelling. A connection is established with Matérn Gaussian fields through a weak solution of a linear fractional stochastic partial differential equation (SPDE) driven by a Gaussian white noise process. One of the main theoretical results is stated in theorems 3 and 4, where the authors prove the weak convergence of the finite Hilbert representation of the solution for positive integer values of the parameter α to the continuous solution. Recently Bolin and Lindgren (2011a) have characterized a class of non-Gaussian random fields as the solution to a nested SPDE driven by Gaussian white noise. Could the authors comment on the possible further extension of their results to the case of SPDEs driven by more general non-Gaussian laws as stable or Lévy processes, in view of obtaining tools suitable for rates or concentrations, which do not typically follow a normal distribution? When d=2, the integer α restriction of theorems 3 and 4 implies that also the parameter ν is integer in the Matérn class specification. Within this class, the exponential covariance function, one of the most popular in many applied fields, corresponds to the value ν=0.5. Is the exponential covariance model definitively excluded by the approach proposed? In the discussion the authors recognize that ‘the approach comes with an implementation and preprocessing cost for setting up the models, as it involves the SPDE solution, triangulations and GMRF representation’. We would be interested in some more comments on the possible qualifications of this cost from an applied statistical point of view. A more detailed comparison of the GMRF solution to competing modelling approaches from a predictive perspective would also be of considerable interest. Here we mainly refer to low rank (Banerjee et al., 2008; Cressie and Johannessson, 2008; Eidsvik et al., 2010; Crainiceanu et al., 2008) and process convolution methods (Higdon, 1998; Higdon et al., 1999) when estimation is carried out with the same technique (either integrated nested Laplace approximation or Markov chain Monte Carlo methods).

Incidentally we think that the geographical interpretation of Fig. 6(b) would benefit from a more detailed description.

Chihoon Lee (Colorado State University, Fort Collins)

First of all, I congratulate the authors on their interesting and stimulating paper. Although this work is clearly an important contribution to the field of spatial statistics, its impact on the field of stochastic partial differential equations (SPDEs) cannot be overlooked as it ties in the probabilistic and mathematical treatment of SPDEs to spatial statistical modelling and geostatistics.

The authors’ Gaussian Markov random-field construction via the SPDE link offers a natural definition of a Matérm field on manifolds with an intuitive interpretation. By employing a stochastic finite element approximation to the SPDE, practitioners can easily obtain finer resolution around any area of interest on the manifold. It seems promising that the authors’ explicit mapping from the parameters of the Gaussian field model to the elements of a Gaussian Markov random-field precision matrix will yield new approaches to parameter estimation, for example, of a scaling parameter κ>0 of the Matérn field, which is of great importance in practice but generally difficult to analyse directly.

More precisely, one can construct an estimator (e.g. from maximum likelihood or the method of moments) inline image based on observations, using a finite element representation of the full solution to the SPDE as presented in expressions (9) and (10). The next step is to verify that inline image indeed converges (in an appropriate sense) to the true parameter κ as the finite Hilbert space, spanned by a finite set of basis functions {ψ1,…,ψn}, approaches the full space. Such parameter estimation procedures circumvent the complexity that stems directly from dealing with the estimation problem of the fractional SPDE (2). Furthermore, this approach will shed light on estimating underlying physical parameters, corresponding to various covariates which could be incorporated into more physics-based SPDEs. For example, in a weather model, meteorological covariates such as wind speed or temperature may be incorporated (e.g. as appropriate drift terms) into underlying SPDEs. This would move beyond the purely statistical treatment of the SPDE (2) by combining theoretical (physical) models and statistical (data-driven) models.

Wayne T. Lee and Cari G. Kaufman (University of California, Berkeley)

We congratulate the authors on an important theoretical achievement with exciting computational implications. It is true that we tend to set n‘a little higher than the value that gives a reasonable computation time’. We expect that ‘reasonable’ will now be redefined according to what is possible by using the authors’ explicit link. For example, we simulated a Matérn-like field of size 750000 (approximately the number of spatial grid boxes for a climate model with 25-km resolution) on a laptop in 8 s. Although a little slow for a long Markov chain Monte Carlo run, it is no longer completely unreasonable!

We would like to address the Gaussian Markov random field as an approximation in the context of likelihood-based estimation for geostatistical models. We hope that the authors can clarify the implications of the treatment of the boundary points in this context. Although many will undoubtably use these models in their non-stationary versions, non-stationarity that is an artefact of the boundary is undesirable and may introduce bias in estimating the geostatistical parameters.

We simulated 100 replications of a mean 0 Gaussian field on a 20×20 unit grid, using a Matérn covariance function with ν=1 and σ2=2, and effective range 10 (κ≈0.283). The large effective range highlights the boundary issue, but we think that high auto-correlation is also not uncommon in geostatistical data. We fixed ν at its true value and considered three possible estimators for σ2 and κ. The first is the maximum likelihood estimator under the original model. The second maximizes an approximate likelihood that replaces the covariance matrix Σ(σ2,κ) by inline image, where Q(k) is constructed by using the formulation from Section 2.2.1, with no special treatment of the boundary points, and inline image is the marginal variance for the stochastic partial differential equation solution. The third treats the boundary points by using the embedding method suggested in Section 5.1.4 of Rue and Held (2005) under which we use the true model for a boundary set of thickness m=2 and the Gaussian Markov random-field approximation for the conditional distribution of the interior set. The drawback is the computational cost of calculating the boundary model.

Fig. 18 shows estimates of σ2κ2ν, the consistently estimable parameter under the fixed domain asymptotics (Zhang, 2004). We observe a clear bias in the embedding estimates, whereas the naive approximation estimates have less bias but have high variability. We have not implemented the authors’ approach using the Neumann boundary conditions.

Figure 18.

 Estimates for σ2κ2ν with effective range of 10 on a 20×20 unit grid with ν=1 and σ2=2 (the x-axis shows the maximum likelihood estimate, whereas the y-axis shows estimators maximizing two approximations to the likelihood): ∇, naive implementation with no boundary correction; •, implementation with embedding; inline image, true parameter value

Bo Li (Purdue University, West Lafayette) and Marc G. Genton (Texas A&M University, College Station)

In this very stimulating paper, the authors created a new path to deal with the computational challenge caused by large spatial data sets. Whereas most of the previous approaches mainly focused on screening out information that is relatively less important to gain computational efficiency (see Sun et al. (2011) for a recent review), this newly proposed method sought an explicit link between some Gaussian fields (GFs) and Gaussian Markov random fields (GMRFs) and thus enabled a direct application of the inherent computational advantage in GMRFs to GFs. The GFs with Matérn covariance structure play a central role in spatial data modelling. Although the GMRF representation is developed only for the GFs with certain values of smoothness, we expect a wide application of this new approach since the smoothness parameter in the Matérn function is nevertheless difficult to estimate precisely. We genuinely appreciate the novelty and practical value of this paper. However, recently emerged data sets are often indexed by locations in both space and time, and many have more than one variable observed. Analyses with those data sets are more challenging owing to the cubic growth of computations in terms of the sample size. North et al. (2011) derived Matérn-like space–time correlations from evolving GFs governed by a white-noise-driven damped diffusion equation arising from simple energy balance climate models on the plane and on the sphere. It appears that those results could be used directly to extend the link between GFs and GMRFs to spatiotemporal data. Further extensions to a multivariate context remain open.

The authors gave an example of modelling non-stationary global temperature GFs and then making inference on the temperature process via GMRFs in conjunction with the integrated nested Laplace approximation. This can be very useful for the palaeoclimate community because one popular approach for large-scale palaeoclimate reconstructions is through Bayesian hierarchical models (Li et al., 2010) where it is crucial to identify an appropriate model for the random process of climate variables. Such a model needs to be sufficiently flexible while still keeping the inference computationally feasible. The explicit link developed in this paper combined with the integrated nested Laplace approximation seems a promising direction. Since the proxy data used for the reconstruction carry various types of noise, a nugget effect may need to be considered. Would the approach be directly applicable if nuggets are included in the covariance model? An ambitious goal in palaeoclimate studies is simultaneously to reconstruct the entire space–time process of the temperature and other climate variables. Therefore, it again requires computational efficiency for spatiotemporal and multivariate data. We look forward to seeing further developments on this topic and in the mean time congratulate the authors for their outstanding work!

Georg Lindgren (Lund University)

I would like to add to the impressive list of applications of the Gaussian Markov random field–stochastic partial differential equation (SPDE) link, namely its use in ocean wave modelling. Traditionally, stochastic wave models have been based on linear Fourier analysis, possibly including low order interactions between the Fourier components. Such models are seen as approximations to the basic hydrodynamical (deterministic) partial differential equations for water waves. These equations have, in themselves, little room for stochastic forces.

One of the common spectra used in ocean modelling is the Pierson–Moskowitz spectrum


The SPDE approach, developed in the paper presented, offers a promising link between the hydrodynamic and Fourier view on random ocean waves. It has recently been shown by David Bolin and Finn Lindgren (see Bolin and Lindgren (2011b) for the general theory and Bolin (2009) and Lindgren et al. (2010) for the wave application) that the solutions to a nested SPDE,


has a spectral density


with g equal to Earth's acceleration. The vector B=(b1,b2)T determines the main direction θ0 of the directional spectrum. For large s, this is close to the Pierson–Moskowitz wave spectrum and, thus, the SPDE approach could turn out to be a flexible alternative to the Fourier approach.

K. V. Mardia (University of Leeds)

I found the paper very timely and stimulating. The problem of dealing with large spatial data has a long history and the authors have given a comprehensive way forward. Mardia (2007) has given a historical background to the maximum likelihood methods for spatial data and pointed out that it seems there are still two main communities—one mining practitioners and the other mainstream statisticians.

The Matérn covariance function has now been accepted by statisticians as a reasonable covariance function, and rightly, in general, the smoothing parameter ν is taken as a tuning parameter. But there are some scientific problems where the estimation of ν is the main interest, for example, when the geovariables are sampled intensively as for a computer-scanned image in petrology or in a high resolution image in remote sensing. Another example is of water head–piezometrics level u where the hydraulic gradient map is the main interest (Pardo-Igúzquiza et al., 2008); the deterministic equation which describes the movement of groundwater is a particular case of equation (2).

Pardo-Igúzquiza et al. (2008) have given the background to a computer program for the estimation of all the parameters of the Gaussian random field with Matérn covariance function (κ,ν,σ and drift; geometrical anisotropy). The code is available at and as far as we are aware the geostatistics community has found this software useful. But the program works for n only up to 700. To extend this work for large data sets, in my joint work with Pardo-Igúzquiza, we have used various composite likelihood strategies extending Vecchia (1988) and Pardo-Igúzquiza and Dowd (1997). In particular, we have found unilateral random selection as shown in Fig. 19 useful for lattice data, and the method is extended for irregular data by using random permutation (even a priori for lattice data) where m is the number of the sites used in conditioning. We obtain the value of m through cross-validation but it is normally 5–15 for various large sets (images) we have experimented with. Indeed, the program takes a few minutes for n as large as 250000, the total number of operations being proportional to n. The recent efficiency results given in Mardia et al. (2010) on composite likelihood are also encouraging and relevant. I welcome a comprehensive comparison of the composite likelihood approach and the finite element method that is used in this paper.

Figure 19.

 Strategy for composite likelihood for the criss-cross pixel by using a random selection of three points in a unilateral scheme, m=3

Jorge Mateu (University Jaume I, Castellón)

The authors are to be congratulated on a valuable contribution and a thought-provoking paper. Spatial data are frequently modelled as realizations of random fields. Gaussian fields (GFs) have a dominant role in spatial statistics and form an important building block in modern hierarchical spatial models. A common approach is to model spatial dependences through a covariance function. However, from the computational side, GFs are hampered by the big n problem, as the authors outline. In the way to overcome such a problem, rather than applying covariance tapering, we argue the use of compactly supported covariance functions which considerably reduce the computational burden of kriging, and allow the use of computationally efficient sparse matrix techniques, thus becoming a core aspect in spatial prediction when dealing with massive data sets. In addition, by considering a class of convolution-based flexible models, we can generate classes of non-stationary, compactly supported spatial covariance functions (Mateu et al., 2011).

Rather than assuming that the GF is built through a Matérn covariance function, have the authors thought about considering other covariance radial basis functions with some positive definite kernel? Alternative stochastic partial differential equations to equation (2) could be brought into play. In their main result 2, the authors use a particular finite element method. An alternative could be using a Tikhonov regularization scheme with particular adapted functionals, like those shown in Montegranario and Espinosa (2007). This regularization method has the mathematical advantage of using properties of compact operators in Hilbert spaces (Vapnik (l998), chapter 7).

I would like to draw the authors’ attention to the use of basis functions to construct a finite element representation of the solution to the stochastic partial differential equation. First, they use n (the number of vertices) functions, and this is basically oversmoothing the solution. An objective criterion following a kind of cross-validation approach (as in functional geostatistics) should be considered. Second, the authors do not explicitly explain which appropriate basis functions could be used. They use harmonic and B-splines: why not wavelets or any other smoothing functions? Recall that some of these bases make orthogonality an issue in computation. Functional geostatistics is a field providing huge amounts of data, I would like to ask the authors whether they have ever thought about the possibility of having a GF of functions as in Giraldo et al. (2011). Do the authors have any tangible results on this emerging issue?

Debashis Mondal (University of Chicago)

The first main result of this paper was familiar to me (and to my late colleague Julian Besag) because of our work on the de Wijs process. In particular, note sections 4.2 and 5 of Besag and Mondal (2005) that allude to such results from a frequency domain perspective and from a computational point of view. Consider for example a sequence of Gaussian Markov random fields on sublattices inline image with spacing 1/m,m=1,2,…, which have individual spectral densities of the form


Here ν=1 corresponds to the first-order conditional auto-regressions at increasingly dense sublattices, ν=2 suggests Whittle's simultaneous auto-regressions at increasingly dense sublattices and so on. Now assume, as m→∞, inline image, 4m2(1−4βm)→κ2 and mν−1σmσ/2ν. Then fm(ω,η) converges to


from which it follows that the corresponding continuum Gaussian random fields (generalized if ν=0,1 or if ν>1,κ=0) emerge as scaling limits of the Gaussian Markov random fields on regular lattices. One point I make here is that the above explicitly describes the rescaling of parameters needed, particularly when we want to choose a suitable sublattice (e.g. to embed irregular sampling locations into a grid or to approximate irregular regions by unions of grid cells when observations themselves are aggregates over such regions). This is important for both estimation and inference of continuum random fields from the approximate Gaussian Markov random field, but it did not receive much attention in the current paper. In addition, rescaling of parameters would also be required for the triangulation scheme discussed in the second main result of the paper, and it would be interesting to see the effect of numeric convergences as one considers denser triangulations.

Delaunay triangulation provides a way to place a dependence graph on irregularly sampled observations. Then the idea is to use a Gaussian Markov random field on this triangulation to approximate the continuum Gaussian random field. For this, the use of the graph Laplacian becomes important, and one can proceed with alternative calculations. Here I draw attention to the work of Coifman and Maggioni (2004) on diffusion wavelets that allow fast and approximate computations of functionals of inverse Laplacian and related diffusion-like operators on manifolds, graphs and other non-homogeneous media. A study of the strengths and weaknesses of this procedure will be a matter of future research.

Werner G. Müller and Helmut Waldl (Johannes-Kepler-Universität Linz)

The authors thankfully further strengthen the bridge connecting the somewhat disparate areas of geostatistics and spatial econometrics. We would like to draw attention towards a rather neglected aspect of establishing a link as above, namely the potential effect on the respective optimal sampling designs. We shall illustrate our points on the leukaemia survival example from Section 2.3, utilizing some of the authors’ calculations.

In geostatistics the optimal sampling design is often based on the kriging variance over the region of interest, frequently by minimizing its maximum. Accounting for the additional uncertainty due to estimating covariance parameters Zhu and Stein (2006) and Zimmerman (2006) have employed a modification termed the empirical kriging (EK) criterion by the latter. In spatial econometrics it is common to test tor spatial auto-correlation by specifying a spatial linkage matrix and to utilize an overall type of measure such as Moran's I. Therefore Gumprecht et al. (2009) have suggested to maximize the power of Moran's I-test under a hypothesized spatial lattice process, call it the Moran I-power (MIP) criterion.

From the example one sensible design question we could pose is which out of the 24 districts should we sample if we are limited to a number k<24 for financial reasons, say k=3, which allows for inline image different configurations? Randomly sampling 20 locations from those as designs we can then draw a scatter plot of the values of the above criterion to judge for a potential linkage (Fig. 20(a)). EK reduces to scalar operations localized at ρ=0.26, the only free covariance parameter. For the MIP we use the corresponding precision matrix provided by the authors, the linkage matrix assigned 1 to point pairs within the range ρ. Although the evident link between the criteria does not extend well into the lower right-hand corner where the optima lie, it looks as if we could achieve reasonably high design efficiencies by substituting the criteria for each other.

Figure 20.

 (a) MIP (horizontal) versus EK (vertical) criterion values and (b) MIP (horizontal) versusCD0.99 (vertical) criterion values

Both criteria, however, are computationally quite intensive and it thus makes sense to look for cheaper alternatives. Motivated by traditional connections between estimation- and prediction-based criteria (‘equivalence theory’), Müller and Stehlık (2010) have suggested replacing the EK criterion by a compound criterion for determinants of information matrices with a weighing factor α, call it compound D optimality, CD. A scatter plot of its values (assuming a constant trend) against the MIPs (Fig. 20(b)) shows high efficiencies on the MIP criterion for the computationally extremely cheap compound-D-optimal design.

Summarizing, we believe that we have demonstrated that the relationships between the two linked approaches can go far beyond mere estimation issues.

Alessandro Ottavi and Daniel Simpson (Norwegian University of Science and Technology, Trondheim)

The Markov property, which is so important in temporal and discrete models, has long been absent in continuous spatial models, and the authors are to be congratulated for reminding us that it exists and is useful. In particular, the connection between stochastic partial differential equations and Gaussian random fields gives a practical method for specifying the local properties of a random field. Within this context, we are interested in using local effects to model non-stationarity. In particular, we are interested in non-stationarity induced by local topography, especially in the context of Bayesian smoothing.

Constructing Bayesian smoothers over complicated regions (i.e. regions that are not inline image) is a difficult problem and most successful efforts have involved, in some way, partial differential operators (Wood et al., 2008; Ramsay, 2002). We note in passing that the techniques described in the paper under consideration yield an efficient Bayesian extension of the FELSPLINE method of Ramsay (2002), which does not suffer the large sample size drawbacks of soap film smoothing (Wood et al., 2008). In fact, as these models are defined over an irregular tessellation, we can represent any hard physical boundary (almost) exactly and we can control either the (deterministic) value of the field on, or the flow through, the boundary by using standard finite element techniques.

Whereas a river, lake or mountain may provide a hard physical boundary, in other situations the constraints may simply impede or discourage movement. Such soft constraints are difficult to deal with properly and we propose to model them by locally deforming the physical space into the third dimension (see Sampson and Guttorp (1992)). This deformed space can then be directly tessellated and non-stationary SPDE models can be built on it. A caricature deformation is provided in Fig. 21, which shows the covariance centred at one point. The covariance can be clearly seen to wrap around the deformation in a sensible way.

Figure 21.

 Covariance function centred at the darkest point of the non-stationary random field induced by a deformation: the contours of the covariance function can be clearly seen to distort locally around the object, whereas the far away covariances remain almost unchanged

Omiros Papaspiliopoulos (Universitat Pompeu Fabra, Barcelona) and Emilio Porcu (University of Castilla la Mancha and University of Göttingen)

We congratulate the authors for this beautiful paper. We have some suggestions that may be considered by the authors for a more general view of the problem.

  • (a) Main result 1 does not hold when the parameter ν is not an integer. A more general setting would allow us to index the fractional Sobolev space associated with the Gaussian random field (GRF) with a Matérn covariance function and thus allow us to study the properties of such a GRF in terms of fractal dimension. Such information seems to be lost with the approach proposed by the authors. In this respect, the tapering approach proposed in Gneiting (2002) seems to be more promising.
  • (b) The Matérn model is very popular also because it is very handy; having a closed form for the related spectrum, it has been widely used, for instance, to apply Yadrenko's (1983) theory for the equivalence of GRF measures. But there are other covariance functions, such as the generalized Cauchy (Gneiting and Schlather, 2004) and the Dagum (Berg et al., 2008) functions, which allow the separation of the fractal dimension and the Hurst effect of the associated GRF. This is a significant advantage for statistical estimation, and such a property is not offered by the Matérn covariance, which has light tails. The generalized Cauchy model
    is positive definite on inline image for all inline image, for 0<αleqslant R: less-than-or-eq, slant2 and 0<γ, whereas the Dagum model admits the expression
    for which sufficient conditions of positive definiteness on inline image for all d are βγleqslant R: less-than-or-eq, slant1 and β<1.
    Recently, Ruiz-Medina et al. (2011) have shown that, for a GRF with a generalized Cauchy or with a Dagum covariance function, we have a local pseudodifferential representation, in the mean-square sense, of the type
    where −Δ denotes the negative Laplacian operator and inline image is Gaussian white noise and where ρ is identically equal to (d+α)/4 and (d+γβ)/4 for the generalized Cauchy and the Dagum functions respectively. It would be interesting to consider whether some type of Gaussian Markov random-field approximation applies to these GRFs.

Marc Saez (University of Gírona and Consortium for Biomedical Research Network in Epidemiology and Public Health, Barcelona) and Jorge Mateu (University Jaume I, Castellón)

We first congratulate the authors for this excellent and innovative work, providing an authoritative review of the ‘big n problem’. We indeed believe that the paper will become a seminal paper in the context of computational spatial statistics. We focus on comments on some aspects that we believe merit further discussion; in particular on how the stochastic partial differential equation generalizes to non-separable space–time models. Whereas in other extensions (the case of manifolds or non-stationary fields) the authors adequately extend the basic approach, in the non-separable space–time case the explanations given on how to obtain a Gaussian Markov random-field representation are not self-contained. It is not clear to us what the resulting system of coupled temporal stochastic differential equations is, or how the system can be discretized, even though the authors suggest using an Euler method. Can the authors be more specific in this respect? Another question related to this extension is to know which regularity conditions are violated when the noise process is not white in time. There is no doubt that this assumption is too restrictive, and we would like to know whether it could be relaxed in some way.

In relation to the aspects that deserve more discussion, we first would like to refer to the edge effects and boundary conditions and, secondly, to the issues of spatial scale. It is known that, at least in small databases or spatial data with a large spatial scale, edge effects can have an effect on the spatial pattern near the edge. It would be necessary, therefore, to investigate how to incorporate a varying edge in the representation. When considering covariates, it is also important to investigate the optimal spatial scale, i.e. the resolution, to avoid overfitting. The resolution of the spatial effect should not be less than the spatial scale of the covariates because otherwise the spatial effect, and not the covariates, explains the data. Since there is a trade-off between the accuracy of the Gaussian Markov random-field representation and the number of vertices, further research is needed on the optimal number of triangles, i.e. the resolution, in conjunction, if possible, with the presence of varying edge effects.

Alexandra M. Schmidt (Universidade Federal do Rio de Janeiro)

The idea of approximating continously indexed Gaussian fields through Gaussian Markov random fields is extremely appealing because of the sparseness of the precision structures of Gaussian Markov random fields. Even with current computational power, this is a real gain, especially in the case of large data sets. Moreover, the approach proposed provides a wide class of spatial models, allowing for the construction of Matérn fields on the sphere, and also non-stationary and oscillating covariance structures.

I shall concentrate my comments on the non-isotropic models aspect of the model proposed. Schmidt and O'Hagan (2003) proposed a Bayesian approach to the deformation idea of Sampson and Guttorp (1992). The idea is to assign a Gaussian process prior to the deformation function f(·) that maps the original locations in R2 into the latent space (also in R2). The smoothness of the deformation is defined by the smoothness of the covariance function of f(·). Schmidt and O'Hagan (2003) used a Kronecker product between a covariance matrix describing the covariance between the axes that define the latent space and the correlation between the monitored locations in the original space. This correlation is modelled through a squared exponential correlation function, which is equivalent to assuming a Matérn correlation function for which ν→∞. This assumption leads to an infinitely mean-squared differentiable function and a smooth mapping. The idea behind this structure is to obtain a deformation that is not very different from the original configuration but is still sufficiently flexible to capture the non-stationarity in the data. In Section 3.4, the authors obtain a non-stationary stochastic partial differential equation (SPDE) that reproduces the deformation method assuming Matérn covariances. This is done by fixing α=2 which is equivalent to ν=1, when d=2. Therefore, the original Matérn correlation function of the mapping function f(·) assumes f(·) is only one time mean-square differentiable. This might lead to undesirable folding of the configuration of points in the latent space. The authors also claim that one advantage is that the parameters of the resulting SPDE do not depend directly on the deformation itself. Although this might be an advantage for inference, it is important to recover the posterior distribution of f(·) to understand the non-stationarity in the data. More recently, Schmidt et al. (2011) extended the mapping onto Rd for d>2, by making use of covariates to define the other dimensions of the latent space. Increasing the dimension of the latent space helps to avoid the foldings mentioned above. They also discuss a simpler version of the model which also makes use of covariates. It is appealing that the SPDE provides another possibility for exploring covariates in the spatial covariance structures.

Daniel Simpson (Norwegian University of Science and Technology, Trondheim)

I congratulate Lindgren, Rue and Lindström on their outstanding contribution to the spatial statistics literature. In the simplest case, when the field is stationary and isotropic, it is easy to write down the Green function of the partial differential operator and, therefore, to derive the convolution representation of the corresponding Matérn field. When ν=αd/2 is the smoothness parameter and κ is the range parameter, this representation is


where η=(νd/2)/2 may be negative. It is tempting to approximate this by a finite sum using the method introduced by Higdon (1998). This does not work! The kernel convolution method, besides being less efficient than the method in the paper under discussion, produces posterior fields with noticeable artefacts (Bolin and Lindgren, 2009; Simpson et al., 2010).

Of course, the paper considers much more than the simplest case with great success, The finite element method introduced succeeds in these complicated cases because it directly approximates the required random function in a stable consistent manner. When considering a bounded domain in inline image, the proof of the convergence result (equation (11) in the paper) boils down to an application of Ceá’s lemma, which says that the error in the Galerkin approximation to an elliptic partial differential equation is bounded above by the error of the best approximation to the solution over the approximation space chosen. The space of piecewise linear functions is extremely well studied and its approximation properties have formed the backbone of finite element theory since the middle of last century.

I am extremely excited that the authors have extended the spatial statistics toolbox to include the finite element method, which is the workhorse of applied mathematics, physics and engineering. My excitement stems mostly from the fact that, given Gaussian data, the mean of the posterior random field can be approximated in linear time by using multigrid methods. For non-Gaussian data, the maximum a posteriori estimate can be computed efficiently by solving a non-linear partial differential equation (see Hegland (2007) and Griebel and Hegland (2010), who considered this in the context of density estimation). The obvious challenge for the computational statistics community is to compute uncertainty in linear time. If this can be done, it will be possible to solve truly huge problems.

Alfred Stein (Twente University, Enschede)

This paper is a fine contribution to spatial statistics. We may have been aware that Gaussian Markov random fields and the Gaussian fields are connected, but the paper provides a solid explanation on the basis of mathematical theory. It raises several items for discussion.

  • (a) The choice of Gaussianity is particular and I doubt whether equally elegant results can be obtained for non-Gaussian random fields, in particular for spatial count data. Also, the explicit link exists for two, relatively wide classes of covariance functions: the Matérn class and periodic covariance functions. However, there are more permissible covariance functions, like spherical covariances. It is not immediately obvious how the current procedure can be expanded. Finally, multivariate spatial data are natural expansions of the univariate case presented; for example a bivariate Gaussian random field may have an explicit link with the bivariate Gaussian Markov random field in an almost trivial way, but it would be good to hear the authors say the final word on it.
  • (b) Applying the method apparently requires a triangularization of the area. I cannot believe that it is always as simple and straightforward as in the examples now. Moreover, results may be sensitive to a particular choice. In an image analysis, it is natural to have a square grid of pixel values, whereas traditional geostatistics have data irregularly spread over an area, and a map on a square grid contains a visualized output map as an essential result. It is not entirely clear yet what the effects are of such choices in the final results.
  • (c) This brings me then to my final comment, namely the issue of quality of data. ln the current setting, the quality of data is entirely governed by a choice of the (Gaussian) distribution. For many data that is not enough, as locations may be uncertain, or refer to an aggregated area, whereas in other studies the data themselves are poorly defined. Yet, scientists collect and analyse them, simply because they are the best available to tell a scientific story—they are fit for use. A good example refers to data that are obtained from interpreting a soil layer and that are poorly defined just because the soil classes are poorly defined. Scientists rely on fuzzy approaches here. I wonder how the theory presented could proceed in such circumstances.

As these are all promising aspects for future research, for now I can only compliment the authors on their achievements.

Paul Switzer (Stanford University)

I have a concern regarding spatial models based on a direct specification of local conditional distributions, although these enjoy substantial computational advantages compared with approaches that use the unconditional Gaussian field (GF) covariance function for local estimation. Nevertheless, the conditional specification is not completely satisfactory, for example because it says nothing about how to construct models in a consistent way for different grid spacings. Your approach to this problem is insightful—estimate parameters of the unconditional covariance by using available data and then find a local conditional model that is approximately consistent with this estimated GF.

If you are starting with a GF then a device used to sidestep the large n problem for local inference is to condition arbitrarily only on observations within a neighbourhood by relying on the ‘screen’ effect of the closest data. Data weights for filtering or interpolation are determined by applying the GF specification restricted to the estimation location and a small number of neighbouring observations, via kriging. Whereas kriged maps will typically resemble maps of posterior means, this may not be so for local estimation precision which is more model dependent.

GF parameter estimates should be about the same regardless of the grid mesh that is used to sample a realization of a GF, i.e. the parameter ν in the Matérn covariance does not depend on how the GF is sampled. This would seem to imply that the Gaussian Markov random-field representation would have the same order neighbourhood regardless of the grid spacing that is used to sample the GF.

The covariance of an isotropic GF for a two-dimensional field is the same as the one that we would obtain if the two-dimensional field were restricted to a one-dimensional linear transect. So the Matérn parameter ν would be the same whether estimated from one-dimensional data or two-dimensional data, although the implied neighbourhood order would be different for d=1 and d=2.

Finally, I am curious how local estimation is affected by the choice of triangulation that is imposed on the observation domain for irregularly spaced data. Your suggestion, to have smaller triangles where there are more data makes sense but can anything more be said? For d=1 how should we think about triangulation for unevenly dense observations?

Kamil Turkman (University of Lisbon)

I congratulate the authors for this excellent work which will have significant consequences in modelling spatial data.

The objective of the paper is to find ways to treat the data as point referenced and to use a Gaussian field (GF) as the model, and to do the computations as if the data are discretely indexed by using an appropriate Gaussian Markov random-field (GMRF) model which represents the GF in the best possible way. The authors stress the word representation rather than approximation, perhaps indicating that this GMRF, although it is the best representation, may not approximate GFs to a desired level.

To switch from one model to another in this manner during the different phases of the modelling, we must assume that the likelihoods under the GMRF and GF models do not differ or deviate much. Indeed, this point was clearly expressed in Rue and Held (2005), where the GMRF representation was chosen by minimizing a metric of discrepancy of the form


where Σ1 and Σ0 denote respectively the covariance matrices of the GF and GMRF over the observed data locations, and r1(ij) and r0(ij) are respectively the elements of Σ1 and Σ0. The weights are chosen as being inversely proportional to distances. In contrast, the normal comparison lemma and its refinements (Piterbarg, 1996) suggest that the metric of discrepancy should be based on




where F1(x) and F2(x) are the density functions of the vectors X1 and X0 corresponding to GF and GMRF processes at the data locations, Xh=X1√(1−h)+X0h for some, h ∈ (0,1),φh(xi,xj) is the joint density function of Xhi and Xhj and Fh(x|·) is the conditional distribution function of Xh.

From this equality, a sharp inequality can be found:


In this paper, the optimal GMRF representation for a specific class of GFs and for a given irregular grid configuration is obtained in a different norm. However, it is not clear what sort of discrepancy this representation induces on the covariances, and hence on the likelihoods under GF and GMRF models and its consequence on the inference. The worrying point is that, in accordance with the normal comparison lemma, the total error of approximation is additive in the L1 covariance errors.

Christopher K. Wikle (University of Missouri, Columbia) andMevin B. Hooten (Utah State University, Logan)

First, we congratulate the authors on another important contribution in what is quickly becoming a renaissance on approximate Bayesian methods. Given that our interest in correlated random fields has been primarily from a continuous spatial perspective rather than a discrete one, this latest paper is quite intriguing. The general concept of thinking about common forms of dependence as a solution to a spectrally defined stochastic partial differential equation (SPDE) is clever and similar in spirit to other general ideas we have been fond of in the past.

As we have spent a considerable amount of time thinking about dynamic spatiotemporal models and the origin of spatial processes, we see this paper as a potentially very valuable contribution. Indeed, the notion of discretizing SPDEs to form the basis of Markovian statistical models from both the spectral (e.g. Wikle et al. (2001)) and physical space (e.g. Wikle (2003), Wikle and Hooten (2006, 2010) and Hooten and Wikle (2007)) perspectives has been a primary focus of our own research. A heuristic summary of the relationship between such approximations and their theoretical counterparts can be found in Cressie and Wikle (2011). A key component of these presentations is the full exploitation of the hierarchical framework that allows us to place dependent random processes on the parameters that control the dynamical interactions (i.e. the parameters in the SPDE), for which no analytical solution exists. We have focused on building models true to the aetiology of the underlying processes—which is, more often than not, in the (Markovian) dynamical evolution. Furthermore, it is important to remember that real world processes are non-linear. The development of analytical covariance functions for the governing SPDEs of such processes are typically intractable, yet the motivating Markovian models suggested by the discretization (either in physical or spectral space) are quite tractable and flexible (for example, see Hooten and Wikle (2007)). Wikle and Hooten (2010) discussed a general form for ‘quadratic interaction models’ and showed the connection to various classes of spatiotemporal PDEs. Wikle and Holan (2011) extend this to higher order interactions.

From a computational perspective, we have found that the algorithms of Rue and Martino (2009) are very computationally efficient when a solution can be found. Thus, for very large and well-behaved continuous physical systems where sufficient data exist and correlation structures are smooth, the approach outlined by Lindgren and his colleagues may be quite valuable for computational reasons alone.

The authors replied later, in writing, as follows.

We are delighted by the deluge of insightful comments, the details of which we can only begin to answer here. We have grouped our responses into a few common themes, mentioning commentators’ names only when referring to specific issues.


As pointed out by Cooley, Hoeting and Brown, it is not necessary to place the triangulation vertices at observation points. Indeed, the observation matrix in the global temperature example in Section 4.2 was introduced for this very reason. For a given triangulation, the matrix can be used to extract any observable linear combination of field values, allowing observations in arbitrary locations as well as regional averages. For point observations, each row of the matrix contains three non-zero elements, one for each corner of the triangle containing the point, and the sparsity structure of the posterior precision matrix is unaffected. There is also no requirement to use a regular grid for such models. The triangulation implementation in Rinla has a parameter for a minimum required distance between data-located vertices. In the temperature example this was set to 10 km, allowing the vertex placement to follow the data density, without generating excessively small triangles. An example where the triangulation was chosen completely independently of the data locations is given by Bolin and Lindgren (2011b). Crujeiras and Prieto add that the triangulation could be chosen adaptively on the basis of local approximation error estimates, and we agree that this can potentially be useful for non-stationary anisotropic models.

Approximation properties

Ippoliti, Martin and Bhansali note that the stochastic partial differential equation (SPDE)–Gaussian Markov random-field (GMRF) precision coefficients do not match those obtained by inverting the covariance matrix of a field sampled on a regular grid. However, the approximation is not aiming to approximate the field only at the vertices of the triangulation, but also at the intermediate points obtained via linear interpolation. If the same interpolation method is used with the sampled covariances, the overall field covariances are underestimated, whereas the SPDE–GMRF approach gives an overall closer approximation. The upper and lower envelopes of all the pairwise covariances for the two settings are shown in Fig. 22, together with the target covariance function. Bolin and Lindgren (2009) investigated how this effect influences the kriging results, comparing with tapering and kernel convolutions, as well as alternative choices of basis functions in the GMRF construction, including wavelets and B-splines.

Figure 22.

 Target covariance (inline image), linear interpolation covariance envelopes (inline image) and GMRF covariance envelopes (inline image): the envelopes show the minimum and maximum covariances for each pairwise point distance within the domain; the linear interpolation is accurate at grid nodes, but the GMRF approximation is closer to the target overall

Kernel methods

Kernel convolution methods, as mentioned by Furrer, Furrer and Nychka, are useful theoretical tools, but can in practice be cumbersome and computationally intensive (Bolin and Lindgren, 2009). The kernel generated by the SPDE operator for a Matérn field takes the shape of another Matérn covariance with different shape parameter. The kernels are singular for αleqslant R: less-than-or-eq, slantd, and non-differentiable for αleqslant R: less-than-or-eq, slantd+2, so the commonly used discrete kernel sums result in kriging and parameter estimation artefacts and do not yield either the correct pointwise distribution or correct distributions for regional averages, unless the range is large compared with the kernel placement distances. Also, since the kernels have non-compact support, they yield dense matrices for the posterior precisions. Using compactly supported kernels as suggested by Mateu is similar to moving average processes in time series analysis and is problematic unless the data are densely and evenly located on the domain of interest, whereas the GMRF models are counterparts to auto-regressive models, that are much more flexible tools for approximating general dependence structures.

Parameter estimation

Both when estimating parameters and calculating kriging interpolations, the results are influenced by the choice of boundary conditions. The easiest method for avoiding these effects is to extend the triangulation beyond the study region by an amount that is sufficiently large to cover the correlation range of the field, since this allows the boundary effects to drop off to virtually zero before reaching the data locations that influence the likelihood. As seen in Fig. 23(a), this eliminates the bias in the maximum likelihood estimates of the field variances σ2. The bias for the rescaling parameter τ−2 is reduced when the triangulation resolution is increased, as seen in Fig. 23(b). The parameter σ2κ2ν that was used in the comparison by Lee and Kaufman is equivalent to τ−2 in the SPDE models used in the paper. We believe that the highly variable results in their comparison is explained by noting that the precision matrix was chosen by simply deleting rows and columns, which is equivalent to conditioning, in this case leading to approximate Dirichlet boundary conditions. To compensate for the resulting small variances near the boundary of such a model, the variance parameter needs to be greatly overestimated. The combined comparisons show that Neumann boundaries are safer, and that extending the boundary further reduces the bias.

Figure 23.

 Estimated (a) σ2, (b) κ and (c) τ−2σ2κ2 for an h=1 lattice GMRF (□), an h=1 lattice with irregular extension (∘) and an inline image lattice with irregular extension, all with Neumann boundary conditions, plotted against the corresponding estimates based on a sampled covariance matrix: the estimations were computed on 10 samples from a ν=1 stationary Matérn field, with exact observation at a 20×20 lattice


In the paper, we used Neumann boundary conditions for simplicity and ease of characterization of the properties of the resulting models. For intrinsic models, these conditions are too restrictive and need to be relaxed to achieve the desired field properties. The key lies in how the boundary conditions affect the null space of the differential operator. The B-matrix that is used in Appendix C.3 relaxes the constraints on the null space, leading to intrinsic models. Normally, the rank deficiency of the precision matrix is used to determine the order of intrinsicness. However, since only some of the eigenfunctions of the Laplacian can be represented exactly in the piecewise linear basis, the rank deficiency does not tell the whole story. For a more complete picture, the continuous domain problem needs to be analysed more carefully.

A common alternative to Neumann conditions is to let inline image, which is easily accomplished for regular grids by replacing the two-dimensional grid Laplacian with the one-dimensional Laplacian along the boundaries. For a unit square domain, the resulting null space is spanned by the four functions 1,u1,u2 and u1u2 but the rank deficiency is only 3, since the last function is not piecewise linear. Although this gives approximately the standard polynomially intrinsic models on inline image, the construction also hints at a more general method for more general SPDE models currently under investigation. The idea is to start with a fully intrinsic precision for the interior of the domain and to add appropriate penalties generated by SPDE models within the boundary. For one-dimensional models, this eliminates the boundary effects entirely, and the higher dimension cases show promising results.

Model checking

We have not yet investigated the model checking issue mentioned by Gelman and Møller, since this is a general problem for spatial models, and not specific to the SPDE–GMRF approach. However, there appear to be opportunities for using the GMRF increments in similar ways to residual analysis for time series models, and also using the close link to the continuous space SPDE formulation itself when interpreting the results.


Choosing priors for the model parameters is a general issue for spatial models, but the handling of the boundary in the SPDE formulation may present further complications. When the correlation range is longer than the size of the domain, estimating κ becomes very difficult. In such situations, the intrinsic models (κ=0) can be used, reducing the set of parameters to an overall scaling factor. This also handles applications with only a single realization of the random field, where it is impossible to separate a long correlation range from a fixed spatial trend, and the posterior distribution for κ typically becomes degenerate, requiring a careful choice of prior. A heuristic approach when not using intrinsic models is to specify a prior for κ that puts low probability for range longer than the diameter of the domain. In the temperature example, we used independent Gaussian priors of that type for the weights for the basis functions controlling  log (τ) and  log (κ2). We are currently extending the temperature example into a full analysis, where the prior for all the SPDE parameters is constructed jointly, giving more control over the behaviour.

Simultaneous auto-regressions

Kent astutely noted the connection to simultaneous auto-regressions that, for even integer values of α, provides another direct link between Markov models and SPDEs; the GMRF construction in the paper also takes this form. Using the notation from theorem 2 in Appendix C.2, for α=2 we have inline image, where inline image is the diagonal matrix from Appendix C.5. In our early experimentation, we approached the GMRF construction problem by various attempts at modifying the graph Laplacian mentioned by Mondal. In hindsight, the current approach that builds more directly on the continuous domain Laplacian feels more natural to us when the goal is to build spatially consistent Markov models. The results do resemble the graph Laplacian but, as the SPDE models are extended to non-stationary models and fields on manifolds, the graph becomes less useful as such and is purely a computational device. This becomes particularly clear when extending the methods to fractional SPDEs.

Fractional operators

Although the results as presented in the paper are limited to integer values for α in the SPDE-generating Matérn field, the GMRF construction can be extended into a more general class of continuous domain Markov models, which contains close approximators of Matérn models with fractional α. The result from Rozanov (1977) that was mentioncd in Section 2.3 means that a stationary random field is a Markov field if and only if the spectral density is the reciprocal of a polynomial. In the isotropic case, such spectra take the form


where p is the degree of the polynomial and bi are coefficients in a strictly positive polynomial, and the corresponding discretized precision matrix can be obtained as


We need to find coefficients bi so that the model that is defined by inline image is an appropriate approximation of a model defined by the Matérn spectrum S(k)=(2π)d(κ2+‖k2)α. A sensible choice is to let p=⌈α⌉, and we use a convenient weighting function ω(k) for the deviation between the spectra, such that


Taking derivatives with respect to all bi and evaluating the integrals, we obtain a linear system of equations that can be solved easily. For ⌈α⌉=1 and ⌈α⌉=2, the coefficients are given through



The limiting case λ→∞ is equivalent to Taylor approximation at k=0 and gives (b0,b1)=κ2α−2(κ2,α) for ⌈α⌉=1 and (b0,b1,b2)=κ2α−4{κ4,ακ2,α(α−1)/2} for ⌈α⌉=2. These limiting approximations provide good agreement for integrals of the field over regions but, for better behaviour of the short distance covariances, λ needs to be chosen more carefully. For a given measure of deviation between the desired and approximate covariance functions, the optimal λ can be determined numerically, as a function of α. For fractional α between 0 and 2 on inline image, the parsimonious choice λ=α−⌈α⌉ approximately minimizes the maximal absolute difference between the covariance functions. As noted by Cooley and Hoeting, α is in practice often chosen only from the integers and half-integers, and we obtain (b0,b1)=κ−1(3κ2/4,3/8) for inline image and (b0,b1,b2)=κ−1(15κ4/16,15κ2/8,15/128) for inline image. Combined with the recursive construction for α>2, this provides GMRF approximations for all positive integers and half-integers. This includes the exponential covariance in inline image, which corresponds to inline image. The resulting covariance is shown in Fig. 24, together with the covariance from the spectral Taylor approximation. Further investigations are needed to determine how well the measurement noise model can incorporate the resulting deviation in small scale variation that is introduced by the approximation. Also shown in Fig. 24 is the covariance for a model with α=2, showing the same qualitative behaviour at zero, but different mid-range behaviour.

Figure 24.

 Covariances-based spectral approximation: desired exponential covariance (inline image, d=2, inline image and range 5), Taylor approximation (inline image), parsimonious weighting (inline image, inline image) and theoretical covariance for α=2 (inline image)

Long-range dependence

As discussed by Bhattacharya et al. (1983), apparent long-range dependence in data cannot be distinguished from a non-stationary mean or trend. An alternative to constructing covariance functions with such behaviour is therefore to use a two-stage model, where the local behaviour is treated separately from the long-range behaviour. In practical situations, spatially varying basis functions are often used to capture large-scale variations, leaving the rest for a spatial field component. This can easily be extended to allow the basis weights to differ between realization of the field, in effect increasing the spectral density near zero. For identifiability reasons, intrinsic models can be preferable, but alternatives such as conditioning on zero integral for the field can also be used, and are implemented in Rinla. As suggested by Fearnhead, another even more general approach is to model the observed field as the sum of several latent fields with different range. Care must be taken to handle the near non-identifiability of such models, noting that each individual latent component may be of less interest than their sum.

In some cases, these approaches can be motivated by considering the physical interpretation of the observed system, where long-range dependence may appear owing to an unobserved latent physical process, e.g. deep water processes with long-term memory affecting surface water processes that interact more rapidly with external forces. The rational spectra that are generated by the nested SPDE approach due to Bolin and Lindgren (2011b) allows more general models even with just a single latent process. Although not Markov as such, they are Markov on an augmented state space, leading to almost the same computational efficiency as the pure Markov models.


When using the non-stationary SPDE reparameterization of the deformation method, all distances are interpreted within a fixed topology, and the issue of folding is transformed into requiring strictly positive definite diffusion tensors. By parameterizing with scalar and vector fields, the estimated parameter fields can be used to interpret and understand the non-stationarity. A simple example is shown by Ottavi and Simpson. Furthermore, this yields a larger practical class of models, since the parameters need no longer correspond to a simple deformation. When using a Matérn process prior in a traditional deformation model, Schmidt rightly points out that fixing α=2 for the deformation field would give undesirable foldings due to insufficient differentiability. Increasing α should alleviate the problem to the same extent as any other choice of more differentiable deformation field model would do. Similarly the parameters in the non-stationary SPDE models can be constructed via general Gaussian fields, but direct comparison with the more general deformation methods is difficult, since the models would need to be constructed on the embedding space, whereas the SPDE as used in this paper is defined on the manifold itself, regardless of any embedding.

General extensions

It is important to note that the GMRF models can be combined in hierarchical modelling frameworks to allow highly non-Gaussian observation processes. Log-Gaussian Cox processes are of particular note, as mentioned by Diggle, Illian, Simpson, Møller and Höhle. The likelihood can be rewritten in a form that allows the use of the integrated nested Laplace approximatons method for inference, and as Diggle notes one can choose freely between gridded count data and using the actual point data themselves.

Functional data can be treated either directly in the general observation model, or by incorporating desired basis functions into the finite element basis itself, leading to block matrices in the precisions. In a setting with a local set of temporal basis functions inline image for each spatial triangulation vertex k, the resulting elements of the joint K-matrix take the form


for a given spatial differential operator inline image, and similarly for the other matrices in the precision construction.

For more general spatiotemporal SPDE models, we agree with Crujeiras and Prieto that a finite volume approach is preferable to finite elements, and Fuglstad presents an example of such a solution. The diagonal approximation to the C-matrix is precisely what a simple finite volume method would produce in the purely spatial case, lending further weight to the appropriateness of the approximation.