#### Discussion on the paper by Lindgren, Rue and Lindström

**John T. Kent** (*University of Leeds*)

This paper uses finite element methods to give a principled construction of Matérn-type Gaussian models in a variety of spatial settings. A key property of such models is that they have a sparse inverse covariance or precision matrix. The paper gives a comprehensive treatment of these models and it seems destined to become a landmark paper in spatial analysis. In many ways the paper is a natural sequel to Besag's (1974) paper that was read to the Society, and it forms a fitting tribute to his memory.

In earlier work often a great distinction was made between conditional auto-regression (CAR) and simultaneous auto-regression (SAR) models. In terms of a zero-mean Gaussian process {*x*_{ij}} on the integer lattice in and the notation *A*(*x*)_{ij}=(*x*_{i−1,j}+*x*_{i+1,j}+*x*_{i,j−1}+*x*_{i,j+1})/*a* for the shrunken first-order neighbourhood average (*a*>4), the one-parameter versions are

(CAR) and

(SAR). In the first, the conditional distribution of *x*_{ij} given the values of the process at all the remaining sites depends only on the nearest neighbours; in the second a filtered version of the *x*-process equals (discrete) white noise. The second process is a discrete approximation to the stochastic partial differential equation (2) based on the differential operator *D*_{κ,α}=(*κ*^{2}−Δ)^{α/2} with *α*=2.

The difference between the two processes can be seen most clearly in the spectral domain. Setting and for *ω* ∈ (−*π*,*π*)^{2}, the two spectral densities are

By taking suitable convolutions, both models can be extended to higher order neighbourhoods as described in the paper and, letting the even integer *α*2 denote the order of neighbourhood, we have

where the dimension is *d*=2 here and *M*(*ν*) stands for the Matérn model of index *ν*. In higher dimensions, I wonder to what extent we need to restrict *α* to be sufficiently large that *ν*>0, or at least *ν*0. One of the clever observations in the paper is to note that this SAR and CAR identification can be extended to odd integers *α*1 where, for the approximate weak solution of the stochastic partial differential equation, the finite clement ideas from Section 2.3 are used.

The role of the null space of a differential operator is not clear to me. Consider the stationary process on all of generated by *D*_{κ,α} with *κ*>0 and *α*=2. As noted in the paper, this differential operator has a null space which includes certain exponential functions. However, the random field is well defined even when specified to have mean 0. Further, if *D*_{κ,α} is used to motivate a process on a finite domain using the finite element construction, it is not clear to me what happens to the null space.

The situation is more delicate for intrinsic processes (Kent and Mardia, 1994). Consider the self-similar process on generated by *D*_{0,α}, which is intrinsic provided that the real parameter *α* is sufficiently large that *ν*=*α*−*d*/2>0. The intrinsic order is *p*=[*ν*], where [·] denotes the integer part. An intrinsic process is defined only up to the null space of polynomials of degree *p*. When *α* is an even integer, the differential operator *D*_{0,α} has a null space given by the polynomials of degree *r*=*α*−1, and this result extends to all integers *α*1. For integer *α* in dimensions *d*=1,2, it follows that *p*=*r*. However, if *d*3, then *p*<*r*. Thus, in the intrinsic setting, the question about the role of the null space of the differential operator also arises.

Recall that a differential operator D can be used to define a smoothing spline with penalty term ∫(D*ϕ*)^{2} for sufficiently smooth functions *ϕ* on . For example, *D*_{0,2} leads to cubic and thin plate splines in dimensions *d*=1,2. As the paper continually emphasizes, this construction is computationally intensive for large *n* when *d*2. Hence it is natural to ask how successfully the finite element ideas of this paper can be used to yield a computationally efficient approximation.

The paper mentions briefly how deformations can be used to introduce non-stationarity in a real-valued Gaussian process. Gaussian models can also be used to construct deformations of Euclidean space. Bookstein (1989) suggested the use of thin plate splines, and Michael Miller and his colleagues (e.g. Christensen *et al.* (l996)) developed more sophisticated models using non-linear partial differential equations motivated by continuum mechanics. My former doctoral student, Godwin (2000), constructed deformations which were constrained by elastic penalties and discretized by using finite element methods. In this case a pair of interacting partial differential equations for the horizontal and vertical displacement is obtained, depending on two parameters called the Lamé coefficients; changing the ratio between them can have a dramatic effect on the fitted deformation.

In summary, I found this to be an extremely stimulating paper and it gives me great pleasure to propose the vote of thanks.

**Peter J. Diggle** (*Lancaster University*)

This paper is an important contribution to an important topic. Latent Gaussian fields are widely used as components of geostatistical models (Diggle *et al.*, 1998) and of point process models (Møller *et al.*, 1998). I believe that they should also be more widely used for the analysis of spatially discrete data. Consider, for example, data consisting of counts *Y*_{i} associated with each of *n* subregions *A*_{i} that partition a region of interest, *A*. A standard class of models for data of this kind is that

- (36)

where the *X*_{i} form a Markov random field in which

where is the average of the values of *X*_{j} associated with the *n*_{i} subregions *A*_{j} that are considered to be *neighbours* of *A*_{i}. The usual approach to defining the neighbours is through contiguities: *A*_{i} and *A*_{j} are neighbours if they share a common boundary. This is appealing in a regular geography; less so when the *A*_{i} vary substantially in size and shape. An alternative is to assume that

- (37)

where now *X*(·) is a spatially continuous Gaussian field.

A pragmatic reason for preferring class (36) over class (37) has long been that the associated computations are much less burdensome. The remarkable computational efficiency that the methods in this paper achieve should instead allow a chioce between classes (36) and (37) to be made on the basis of their merits as models.

Another important aspect of the paper is its delivery of appealing non-stationary constructions through the paper's equation (12). In this respect, it is a pity that the restriction to integer *α* excludes the exponential correlation function (*κ*=0.5) in the two-dimensional case. The popularity of the Matérn family of correlation functions stems from the fact that the integer part of *κ* corresponds to the mean-square differentiability of *X*(·). But *κ* is difficult to estimate, and a widely used strategy is to choose its value among a small number of qualitatively different candidates, say *κ*=0.5,1.5,2.5, corresponding to continuous, differentiable and twice-differentiable fields *X*(·) respectively.

The paper also gives a new slant on low rank approximations by representing the field *X*(·) as

Here again, computational efficiency is crucial as the number of terms in the summation needs to be large when the range of the spatial correlation is small relative to the dimensions of the region of interest.

The paper uses integrated nested Laplace approximations as the basis for inference, building on Rue *et al.* (2009). For the applications that are considered in tonight's paper, the focus on marginal posteriors is a little too restrictive. If, as is often the case in applications, the focus of scientific interest is on predictive inference for one or more non-linear functionals of the unobserved realization of *X*(·), we need a reliable approximation to the joint predictive distribution of the whole field *X*(·), which is in practice approximated by its joint distribution on a fine grid spanning the region of interest.

The ability to fit models of this kind without resorting to Markov chain Monte Carlo methods is very welcome. My strong impression is that, for problems of this degree of complexity, the tuning and empirical assessment of convergence of Markov chain Monte Carlo algorithms remains something of a black art. However, is tuning still necessary for the methods that are proposed in the paper to deliver accurate inferences, and if so how delicate is this tuning?

The acute myeloid leukaemia (AML) data that are analysed in the paper consist of the residential locations and (censored) survival times of all recorded cases in an area of north-west England. The authors’ analysis follows earlier published analyses in treating these as geostatistical data, implicitly assuming that the locations have been sampled independently of the survival process. But this may not be so—the data are a realization of a marked point process with carrier space , where is the set of all locations of people resident in the study region who are at risk of contracting AML. This does not invalidate the authors’ analysis, which addresses the spatial variation in survival prognosis conditional on contraction of AML. But if the wider objective is to identify spatially varying factors that are involved in the aetiology of AML it potentially tells only half of the story, as there may be unrecognized spatially varying factors that affect both disease risk and survival prognosis; for a discussion of some of the methodological issues that are involved, see Diggle *et al.* (2010). For the AML data, although there is clear evidence of a marginal association between a simple circle counting estimate of the local density of cases and the hazard for survival (*p*<0.001 in a Cox proportional hazards analysis), this is accounted for by the authors’ adjustments for sex, white blood cell count and deprivation (*p*=0.349). However, a thorough joint analysis of risk and survival prognosis should also take into account the spatial variation in the local density of the population at risk.

It is with great pleasure that I second the vote of thanks for what will be, I am sure, a very influential paper.

The vote of thanks was passed by acclamation.

**J. B. Illian** (*St Andrews University*) **and D. P. Simpson** (*Norwegian University of Science and Technology, Trondheim*) *Explicitly linking Gaussian fields and Gaussian Markov random fields—relevance for point process modelling*

We congratulate the authors for this inspiring paper which we are sure will have a strong influence on spatial and spatiotemporal modelling for years to come. The stochastic partial differential equation approach provides an alternative representation for a large class of non-stationary Gaussian random-field models without needing explicitly to derive a covariance function. A piecewise linear Gaussian Markov random-field approximation is constructed that globally approximates the true random field up to a given resolution. This is a particularly interesting feature in the context of spatial point process modelling.

A log-Gaussian Cox process is a spatial point process that models the log-intensity field as a Gaussian random field defined *continuously* over the whole observation window. A common model fitting approach is to place a fine lattice over the observation window (Møller and Waagepetersen, 2007) and to count the number of points that are present in each grid box (Rue *et al.*, 2009; Illian and Rue, 2010; lllian *et al.*, 2011). This count value has a Poisson distribution and the latent spatial structure is modelled through a Gaussian Markov random field. Using this approach, the computational lattice has two roles: to estimate the latent Gaussian field, and to approximate the position of the points through a binning procedure. The first of these aims is entirely natural, whereas the second is an artefact of the approximation and leads to finer lattices than are really necessary. *This is wasteful*. As the stochastic partial differential equation approach constructs a *continuously indexed* random field, it is no longer necessary to bin the data over the lattice. It is then possible to construct a numerical approximation to the point process likelihood and to perform inference as normal. In particular, it is possible to perform extremely fast approximate inference for realistically complicated marked point process models by using the integrated nested Laplace approximations framework of Rue *et al.* (2009).

This approach begets a whole host of modelling questions. Consider the point pattern that was discussed in Illian and Hendrichsen (2010) and in Illian *et al.* (2011) consisting of the locations of muskoxen herds in Zackenberg valley in eastern Greenland; see Fig 10 for the location of the study area within Greenland and a map of the area. The observation window has a complicated boundary structure; it contains hard boundaries (the sea to the south), permeable boundaries (a river that the muskoxen may, reluctantly, cross) and artificial boundaries (the end of the observation window in the north). The stochastic partial differential equation approach allows us to incorporate this important information into our models. This is impossible with standard covariance-based models.

We thank Mads Forchhammer from Greenland Ecosystem Monitoring, and Ditte Hendrichsen for introducing us to the Zachenberg muskoxen data as an example of different boundary characteristics within a study site.

**Tilmann Gneiting and Michael Scheuerer** (*Universität Heidelberg*)

We congratulate Lindgren, Rue and Lindström on an exceptionally rich and original paper that builds bridges between statistics, probability, approximation theory, numerical analysis and applied fields, and opens up a wealth of new perspectives.

We share the authors’ excitement about the ease with which the stochastic partial differential equation approach allows for the modelling of non-stationarity, both on Euclidean spaces and on manifolds. As the authors note, an appealing interpretation of the non-stationary model (12) is that of a locally stationary Matérn field, whereas the actual form of its correlation function is unknown. Would members of the locally stationary Matérn correlation function that was developed by Stein (2005) and Anderes and Stein (2011) be candidates?

We second the authors’ call for ‘more physics-based spatial modelling’ (Section 5) with enthusiasm. In this context, Balgovind *et al.* (1983) derived a spatial statistical model for the errors in numerical weather prediction schemes from the physics of large-scale atmospheric flow. Their arguments lead to the stochastic dynamic differential equation (2.7) for the geopotential error field, which is of form (12) in this paper with *α*=2, where the *κ*(**u**) term varies smoothly with geographic latitude. This defines a spatial model on the sphere and so provides an applied instance where the manifold setting is essential, similarly to the global temperature example in the paper. On more general types of manifolds, approaches developed by approximation theorists may prove related, and useful; see, for example, Narcowich (1995).

Perhaps the strongest limitation of the authors’ ingenious approach is the restriction to a small set of feasible values for the Matérn smoothness parameter (Section 2.3, remark (e)). In view of well-known sample path properties of Matérn fields (see Guttorp and Gneiting (2006), and references therein), this creates what could be called a ‘roughness gap’, particularly in , where smoothness parameters *ν*=1,2,… allow for Gaussian fields with differentiable sample paths only. What value of *ν* was used in the global temperature example, where the spatial domain is the sphere, and what are the implications?

In a way, the restriction just mentioned is natural, as any probabilistically principled approximation of Gaussian fields by discretely indexed Gaussian Markov random fields can be expected to yield Markov models in the continuum limit, which is indeed what happens, leading to processes with reciprocally polynomial spectral densities (Section 2.3, remark (f)) and the (intrinsic, generalized) de Wijs field (Appendix C.3).

**R. Furrer and E. Furrer** (*University of Zurich*) **and D. Nychka** (*National Center for Atmospheric Research, Boulder*)

The authors are to be congratulated for this timely paper describing a very useful and practical methodological approach and computational procedure. There are many statisticians who are still working with Gaussian random fields simply because it is ‘virtually’ trivial to embed non-stationarity in the covariance function. With the stochastic partial differential equation approch, non-stationary modelling using Gaussian Markov random fields has become straightforward as well and this will take much wind out of the so-called ‘big *n*’ problem research.

We would like to point out a link from the presented work with recent developments concerning the asymptotics of the kriging predictor. It is well known that spline smoothing and kriging prediction are related in the sense that kriging prediction can be interpreted similarly to smoothing splines as a penalized minimization problem where the roughness penalty is determined by the covariance function of the underlying spatial process. For both settings the basic idea lies in representing the estimator or predictor as a weighted average of the observations by using a weight function, which is—in contrast with kernel methods—not known in closed form. On the basis of Nychka (1995), we have exploited an approach of how to approximate this weighting function by using its reproducing kernel properties. For the kriging predictor in a stationary setting this so-called equivalent kernel is shown to have a simple form in terms of the Fourier transform of the covariance function of the process. The equivalent kernel can also be characterized as the Green function of a pseudodifferential operator, which is closely related to the one of the paper.

In our research we aim to use the equivalent kernel to analyse the asymptotic behaviour of the kriging predictor in terms of bias and variance (Furrer and Nychka, 2007; Furrer, 2008). For this we need to show that it satisfies the so-called exponential envelope condition, which is accomplished for fractional smoothing parameters *ν* by using complex analysis (Furrer *et al.*, 2011). It is our hope that the use of reproducing kernel methodology and of techniques from complex analysis, especially in the case of fractional values of *α*, could also be beneficial to the presented work.

On a second note, we were surprised to see that the range parameter of the fitted climate field was smallest around the equator and increased towards both poles. Although the observational data cannot be directly compared with general circulation model output, the latter usually exhibits the inverse behaviour. This is partially due to larger ocean proportions in the mid-latitudes, compared with the high latitudes in the northern hemisphere and due to sea ice components around the pole.

**Paul Fearnhead** (*Lancaster University*)

I congratulate the authors on their principled approach and elegant (yet easily implementable) solution to the important computational bottleneck of inverting covariance matrices that arises in spatial statistics. I have two comments, one relating to experience of constructing their Gaussian Markov random-field (GMRF) approximations in one dimension, and the other to the restrictions on the covariance models for which GMRF approximations can be calculated.

To see how easy it is to construct the GMRF approximation, and how accurate it is, I ran some experiments for approximating one-dimensional Gaussian processes. Firstly, constructing the approximation was relatively simple to implement, and computationally cheap. (I imagine that the main difficulty in higher dimensions will come from needing to construct the triangulation.) In terms of accuracy, results for and three different values of *ρ*=√(8*ν*)/*κ* are shown in Fig. 11.

The covariance structure of the GMRF approximation varied little with the number of basis vectors used, and Fig. 11 shows the result by using 100 basis vectors. The key observation is that the GMRF approximation is excellent except for within a distance *ρ* of the boundary of the interval considered; this is particularly noticeable for the marginal variance. A question to the authors is how should the GMRF approximation be constructed in practice to avoid these issues of deteriorating accuracy at the boundary? One simple approach would be to construct the GMRF approximation over an interval that extends a distance *ρ* beyond the data.

Secondly, a comment on the generality of the method: Stein (1999) argued for the use of the Matérn covariance model on the basis of its flexibility. The argument is based on the ability to vary *ν* to allow for different degrees of smoothness in the underlying spatial model. So, is the restriction of the authors’ approach to integer values of *ν*+*d*/2 a practically important restriction on the flexibility of models that can be fitted to the data?

One way to extend the class of covariance models that can be used is to consider a spatial process *X*(**u**), defined as

where *Y*(**u**) and *Z*(**u**) are independent Gaussian fields with different Matérn covariance functions (but each restricted to integer values of *ν*+*d*/2). It is possible to analyse such models by using Markov chain Monte Carlo methods (or even integrated nested Laplace approximations), in such a way that you need only to invert the covariance matrices of *Y*(**u**) or *Z*(**u**), but not *X*(**u**). Hence the GMRF approximations could be used to do the needed calculations efficiently.

**Peter Challenor, Yiannis Andrianakis and Gemma Stephenson** (*National Oceanography Centre, Southampton*)

We congratulate the authors on what we believe will be a significant paper. Our own application of Gaussian fields is in the statistical analysis of computer experiments (see for example Kennedy and O'Hagan (2001)). Large computer simulators are used in many areas of science. In essence we follow the following procedure.

- (a)
We carry out a designed experiment, running the computer simulator as few times as possible to span its input space.

- (b)
What is known as an emulator is built by fitting a Gaussian field to the results of this experiment, linking the simulator inputs to its outputs.

- (c)
A second experiment is often used for diagnostic purposes to check that we have a good emulator.

- (d)
The emulator is used to make inferences about the simulator.

For further details see www.mucm.ac.uk. Although the methods used are very similar to the analysis of spatial data we work in a much larger number of dimensions. We have one dimension for each uncertain parameter in the computer code being analysed. Challenor *et al.* (2010) examined a climate simulator with 16 uncertain input parameters. However, because runs of the computer simulator are often expensive we have a limited number of data points from a designed experiment: usually about 10 per dimension (Loeppky *et al.*, 2009). This means that we have a different big *n* problem. Unlike the geostatistics problem we have a relatively small number of data points but a large dimension—a big *d* problem rather than a big *n*. The framework presented here is very attractive because of the ease of including, for example, non-stationarity through the stochastic partial differential equation formulation, it can handle data on manifolds and the theory is appropriate for *R*^{d}. We are not sure that the implementation in many dimensions will be as simple as for *R*^{1} or *R*^{2}. The triangulation will be complex and since we often work in a sequential way modifying the emulator as additional experiments are carried out. Since our experiments are designed we might be able to anticipate the positioning of our future data points when we build the triangulation and make it part of the design process. One possibility we are considering is whether a more complex choice of basis function than piecewise linear would allow a simpler triangulation and, at the expense of computational complexity, gain in the set-up costs.

**Jesper Møller** (*Aalborg University*)

This important and impressive paper by Lindgren, Rue and Lindström provides a computationally feasible approach for large spatial data sets analysed by a hierarchical Bayesian model. It involves a latent Gaussian field, a parameter *θ* of low dimension, Gaussian Markov random-field approximations to stochastic partial differential equations for covariance functions, and the integrated nested Laplace approximations package for the computations instead of time-consuming Markov chain Monte Carlo methods. Other recent papers by the authors (Simpson *et al.*, 2010; Bolin and Lindgren, 2011a) compare the approach in this paper with kernel convolution methods (process convolution approaches) and covariance tapering methods, and conclude that the Gaussian Markov random-field approximation to stochastic partial differential equations is superior.

The Matérn covariance function plays a key role, where for example in the planar case the authors assume that the shape parameter *ν* is a non-negative integer when considering the stochastic partial differential equation. Is there some link to the fact that this stationary planar covariance function is proportional to the mixture density of a zero-mean radially symmetric bivariate normal distribution *N*_{2}(0,*WI*_{2}) with the variance *W* following a (*ν*+1)-times convolution of an exponential distribution?

Despite its popularity and flexibility for modelling different degrees of smoothness, is this three-parameter Matérn covariance function really sufficiently flexible for modelling large spatial data sets? Would a flexible non-parametric Bayesian approach be more appropriate for ‘huge’ spatial data sets, although this of course may be computationally slow? The dimension of *θ* may then be expected to be so high that integrated nested Laplace approximations (Rue *et al.*, 2009) would not work; as the dimension of *θ* may even be varying, a reversible jump Markov chain Monte Carlo method (Green, 1995) may be needed when updating *θ* from its full conditional. When updating the Gaussian field from its full conditional (corresponding to a finite set of locations), a Metropolis–Hastings algorithm may apply (Roberts and Tweedie, 1996; Møller and Waagepetersen, 2004).

The authors do not discuss model checking. The integrated nested Laplace approximation provides quick estimates of the marginal posterior distributions of the Gaussian field and of *θ*. For model checking based on the joint posterior distribution, e.g. when comparing the data with simulations from the posterior predictive distribution, I presume that Markov chain Monte Carlo algorithms still are needed.

Finally, using a triangulation for a finite element representation of a Gaussian field is an appealing idea. For a spatial point pattern modelled by a log-Gaussian Cox process (Møller *et al.*, 1998), I expect that a regular triangulation would be used, since both the point pattern and the ‘empty space’ provide important information.

**Xiangping Hu and Daniel Simpson** (*Norwegian University of Science and Technology, Trondheim*)

We congratulate the authors on their excellent contribution to practical spatial statistics. We are particularly excited about the *local* specification of these random fields, which is markedly different from the constructs that are used for traditional spatial models. This stochastic partial differential equation specification is particularly useful in that it aviods any considerations of positive definiteness—if a solution exists it is automatically a Gaussian random field with a valid covariance function. Following from the comment in Section 3.3 of the paper, we have been investigating the extension of these methods to *multivariate* Gaussian random fields.

The construction of valid eross-covariance functions for multivariate Gaussian random fields is a very difficult problem and thus far only very specialized methods exist (see Gelfand and Banerjee (2010), and sources cited therein). The stochastic partial differential equation specification, however, *automatically* constructs valid cross-covariance functions! Inspired by the multivariate Matérn fields constructed by Gneiting *et al.* (2010), we define our multivariate random field **x**(*s*)=(*x*_{1}(*s*),*x*_{2}(*s*))^{T} as the solution to

- (38)

- (39)

where and *f*_{i}(*s*) are independent (but not necessarily identical) noise processes. If we choose our exponents as *α*_{ij}=0,2,4,… and take the noise processes *f*_{i}(*s*) to be Markov, then following the procedure outlined in Section 2.3 we arrive at a bivariate *Markov* random field. A sample from a negatively correlated bivariate random field is given in Fig. 12.

We conclude this comment by noting that all the extensions mentioned in Section 3 of this excellent paper can be applied to this situation, In particular, we can construct non-stationary, spatiotemporal Gaussian random fields over fairly arbitrary manifolds. We plan to expand on this in future work.

**David Bolin** (*Lund University*)

The methods presented in the paper are indeed useful in a wide range of applications; however, if the Gaussianity assumption cannot be justified one cannot apply the methodology directly. Although the authors look only at Gaussian applications, the extension to certain non-Gaussian models is fairly straightforward.

A non-Gaussian model can be obtained by changing the Gaussian white noise to some other non-Gaussian process, i.e. define *X* as the solution to

- (40)

for some non-Gaussian process *Z*. What then differs in the method is the calculation of the elements on the right-hand side of the weak formulation of the stochastic partial differential equation, i.e. the integrals of the basis functions *ϕ* with respect to the noise,

- (41)

An interesting class of non-Gaussian fields are the generalized asymmetric Laplace fields (Åberg *et al.*, 2009). If *Z* is of this type, integral (41) can be expressed as a Gaussian variable with mean *γ*∫_{Ω}*ϕ*(**s**) d**s**+*μ*∫_{Ω}*ϕ*(**s**) Γ(d**s**) and variance *σ*^{2}∫_{Ω}*ϕ*^{2}(**s**) Γ(d**s**), where Γ is a gamma process. Thus, the weak solution to equation (40) can be expressed as a GMRF conditionally on the integrals with respect to Γ. The solution to equation (40) can be viewed as a Laplace moving average model (Åberg *et al.*, 2009; Åberg and Podgórski, 2010) with a symmetric Matérn kernel *f*. The covariance function for *X* is Matérn and the marginal distribution for *X*(**s**) is given by the characteristic function

- (42)

In Fig. 13, a simulation of the process on is shown when *μ*=*γ*=*σ*=*κ*=1. In Fig. 13 we can also observe close agreement between the empirical covariance function based on samples from 1000 simulations and the true Matérn covariance, as well as close similarity between the empirical density and the true density calculated by using numerical Fourier inversion of equation (42). This indicates that the method works for this non-Gaussian case.

Besides providing a computationally efficient method for simulation, an advantage with the stochastic partial differential equation formulation is that it simplifies parameter estimation, which usually must be done using the method of moments for these models (Wegener, 2010). However, the stochastic partial differential equation formulation facilitates estimation in a likelihood framework using the EM algorithm.

Laplace processes are a special case of the Lévy processes of type G (see for example Wiktorsson (2002)), and the method extends to this larger class and possibly other non-Gaussian models as well. This is currently being investigated together with the parameter estimation problem for these models.

The following contributions were received in writing after the meeting.

**Patrick E. Brown** (*Cancer Care Ontario, Toronto, and University of Toronto*)

This is a very interesting paper which is certain to enable a wide range of spatial analyses which were previously intractable. Particular credit is due to the authors for making their excellent software available.

The interpretation of the Markov random-field approximation on the regular square grid could be slightly modified to allow for its use with irregularly spaced data. Rather than evaluating the surface *X*(*s*) only at a set of grid points *s*_{ij}, could we not use the *x*_{ij} to approximate the continuous *X*(*s*) by a surface which is piecewise constant within grid cells centred at *s*_{ij}? This would be a special case of the method used for irregular data, with a regular square grid in place of irregular triangles and constant basis functions within cells in place of the linear basis functions. An important difference between the piecewise constant grid approximation and that used in the paper is that observations need not lie on the vertices of the grid.

Fig. 14 shows the regular grid, piecewise constant approximation fit to the leukaemia data by using the Rinla software. Grid cells are l/100th the size of the *y*-dimension of the study region, and the grid is extended by 20 cells beyond the study region in each direction to avoid edge effects. The posterior means and standard deviations are nearly identical to those shown in Fig. 3 of the paper. The disadvantage of this analysis is that it appears to be more computationally demanding (taking roughly 10 min on an eight-core computer), though it is likely to scale quite well as increasing the number of observations would not increase the number of vertices on the lattice.

Can the authors offer advice on the choice of grid? How would we compare the quality of the approximation for a fine regular grid with piecewise constant bases to a much coarser triangular grid with linear basis functions? There is certainly a limit to roughness of a surface approximated by a coarse grid, and presumably a limit to how smooth a Markov surface on an extremely fine grid can be.

**Michela Cameletti** (*University of Bergamo*) **and Sara Martino** (*Norwegian University of Science and Technology, Trondheim*)

We congratulate the authors for this excellent paper that defines a link between Gaussian fields with Matérn covariance function and Gaussian Markov random fields. From our point of view, the stochastic partial differential equation (SPDE) approach, combined with the integrated nested Laplace approximations algorithm proposed by Rue *et al.* (2009), introduces a new modelling strategy that is particularly useful for spatiotemporal geostatistical processes. The key point is that the spatiotemporal covariance function and the dense covariance matrix of a Gaussian field are substituted respectively by a neighbourhood structure and a sparse precision matrix, that together define a Gaussian Markov random field. In particular, the good computational properties of Gaussian Markov random fields and the computationally effective approximations of the integrated nested Laplace approximations algorithm make it possible to overcome the so-called ‘big *n* problem’. This issue refers to the infeasibility of linear algebra operations involving dense matrices and arises in many environmental fields where large spatiotemporal data sets are available.

The authors mention in Section 3.5 the possibility of extending the SPDE approach to non-separable spatiotemporal models. In this regard, we wonder whether it is possible to use the SPDE approach for approximating a spatiotemporal Gaussian field with a non-separable covariance function belonging to the general class defined by Gneiting (2002). As described in Cameletti *et al.* (2011) for air quality data, models characterized by these non-separable covariance functions are extremely computationally expensive because they involve matrices whose dimension is given by the number of data in space and time. Thus, parameter estimation and spatial prediction become infeasible by using Markov chain Monte Carlo methods. Moreover, such non-separable covariance functions are defined by a large number of parameters and the convergence when using Markov chain Monte Carlo methods can be an issue. Thus, if the SPDE approach could deal with the general class of non-separable covariance functions given in Gneiting (2002), it would be an important result for spatiotemporal geostatistical modelling.

**Daniel Cooley and Jennifer A. Hoeting** (*Colorado State University, Fort Collins*)

We congratulate the authors for establishing this important link between Gaussian fields (GFs) and Gaussian Markov random fields (GMRFs). To a large extent, GFs provide the foundation for geostatistics. Even when one restricts assumptions to only a mean and covariance function, a GF is not far removed from any geostatistical analysis since there is always a GF with these first- and second-order properties. For kriging the best linear unbiased predictor and the conditional expectation of a GF correspond and, if performing maximum likelihood estimation, the GF assumption is explicit.

Rather than dealing with point-located geostatistical data, GMRFs have their origin in modelling areal or lattice data. In comparison with geostatistical methods, statistical practice for areal data has seemed *ad hoc*. Whether assessing auto-correlation (e.g. Geary's *C* or Moran's *I*) or constructing models (GMRFs or other auto-regressive models), areal data methods have relied on an adjacency matrix constructed from researchers’*a priori* assumptions of spatial dependence. Modelling a GMRF often includes justifying a dependence structure described by a small number of parameters and constructing an adjacency matrix. The process of constructing a dependence structure from an adjacency matrix is typically heuristic at best and particularly difficult for irregular lattices. However, the computational advantages of GMRFs and other auto-regressive models usually outweighed the disadvantages. The authors’ link between GFs and GMRFs allows for the construction of a meaningful dependence structure in a GMRF setting for both regular and irregular lattices, allows for the use of GMRFs on point-located data and enables fast computation that has always been the advantage of GMRFs over GFs.

That the GF–GMRF link is made through a Matérn covariance function is important as the Matérn function is often defended as the most theoretically justifiable. In geostatistical practice, estimating the Matérn function's smoothness parameter is challenging. Often the smoothness parameter *ν* is fixed according to an *a priori* belief of the smoothness of process realizations. One can view the choice of template for a regular grid given in Section 2.2 as equivalent to setting the smoothness parameter at the outset, as is common practice.

A challenge in spatial statistics is that complex models require large sample sizes to estimate model parameters reliably (Irvine *et al.*, 2007), but many modelling procedures are too computationally complex for large sample sizes. The link between GFs and GMRFs will allow more researchers to investigate important statistical issues like model selection, precision of parameter estimates (especially spatial covariance parameters), sampling designs and more.

**Rosa M. Crujeiras and Andrés Prieto** (*University of Santiago de Compostela*)

We thank the authors for such an interesting paper, which forms a bridge between Gaussian fields and Gaussian Markov random fields by means of stochastic partial differential equations (SPDEs). The recent developments in numerical methods for PDEs may play an important role, by suitable modifications in the stochastic context. Our comments are focused on the procedure for solving the SPDE (2) by means of finite element methods (FEMs), concerning the mesh construction, the FEM approximation and the boundary conditions imposed.

The construction of a triangulated lattice to approximate the Matérn field is required for constructing the FEM approximation. The authors point out that the vertices are placed on sample locations and a heuristic refinement is used to minimize the total number of triangles satisfying the mesh constraints. Such a generation is explicitly determined by the geometric requirements of sample locations and the geometry of the observation region. However, the mesh design could be driven additionally by the Matérn field itself, by means of an iterative procedure, which adapts the mesh to the behaviour of the SPDE solution minimizing an *a posteriori* error estimate. This is a well-known procedure in the engineering literature called *h*-adaptivity (see Ainsworth and Oden (2000) for further details).

Also in the FEM approximation, low order (piecewise linear) elements are used to approximate the Matérn field in the computational domain. These elements provide satisfactory approximations in the stationary case, but other choices are possible. For instance, discontinuous Galerkin methods (Hesthaven and Warburton, 2008), Petrov–Galerkin methods or finite volume methods (see LeVeque (2002)) are possible alternatives. In fact, finite volume methods may be more suitable in the non-separable space–time models, with a dominant transport term.

As the authors remarked, there is a boundary effect in the covariance approximation due to boundary conditions. In this approach, Neumann conditions are imposed on the boundary of the computational domain when solving the SPDE. Rue and Held (2005) showed that the effect of boundary conditions is negligible if the length of the computational domain is sufficiently large compared with the effective range of the covariance, which enables us to capture the variability of the process. Nevertheless, this embedding drawback due to the non-exact homogeneous Neumann boundary conditions (which are not satisfied by the Matérn class) could be overcome by settling other boundary values in the second-order elliptic problem: non-local DtN operators (Givoli, 1999), absorbing boundary conditions (Keller and Givoli, 1989) or perfectly matched layers (see Bermúdez *et al.* (2010)).

**Marco A. R. Ferreira** (*University of Missouri, Columbia*)

I congratulate Dr Lindgren and his colleagues for their valuable contribution to the area of spatial statistics modelling and computation. In addition, I commend the authors for making their methodology publicly available in the Rinla package.

Lindgren and colleagues build on a stochastic partial differential equation based explicit link between Gaussian fields and Gaussian Markov random fields to develop fast computations for Gaussian fields with Matérn covariance functions. In addition, the stochastic partial differential equation may be thought of as a generator process that produces Matérn Gaussian fields. Within this stochastic partial differential equation framework, Matérn fields on manifolds and non-stationary fields arise naturally.

However, in this work the authors consider the smoothness parameter *ν* fixed and taking values only on the set of positive integers. This contrasts with one of the advantages of the use of the Matérn class of covariance functions: the smoothness parameter *ν* may be estimated from the data. Thus, the use of the Matérn class allows the degree of mean-square differentiability of the process to be data adaptive. In the current work, this particular advantage of the Matérn class is lost in favour of fast computation.

Finally, I am curious about what were the priors used by the authors for the hyperparameters. Even though the data sets considered are fairly large, some strange things may happen when one analyses spatial data. For example, improper uniform priors may lead to improper posterior distributions (Berger *et al.*, 2001).

**Geir-Arne Fuglstad** (*Norwegian University of Science and Technology, Trondheim*)

I congratulate the authors for their work on bringing together modelling with stochastic partial differential equations (SPDEs) and computations with efficient (sparse) Gaussian Markov random fields. This is an elegant way of creating (space–time) consistent models, which take advantage of the computational benefits of sparse precision matrices. I wish to comment on the brief statement at the end of Section 3.5 about how this method can be extended to a non-separable space–time SPDE. I used an approach that is similar to the one described there, but with a different spatial discretization. The steps that were used are described briefly in what follows.

Consider the non-separable space–time SPDE

- (43)

where is standardized Gaussian white noise and *A*,*T*>0 are constants. The spatial discretization is done by a finite volume method, and the Gaussian variable associated with each cell in the grid is allowed to be a Gaussian process in time. This reduces the space–time SPDE to a linear system of SDEs in time,

where *U* is the vector of Gaussian processes associated with the cells, *C* is a sparse matrix relating the derivative to the value at the cell itself and neighbouring cells and *D* is the vector of Gaussian white noise processes associated with each of the cells.

This system is then approximated with the backward Euler method. The use of the backward Euler method gives a Markov property in time, as each time step is only conditionally dependent on the closest time steps, thus still giving the desired speed-up compared with a dense precision matrix.

The results of this procedure proved quite good and the finite volume method extends to transport terms in a natural way. One simple example of using this procedure is the case with von Neumann boundary conditions at the spatial boundaries and a fixed starting state. Fig. 15 shows one simulation from such a situation with the starting condition

- (44)

Hence the approach briefly indicated in Section 3.5 does indeed seem to be a viable approach for space–time SPDEs.

**Andrew Gelman** (*Columbia University, New York*)

When using Bayesian inference (or, more generally, structured probability modelling) to obtain precise inferences from data, we have a corresponding responsibility to check the fit and the range of applicability of our models. Using expected values and simulated replicated data, we can graphically explore ways in which the model does not fit the data and broadly characterize particular inferences as interpolation or extrapolation. I am wondering whether the authors have considered using their powerful computational machinery to understand and explore the fit of their models. I think that graphical exploration and model checking could greatly increase the utility of these models in applications.

**Peter Guttorp** (*University of Washington, Seattle, and Norwegian Computing Center, Oslo*) **and****Barnali Das** (*Westat, Rockville*)

This is a very important paper in spatial statistics. The late Julian Besag used to argue that the Markov random-field approach was better than the geostatistical approach, both from the computational and the conceptual point of view. The difficulty was always the requirement to have observations on a regular grid. Le and Zidek (2006) solved this by creating a grid containing the observations, and treating most of the grid points as missing, but then only a small fraction of the data are non-missing. The approach by Rue and his colleagues adapts the tessellation to the data and, in our opinion, proves that Julian once again was right.

The current approaches to estimating global mean temperature suffer from several difficulties. They are not truly global in character, they do not take into account the non-stationarity of the global temperature field and the covariance structure is local, not global. One of us (Das, 2000) developed an approach to estimating non-stationary processes on the globe. The idea was based on the deformation approach (Sampson, 2010), by mapping the globe to a sphere, on which isotropic covariance functions are fully characterized (Obukhov, 1947). The implementation consisted of alternating transformations of latitude and longitude. Fig. 16 shows the resulting deformation, which expands the southern hemisphere (indicating smaller correlation length) and compresses the northern hemisphere.

Owing to the computational complexity of this fitting procedure, only 724 stations in the Global Historical Climate Network version 2 data set with a record of at least 40 years were used (115 randomly selected stations were reserved for model evaluation). Fig. 17 shows the resulting correlation estimates around three different points on the globe, showing a clear indication of inhomogeneity. The computational complexity of this fitting, not to mention spatial estimation of the global temperature field, has been reduced by several orders of magnitude in the work by Rue and his colleagues.

It appears that the temperature analysis by Rue and his colleagues has reasonable results; for example, the coefficient for regression on altitude is similar to what has been seen in the literature. We do wonder whether the estimates of the uncertainty of the various published global temperature series are similar to those computed by the new approach. Also, to what extent would inclusion of ocean temperature data change the estimates and their uncertainties?

**Ben Haaland** (*Duke–National University of Singapore Graduate Medical School*) **and****David J. Nott** (*National University of Singapore*)

As the authors emphasize perhaps the greatest value of this paper lies in the non-stationary and other extensions their results provide. One potential application area is Gaussian field (GF) emulation of computer models. Modelling output of computer models as GFs is useful for constructing emulators, which are computationally cheap mimics of computer models providing appropriate descriptions of uncertainty (Santner *et al.*, 2003). The computer model involves inputs *λ*, and we have run the model at designed values *λ*_{i}, *i*=1,…,*n* say. There has been recent interest in dynamic emulation where the computer model output is a time series. Conti and O'Hagan (2010) discussed dynamic emulation with GFs. One approach treats time as an additional input, but for long time series this treats a large data set as an observation of a GF and we are in the realm of the ‘big *n* problem’. Correlations are often highly non-stationary in the input space and time. Although appropriate covariates in the mean sometimes help, currently used models do not seem sufficiently flexible. The methods provided have great potential, but we have some questions. First, the authors focus on the two-dimensional case and we wonder whether computational benefits decrease as the dimensionality increases. Secondly, we wonder in higher dimensions whether the boundary effects are more problematic.

Whereas the explicit representation discussed by the authors concerns positive integer *α*, the representation has broad applicability. Draws from a GF with Matérn covariance are functions in a Sobolev space *H*^{α}(Ω) (Wendland, 2005). For integer *α*, these functions have *α* square integrable derivatives. The value of *α* is typically selected so that the function estimate has sensible properties and fractional values are not ordinarily chosen. These function spaces are nested decreasing, *H*^{α}(Ω) ⊃ *H*^{α+ɛ}(Ω) for *ɛ*>0. Hence, a function with fractional smoothness *α*>1 can be approximated by a function in *H*^{⌊α⌋}(Ω) and the convergence results hold. The infinitely smooth function space corresponding to the commonly used Gaussian covariance is contained in all Sobolev spaces. So, a Markov representation of a Matérn covariance with large smoothness could be used to approximate a Gaussian covariance. A balance must be found between better computational properties of less smoothness and better approximation properties of more smoothness. However, we wonder to what extent the convergence (theorems 3 and 4) and sparsity (Appendix C.5) results depend on using the typical finite element basis.

**John Haslett, Chaitanya Joshi and Vincent Garreta** (*Trinity College Dublin*)

We congratulate the authors on a most stimulating paper, and we see applications everywhere.

One application that concerns us at the moment is a multivariate space–time process in which the temporal changes can be much longer tailed than Gaussian. The simplest ease is a univariate process at a single point in space. We have found that the normal inverse Gaussian distribution provides a flexible basis for temporal increments (see, for example, Barndorff-Nielsen and Halgreen (1977)). This model and its extensions are now widely used in finance (see, for example, Kumar *et al.* (2011)). Its multivariate extension (Wu *et al.*, 2009)) provides a basis for the spatiotemporal version. The multivariate normal inverse Gaussian distribution can be motivated as a scale mixture of Gaussian distributions.

Impressed by the link between Gaussian fields and the Gaussian Markov random fields presented in this paper, we wonder whether it may be possible to extend the existing methodology to include non-Gaussian fields. If so, we suspect that one approach may be via *subordination*, which is one way of changing the distributional properties of a stochastic process.

Let **X**(*t*) be a random process, and let **T**(*t*) be a monotone, non-decreasing Lévy process; then the process **X**{**T**(*t*)} is said to be *subordinated* to the process **X**(*t*) and **T**(*t*) is called the *directing process*.

Subordination is an elegant way of producing both subdiffusive and superdiffusive motions from regular diffusive motions such as random walks and Brownian motion; see for example Sokolov (2000) and Eliazar and Klafter (2004). In particular, temporal subordination has been widely used to model systems whose subjective ‘operational time’ is different from the objective ‘physical time’, such as those often found in statistical physics (see for example Eliazar and Klafter (2005)) and econometrics (see for example Clark (l973) and Carr *et al.* (2003)). In particular, subordination has been shown to result in processes which are more *leptokurtic* (Clark, l973).

We therefore wonder whether, by using a (spatially) subordinated Brownian motion *W*{*U*(*u*)}, it might be possible to modify the stochastic partial differential equation in equation (3)

- (45)

where the directing process *U*(*u*) is any suitable monotone, non-decreasing Lévy process, so that a (non-Gaussian) *weak* solution to this equation exists.

We would like to know the author's views regarding this approach or any other approach leading to non-Gaussian fields.

**Michael Höhle** (*Robert Koch Institute, Berlin*)

The work presents innovative numerical solutions for geostatistical modelling. Despite the complexity of its mathematical detail, an open source implementation including an R interface is provided. This encourages potential users, including those without advanced mathematical knowledge, to apply the methods in practice. I congratulate the authors on their work regarding both the theoretical and the applied dimension.

As a comment, I miss explicit mention of how the methodology proposed can be used for inference in spatial point processes—specifically log-Gaussian Cox processes where Gaussian fields play an important role. In Rue *et al.* (2009), a proposal is given on how to use a Gaussian Markov random-field approximation in this case. It would therefore be of interest to see a comparison between this approximation and the current proposal. Furthermore, the leukaemia application in the paper provides the important link of using latent Gaussian fields in spatial regression modelling: it bridges between the joint posterior of the latent field and the hyperparameters as given by Rue *et al.* (2009), page 32l. With specific application of Gaussian fields in spatiotemporal point process modelling in mind, how well does the integrated nested Laplace approximations approach allow for a likelihood where *π*(**y**|**x**,*θ*) is not simply a product of the individual observations anymore? Of particular interest would be whether the approach proposed is usable for adding Gaussian field random effects to regression models of the conditional intensity function, e.g. as in the stochastic epidemic model of Höhle (2009).

**L. Ippoliti and R. J. Martin** (*University G. d'Annunzio Chieti–Pescara*) **and R. J. Bhansali** (*University of Liverpool*)

We congratulate the authors for developing a power Gaussian Markov random-field approximation to the correlation function of the Matérn class of Gaussian random fields, extending Besag's (1981) approximation using *K*_{0} to the completely symmetric conditional auto-regressive CAR(1) process, and Whittle's (1954) approximation using *K*_{1} to the completely symmetric simultaneous auto-regressive SAR(1) process.

We have two questions for the authors.

For large distances, both correlations (Matérn and approximating power CAR) are small, and the difference is small. From Abramowitz and Stegun (1965), equation (9.7.2),

Do the authors have any results on behaviour of the power CAR correlations for large distances?

The other question is on the interpolation. If the number of points to be interpolated is small the Gaussian Markov random-field specification has several advantages: compared with kriging, an inverse covariance-based predictor should be much quicker and more accurate, and avoid the common ill-conditioning problems of **Σ**. However, this assumes that the precision matrix of the Matérn function is well approximated by that of the power CAR with power *ν*+1. Consider the Matérn function with *ν*=2. Using a 512×512 torus lattice Tables 1 and 2 show some low lag inverse correlations for ranges 30 and 100. The inverse correlations (the same values for both ranges to three decimal places) of the corresponding power CAR are given in Table 3.

Table 1. Matérn inverse correlations—range = 30 *Lag* | *Correlations for the following lags:* |
---|

*0* | *1* | *2* | *3* | *4* |
---|

0 | 1.000 | −0.510 | 0.187 | −0.063 | 0.021 |

1 | −0.510 | 0.167 | −0.020 | −0.005 | 0.005 |

2 | 0.187 | −0.020 | −0.011 | 0.006 | −0.002 |

3 | −0.063 | −0.005 | 0.006 | −0.00l | 0.000 |

4 | 0.021 | 0.005 | −0.002 | 0.000 | 0.000 |

Table 2. Matérn inverse correlations—range=100 *Lag* | *Correlations for the following lags:* |
---|

*0* | *1* | *2* | *3* | *4* |
---|

0 | 1.000 | −0.512 | −0.220 | 0.204 | 0.208 |

1 | −0.512 | 0.179 | 0.062 | 0.072 | 0.045 |

2 | −0.220 | 0.062 | 0.189 | −0.047 | −0.165 |

3 | 0.204 | 0.072 | −0.047 | −0.157 | −0.142 |

4 | 0.208 | 0.045 | −0.165 | −0.142 | 0.054 |

Table 3. Inverse correlations of the completely symmetric power CAR *Lag* | *Inverse correlations for the following lags:* |
---|

*0* | *1* | *2* | *3* | *4* |
---|

0 | 1.000 | −0.508 | 0.107 | −0.009 | — |

1 | −0.508 | 0.214 | −0.027 | — | — |

2 | 0.107 | −0.027 | 0.000 | — | — |

3 | −0.009 | — | — | — | — |

4 | 0.000 | — | — | — | — |

Despite the good fit for the correlation function, the power CAR is not a good approximation to the inverse correlations within the CAR neighbourhood, and also the Matérn values can be appreciably different from 0 outside the neighbourhood. More importantly, the interpolation variance can be appreciably larger for the approximating power CAR—Table 4. Do the authors have any views on how much accuracy it is reasonable to sacrifice to obtain faster predictions?

Table 4. Interpolation variance for Matérn and completely symmetric power CAR for two different values of *ν* and ranges *ν* | *Range* | *Interpolation variance, Matérn* | *Interpolation variance, CAR* |
---|

1 | 30 | 0.0308 | 0.0498 |

1 | 100 | 0.0311 | 0.0500 |

2 | 30 | 0.0025 | 0.0089 |

2 | 70 | 0.0023 | 0.0089 |

Note that the Gaussian correlation structure, the limit as *ν*∞ of the Matérn function, has an exact explicit form on the rectangular planar lattice for the inverse variance matrix, or for the finite CAR representation, which can be obtained from the results in Martin (2000).

Finally, some results comparing directly specified CAR models with integrated continuous processes over irregular areas are given in Martin and Dwyer (1994).

**Venkata K. Jandhyala and Stergios B. Fotopoulos** (*Washington State University, Pullman*)

We congratulate the authors for a timely and inspiring paper on a topic that is at the forefront of contemporary statistical modelling and computational statistics. The advantages of working with Gaussian Markov random fields have been well exploited for fitting hierarchical Bayesian models for spatial data. However, as pointed out by the authors, one often wishes to model data drawn from Gaussian fields (GFs). In the absence of the Markovian property, GFs are dense and are not easily amenable for computations. In this regard the authors have come up with a breakthrough result where they have demonstrated (with the necessary technical details) that any GF with a Matérn covariance function can be suitably replaced by an equivalent Gaussian Markov random field whose precision matrix is sparse. The depth of their result becomes evident by the general nature of spatial domains to which the result has been extended in the paper, e.g. to irregular lattices and manifolds. Thus, the authors have done a remarkable job.

We believe that the authors’ results can be useful to address the change-point problem for spatiotemporal GFs, which is a problem that has been well formulated recently by Majumdar *et al.* (2005). Estimating the unknown change-point for retrospective data has been approached under both maximum likelihood estimation and Bayesian methods. Recent simulation studies performed by Fotopoulos *et al.* (2010) and Jandhyala *et al.* (2011) for estimating the change-point when changes occur in the mean vector alone or in the mean and/or covariance matrix of a multivariate Gaussian series have shown that the maximum likelihood estimate (MLE) performs marginally better than the Bayesian estimate, in most cases. Asymptotic distribution of the change-point MLE developed in both Fotopoulos *et al.* (2010) and Jandhyala *et al.* (2011) appear to be extendable to spatiotemporal Gaussian Markov random fields when changes occur in the elements of the precision matrix at some unknown time point. However, taking advantage of the main contribution of the present paper, it seems that asymptotic distribution of change-point MLEs can be extended to GFs with Matérn covariance functions. Thus, the results derived by the authors not only are useful for Bayesian hierarchical model fitting but also seem to apply quite well to asymptotic distribution of the MLE in a change-point setting. One of the limitations of the change-point MLE in the treatment of Fotopoulos *et al.* (2010) and Jandhyala *et al.* (2011) is that its asymptotics require the assumption of independence over time. It seems that such an assumption about the MLE should extend to spatiotemporal GFs also, which may not be satisfactory, and this is where the Bayesian approach to change-point structure can be advantageous.

**Mikyoung Jun** (*Texas A&M University, College Station*)

It is my honour to congratulate the authors on this excellent paper. Gaussian random-field models with a Matérn covariance function are popular, useful and commonly used in geostatistical applications, but the computation is problematic with large data sets. This paper provides a computationally efficient way for such a model through the Gaussian Markov random-field framework.

I have a few comments to add. My first comment is regarding the smoothness parameter *ν*. As the authors acknowledge in Section 2.3, the current link between a Gaussian random field with Matérn covariance function and the Gaussian Markov random-field model is only possible for integer values of *α* (and thus integer increments of *ν*), which, in my view, could be a little limited. Given that we cannot distinguish well *ν*-values larger than 3 or so, this gives only two or three possible values of *ν* for the application. I wonder how this might affect the model performance, especially parameter estimation and spatial prediction.

Regarding the above point, how is the estimation of the parameter *ν* done? Is it fixed, or do you fit possible values separately and compare the fit? In Section 4.3, the authors mention that spatial covariance parameters are difficult to interpret individually (I am not sure whether I completely agree with this) and do not provide detail on how each parameter is estimated and what the estimates look like. I am also wondering, if you misspecify *ν*-values, how this might affect the estimation of the rest of the parameters, and the spatial prediction.

I am particularly interested in the development for the processes on a sphere and non-stationary (and non-isotropic) models for the processes on a sphere. One of the limitations of the covariance models in Jun and Stein (2007, 2008) is that they are singular at the poles without some constraints. It appears that the approach suggested here may not suffer from this limitation. Is any constraint required for the model in this paper, to guarantee this? Do you need to have vertices at the poles or is any constraint required on the triangularization near the poles?

My last comment is that the third issue in Section 5 seems to me very interesting. Incorporating external covariates such as wind speed or wind direction in the covariance structure should be useful for environmental applications. However, as the authors point out, it may be difficult to do it in a way that the resulting covariance model is guaranteed to be valid. The method proposed may naturally provide ways to develop such physics-based covariance models.

**Håvard Wahl Kongsgård** (*Peace Research Institute, Oslo*)

I congratulate Lindgren, Rue and Lindström for an excellent paper and for the computational implementation. In conflict studies, microeconomics and related fields, spatial inquiries have become increasingly popular. For these fields, large data sets with multiple variables are often favoured.

However, most scholars nevertheless refrain from utilizing spatial regression as methods are often poorly computationally implemented and computer intensive. Consequently, spatial independence remains a major concern.

With great effect I recently applied the methods implemented in Rinla on a series of large point-based conflict data sets from Vietnam, Afghanistan and lraq (Kongsgaard, 2011). When combined with spatial–temporal visualization, this type of analysis can be of value for tactical risk assessment, in a military or humanitarian capacity. Lindgren, Rue and Lindström's work is especially interesting as the new approaches make it possible to adjust for relative spatial distance.

Although the method introduced is an approximation with limitations, it is easy to use and can be applied in a streamlined fashion. Given rugged low level computational implementation and support for large sample sizes, the new functions will be a good supplement to the Rinla library. It is my opinion that these methods have great potential within fields of application.

**Giovanna Jona Lasinio** (*‘‘Sapienza’’ University of Rome*) **and****Alessio Pollice** (*‘‘Aldo Moro’’ University of Bari*)

First, we congratulate the authors for their extremely interesting work that sheds new light on Gaussian Markov random-field (GMRF) modelling. A connection is established with Matérn Gaussian fields through a weak solution of a linear fractional stochastic partial differential equation (SPDE) driven by a Gaussian white noise process. One of the main theoretical results is stated in theorems 3 and 4, where the authors prove the weak convergence of the finite Hilbert representation of the solution for positive integer values of the parameter *α* to the continuous solution. Recently Bolin and Lindgren (2011a) have characterized a class of non-Gaussian random fields as the solution to a nested SPDE driven by Gaussian white noise. Could the authors comment on the possible further extension of their results to the case of SPDEs driven by more general non-Gaussian laws as stable or Lévy processes, in view of obtaining tools suitable for rates or concentrations, which do not typically follow a normal distribution? When *d*=2, the integer *α* restriction of theorems 3 and 4 implies that also the parameter *ν* is integer in the Matérn class specification. Within this class, the exponential covariance function, one of the most popular in many applied fields, corresponds to the value *ν*=0.5. Is the exponential covariance model definitively excluded by the approach proposed? In the discussion the authors recognize that ‘the approach comes with an implementation and preprocessing cost for setting up the models, as it involves the SPDE solution, triangulations and GMRF representation’. We would be interested in some more comments on the possible qualifications of this cost from an applied statistical point of view. A more detailed comparison of the GMRF solution to competing modelling approaches from a predictive perspective would also be of considerable interest. Here we mainly refer to low rank (Banerjee *et al.*, 2008; Cressie and Johannessson, 2008; Eidsvik *et al.*, 2010; Crainiceanu *et al.*, 2008) and process convolution methods (Higdon, 1998; Higdon *et al.*, 1999) when estimation is carried out with the same technique (either integrated nested Laplace approximation or Markov chain Monte Carlo methods).

Incidentally we think that the geographical interpretation of Fig. 6(b) would benefit from a more detailed description.

**Chihoon Lee** (*Colorado State University, Fort Collins*)

First of all, I congratulate the authors on their interesting and stimulating paper. Although this work is clearly an important contribution to the field of spatial statistics, its impact on the field of stochastic partial differential equations (SPDEs) cannot be overlooked as it ties in the probabilistic and mathematical treatment of SPDEs to spatial statistical modelling and geostatistics.

The authors’ Gaussian Markov random-field construction via the SPDE link offers a natural definition of a Matérm field on manifolds with an intuitive interpretation. By employing a stochastic finite element approximation to the SPDE, practitioners can easily obtain finer resolution around any area of interest on the manifold. It seems promising that the authors’ explicit mapping from the parameters of the Gaussian field model to the elements of a Gaussian Markov random-field precision matrix will yield new approaches to parameter estimation, for example, of a scaling parameter *κ*>0 of the Matérn field, which is of great importance in practice but generally difficult to analyse directly.

More precisely, one can construct an estimator (e.g. from maximum likelihood or the method of moments) based on observations, using a finite element representation of the full solution to the SPDE as presented in expressions (9) and (10). The next step is to verify that indeed converges (in an appropriate sense) to the true parameter *κ* as the finite Hilbert space, spanned by a finite set of basis functions {*ψ*_{1},…,*ψ*_{n}}, approaches the full space. Such parameter estimation procedures circumvent the complexity that stems directly from dealing with the estimation problem of the fractional SPDE (2). Furthermore, this approach will shed light on estimating underlying physical parameters, corresponding to various covariates which could be incorporated into more physics-based SPDEs. For example, in a weather model, meteorological covariates such as wind speed or temperature may be incorporated (e.g. as appropriate drift terms) into underlying SPDEs. This would move beyond the purely statistical treatment of the SPDE (2) by combining theoretical (physical) models and statistical (data-driven) models.

**Wayne T. Lee and Cari G. Kaufman** (*University of California, Berkeley*)

We congratulate the authors on an important theoretical achievement with exciting computational implications. It is true that we tend to set *n*‘a little higher than the value that gives a reasonable computation time’. We expect that ‘reasonable’ will now be redefined according to what is possible by using the authors’ explicit link. For example, we simulated a Matérn-like field of size 750000 (approximately the number of spatial grid boxes for a climate model with 25-km resolution) on a laptop in 8 s. Although a little slow for a long Markov chain Monte Carlo run, it is no longer completely *unreasonable!*

We would like to address the Gaussian Markov random field as an approximation in the context of likelihood-based estimation for geostatistical models. We hope that the authors can clarify the implications of the treatment of the boundary points in this context. Although many will undoubtably use these models in their non-stationary versions, non-stationarity that is an artefact of the boundary is undesirable and may introduce bias in estimating the geostatistical parameters.

We simulated 100 replications of a mean 0 Gaussian field on a 20×20 unit grid, using a Matérn covariance function with *ν*=1 and *σ*^{2}=2, and effective range 10 (*κ*≈0.283). The large effective range highlights the boundary issue, but we think that high auto-correlation is also not uncommon in geostatistical data. We fixed *ν* at its true value and considered three possible estimators for *σ*^{2} and *κ*. The first is the maximum likelihood estimator under the original model. The second maximizes an approximate likelihood that replaces the covariance matrix Σ(*σ*^{2},*κ*) by , where *Q*(*k*) is constructed by using the formulation from Section 2.2.1, with no special treatment of the boundary points, and is the marginal variance for the stochastic partial differential equation solution. The third treats the boundary points by using the embedding method suggested in Section 5.1.4 of Rue and Held (2005) under which we use the true model for a boundary set of thickness *m*=2 and the Gaussian Markov random-field approximation for the conditional distribution of the interior set. The drawback is the computational cost of calculating the boundary model.

Fig. 18 shows estimates of *σ*^{2}*κ*^{2ν}, the consistently estimable parameter under the fixed domain asymptotics (Zhang, 2004). We observe a clear bias in the embedding estimates, whereas the naive approximation estimates have less bias but have high variability. We have not implemented the authors’ approach using the Neumann boundary conditions.

**Bo Li** (*Purdue University, West Lafayette*) **and Marc G. Genton** (*Texas A&M University, College Station*)

In this very stimulating paper, the authors created a new path to deal with the computational challenge caused by large spatial data sets. Whereas most of the previous approaches mainly focused on screening out information that is relatively less important to gain computational efficiency (see Sun *et al.* (2011) for a recent review), this newly proposed method sought an explicit link between some Gaussian fields (GFs) and Gaussian Markov random fields (GMRFs) and thus enabled a direct application of the inherent computational advantage in GMRFs to GFs. The GFs with Matérn covariance structure play a central role in spatial data modelling. Although the GMRF representation is developed only for the GFs with certain values of smoothness, we expect a wide application of this new approach since the smoothness parameter in the Matérn function is nevertheless difficult to estimate precisely. We genuinely appreciate the novelty and practical value of this paper. However, recently emerged data sets are often indexed by locations in both space and time, and many have more than one variable observed. Analyses with those data sets are more challenging owing to the cubic growth of computations in terms of the sample size. North *et al.* (2011) derived Matérn-like space–time correlations from evolving GFs governed by a white-noise-driven damped diffusion equation arising from simple energy balance climate models on the plane and on the sphere. It appears that those results could be used directly to extend the link between GFs and GMRFs to spatiotemporal data. Further extensions to a multivariate context remain open.

The authors gave an example of modelling non-stationary global temperature GFs and then making inference on the temperature process via GMRFs in conjunction with the integrated nested Laplace approximation. This can be very useful for the palaeoclimate community because one popular approach for large-scale palaeoclimate reconstructions is through Bayesian hierarchical models (Li *et al.*, 2010) where it is crucial to identify an appropriate model for the random process of climate variables. Such a model needs to be sufficiently flexible while still keeping the inference computationally feasible. The explicit link developed in this paper combined with the integrated nested Laplace approximation seems a promising direction. Since the proxy data used for the reconstruction carry various types of noise, a nugget effect may need to be considered. Would the approach be directly applicable if nuggets are included in the covariance model? An ambitious goal in palaeoclimate studies is simultaneously to reconstruct the entire space–time process of the temperature and other climate variables. Therefore, it again requires computational efficiency for spatiotemporal and multivariate data. We look forward to seeing further developments on this topic and in the mean time congratulate the authors for their outstanding work!

**Georg Lindgren** (*Lund University*)

I would like to add to the impressive list of applications of the Gaussian Markov random field–stochastic partial differential equation (SPDE) link, namely its use in ocean wave modelling. Traditionally, stochastic wave models have been based on linear Fourier analysis, possibly including low order interactions between the Fourier components. Such models are seen as approximations to the basic hydrodynamical (deterministic) partial differential equations for water waves. These equations have, in themselves, little room for stochastic forces.

One of the common spectra used in ocean modelling is the Pierson–Moskowitz spectrum

The SPDE approach, developed in the paper presented, offers a promising link between the hydrodynamic and Fourier view on random ocean waves. It has recently been shown by David Bolin and Finn Lindgren (see Bolin and Lindgren (2011b) for the general theory and Bolin (2009) and Lindgren *et al.* (2010) for the wave application) that the solutions to a *nested* SPDE,

has a spectral density

with *g* equal to Earth's acceleration. The vector **B**=(*b*_{1},*b*_{2})^{T} determines the main direction *θ*_{0} of the directional spectrum. For large *s*, this is close to the Pierson–Moskowitz wave spectrum and, thus, the SPDE approach could turn out to be a flexible alternative to the Fourier approach.

**K. V. Mardia** (*University of Leeds*)

I found the paper very timely and stimulating. The problem of dealing with large spatial data has a long history and the authors have given a comprehensive way forward. Mardia (2007) has given a historical background to the maximum likelihood methods for spatial data and pointed out that it seems there are still two main communities—one mining practitioners and the other mainstream statisticians.

The Matérn covariance function has now been accepted by statisticians as a reasonable covariance function, and rightly, in general, the smoothing parameter *ν* is taken as a tuning parameter. But there are some scientific problems where the estimation of *ν* is the main interest, for example, when the geovariables are sampled intensively as for a computer-scanned image in petrology or in a high resolution image in remote sensing. Another example is of water head–piezometrics level **u** where the hydraulic gradient map is the main interest (Pardo-Igúzquiza *et al.*, 2008); the deterministic equation which describes the movement of groundwater is a particular case of equation (2).

Pardo-Igúzquiza *et al.* (2008) have given the background to a computer program for the estimation of all the parameters of the Gaussian random field with Matérn covariance function (*κ*,*ν*,*σ* and drift; geometrical anisotropy). The code is available at http://www.iamg.org/CGEditor/index.htm and as far as we are aware the geostatistics community has found this software useful. But the program works for *n* only up to 700. To extend this work for large data sets, in my joint work with Pardo-Igúzquiza, we have used various composite likelihood strategies extending Vecchia (1988) and Pardo-Igúzquiza and Dowd (1997). In particular, we have found unilateral random selection as shown in Fig. 19 useful for lattice data, and the method is extended for irregular data by using random permutation (even *a priori* for lattice data) where *m* is the number of the sites used in conditioning. We obtain the value of *m* through cross-validation but it is normally 5–15 for various large sets (images) we have experimented with. Indeed, the program takes a few minutes for *n* as large as 250000, the total number of operations being proportional to *n*. The recent efficiency results given in Mardia *et al.* (2010) on composite likelihood are also encouraging and relevant. I welcome a comprehensive comparison of the composite likelihood approach and the finite element method that is used in this paper.

**Jorge Mateu** (*University Jaume I, Castellón*)

The authors are to be congratulated on a valuable contribution and a thought-provoking paper. Spatial data are frequently modelled as realizations of random fields. Gaussian fields (GFs) have a dominant role in spatial statistics and form an important building block in modern hierarchical spatial models. A common approach is to model spatial dependences through a covariance function. However, from the computational side, GFs are hampered by *the big n problem*, as the authors outline. In the way to overcome such a problem, rather than applying covariance tapering, we argue the use of compactly supported covariance functions which considerably reduce the computational burden of kriging, and allow the use of computationally efficient sparse matrix techniques, thus becoming a core aspect in spatial prediction when dealing with massive data sets. In addition, by considering a class of convolution-based flexible models, we can generate classes of non-stationary, compactly supported spatial covariance functions (Mateu *et al.*, 2011).

Rather than assuming that the GF is built through a Matérn covariance function, have the authors thought about considering other covariance radial basis functions with some positive definite kernel? Alternative stochastic partial differential equations to equation (2) could be brought into play. In their main result 2, the authors use a particular finite element method. An alternative could be using a Tikhonov regularization scheme with particular adapted functionals, like those shown in Montegranario and Espinosa (2007). This regularization method has the mathematical advantage of using properties of compact operators in Hilbert spaces (Vapnik (l998), chapter 7).

I would like to draw the authors’ attention to the use of basis functions to construct a finite element representation of the solution to the stochastic partial differential equation. First, they use *n* (the number of vertices) functions, and this is basically oversmoothing the solution. An objective criterion following a kind of cross-validation approach (as in functional geostatistics) should be considered. Second, the authors do not explicitly explain which appropriate basis functions could be used. They use harmonic and *B*-splines: why not wavelets or any other smoothing functions? Recall that some of these bases make orthogonality an issue in computation. Functional geostatistics is a field providing huge amounts of data, I would like to ask the authors whether they have ever thought about the possibility of having a GF of functions as in Giraldo *et al.* (2011). Do the authors have any tangible results on this emerging issue?

**Debashis Mondal** (*University of Chicago*)

The first main result of this paper was familiar to me (and to my late colleague Julian Besag) because of our work on the de Wijs process. In particular, note sections 4.2 and 5 of Besag and Mondal (2005) that allude to such results from a frequency domain perspective and from a computational point of view. Consider for example a sequence of Gaussian Markov random fields on sublattices with spacing 1/*m*,*m*=1,2,…, which have individual spectral densities of the form

Here *ν*=1 corresponds to the first-order conditional auto-regressions at increasingly dense sublattices, *ν*=2 suggests Whittle's simultaneous auto-regressions at increasingly dense sublattices and so on. Now assume, as *m*∞, , 4*m*^{2}(1−4*β*_{m})*κ*^{2} and *m*^{ν−1}*σ*_{m}*σ*/2^{ν}. Then *f*_{m}(*ω*,*η*) converges to

from which it follows that the corresponding continuum Gaussian random fields (generalized if *ν*=0,1 or if *ν*>1,*κ*=0) emerge as scaling limits of the Gaussian Markov random fields on regular lattices. One point I make here is that the above explicitly describes the rescaling of parameters needed, particularly when we want to choose a suitable sublattice (e.g. to embed irregular sampling locations into a grid or to approximate irregular regions by unions of grid cells when observations themselves are aggregates over such regions). This is important for both estimation and inference of continuum random fields from the approximate Gaussian Markov random field, but it did not receive much attention in the current paper. In addition, rescaling of parameters would also be required for the triangulation scheme discussed in the second main result of the paper, and it would be interesting to see the effect of numeric convergences as one considers denser triangulations.

Delaunay triangulation provides a way to place a dependence graph on irregularly sampled observations. Then the idea is to use a Gaussian Markov random field on this triangulation to approximate the continuum Gaussian random field. For this, the use of the graph Laplacian becomes important, and one can proceed with alternative calculations. Here I draw attention to the work of Coifman and Maggioni (2004) on diffusion wavelets that allow fast and approximate computations of functionals of inverse Laplacian and related diffusion-like operators on manifolds, graphs and other non-homogeneous media. A study of the strengths and weaknesses of this procedure will be a matter of future research.

**Werner G. Müller and Helmut Waldl** (*Johannes-Kepler-Universität Linz*)

The authors thankfully further strengthen the bridge connecting the somewhat disparate areas of geostatistics and spatial econometrics. We would like to draw attention towards a rather neglected aspect of establishing a link as above, namely the potential effect on the respective optimal sampling designs. We shall illustrate our points on the leukaemia survival example from Section 2.3, utilizing some of the authors’ calculations.

In geostatistics the optimal sampling design is often based on the kriging variance over the region of interest, frequently by minimizing its maximum. Accounting for the additional uncertainty due to estimating covariance parameters Zhu and Stein (2006) and Zimmerman (2006) have employed a modification termed the empirical kriging (EK) criterion by the latter. In spatial econometrics it is common to test tor spatial auto-correlation by specifying a spatial linkage matrix and to utilize an overall type of measure such as Moran's *I*. Therefore Gumprecht *et al*. (2009) have suggested to maximize the power of Moran's *I*-test under a hypothesized spatial lattice process, call it the Moran *I*-power (MIP) criterion.

From the example one sensible design question we could pose is which out of the 24 districts should we sample if we are limited to a number *k*<24 for financial reasons, say *k*=3, which allows for different configurations? Randomly sampling 20 locations from those as designs we can then draw a scatter plot of the values of the above criterion to judge for a potential linkage (Fig. 20(a)). EK reduces to scalar operations localized at *ρ*=0.26, the only free covariance parameter. For the MIP we use the corresponding precision matrix provided by the authors, the linkage matrix assigned 1 to point pairs within the range *ρ*. Although the evident link between the criteria does not extend well into the lower right-hand corner where the optima lie, it looks as if we could achieve reasonably high design efficiencies by substituting the criteria for each other.

Both criteria, however, are computationally quite intensive and it thus makes sense to look for cheaper alternatives. Motivated by traditional connections between estimation- and prediction-based criteria (‘equivalence theory’), Müller and Stehlık (2010) have suggested replacing the EK criterion by a compound criterion for determinants of information matrices with a weighing factor *α*, call it compound *D* optimality, CD. A scatter plot of its values (assuming a constant trend) against the MIPs (Fig. 20(b)) shows high efficiencies on the MIP criterion for the computationally extremely cheap compound-*D*-optimal design.

Summarizing, we believe that we have demonstrated that the relationships between the two linked approaches can go far beyond mere estimation issues.

**Alessandro Ottavi and Daniel Simpson** (*Norwegian University of Science and Technology, Trondheim*)

The Markov property, which is so important in temporal and discrete models, has long been absent in continuous spatial models, and the authors are to be congratulated for reminding us that it exists and is useful. In particular, the connection between stochastic partial differential equations and Gaussian random fields gives a practical method for specifying the *local* properties of a random field. Within this context, we are interested in using local effects to model non-stationarity. In particular, we are interested in non-stationarity induced by local topography, especially in the context of Bayesian smoothing.

Constructing Bayesian smoothers over complicated regions (i.e. regions that are not ) is a difficult problem and most successful efforts have involved, in some way, partial differential operators (Wood *et al*., 2008; Ramsay, 2002). We note in passing that the techniques described in the paper under consideration yield an efficient Bayesian extension of the FELSPLINE method of Ramsay (2002), which does not suffer the large sample size drawbacks of soap film smoothing (Wood *et al*., 2008). In fact, as these models are defined over an *irregular* tessellation, we can represent any hard physical boundary (almost) exactly and we can control either the (deterministic) value of the field on, or the flow through, the boundary by using standard finite element techniques.

Whereas a river, lake or mountain may provide a hard physical boundary, in other situations the constraints may simply impede or discourage movement. Such *soft* constraints are difficult to deal with properly and we propose to model them by locally deforming the physical space into the third dimension (see Sampson and Guttorp (1992)). This deformed space can then be directly tessellated and non-stationary SPDE models can be built on it. A caricature deformation is provided in Fig. 21, which shows the covariance centred at one point. The covariance can be clearly seen to wrap around the deformation in a sensible way.

**Omiros Papaspiliopoulos** (*Universitat Pompeu Fabra, Barcelona*) **and Emilio Porcu** (*University of Castilla la Mancha and University of Göttingen*)

We congratulate the authors for this beautiful paper. We have some suggestions that may be considered by the authors for a more general view of the problem.

- (a)
Main result 1 does not hold when the parameter *ν* is not an integer. A more general setting would allow us to index the fractional Sobolev space associated with the Gaussian random field (GRF) with a Matérn covariance function and thus allow us to study the properties of such a GRF in terms of fractal dimension. Such information seems to be lost with the approach proposed by the authors. In this respect, the tapering approach proposed in Gneiting (2002) seems to be more promising.

- (b)
The Matérn model is very popular also because it is very handy; having a closed form for the related spectrum, it has been widely used, for instance, to apply

Yadrenko's (1983) theory for the equivalence of GRF measures. But there are other covariance functions, such as the generalized Cauchy (

Gneiting and Schlather, 2004) and the Dagum (

Berg *et al*., 2008) functions, which allow the separation of the fractal dimension and the Hurst effect of the associated GRF. This is a significant advantage for statistical estimation, and such a property is not offered by the Matérn covariance, which has light tails. The generalized Cauchy model

- (46)

is positive definite on

for all

, for 0<

*α*2 and 0<

*γ*, whereas the Dagum model admits the expression

- (47)

for which sufficient conditions of positive definiteness on

for all

*d* are

*β**γ*1 and

*β*<1. Recently,

Ruiz-Medina *et al*. (2011) have shown that, for a GRF with a generalized Cauchy or with a Dagum covariance function, we have a local pseudodifferential representation, in the mean-square sense, of the type

- (48)

where −Δ denotes the negative Laplacian operator and

is Gaussian white noise and where

*ρ* is identically equal to (

*d*+

*α*)/4 and (

*d*+

*γ**β*)/4 for the generalized Cauchy and the Dagum functions respectively. It would be interesting to consider whether some type of Gaussian Markov random-field approximation applies to these GRFs.

**Marc Saez** (*University of Gírona and Consortium for Biomedical Research Network in Epidemiology and Public Health, Barcelona*) **and Jorge Mateu** (*University Jaume I, Castellón*)

We first congratulate the authors for this excellent and innovative work, providing an authoritative review of the ‘big *n* problem’. We indeed believe that the paper will become a seminal paper in the context of computational spatial statistics. We focus on comments on some aspects that we believe merit further discussion; in particular on how the stochastic partial differential equation generalizes to non-separable space–time models. Whereas in other extensions (the case of manifolds or non-stationary fields) the authors adequately extend the basic approach, in the non-separable space–time case the explanations given on how to obtain a Gaussian Markov random-field representation are not self-contained. It is not clear to us what the resulting system of coupled temporal stochastic differential equations is, or how the system can be discretized, even though the authors suggest using an Euler method. Can the authors be more specific in this respect? Another question related to this extension is to know which regularity conditions are violated when the noise process is not white in time. There is no doubt that this assumption is too restrictive, and we would like to know whether it could be relaxed in some way.

In relation to the aspects that deserve more discussion, we first would like to refer to the edge effects and boundary conditions and, secondly, to the issues of spatial scale. It is known that, at least in small databases or spatial data with a large spatial scale, edge effects can have an effect on the spatial pattern near the edge. It would be necessary, therefore, to investigate how to incorporate a varying edge in the representation. When considering covariates, it is also important to investigate the optimal spatial scale, i.e. the resolution, to avoid overfitting. The resolution of the spatial effect should not be less than the spatial scale of the covariates because otherwise the spatial effect, and not the covariates, explains the data. Since there is a trade-off between the accuracy of the Gaussian Markov random-field representation and the number of vertices, further research is needed on the optimal number of triangles, i.e. the resolution, in conjunction, if possible, with the presence of varying edge effects.

**Alexandra M. Schmidt** (*Universidade Federal do Rio de Janeiro*)

The idea of approximating continously indexed Gaussian fields through Gaussian Markov random fields is extremely appealing because of the sparseness of the precision structures of Gaussian Markov random fields. Even with current computational power, this is a real gain, especially in the case of large data sets. Moreover, the approach proposed provides a wide class of spatial models, allowing for the construction of Matérn fields on the sphere, and also non-stationary and oscillating covariance structures.

I shall concentrate my comments on the non-isotropic models aspect of the model proposed. Schmidt and O'Hagan (2003) proposed a Bayesian approach to the deformation idea of Sampson and Guttorp (1992). The idea is to assign a Gaussian process prior to the deformation function *f*(·) that maps the original locations in *R*^{2} into the latent space (also in *R*^{2}). The smoothness of the deformation is defined by the smoothness of the covariance function of *f*(·). Schmidt and O'Hagan (2003) used a Kronecker product between a covariance matrix describing the covariance between the axes that define the latent space and the correlation between the monitored locations in the original space. This correlation is modelled through a squared exponential correlation function, which is equivalent to assuming a Matérn correlation function for which *ν*∞. This assumption leads to an infinitely mean-squared differentiable function and a smooth mapping. The idea behind this structure is to obtain a deformation that is not very different from the original configuration but is still sufficiently flexible to capture the non-stationarity in the data. In Section 3.4, the authors obtain a non-stationary stochastic partial differential equation (SPDE) that reproduces the deformation method assuming Matérn covariances. This is done by fixing *α*=2 which is equivalent to *ν*=1, when *d*=2. Therefore, the original Matérn correlation function of the mapping function *f*(·) assumes *f*(·) is only one time mean-square differentiable. This might lead to undesirable folding of the configuration of points in the latent space. The authors also claim that one advantage is that the parameters of the resulting SPDE do not depend directly on the deformation itself. Although this might be an advantage for inference, it is important to recover the posterior distribution of *f*(·) to understand the non-stationarity in the data. More recently, Schmidt *et al.* (2011) extended the mapping onto *R*^{d} for *d*>2, by making use of covariates to define the other dimensions of the latent space. Increasing the dimension of the latent space helps to avoid the foldings mentioned above. They also discuss a simpler version of the model which also makes use of covariates. It is appealing that the SPDE provides another possibility for exploring covariates in the spatial covariance structures.

**Daniel Simpson** (*Norwegian University of Science and Technology, Trondheim*)

I congratulate Lindgren, Rue and Lindström on their outstanding contribution to the spatial statistics literature. In the simplest case, when the field is stationary and isotropic, it is easy to write down the Green function of the partial differential operator and, therefore, to derive the convolution representation of the corresponding Matérn field. When *ν*=*α*−*d*/2 is the smoothness parameter and *κ* is the range parameter, this representation is

where *η*=(*ν*−*d*/2)/2 may be negative. It is tempting to approximate this by a finite sum using the method introduced by Higdon (1998). *This does not work*! The kernel convolution method, besides being less efficient than the method in the paper under discussion, produces posterior fields with noticeable artefacts (Bolin and Lindgren, 2009; Simpson *et al*., 2010).

Of course, the paper considers much more than the simplest case with great success, The finite element method introduced succeeds in these complicated cases because it *directly *approximates the required random function in a stable consistent manner. When considering a bounded domain in , the proof of the convergence result (equation (11) in the paper) boils down to an application of Ceá’s lemma, which says that the error in the Galerkin approximation to an elliptic partial differential equation is bounded above by the error of the best approximation to the solution over the approximation space chosen. The space of piecewise linear functions is extremely well studied and its approximation properties have formed the backbone of finite element theory since the middle of last century.

I am extremely excited that the authors have extended the spatial statistics toolbox to include the finite element method, which is the workhorse of applied mathematics, physics and engineering. My excitement stems mostly from the fact that, given Gaussian data, the mean of the posterior random field can be approximated in *linear* time by using multigrid methods. For non-Gaussian data, the maximum *a posteriori* estimate can be computed efficiently by solving a non-linear partial differential equation (see Hegland (2007) and Griebel and Hegland (2010), who considered this in the context of density estimation). The obvious challenge for the computational statistics community is to compute uncertainty in linear time. If this can be done, it will be possible to solve truly huge problems.

**Alfred Stein** (*Twente University, Enschede*)

This paper is a fine contribution to spatial statistics. We may have been aware that Gaussian Markov random fields and the Gaussian fields are connected, but the paper provides a solid explanation on the basis of mathematical theory. It raises several items for discussion.

- (a)
The choice of Gaussianity is particular and I doubt whether equally elegant results can be obtained for non-Gaussian random fields, in particular for spatial count data. Also, the explicit link exists for two, relatively wide classes of covariance functions: the Matérn class and periodic covariance functions. However, there are more permissible covariance functions, like spherical covariances. It is not immediately obvious how the current procedure can be expanded. Finally, multivariate spatial data are natural expansions of the univariate case presented; for example a bivariate Gaussian random field may have an explicit link with the bivariate Gaussian Markov random field in an almost trivial way, but it would be good to hear the authors say the final word on it.

- (b)
Applying the method apparently requires a triangularization of the area. I cannot believe that it is always as simple and straightforward as in the examples now. Moreover, results may be sensitive to a particular choice. In an image analysis, it is natural to have a square grid of pixel values, whereas traditional geostatistics have data irregularly spread over an area, and a map on a square grid contains a visualized output map as an essential result. It is not entirely clear yet what the effects are of such choices in the final results.

- (c)
This brings me then to my final comment, namely the issue of quality of data. ln the current setting, the quality of data is entirely governed by a choice of the (Gaussian) distribution. For many data that is not enough, as locations may be uncertain, or refer to an aggregated area, whereas in other studies the data themselves are poorly defined. Yet, scientists collect and analyse them, simply because they are the best available to tell a scientific story—they are fit for use. A good example refers to data that are obtained from interpreting a soil layer and that are poorly defined just because the soil classes are poorly defined. Scientists rely on fuzzy approaches here. I wonder how the theory presented could proceed in such circumstances.

As these are all promising aspects for future research, for now I can only compliment the authors on their achievements.

**Paul Switzer** (*Stanford University*)

I have a concern regarding spatial models based on a direct specification of local conditional distributions, although these enjoy substantial computational advantages compared with approaches that use the unconditional Gaussian field (GF) covariance function for local estimation. Nevertheless, the conditional specification is not completely satisfactory, for example because it says nothing about how to construct models in a consistent way for different grid spacings. Your approach to this problem is insightful—estimate parameters of the unconditional covariance by using available data and then find a local conditional model that is approximately consistent with this estimated GF.

If you are starting with a GF then a device used to sidestep the large *n* problem for local inference is to condition arbitrarily only on observations within a neighbourhood by relying on the ‘screen’ effect of the closest data. Data weights for filtering or interpolation are determined by applying the GF specification restricted to the estimation location and a small number of neighbouring observations, via kriging. Whereas kriged maps will typically resemble maps of posterior means, this may not be so for local estimation precision which is more model dependent.

GF parameter estimates should be about the same regardless of the grid mesh that is used to sample a realization of a GF, i.e. the parameter *ν* in the Matérn covariance does not depend on how the GF is sampled. This would seem to imply that the Gaussian Markov random-field representation would have the same order neighbourhood regardless of the grid spacing that is used to sample the GF.

The covariance of an isotropic GF for a two-dimensional field is the same as the one that we would obtain if the two-dimensional field were restricted to a one-dimensional linear transect. So the Matérn parameter *ν* would be the same whether estimated from one-dimensional data or two-dimensional data, although the implied neighbourhood order would be different for *d*=1 and *d*=2.

Finally, I am curious how local estimation is affected by the choice of triangulation that is imposed on the observation domain for irregularly spaced data. Your suggestion, to have smaller triangles where there are more data makes sense but can anything more be said? For *d*=1 how should we think about triangulation for unevenly dense observations?

**Kamil Turkman** (*University of Lisbon*)

I congratulate the authors for this excellent work which will have significant consequences in modelling spatial data.

The objective of the paper is to find ways to treat the data as point referenced and to use a Gaussian field (GF) as the model, and to do the computations as if the data are discretely indexed by using an appropriate Gaussian Markov random-field (GMRF) model which represents the GF in the best possible way. The authors stress the word representation rather than approximation, perhaps indicating that this GMRF, although it is the best representation, may not approximate GFs to a desired level.

To switch from one model to another in this manner during the different phases of the modelling, we must assume that the likelihoods under the GMRF and GF models do not differ or deviate much. Indeed, this point was clearly expressed in Rue and Held (2005), where the GMRF representation was chosen by minimizing a metric of discrepancy of the form

where Σ_{1} and Σ_{0} denote respectively the covariance matrices of the GF and GMRF over the observed data locations, and *r*_{1}(*ij*) and *r*_{0}(*ij*) are respectively the elements of Σ_{1} and Σ_{0}. The weights are chosen as being inversely proportional to distances. In contrast, the normal comparison lemma and its refinements (Piterbarg, 1996) suggest that the metric of discrepancy should be based on

since

- (49)

where *F*_{1}(*x*) and *F*_{2}(*x*) are the density functions of the vectors *X*_{1} and *X*_{0} corresponding to GF and GMRF processes at the data locations, *X*_{h}=*X*_{1}√(1−*h*)+*X*_{0}√*h* for some, *h* ∈ (0,1),*φ*_{h}(*x*_{i},*x*_{j}) is the joint density function of *X*_{hi} and *X*_{hj} and *F*_{h}(*x*|·) is the conditional distribution function of *X*_{h}.

From this equality, a sharp inequality can be found:

In this paper, the optimal GMRF representation for a specific class of GFs and for a given irregular grid configuration is obtained in a different norm. However, it is not clear what sort of discrepancy this representation induces on the covariances, and hence on the likelihoods under GF and GMRF models and its consequence on the inference. The worrying point is that, in accordance with the normal comparison lemma, the total error of approximation is additive in the L1 covariance errors.

**Christopher K. Wikle** (*University of Missouri, Columbia*) **and****Mevin B. Hooten** (*Utah State University, Logan*)

First, we congratulate the authors on another important contribution in what is quickly becoming a renaissance on approximate Bayesian methods. Given that our interest in correlated random fields has been primarily from a continuous spatial perspective rather than a discrete one, this latest paper is quite intriguing. The general concept of thinking about common forms of dependence as a solution to a spectrally defined stochastic partial differential equation (SPDE) is clever and similar in spirit to other general ideas we have been fond of in the past.

As we have spent a considerable amount of time thinking about dynamic spatiotemporal models and the origin of spatial processes, we see this paper as a potentially very valuable contribution. Indeed, the notion of discretizing SPDEs to form the basis of Markovian statistical models from both the spectral (e.g. Wikle *et al.* (2001)) and physical space (e.g. Wikle (2003), Wikle and Hooten (2006, 2010) and Hooten and Wikle (2007)) perspectives has been a primary focus of our own research. A heuristic summary of the relationship between such approximations and their theoretical counterparts can be found in Cressie and Wikle (2011). A key component of these presentations is the full exploitation of the hierarchical framework that allows us to place dependent random processes on the parameters that control the dynamical interactions (i.e. the parameters in the SPDE), for which no analytical solution exists. We have focused on building models true to the aetiology of the underlying processes—which is, more often than not, in the (Markovian) dynamical evolution. Furthermore, it is important to remember that real world processes are non-linear. The development of analytical covariance functions for the governing SPDEs of such processes are typically intractable, yet the motivating Markovian models suggested by the discretization (either in physical or spectral space) are quite tractable and flexible (for example, see Hooten and Wikle (2007)). Wikle and Hooten (2010) discussed a general form for ‘quadratic interaction models’ and showed the connection to various classes of spatiotemporal PDEs. Wikle and Holan (2011) extend this to higher order interactions.

From a computational perspective, we have found that the algorithms of Rue and Martino (2009) are very computationally efficient when a solution can be found. Thus, for very large and well-behaved continuous physical systems where sufficient data exist and correlation structures are smooth, the approach outlined by Lindgren and his colleagues may be quite valuable for computational reasons alone.

The **authors** replied later, in writing, as follows.

We are delighted by the deluge of insightful comments, the details of which we can only begin to answer here. We have grouped our responses into a few common themes, mentioning commentators’ names only when referring to specific issues.

##### Triangulations

As pointed out by Cooley, Hoeting and Brown, it is not necessary to place the triangulation vertices at observation points. Indeed, the *observation matrix* in the global temperature example in Section 4.2 was introduced for this very reason. For a given triangulation, the matrix can be used to extract any observable linear combination of field values, allowing observations in arbitrary locations as well as regional averages. For point observations, each row of the matrix contains three non-zero elements, one for each corner of the triangle containing the point, and the sparsity structure of the posterior precision matrix is unaffected. There is also no requirement to use a regular grid for such models. The triangulation implementation in Rinla has a parameter for a minimum required distance between data-located vertices. In the temperature example this was set to 10 km, allowing the vertex placement to follow the data density, without generating excessively small triangles. An example where the triangulation was chosen completely independently of the data locations is given by Bolin and Lindgren (2011b). Crujeiras and Prieto add that the triangulation could be chosen adaptively on the basis of local approximation error estimates, and we agree that this can potentially be useful for non-stationary anisotropic models.

##### Approximation properties

Ippoliti, Martin and Bhansali note that the stochastic partial differential equation (SPDE)–Gaussian Markov random-field (GMRF) precision coefficients do not match those obtained by inverting the covariance matrix of a field sampled on a regular grid. However, the approximation is not aiming to approximate the field only at the vertices of the triangulation, but also at the intermediate points obtained via linear interpolation. If the same interpolation method is used with the sampled covariances, the overall field covariances are underestimated, whereas the SPDE–GMRF approach gives an overall closer approximation. The upper and lower envelopes of all the pairwise covariances for the two settings are shown in Fig. 22, together with the target covariance function. Bolin and Lindgren (2009) investigated how this effect influences the kriging results, comparing with tapering and kernel convolutions, as well as alternative choices of basis functions in the GMRF construction, including wavelets and *B*-splines.

##### Kernel methods

Kernel convolution methods, as mentioned by Furrer, Furrer and Nychka, are useful theoretical tools, but can in practice be cumbersome and computationally intensive (Bolin and Lindgren, 2009). The kernel generated by the SPDE operator for a Matérn field takes the shape of another Matérn covariance with different shape parameter. The kernels are singular for *α**d*, and non-differentiable for *α**d*+2, so the commonly used discrete kernel sums result in kriging and parameter estimation artefacts and do not yield either the correct pointwise distribution or correct distributions for regional averages, unless the range is large compared with the kernel placement distances. Also, since the kernels have non-compact support, they yield dense matrices for the posterior precisions. Using compactly supported kernels as suggested by Mateu is similar to moving average processes in time series analysis and is problematic unless the data are densely and evenly located on the domain of interest, whereas the GMRF models are counterparts to auto-regressive models, that are much more flexible tools for approximating general dependence structures.

##### Parameter estimation

Both when estimating parameters and calculating kriging interpolations, the results are influenced by the choice of boundary conditions. The easiest method for avoiding these effects is to extend the triangulation beyond the study region by an amount that is sufficiently large to cover the correlation range of the field, since this allows the boundary effects to drop off to virtually zero before reaching the data locations that influence the likelihood. As seen in Fig. 23(a), this eliminates the bias in the maximum likelihood estimates of the field variances *σ*^{2}. The bias for the rescaling parameter *τ*^{−2} is reduced when the triangulation resolution is increased, as seen in Fig. 23(b). The parameter *σ*^{2}*κ*^{2ν} that was used in the comparison by Lee and Kaufman is equivalent to *τ*^{−2} in the SPDE models used in the paper. We believe that the highly variable results in their comparison is explained by noting that the precision matrix was chosen by simply deleting rows and columns, which is equivalent to conditioning, in this case leading to approximate Dirichlet boundary conditions. To compensate for the resulting small variances near the boundary of such a model, the variance parameter needs to be greatly overestimated. The combined comparisons show that Neumann boundaries are safer, and that extending the boundary further reduces the bias.

##### Boundaries

In the paper, we used Neumann boundary conditions for simplicity and ease of characterization of the properties of the resulting models. For intrinsic models, these conditions are too restrictive and need to be relaxed to achieve the desired field properties. The key lies in how the boundary conditions affect the null space of the differential operator. The *B*-matrix that is used in Appendix C.3 relaxes the constraints on the null space, leading to intrinsic models. Normally, the rank deficiency of the precision matrix is used to determine the order of intrinsicness. However, since only some of the eigenfunctions of the Laplacian can be represented exactly in the piecewise linear basis, the rank deficiency does not tell the whole story. For a more complete picture, the continuous domain problem needs to be analysed more carefully.

A common alternative to Neumann conditions is to let , which is easily accomplished for regular grids by replacing the two-dimensional grid Laplacian with the one-dimensional Laplacian along the boundaries. For a unit square domain, the resulting null space is spanned by the four functions 1,*u*_{1},*u*_{2} and *u*_{1}*u*_{2} but the rank deficiency is only 3, since the last function is not piecewise linear. Although this gives approximately the standard polynomially intrinsic models on , the construction also hints at a more general method for more general SPDE models currently under investigation. The idea is to start with a fully intrinsic precision for the interior of the domain and to add appropriate penalties generated by SPDE models within the boundary. For one-dimensional models, this eliminates the boundary effects entirely, and the higher dimension cases show promising results.

##### Model checking

We have not yet investigated the model checking issue mentioned by Gelman and Møller, since this is a general problem for spatial models, and not specific to the SPDE–GMRF approach. However, there appear to be opportunities for using the GMRF *increments* in similar ways to residual analysis for time series models, and also using the close link to the continuous space SPDE formulation itself when interpreting the results.

##### Priors

Choosing priors for the model parameters is a general issue for spatial models, but the handling of the boundary in the SPDE formulation may present further complications. When the correlation range is longer than the size of the domain, estimating *κ* becomes very difficult. In such situations, the intrinsic models (*κ*=0) can be used, reducing the set of parameters to an overall scaling factor. This also handles applications with only a single realization of the random field, where it is impossible to separate a long correlation range from a fixed spatial trend, and the posterior distribution for *κ* typically becomes degenerate, requiring a careful choice of prior. A heuristic approach when not using intrinsic models is to specify a prior for *κ* that puts low probability for range longer than the diameter of the domain. In the temperature example, we used independent Gaussian priors of that type for the weights for the basis functions controlling log (*τ*) and log (*κ*^{2}). We are currently extending the temperature example into a full analysis, where the prior for all the SPDE parameters is constructed jointly, giving more control over the behaviour.

##### Simultaneous auto-regressions

Kent astutely noted the connection to simultaneous auto-regressions that, for even integer values of *α*, provides another direct link between Markov models and SPDEs; the GMRF construction in the paper also takes this form. Using the notation from theorem 2 in Appendix C.2, for *α*=2 we have , where is the diagonal matrix from Appendix C.5. In our early experimentation, we approached the GMRF construction problem by various attempts at modifying the graph Laplacian mentioned by Mondal. In hindsight, the current approach that builds more directly on the continuous domain Laplacian feels more natural to us when the goal is to build spatially consistent Markov models. The results do resemble the graph Laplacian but, as the SPDE models are extended to non-stationary models and fields on manifolds, the graph becomes less useful as such and is purely a computational device. This becomes particularly clear when extending the methods to fractional SPDEs.

##### Fractional operators

Although the results as presented in the paper are limited to integer values for *α* in the SPDE-generating Matérn field, the GMRF construction can be extended into a more general class of continuous domain Markov models, which contains close approximators of Matérn models with fractional *α*. The result from Rozanov (1977) that was mentioncd in Section 2.3 means that a stationary random field is a Markov field if and only if the spectral density is the reciprocal of a polynomial. In the isotropic case, such spectra take the form

where *p* is the degree of the polynomial and *b*_{i} are coefficients in a strictly positive polynomial, and the corresponding discretized precision matrix can be obtained as

We need to find coefficients *b*_{i} so that the model that is defined by is an appropriate approximation of a model defined by the Matérn spectrum *S*(**k**)=(2*π*)^{−d}(*κ*^{2}+‖*k*‖^{2})^{−α}. A sensible choice is to let *p*=⌈*α*⌉, and we use a convenient weighting function *ω*(**k**) for the deviation between the spectra, such that

Taking derivatives with respect to all *b*_{i} and evaluating the integrals, we obtain a linear system of equations that can be solved easily. For ⌈*α*⌉=1 and ⌈*α*⌉=2, the coefficients are given through

respectively.

The limiting case *λ*∞ is equivalent to Taylor approximation at **k**=**0** and gives (*b*_{0},*b*_{1})=*κ*^{2α−2}(*κ*^{2},*α*) for ⌈*α*⌉=1 and (*b*_{0},*b*_{1},*b*_{2})=*κ*^{2α−4}{*κ*^{4},*α**κ*^{2},*α*(*α*−1)/2} for ⌈*α*⌉=2. These limiting approximations provide good agreement for integrals of the field over regions but, for better behaviour of the short distance covariances, *λ* needs to be chosen more carefully. For a given measure of deviation between the desired and approximate covariance functions, the optimal *λ* can be determined numerically, as a function of *α*. For fractional *α* between 0 and 2 on , the parsimonious choice *λ*=*α*−⌈*α*⌉ approximately minimizes the maximal absolute difference between the covariance functions. As noted by Cooley and Hoeting, *α* is in practice often chosen only from the integers and half-integers, and we obtain (*b*_{0},*b*_{1})=*κ*^{−1}(3*κ*^{2}/4,3/8) for and (*b*_{0},*b*_{1},*b*_{2})=*κ*^{−1}(15*κ*^{4}/16,15*κ*^{2}/8,15/128) for . Combined with the recursive construction for *α*>2, this provides GMRF approximations for all positive integers and half-integers. This includes the exponential covariance in , which corresponds to . The resulting covariance is shown in Fig. 24, together with the covariance from the spectral Taylor approximation. Further investigations are needed to determine how well the measurement noise model can incorporate the resulting deviation in small scale variation that is introduced by the approximation. Also shown in Fig. 24 is the covariance for a model with *α*=2, showing the same qualitative behaviour at zero, but different mid-range behaviour.

##### Long-range dependence

As discussed by Bhattacharya *et al*. (1983), apparent long-range dependence in data cannot be distinguished from a non-stationary mean or trend. An alternative to constructing covariance functions with such behaviour is therefore to use a two-stage model, where the local behaviour is treated separately from the long-range behaviour. In practical situations, spatially varying basis functions are often used to capture large-scale variations, leaving the rest for a spatial field component. This can easily be extended to allow the basis weights to differ between realization of the field, in effect increasing the spectral density near zero. For identifiability reasons, intrinsic models can be preferable, but alternatives such as conditioning on zero integral for the field can also be used, and are implemented in Rinla. As suggested by Fearnhead, another even more general approach is to model the observed field as the sum of several latent fields with different range. Care must be taken to handle the near non-identifiability of such models, noting that each individual latent component may be of less interest than their sum.

In some cases, these approaches can be motivated by considering the physical interpretation of the observed system, where long-range dependence may appear owing to an unobserved latent physical process, e.g. deep water processes with long-term memory affecting surface water processes that interact more rapidly with external forces. The rational spectra that are generated by the nested SPDE approach due to Bolin and Lindgren (2011b) allows more general models even with just a single latent process. Although not Markov as such, they are Markov on an augmented state space, leading to almost the same computational efficiency as the pure Markov models.

##### Deformations

When using the non-stationary SPDE reparameterization of the deformation method, all distances are interpreted within a fixed topology, and the issue of folding is transformed into requiring strictly positive definite diffusion tensors. By parameterizing with scalar and vector fields, the estimated parameter fields can be used to interpret and understand the non-stationarity. A simple example is shown by Ottavi and Simpson. Furthermore, this yields a larger practical class of models, since the parameters need no longer correspond to a simple deformation. When using a Matérn process prior in a traditional deformation model, Schmidt rightly points out that fixing *α*=2 for the deformation field would give undesirable foldings due to insufficient differentiability. Increasing *α* should alleviate the problem to the same extent as any other choice of more differentiable deformation field model would do. Similarly the parameters in the non-stationary SPDE models can be constructed via general Gaussian fields, but direct comparison with the more general deformation methods is difficult, since the models would need to be constructed on the embedding space, whereas the SPDE as used in this paper is defined on the manifold itself, regardless of any embedding.

##### General extensions

It is important to note that the GMRF models can be combined in hierarchical modelling frameworks to allow highly non-Gaussian observation processes. Log-Gaussian Cox processes are of particular note, as mentioned by Diggle, Illian, Simpson, Møller and Höhle. The likelihood can be rewritten in a form that allows the use of the integrated nested Laplace approximatons method for inference, and as Diggle notes one can choose freely between gridded count data and using the actual point data themselves.

Functional data can be treated either directly in the general observation model, or by incorporating desired basis functions into the finite element basis itself, leading to block matrices in the precisions. In a setting with a local set of temporal basis functions for each spatial triangulation vertex *k*, the resulting elements of the joint **K**-matrix take the form

for a given spatial differential operator , and similarly for the other matrices in the precision construction.

For more general spatiotemporal SPDE models, we agree with Crujeiras and Prieto that a finite volume approach is preferable to finite elements, and Fuglstad presents an example of such a solution. The diagonal approximation to the **C**-matrix is precisely what a simple finite volume method would produce in the purely spatial case, lending further weight to the appropriateness of the approximation.