## Introduction

Ecological processes take place in space, and many ecological data sets are collected in space. As a result, there is a growing interest in spatial statistical methods and spatial statistical modelling (Beale *et al*. 2010). In general, the aim of a spatial analysis is to either (a) account for spatial autocorrelation or to (b) explicitly models of type (a) spatial patterning. To be more specific, models of type a are models of some response variable in which spatial autocorrelation structures form part of the explanatory part of the model (Diggle & Ribeiro 2007). Examples exist of such spatial models both for data collected in continuous space or on a spatial lattice. On the contrary, when models of type (b) are considered, the interest is in analysing the spatial patterns formed by individuals as these can be used to characterize population dynamics and to determine the nature of the underlying processes leading to those dynamics (Law *et al*. 2009). In these models, the pattern itself, or rather its structure, is the response variable. These are typically treated within the context of spatial point process theory (Diggle 2003; Wiegand & Moloney 2004; Wiegand *et al*. 2007; Illian *et al*. 2012).

These two types of model reflect different aspects of ecological systems, so in many cases one would ideally want to consider both to obtain a better and more nuanced understanding of a system. For example, the data set discussed in this article provides information on both short-term survival (as reflected in the health status of the individuals of the species *Thymus carnosus*) and long-term survival (as reflected in the spatial pattern formed by these individuals). Rather than modelling these two non-independent aspects of the system in separate models, we illustrate how both can be treated within a single joint (or integrated) model. This combines the information contained in the data that would be used for the two separate models and reduces variability by assuming a shared spatial structure informed by more data (Brooks, King & Morgan 2004; King *et al*. 2009; Reynolds *et al*. 2009). An integrated analysis such as the joint model discussed here is often used to increase the precision of parameter estimates as information may be ‘borrowed’ across different data sets. Within statistical ecology, these joint models are becoming increasingly common (King *et al*. 2009).

Unfortunately, spatial models are computationally challenging, in particular in the context of realistically complex data sets as incorporating spatial correlation structure dramatically increases the complexity of a model. A joint model adds complexity and provides an even greater computational challenge; existing software such as spatstat (Baddeley & Turner 2005), which is used for fitting simple point process models cannot handle these models. Thus, in this article our aim was twofold; in addition to discussing the spatial statistical methodology that allows us to consider such a joint model, we also explain how this model can be fitted in a computationally feasible way. In particular, we introduce the ecological community to recent statistical developments based on integrated nested Laplace approximation (INLA; Rue, Martino & Chopin 2009) that substantially reduce the computational cost of fitting spatial models.

In this contribution, we introduce INLA and explain how it can be used in the analysis of a complex spatial model. We also highlight the potential for INLA and joint spatial modelling to be used in conjunction to analyse a wide range of spatial data. We provide the code for readers to work through the case study example themselves within the R package R-INLA. Finally, we make some suggestions for how this approach can be used to analyse further data sets, highlight some existing limitations in the statistical methods and argue that this is a field where there is substantial mutual benefit for ecologists and statisticians to work in close collaboration.

### Analysing a complex spatial data set

This article has been motivated by a data set that details the exact locations of *T. carnosus* plants in a dune system in South West Spain along with the health status of each of these plants as well as environmental covariates that may potentially impact on the conservation of the plants. The exact details for this data set are discussed in the 'Application' section. The data were collected with the aim of revealing which factors determine the short-term health status of plants and the longer term spatial distribution of individuals. The health status of a plant reflects the degree to which local environmental conditions facilitated survival following a recent drought. Spatial heterogeneity in long-term fitness, on the other hand, is reflected in plant density in space.

Technically, when modelling the locations of the *T. carnosus* plants, we are modelling a spatial point pattern. Spatial point pattern analysis using summary characteristics such as Ripley's *K*-function has become increasingly used in ecology (Wiegand & Moloney 2004; Wiegand *et al*. 2007; Perry *et al*. 2008; Schiffers *et al*. 2008; Law *et al*. 2009; Martínez *et al*. 2010; Wang *et al*. 2010; Zhang *et al*. 2010; Brown *et al*. 2011). Empirical spatial point patterns may also be described by theoretical statistical models, spatial point *processes*, through the estimation and interpretation of model parameters based on samples, i.e. spatial point *patterns*. However, these models have been used much less often than summary characteristics (Neeff *et al*. 2005; Cornulier & Bretagnolle 2006; Wiegand *et al*. 2007; Lin *et al*. 2011). This is due to the fact that most ecological data sets are more complex than can be readily dealt with using classical statistical methods.

As we are also considering the health status (a ‘mark’) along with the pattern, this yields what is referred to as a ‘marked point pattern’. As it is likely that the health status is not independent of the local spatial structure, a suitable statistical model is a marked point process model, where the marks are assumed to depend on the spatial pattern through a shared spatial effect (for more information on marked point processes, see the Appendix). In the past, models where the marks depend on the pattern have rarely been considered, mainly due to computational costs (Møller & Waagepetersen 2007). This has severely constrained the analysis of many rich, spatial ecological data sets (Illian *et al*. 2012; Illian, Sørbye & Rue 2012). We here jointly model the marks and the pattern to account for the dependence, using a specific type of spatial point process models, a *Cox process*. While the health status marks are categorical marks in the specific example, a very similar approach could be used to model continuously valued marks such as plant height or age (Illian, Sørbye & Rue 2012).

### A class of spatial point process models – Cox processes

Within the spatial point process toolbox, Cox processes represent a very flexible class of spatial point processes designed to model spatial point pattern data in the presence of observed and unobserved environmental variation (Møller, Syversveen & Waagepetersen 1998; Møller & Waagepetersen 2007). In Cox process models, spatial variation and autocorrelation are expressed through a random structure that is continuous in space. It is based on an underlying (or latent) random field *Λ*(·) that describes the intensity (=point density) of the point pattern, assuming independence among the points given this field. In other words, conditional on the random field, the point pattern may be described by the statistical model for complete spatial randomness, the Poisson process (Illian *et al*. 2008; Law *et al*. 2009). Due to the random field, Cox process models have a hierarchical structure making these processes particularly flexible as the field can be modelled in many ways. We exploit this here and focus on *log-Gaussian Cox* processes, as considered in Møller, Syversveen & Waagepetersen (1998) and Møller & Waagepetersen (2004, 2007). These belong to a specific subclass, where *Λ*(*s*) has the form

Here, {*Z*(*s*)} is a Gaussian random field, , i.e. for any location *s*_{1}, …, *s*_{l} the vector *Z*(*s*_{1}), …, *Z*(*s*_{l}) follows a multivariate normal distribution. The exponential avoids negative values for *Λ*(*s*).

The practical fitting of Cox point process models to point pattern data is difficult due to intractable likelihoods. Fitting even simple Cox processes has typically used MCMC methodology, and has been extremely computationally expensive (Møller & Waagepetersen 2007) as well as largely inaccessible to non-specialists. Within the statistical literature rather simplistic models have been fitted that typically only consider a spatial pattern without marks. However, Illian & Rue (2010) and Illian, Sørbye & Rue (2012) have developed an approach that facilitates the fitting of realistically complex Cox process models based on INLA (Rue, Martino & Chopin 2009; see the Appendix for technical details). They provide a toolbox that enables non-specialists to develop and fit complex models to data using coding routines within the familiar software package R based on the library R-INLA. In particular, based on this approach, we can model marked point pattern data without an assumption of independence of the pattern and the marks (Ho & Stoyan 2008; Myllymäki & Penttinen 2009). This is achieved through fitting a joint model to both the pattern and the marks in which the dependence is accounted for by a shared spatial effect that is contained both in the explanatory part of the random field *Λ*(·) and the model for the marks.

### INLA in a nutshell

Conveniently, a new computationally efficient method for fitting a wide range of complex models has been developed. This method, called INLA (Rue, Martino & Chopin 2009), opens the possibility to analyse increasingly complex ecological data such as those we consider here. In general, INLA may be used to fit a large class of statistical models, the very flexible class of latent Gaussian models (details in the Appendix), in a Bayesian context. An underlying stochastic structure (called a ‘latent’ field) is contained in these models to account for temporal or spatial autocorrelation; given the latent field, the observations are assumed to be independent. Cox processes are an example of this class of models.

INLA is computationally efficient because it uses an approximation approach based on clever Laplace approximations rather than simulations (MCMC). It is designed to fit latent Gaussian models in which spatial autocorrelation in the latent field is reflected by a Gauss Markov random field (GMRF) (Rue & Held 2005). This is a spatially discrete stochastic process in which spatial dependence is restricted to suitably specified spatial neighbours, again increasing efficiency. INLA is much faster than MCMC and at the same time flexible and very accurate (Rue, Martino & Chopin 2009). We provide technical details for INLA in the Appendix.

Here, by way of example, we use INLA to fit a joint model to a spatial pattern and the marks derived from a study system on a protected plant species using different likelihoods for the pattern and the marks. INLA enables us to fit this model and, because it is fast, we can also employ model comparison methods to identify the best model out of a set of models based on the deviance information criterion (DIC) within reasonable time such as a few minutes.