• Open Access

Fitting complex ecological point process models with integrated nested Laplace approximation


Correspondence author. E-mail: janine@mcs.st-and.ac.uk


  1. We highlight an emerging statistical method, integrated nested Laplace approximation (INLA), which is ideally suited for fitting complex models to many of the rich spatial data sets that ecologists wish to analyse.
  2. INLA is an approximation method that nevertheless provides very exact estimates. In this article, we describe the INLA methodology highlighting where it offers opportunities for drawing inference from (spatial) ecological data that would previously have been too complex to make practical model fitting feasible.
  3. We use INLA to fit a complex joint model to the spatial pattern formed by a plant species, Thymus carnosus, as well as to the health status of each individual.
  4. The key ecological result revealed by our spatial analysis of these data, relates to the distance-to-water covariate. We find that T. carnosus plants are generally healthier when they are further away from the water.
  5. We suggest that this may be the result of a combination of (1) plants having alternative rooting strategies depending on how close to water they grow and (2) the rooting strategy determining how well the plants were able to tolerate an unusually dry summer.
  6. We anticipate INLA becoming widely used within spatial ecological analysis over the next decade and suggest that both ecologists and statisticians will benefit greatly from working collaboratively to further develop and apply these emerging statistical methods.


Ecological processes take place in space, and many ecological data sets are collected in space. As a result, there is a growing interest in spatial statistical methods and spatial statistical modelling (Beale et al. 2010). In general, the aim of a spatial analysis is to either (a) account for spatial autocorrelation or to (b) explicitly models of type (a) spatial patterning. To be more specific, models of type a are models of some response variable in which spatial autocorrelation structures form part of the explanatory part of the model (Diggle & Ribeiro 2007). Examples exist of such spatial models both for data collected in continuous space or on a spatial lattice. On the contrary, when models of type (b) are considered, the interest is in analysing the spatial patterns formed by individuals as these can be used to characterize population dynamics and to determine the nature of the underlying processes leading to those dynamics (Law et al. 2009). In these models, the pattern itself, or rather its structure, is the response variable. These are typically treated within the context of spatial point process theory (Diggle 2003; Wiegand & Moloney 2004; Wiegand et al. 2007; Illian et al. 2012).

These two types of model reflect different aspects of ecological systems, so in many cases one would ideally want to consider both to obtain a better and more nuanced understanding of a system. For example, the data set discussed in this article provides information on both short-term survival (as reflected in the health status of the individuals of the species Thymus carnosus) and long-term survival (as reflected in the spatial pattern formed by these individuals). Rather than modelling these two non-independent aspects of the system in separate models, we illustrate how both can be treated within a single joint (or integrated) model. This combines the information contained in the data that would be used for the two separate models and reduces variability by assuming a shared spatial structure informed by more data (Brooks, King & Morgan 2004; King et al. 2009; Reynolds et al. 2009). An integrated analysis such as the joint model discussed here is often used to increase the precision of parameter estimates as information may be ‘borrowed’ across different data sets. Within statistical ecology, these joint models are becoming increasingly common (King et al. 2009).

Unfortunately, spatial models are computationally challenging, in particular in the context of realistically complex data sets as incorporating spatial correlation structure dramatically increases the complexity of a model. A joint model adds complexity and provides an even greater computational challenge; existing software such as spatstat (Baddeley & Turner 2005), which is used for fitting simple point process models cannot handle these models. Thus, in this article our aim was twofold; in addition to discussing the spatial statistical methodology that allows us to consider such a joint model, we also explain how this model can be fitted in a computationally feasible way. In particular, we introduce the ecological community to recent statistical developments based on integrated nested Laplace approximation (INLA; Rue, Martino & Chopin 2009) that substantially reduce the computational cost of fitting spatial models.

In this contribution, we introduce INLA and explain how it can be used in the analysis of a complex spatial model. We also highlight the potential for INLA and joint spatial modelling to be used in conjunction to analyse a wide range of spatial data. We provide the code for readers to work through the case study example themselves within the R package R-INLA. Finally, we make some suggestions for how this approach can be used to analyse further data sets, highlight some existing limitations in the statistical methods and argue that this is a field where there is substantial mutual benefit for ecologists and statisticians to work in close collaboration.

Analysing a complex spatial data set

This article has been motivated by a data set that details the exact locations of T. carnosus plants in a dune system in South West Spain along with the health status of each of these plants as well as environmental covariates that may potentially impact on the conservation of the plants. The exact details for this data set are discussed in the 'Application' section. The data were collected with the aim of revealing which factors determine the short-term health status of plants and the longer term spatial distribution of individuals. The health status of a plant reflects the degree to which local environmental conditions facilitated survival following a recent drought. Spatial heterogeneity in long-term fitness, on the other hand, is reflected in plant density in space.

Technically, when modelling the locations of the T. carnosus plants, we are modelling a spatial point pattern. Spatial point pattern analysis using summary characteristics such as Ripley's K-function has become increasingly used in ecology (Wiegand & Moloney 2004; Wiegand et al. 2007; Perry et al. 2008; Schiffers et al. 2008; Law et al. 2009; Martínez et al. 2010; Wang et al. 2010; Zhang et al. 2010; Brown et al. 2011). Empirical spatial point patterns may also be described by theoretical statistical models, spatial point processes, through the estimation and interpretation of model parameters based on samples, i.e. spatial point patterns. However, these models have been used much less often than summary characteristics (Neeff et al. 2005; Cornulier & Bretagnolle 2006; Wiegand et al. 2007; Lin et al. 2011). This is due to the fact that most ecological data sets are more complex than can be readily dealt with using classical statistical methods.

As we are also considering the health status (a ‘mark’) along with the pattern, this yields what is referred to as a ‘marked point pattern’. As it is likely that the health status is not independent of the local spatial structure, a suitable statistical model is a marked point process model, where the marks are assumed to depend on the spatial pattern through a shared spatial effect (for more information on marked point processes, see the Appendix). In the past, models where the marks depend on the pattern have rarely been considered, mainly due to computational costs (Møller & Waagepetersen 2007). This has severely constrained the analysis of many rich, spatial ecological data sets (Illian et al. 2012; Illian, Sørbye & Rue 2012). We here jointly model the marks and the pattern to account for the dependence, using a specific type of spatial point process models, a Cox process. While the health status marks are categorical marks in the specific example, a very similar approach could be used to model continuously valued marks such as plant height or age (Illian, Sørbye & Rue 2012).

A class of spatial point process models – Cox processes

Within the spatial point process toolbox, Cox processes represent a very flexible class of spatial point processes designed to model spatial point pattern data in the presence of observed and unobserved environmental variation (Møller, Syversveen & Waagepetersen 1998; Møller & Waagepetersen 2007). In Cox process models, spatial variation and autocorrelation are expressed through a random structure that is continuous in space. It is based on an underlying (or latent) random field Λ(·) that describes the intensity (=point density) of the point pattern, assuming independence among the points given this field. In other words, conditional on the random field, the point pattern may be described by the statistical model for complete spatial randomness, the Poisson process (Illian et al. 2008; Law et al. 2009). Due to the random field, Cox process models have a hierarchical structure making these processes particularly flexible as the field can be modelled in many ways. We exploit this here and focus on log-Gaussian Cox processes, as considered in Møller, Syversveen & Waagepetersen (1998) and Møller & Waagepetersen (2004, 2007). These belong to a specific subclass, where Λ(s) has the form

display math

Here, {Z(s)} is a Gaussian random field, math formula, i.e. for any location s1, …, sl the vector Z(s1), …, Z(sl) follows a multivariate normal distribution. The exponential avoids negative values for Λ(s).

The practical fitting of Cox point process models to point pattern data is difficult due to intractable likelihoods. Fitting even simple Cox processes has typically used MCMC methodology, and has been extremely computationally expensive (Møller & Waagepetersen 2007) as well as largely inaccessible to non-specialists. Within the statistical literature rather simplistic models have been fitted that typically only consider a spatial pattern without marks. However, Illian & Rue (2010) and Illian, Sørbye & Rue (2012) have developed an approach that facilitates the fitting of realistically complex Cox process models based on INLA (Rue, Martino & Chopin 2009; see the Appendix for technical details). They provide a toolbox that enables non-specialists to develop and fit complex models to data using coding routines within the familiar software package R based on the library R-INLA. In particular, based on this approach, we can model marked point pattern data without an assumption of independence of the pattern and the marks (Ho & Stoyan 2008; Myllymäki & Penttinen 2009). This is achieved through fitting a joint model to both the pattern and the marks in which the dependence is accounted for by a shared spatial effect that is contained both in the explanatory part of the random field Λ(·) and the model for the marks.

INLA in a nutshell

Conveniently, a new computationally efficient method for fitting a wide range of complex models has been developed. This method, called INLA (Rue, Martino & Chopin 2009), opens the possibility to analyse increasingly complex ecological data such as those we consider here. In general, INLA may be used to fit a large class of statistical models, the very flexible class of latent Gaussian models (details in the Appendix), in a Bayesian context. An underlying stochastic structure (called a ‘latent’ field) is contained in these models to account for temporal or spatial autocorrelation; given the latent field, the observations are assumed to be independent. Cox processes are an example of this class of models.

INLA is computationally efficient because it uses an approximation approach based on clever Laplace approximations rather than simulations (MCMC). It is designed to fit latent Gaussian models in which spatial autocorrelation in the latent field is reflected by a Gauss Markov random field (GMRF) (Rue & Held 2005). This is a spatially discrete stochastic process in which spatial dependence is restricted to suitably specified spatial neighbours, again increasing efficiency. INLA is much faster than MCMC and at the same time flexible and very accurate (Rue, Martino & Chopin 2009). We provide technical details for INLA in the Appendix.

Here, by way of example, we use INLA to fit a joint model to a spatial pattern and the marks derived from a study system on a protected plant species using different likelihoods for the pattern and the marks. INLA enables us to fit this model and, because it is fast, we can also employ model comparison methods to identify the best model out of a set of models based on the deviance information criterion (DIC) within reasonable time such as a few minutes.


Study system

As a case study, we model the spatial pattern of the endangered plant species T. carnosus. The data set we use for this is relatively complex because it is a marked point pattern consisting of six replicates; two replicates of each of three different levels of livestock pressure. Importantly, the data consist of the health status of individual plants (a mark) as well as their x and y co-ordinates. The main purpose of this study was to present the methodology and this data set presents an excellent example of how the methodology can be used since it is of a complexity that is not unusual for an ecological data set, but that has normally not been considered in the statistical literature. While we do not want to focus too much on the particular details of the study system, we briefly provide some contextual information below and refer the interested reader to other articles for further details.

Study area and vegetation community

The study area is the coastal dune system of the El Rompido sand spit, which is located at the mouth of the River Piedras (Gulf of Cadiz, SW Spain) (37°12′N, 7°07′W). The spit stretches east for about 12 km, is between 300 and 700 m in width and currently covers an area of 534·7 ha, of which 57% are interior sand dunes (Gallego-Fernández, Muñoz Vallés & Dellafiore 2006). The El Rompido spit supports diverse vegetation communities (Gallego-Fernández, Muñoz Vallés & Dellafiore 2006) and this includes 16 protected and/or endangered species that have been recorded in the area (Muñoz Vallés, Gallego-Fernández & Dellafiore 2009). The spit is subject to low tourist pressure. Grazing by domestic livestock (sheep and goats) is prohibited within the protected area.

Study species

Our focal species, Thymus carnosus Boiss. (Labiateae), is an evergreen coastal shrub, up to 0·5 m high, endemic to the southwestern of the Iberian Peninsula coastal dunes. The species is in danger of extinction in Spain (Cabezudo et al. 2005) and populations are also seriously declining in Portugal. The main driver of decline is habitat destruction of coastal dune systems by urbanization and tourism (Cabezudo et al. 2005). The coast of Huelva is one of the southern extremes of its distribution (Parra et al. 2000) and El Rompido spit retains the largest population found in Spain (Alés, Sánchez Gullón & Peña 2003), this being a major factor behind much of the spit's inclusion within a natural protected area.

Retama monosperma (L.) Boiss. (Leguminoseae) is a leafless leguminous shrub, growing to a height of 3·5 m, native to the sandy soils of the southwest coast of the Iberian Peninsula. Retama monosperma was planted in the middle of the El Rompido spit in the 1930s as a dune stabilizing species (Gallego-Fernández, Muñoz Vallés & Dellafiore 2006) and over the period 1956–2001, the basal cover of R. monosperma has increased from 15 to 116 ha (Muñoz Vallés, Gallego-Fernández & Dellafiore 2009). This invasion has resulted in a profound change in the dune landscape from open plant communities to shrubland of variable density and constitutes a considerable threat to dune landscapes because it suppresses natural vegetation of coastal dunes of high conservation value (Muñoz Vallés et al. 2011). Recent studies in the study area have shown that T. carnosus is threatened both by the invasion of R. monosperma on dunes and by livestock pressure. When R. monosperma establishes in an area occupied by T. carnosus, competition between the two species for light and water can result in the eventual exclusion of T. carnosus. While sheep and goats do not eat T. carnosus, individual plants located beneath the R. monosperma canopy are strongly affected by trampling (Zunzunegui et al. 2012) – the livestock are attracted to R. monosperma and hence trampling pressure is greatest close to the invasive shrub.

In 2008, in most western populations of the El Rompido spit a high mortality of T. carnosus plants was observed and a high proportion of survivors had a poor health status. The spatial pattern of mortality/decline in health was apparently not homogeneous, resulting in higher mortality in lower areas of the dunes. This observation motivated the collection of the data which we analyse in this study.

Data on the location and health status of plants were collected at three study sites each with different livestock pressure: (a) High herbivory plots (High1 and High2) were located in the western part of the spit, outwith the protected area. The vegetation is dominated by a shrub community composed mainly of R. monosperma and T. carnosus. (b) Low herbivory plots (Low1 and Low2) were outside the protected area, but in a location where livestock access is less frequent. The vegetation is dominated by R. monosperma, T. carnosus and Artemisia campestris. (c) Non herbivory plots (Nat1 and Nat2) were located inside the protected area where they are never accessible to livestock. The vegetation is composed mainly of a shrub community of R. monosperma, T. carnosus, Helichrysum picardii, Artemisia campestris and Crucianella maritima.

Data description

The data set comprises observations of point patterns in six plots (each 25 m × 25m in size), two plots for each of three different levels of livestock pressure (‘High’, ‘Low’, ‘Nat’), in which the area marked by ‘Nat’ is non-accessible to livestock. The two plots with high level of livestock pressure are adjacent. For each plot, the data consist of the location of the individual T. carnosus plants as well as their health status, a mark that provides additional information on the individuals in the spatial pattern. Data on the health status have been collected on a scale from 0 (dead) to 4 (very healthy), which, for the purposes of this analysis, have been aggregated into two categories dead or in poor health (0–2) and alive and healthy (3–4). Moreover, for each plot, covariate data on the location and size of the R. monosperma plants and the distance to the water-table have been collected. Table 1 in the Appendix displays a summary of the data for each plot. Figures 1–5 in the Appendix show the point pattern formed by T. carnosus (a), R. monosperma cover (b), distance from the water level (c) and distances to the nearest neighbours (d) for each of the plots.

Joint model of T. carnosus pattern and health status

Using INLA, we are able to fit a joint model to the spatial pattern and the health status, i.e. the marks. The spatial pattern formed by the plants reveals those areas where environmental conditions have been suitable for plant establishment and survival over the longer term while the health status of the plants provides complementary information, as it is anticipated to reflect the impact of the most recent extreme drought. Fitting a joint model hence allows us to assess the impact of drought on both short-term and long-term processes simultaneously. In other words, we take an integrated approach that allows covariates to impact differently on the spatial pattern and on the health status. Using a joint spatial effect, we can then account for both spatial autocorrelation and dependence between the pattern and the marks that cannot be explained by the empirical covariates.

Model description

To model the point pattern, we use a log-Gaussian Cox process construction. As INLA fits models that are based on discrete Gauss Markov random fields, we have to approximate the spatially continuous random field Λ(s) = exp (Z(s)) using a grid. Hence, to fit the model with INLA, the observation window in each of the k = 1, …, 6 plots is discretized into N = nrow × ncol grid cells {sijk} with area |sijk|, i = 1, …, nrow, j = 1, …, ncol and nrow = ncol = 40. Grids with a finer resolution have been used to assess if the results are influenced by the fineness of the grid, but produced essentially the same results. Let {yijk} denote the observed number of points in the grid cells for plot k. Due to the Cox process construction, the number of points in grid cell {sijk} follows a Poisson distribution given math formula, the value of a latent field in the same grid cell (see Rue, Martino & Chopin 2009):

display math(eqn 1)

Each individual T. carnosus plant has been classified according to health status. Let mijk be the number of plants categorized as being healthy in grid cell sijk in plot k. Given the value of a second latent field math formula in the same grid cell, mijk follows a binomial distribution

display math(eqn 2)

where math formula is the probability of plants being healthy and yijk is the total number of T. carnosus plants in grid cell sijk.

The main interest is now in constructing the models for the two latent fields math formula and math formula. The full models for the latent field math formula for the spatial pattern and math formula for the marks that will be considered are specified by

display math(eqn 3)
display math(eqn 4)

respectively. Here, β01 and β02 are offsets, RC(sijk) is a covariate describing the degree of R. monosperma cover in grid cell sijk. WD(sijk) represents the distance from the terrain to the water level (measured as the altitude plus the depth of water-table). The values of this covariate are not available at all grid cells and have therefore been interpolated from the original measurements. As the distribution of these distances is skewed, the values have been log-transformed. LSPk is the degree of livestock pressure for plot k. This is a categorical covariate (or ‘factor’). To ensure identifiability, we use a sum-to-zero constraint, as is common in models that contain factor variables. The β-parameters for the linear effects of R. monosperma cover and distance to water are unknown coefficients.

f(zc(sijk)) and g(zc(sijk)) are functions of a constructed covariate reflecting local interaction in grid cell sijk. Here, we use a constructed covariate representing the distance from the midpoint of each cell to the nearest point in the pattern outside the cell (see the Appendix for more detail). This reflects the local intensity in each grid cell and may be used as a measure of local competition. As we do not know if the dependence on this constructed covariate is linear, we fit a smooth function to it.

math formula and math formula are GMRFs (spatially structured effects) describing the spatial autocorrelation not explained by the covariates. Finally, u(sijk) and v(sijk) are spatially unstructured random effects, i.e. random error terms. We aim to jointly fit the model to the point pattern and the marks using Eqns (eqn 3) and (eqn 4), expressing dependence between the pattern and the marks in this way. In this case, the spatial effect for the marks is proportional to the spatial effect for the pattern, math formula. Methods for model comparison may be used to check whether the full model in (eqn 3) and (eqn 4), or a submodel provides the best fit according to a model comparison criterion, here the DIC.

Specifying the model in R-INLA

We briefly explain here how the full model is specified in a call using the library R-INLA; submodels are specified by leaving out the appropriate terms in the model specification. Detailed code for running the model discussed here – including the appropriate data transformation – can be found in the Appendix.

The joint model for both latent fields is specified in a single model specification. In general, the model can be specified within the call to the function inla which uses the approximation algorithm based on INLA. However, this can look very complicated. Hence, for the sake of the exposition, we explain this in two separate steps to make the code easier to read. We initially describe how the model for the latent field is specified as a model formula in R and then describe the call to the function inla afterwards.

As we are fitting a joint model to both the marks and the spatial pattern, we have two separate response variables. These have to be stored in a matrix (called outcome.matrix below) with two columns, one for each outcome variable. We also have to specify separate offsets (beta.pat and beta.status) for each of the two components as well as separate explanatory variables for the degree of R. monosperma cover (retama.pat and retama.status) and for the distance to the water-table (topo.pat and topo.status). Any nonlinear effects are specified by f(.). This notation is used for the random effect accounting for the different levels of livestock pressure (lsp.pat and lsp.status), where the model is specified as iid. It is also used for the constructed covariate (const.pat and const.status; here the model is a one-dimensional CAR model of order 1, rw1) and the spatial effect (I.pat and I.status); here the model is a two-dimensional CAR model of order 2, rw2d). For each of the two response variables, the model for the spatial effect is chosen to be the same across all replicates, i.e. across the six plots, including the choice of the hyperparameters. This is achieved by specifying the relevant plot for each grid cell using the command replicate.

  • formula = outcome.matrix ˜ -1 + beta.pat + beta.status

  • + retama.pat + retama.status + topo.pat + topo.status

  • + f(lsp.pat, model="iid") + f(lsp.status, model="iid")

  • + f(inla.group(constructed.pat), model="rw1", hyper=param.cc)

  • + f(inla.group(constructed.status), model="rw1", hyper=param.cc)

  • + f(I.pat, model="rw2d", nrow=2*n.columns, ncol=n.columns,

  • replicate=plot.pat, hyper=param.spatial)

  • + f(I.status, model="rw2d", nrow=2*n.columns, ncol=n.columns,

  • replicate=plot.status, hyper=param.spatial)

Once this has been specified we can call the function inla as follows:

  • result = inla(formula, family =c("poisson","binomial"),

  • data = outcome.matrix, Ntrials = Ntrials, E = Area, control.compute=list(dic=T))

Here, we need to specify the two different distributions for the two response variables using family =c(“poisson”, “binomial”). For the Poisson distribution, we specify the size of the area of the cells E = Area while, for the binomial case, we specify the number of trials, i.e. the number of plants per cell. The term control.compute=list(dic=T) may be included such that the DIC is calculated as well (Spiegelhalter et al. 2002). The hyperparameters have to be chosen carefully; in particular for the spatial effects it is important to choose parameters such that the spatial effect is smooth. This is critical for avoiding overfitting, because a spatial effect that is too coarse can potentially explain every single point in the pattern. In this case, the spatial effect would make any empirical covariates redundant and also defy both the purpose of the model and the use of the spatial effect. This is because it would explain any spatial variation in the data by being an almost exact copy of the data, that is naturally unable to distinguish between the effect of the covariates and any remaining spatial structure. The specific parameters chosen here may be found in the code in the Appendix.

To find a best possible model for the given data set, we evaluate several submodels of the joint model described in (eqn 3) and (eqn 4), using DIC for model comparison and finding posterior estimates for relevant parameters. Initially, we fit a model without the constructed covariates and spatial effects to assess which of the empirical covariates are significant (see section 'Assessing the influence of empirical covariates'). In the section 'Adding constructed covariates and spatial effects', we move on to include the constructed covariates and a common spatial effect for the pattern and the marks. The main aim of including these terms is to account for additional small- and large-scale structure not explained by the empirical covariates. Through this, we are able to better understand the spatial structure in the data and relate this to the potential ecological processes that have caused these, such as dispersal mechanisms or suggest associations with unobserved covariates.


Assessing the influence of empirical covariates

Separate DIC-values for the pattern and the marks gained from running models with the intercepts, the unstructured fields and different subsets of the empirical covariates are given in Table 1. For the intensity of the pattern, we notice that all the empirical covariates are relevant to the model as the DIC increases if any of these terms are left out. However, there is no evidence that R. monosperma cover directly impacts on the health status of the plants.

Table 1. Separate DIC values for pattern and marks including intercepts, empirical covariates and error fields; RC refers to Retama monosperma cover, WD to the distance from the terrain to the water level and LSP to livestock pressure
ModelDIC (pattern)DIC (marks)
No empirical covariates13 0553273
Empirical covariates RC, WD and LSP12 1672358
 Without RC12 4752358
 Without WD12 1832364
 Without LSP12 4913205

Significance of the empirical covariates may also be assessed by calculating posterior means, standard deviations and credible intervals for each term (see Table 2). These results support the conclusions already made. The negative posterior mean indicates that R. monosperma cover has a negative impact on the location of the T. carnosus plants; this is reasonable because only few T. carnosus plants grow underneath R. monosperma plants. However, the competitive effect of R. monosperma is not significant for the health status of the T. carnosus plants and is hence not considered in the final model. Hence, competition with R. monosperma impacts on the long-term establishment of the plants in the environment, but it does not impact on short-term survival.

Table 2. Posterior mean, standard deviation and 95% pointwise credible intervals for fixed effects; RC refers to Retama monosperma cover and WD to the distance from the terrain to the water level
  MeanSD2·5% quant.97·5% quant.
Intercept for pattern β 01 −0·8280·109−1·047−0·618
RC for pattern β 11 −1·0070·056−1·117−0·898
WD for pattern β 21 0·0960·0230·0510·143
Intercept for marks β 02 0·4020·306−0·2030·998
RC for marks β 12 −0·2660·175−0·6060·079
WD for marks β 22 0·1800·0660·0520·311

The distance to the water-table has a positive significant effect on both the location and the health status of the T.carnosus plants. This indicates that the density of T. carnosus plants is higher in areas where the water-table is low and that these plants are also healthier. The level of livestock pressure (LSP) is seen to impact on both the intensity of the pattern and on the health status of the plants as all credible intervals for different levels of LSP are significantly different from 0 (results not shown). However, to more fully account for random structure due to different study regions, and to provide a better understanding of spatial processes in the data, spatially structured effects should also be included in the model.

Adding constructed covariates and spatial effects

We now add constructed covariates and a joint spatially structured effect to account for local clustering and random large-scale variation impacting on short- and long-term survival, respectively, not explained by the empirical covariates. As mentioned, these effects might easily be overfitted to the actual pattern making the empirical covariates in the model redundant. Thus, the prior parameters for these effects need to be chosen carefully to avoid overfitting. We choose to estimate joint spatial effects for the pattern and the marks, for each of the given plots.

Table 3 summarizes the DIC-values for various joints model for the pattern and the marks as the different terms are added to the model. The final model with the lowest DIC, using a common spatial effect, is the following:

display math(eqn 5)
display math(eqn 6)

in which the estimated value of βs is 1·343. The constructed covariate is significant for the pattern, but the model fit does not improve if it is included in the model for the marks. Figure 1 shows the estimated functional relationship between the constructed covariate and the spatial pattern. The plot reveals that the plants are locally clustered (up to around 2 m), as the curve shows that the intensity of the pattern is high if the constructed covariate, i.e. the distance to the nearest neighbour, is low. The same constructed covariate is non-significant for the health status resulting in a flat curve (result not shown). In other words, the model does not indicate that the health status is worse or better in areas where the pattern is locally clustered than in areas where the plants do not cluster locally.

Table 3. Summary of DIC values for joint models of the pattern and marks, having increasing complexity
Random field modelDIC
Only intercepts17189
Add unstructured terms16328
Add significant empirical covariates14525
Add constructed covariates13877
Add common spatial effect13593
Figure 1.

Effect of the constructed covariate on the log intensity of the point pattern formed by the Thymus carnosus plants as a smooth function with 95% credibility intervals (dashed lines).

Figure 2 shows the estimated common spatially structured effect, i.e. residual spatial autocorrelation unexplained by the covariates, for each of the plots. As these are clearly exhibiting a structure that is not flat or uniform in space, they reveal that the residual spatial autocorrelation is present in the data that cannot be explained by the covariates alone. A careful inspection of these surfaces might serve as a means of identifying additional covariates that might improve the model and impact on the establishment of T. carnosus plants.

Figure 2.

Estimated common spatial trend for the spatial pattern and marks (posterior mean) in each of the five plots.

For more specific results, we may consider the posterior distribution for the explanatory variables. The posterior means as well as standard deviations and 95% credible intervals for the intercepts, the degree of R. monosperma cover and the distance to the water-table in the final model, are summarized in Table 4. We notice that the empirical covariates are still significant after the constructed covariate and the spatial effect have been added. The effect of livestock pressure on the intensity of the pattern and on the marks (posterior mean and 95% credible intervals) is illustrated in Fig. 3. Livestock pressure clearly has a strong effect on the health status of the plants. Not surprisingly, plants seem to be healthier at a low level of livestock pressure while a high level of livestock pressure worsens the health status. The non-herbivory plots (‘Nat’) are non-accessible for livestock, but have a high percentage of R. monosperma cover and the number of T. carnosus plants here is lower than in the other plots.

Table 4. Posterior mean, standard deviation and 95% pointwise credible intervals for fixed effects of the final model; RC refers to Retama monosperma cover and WD to the distance from the terrain to the water level
  MeanSD2·5% quant.97·5% quant.
Intercept for pattern β 01 −3·2140·432−4·085−3·206
RC for pattern β 11 −0·4040·059−0·521−0·404
WD for pattern β 21 0·0660·0280·0130·065
Intercept for marks β 02 0·3320·366−0·3890·333
WD for marks β 22 0·1500·0680·0170·150
Figure 3.

Effect of livestock pressure on the intensity of the spatial pattern (a) and health status (b) of the Thymus carnosus plants.


In this contribution, we have highlighted the potential for using an emerging statistical methodology, INLA, within the context of spatial ecological data. We have explained how it promises to facilitate the analysis of more complex spatial data sets than has to date been possible and have demonstrated this potential using a typically complex data set of spatial plant distributions that, in this case, includes individual health status as well as spatial covariates. We anticipate that INLA will have two major impacts on the inferences we make from spatial ecological data. The first is that it promises to substantially improve the robustness of the sorts of inferences that we are already making; this is because it enables the real complexity that exists in many ecological data sets to be more fully incorporated. The second benefit is that it will make new inferences possible that could not have been considered previously. In particular, these are likely to relate to gaining insights into processes and patterns that operate simultaneously or at different levels of a system such as the different temporal scales in the study data set. Similarly, several types of data that inform on the same or related processes may be analysed in one integrated model. This includes situations where data are available from a number of sources with a different quality and we can substantially gain from jointly exploiting all the information contained in these.

In this discussion, we will first provide some relatively brief ecological interpretation of the results gained for our case study before turning to the main focus of the article, which is the application of INLA in spatial ecological analysis in general. Here, we will describe the current state of the statistical field and explain what is and what is not currently possible using INLA and suggest some promising potential avenues where ecological analysis may progress rapidly using the currently available methods. Finally, we will highlight where further work between ecologists and statisticians will be required to develop the methodology such that it is able to deal with an even greater range of spatial ecological data sets.

Ecological discussion

Our analysis of the marked point pattern (i.e. the spatial distribution of plants according to health status) yields some clear results. It confirms that T. carnosus is found much less frequently in the proximity of R. monosperma (under R. monosperma canopy). Given our expectation that R. monosperma is a strong competitor, it is not surprising that we find substantially reduced densities of T. carnosus near R. monosperma. In addition, in sites with higher livestock disturbance, we believe the reduced T. carnosus density under the canopy, is due to a trampling effect of the livestock which are often located in the proximity of the R. monosperma. In terms of health status, we find no effect of R. monosperma presence on T. carnosus. This suggests that, in a particularly dry year, R. monosperma presence does not have a short-term impact on local T. carnosus plants. From this result, we might hypothesize that the longer term negative effect of R. monosperma on T. carnosus (that we do observe in the data) is perhaps more due to competition for light rather than competition for water.

The most interesting result revealed by our spatial analysis relates to the distance-to-water covariate: while T. carnosus plants are typically at higher density close to water, they are generally healthier when they are further from the water. The Mediterranean-type climate is characterized not only by a strong seasonal variability of rainfall, with cool, wet winters and hot, dry summers, but with unpredictable alternating years of severe drought with others of high precipitation rates. So, following an unusually dry year, we observe higher mortality of individuals that are growing closer to the water-table, a result that, at first sight, seems counterintuitive and warrants some explanation. In common with all other species occupying the harsh environment represented by the Mediterranean dunes, T. carnosus has to be well-adapted to water stress which, especially during summer, can be substantial. Plants living in such water-limited ecosystems have evolved a range of rooting strategies that enable them to avoid serious water-deficit (Larcher 1995; Rodriguez-Iturbe et al. 2001; Collins & Bras 2007; Viola et al. 2008); these include both intensive exploitation strategies involving roots and transpiration systems that rapidly respond to intermittent and unpredictable rainfall events during the summer months and extensive exploitation strategies with roots that extend deeper and enable individuals to benefit from soil moisture at much greater depths. Many species are characterized as utilizing mainly one of these rooting strategies (Viola et al. 2008; Jenerette et al. 2012), such as dimorphic root systems (Dawson & Pate 1996). However, T. carnosus is quite plastic and can use both strategies to a greater or lesser extent depending upon local environmental conditions. In the absence of a water-table near the surface, the species typically develops a root system capable of taking water from precipitation or condensation on the surface of soil (a more intensive strategy). However, when groundwater is close (<1·5 m), the radical system of T. carnosus is dimorphic, with some shallow roots but also deeper roots that can reach groundwater. We hypothesize that the plasticity in rooting strategy provides the likely explanation for our observation that the plants growing closer to the water-table are the ones to suffer the most from an unusually dry summer. We suggest that these individuals are likely to be much more reliant on the deeper water accessed by their extensive rooting system and have invested much less heavily in an intensive rooting system that would equip them to access the water available near the surface from light precipitation or condensation. So, when the water-table drops, they are likely to be prone to suffer a much greater water-deficit than those individuals with a more intensive rooting system that do not rely on the deeper water. This type of rooting strategy would correspond with the response found by Zunzunegui, Caldeira-Díaz-Barradas & Novo (2000) in another Mediterranean species, Halimium halimifolium. Even though water-table was further away for plants at the top of the dune, Halimium halimifolium plants from this site exhibited better physiological and vegetative responses than Halimium halimifolium plants growing in the dune slack. It was suggested that these individuals acclimated to permanent water availability could show higher sensitivity to drought events than the former, which never reached the water-table. Our result provides an interesting example of how plastic responses to spatially heterogeneous environmental conditions may make the response of individuals to environmental stress inherently hard to predict.

Methodological discussion

In this article, we discuss a marked spatial point process model and jointly fit this model to both the spatial pattern formed by individual plants and the associated marks. Using INLA enables us to fit this complex point process model at relatively little computational cost, while it would be computationally prohibitive to do this with standard MCMC methods (see Rue, Martino & Chopin 2009 for comparisons of running times). In addition, the full model and appropriate submodels may be considered to allow for model comparison. Certainly, INLA may be applied to fit many other complex point process models. This includes other marked point processes such as multivariate models, and models with marks following other distributions, such as normal for continuous marks, Poisson for count data, zero-inflated Poisson, etc. Similarly, INLA also facilitates the integrated analysis of other joint models such as models of a spatial pattern and spatial covariates that account for measurement error in the covariates (Illian, Sørbye & Rue 2012) or spatio-temporal point patterns. The latter constitute an emerging field within statistics (Diggle 2007) and this promises to open even more opportunities for analysis of ecological data.

In discussing the data example here, we aim at introducing an ecological audience to spatial modelling based on INLA fitting a latent Gaussian model, in particular a marked Cox process model to an ecological data set. Many spatial point process models, including Poisson models (Aarts, Fieberg & Matthiopoulos 2012) and Gibbs process models (Baddeley & Turner 2005) do not assume a latent random model, but use models that are based on a deterministic trend. Modelling the spatial trend in these models hence often assumes that an explicit and deterministic model of the trend as a function of location (and spatial covariates) is known (Baddeley & Turner 2005). The estimated values of the underlying spatial trend are considered fixed values, which are subject neither to stochastic variation nor to measurement error. As it is based on a latent random field, the approach discussed here differs from these approaches in assuming a hierarchical, doubly stochastic structure. This provides a flexible class of point processes models which assume that the spatial trends exist in the data that cannot be accounted for by the covariates. The spatial trend is hence not regarded as deterministic, but assumed to be a random field.

In general, analysing the spatial pattern formed by individuals in space is not necessarily the interest of all ecological studies involving spatial data and hence point process models are certainly only one type of spatial model that is relevant here. As the class of latent Gaussian models is very general, many other spatial (and indeed non-spatial) data structures may be fitted with INLA. For instance, similar modelling techniques may also be applied to geostatistical data, i.e. a situation where the aim is to fit a spatially continuous model to measurements taken at a finite number of discrete locations (Diggle & Ribeiro 2007). This includes situations where preferential sampling is likely to have occurred (Diggle, Menezes & Su 2010). Similarly, models for data that have been collected on a – regular or irregular – spatial grid can also be fitted taking a strongly related approach to the model discussed here (Rue & Held 2005). In other words, while we discuss one specific example here, the INLA methodology is generally applicable to many other spatial models.

It is worth mentioning that many other complex data structures that are not necessarily spatial may be fitted with INLA – in a Bayesian setting. Examples include models with random effects, dynamic linear models, stochastic volatility models, generalized linear (mixed) models, generalized additive (mixed) models, spline smoothing, semiparametric regression, space-varying (semiparametric) regression models, disease mapping, spatio-temporal models, survival models etc. (see Rue, Martino & Chopin 2009). While INLA facilitates the fitting of increasingly complex models, there will inevitably be eventual limitations. In particular, an increase in the number of hyperparameters will eventually also slow down INLA.

The current approach uses a regular spatial grid and approximates both the latent field and the spatial pattern by this grid. Due to this, a dense lattice has to be used to be as exact as possible. Recent statistical developments that approximate the random field by the solution to a stochastic differential equation (SPDE) defined on a triangulation avoid these issues. Here, the resolution of the spatial component can be locally controlled (Lindgren, Rue & Lindström 2011). Combining this SPDE approach with INLA is currently undergoing development. This will allow for more flexible models to be fitted since the spatial field and hence the latent process may be defined to account for phenomena relevant in realistic data sets such as varying boundary conditions or observation windows with holes (Simpson et al. 2011).

In summary, INLA already provides considerable opportunities for the fitting of spatial ecological data that would previously have been impossible to fit using other approaches. Although most often ecologists will apply newly emerging statistical methods some time (often some considerable time) after they have been initially developed by the statisticians, the development and application of the methods can, in this case, benefit substantially from the close working together of spatial ecologists and statisticians. There are many ways in which INLA can be further developed such that it is able to be used for analysis of a greater range of spatial data and ecologists with an intimate knowledge of their data, and of the key questions they want to explore using their data, can help to prioritize the directions future statistical developments take. The ecologists benefit by having methods available to address questions they may otherwise be unable to answer while the statisticians benefit by having access to ecological data exhibiting interesting statistical properties that may often demand the development of new statistical approaches. We hope and anticipate that over the next few years we will witness a rapid development of these statistical methods driven, at least in part, by a recognition that they offer enormous potential to provide novel insights into ecological processes through the analysis of complex spatial data.