In this contribution, we have highlighted the potential for using an emerging statistical methodology, INLA, within the context of spatial ecological data. We have explained how it promises to facilitate the analysis of more complex spatial data sets than has to date been possible and have demonstrated this potential using a typically complex data set of spatial plant distributions that, in this case, includes individual health status as well as spatial covariates. We anticipate that INLA will have two major impacts on the inferences we make from spatial ecological data. The first is that it promises to substantially improve the robustness of the sorts of inferences that we are already making; this is because it enables the real complexity that exists in many ecological data sets to be more fully incorporated. The second benefit is that it will make new inferences possible that could not have been considered previously. In particular, these are likely to relate to gaining insights into processes and patterns that operate simultaneously or at different levels of a system such as the different temporal scales in the study data set. Similarly, several types of data that inform on the same or related processes may be analysed in one integrated model. This includes situations where data are available from a number of sources with a different quality and we can substantially gain from jointly exploiting all the information contained in these.
In this discussion, we will first provide some relatively brief ecological interpretation of the results gained for our case study before turning to the main focus of the article, which is the application of INLA in spatial ecological analysis in general. Here, we will describe the current state of the statistical field and explain what is and what is not currently possible using INLA and suggest some promising potential avenues where ecological analysis may progress rapidly using the currently available methods. Finally, we will highlight where further work between ecologists and statisticians will be required to develop the methodology such that it is able to deal with an even greater range of spatial ecological data sets.
Our analysis of the marked point pattern (i.e. the spatial distribution of plants according to health status) yields some clear results. It confirms that T. carnosus is found much less frequently in the proximity of R. monosperma (under R. monosperma canopy). Given our expectation that R. monosperma is a strong competitor, it is not surprising that we find substantially reduced densities of T. carnosus near R. monosperma. In addition, in sites with higher livestock disturbance, we believe the reduced T. carnosus density under the canopy, is due to a trampling effect of the livestock which are often located in the proximity of the R. monosperma. In terms of health status, we find no effect of R. monosperma presence on T. carnosus. This suggests that, in a particularly dry year, R. monosperma presence does not have a short-term impact on local T. carnosus plants. From this result, we might hypothesize that the longer term negative effect of R. monosperma on T. carnosus (that we do observe in the data) is perhaps more due to competition for light rather than competition for water.
The most interesting result revealed by our spatial analysis relates to the distance-to-water covariate: while T. carnosus plants are typically at higher density close to water, they are generally healthier when they are further from the water. The Mediterranean-type climate is characterized not only by a strong seasonal variability of rainfall, with cool, wet winters and hot, dry summers, but with unpredictable alternating years of severe drought with others of high precipitation rates. So, following an unusually dry year, we observe higher mortality of individuals that are growing closer to the water-table, a result that, at first sight, seems counterintuitive and warrants some explanation. In common with all other species occupying the harsh environment represented by the Mediterranean dunes, T. carnosus has to be well-adapted to water stress which, especially during summer, can be substantial. Plants living in such water-limited ecosystems have evolved a range of rooting strategies that enable them to avoid serious water-deficit (Larcher 1995; Rodriguez-Iturbe et al. 2001; Collins & Bras 2007; Viola et al. 2008); these include both intensive exploitation strategies involving roots and transpiration systems that rapidly respond to intermittent and unpredictable rainfall events during the summer months and extensive exploitation strategies with roots that extend deeper and enable individuals to benefit from soil moisture at much greater depths. Many species are characterized as utilizing mainly one of these rooting strategies (Viola et al. 2008; Jenerette et al. 2012), such as dimorphic root systems (Dawson & Pate 1996). However, T. carnosus is quite plastic and can use both strategies to a greater or lesser extent depending upon local environmental conditions. In the absence of a water-table near the surface, the species typically develops a root system capable of taking water from precipitation or condensation on the surface of soil (a more intensive strategy). However, when groundwater is close (<1·5 m), the radical system of T. carnosus is dimorphic, with some shallow roots but also deeper roots that can reach groundwater. We hypothesize that the plasticity in rooting strategy provides the likely explanation for our observation that the plants growing closer to the water-table are the ones to suffer the most from an unusually dry summer. We suggest that these individuals are likely to be much more reliant on the deeper water accessed by their extensive rooting system and have invested much less heavily in an intensive rooting system that would equip them to access the water available near the surface from light precipitation or condensation. So, when the water-table drops, they are likely to be prone to suffer a much greater water-deficit than those individuals with a more intensive rooting system that do not rely on the deeper water. This type of rooting strategy would correspond with the response found by Zunzunegui, Caldeira-Díaz-Barradas & Novo (2000) in another Mediterranean species, Halimium halimifolium. Even though water-table was further away for plants at the top of the dune, Halimium halimifolium plants from this site exhibited better physiological and vegetative responses than Halimium halimifolium plants growing in the dune slack. It was suggested that these individuals acclimated to permanent water availability could show higher sensitivity to drought events than the former, which never reached the water-table. Our result provides an interesting example of how plastic responses to spatially heterogeneous environmental conditions may make the response of individuals to environmental stress inherently hard to predict.
In this article, we discuss a marked spatial point process model and jointly fit this model to both the spatial pattern formed by individual plants and the associated marks. Using INLA enables us to fit this complex point process model at relatively little computational cost, while it would be computationally prohibitive to do this with standard MCMC methods (see Rue, Martino & Chopin 2009 for comparisons of running times). In addition, the full model and appropriate submodels may be considered to allow for model comparison. Certainly, INLA may be applied to fit many other complex point process models. This includes other marked point processes such as multivariate models, and models with marks following other distributions, such as normal for continuous marks, Poisson for count data, zero-inflated Poisson, etc. Similarly, INLA also facilitates the integrated analysis of other joint models such as models of a spatial pattern and spatial covariates that account for measurement error in the covariates (Illian, Sørbye & Rue 2012) or spatio-temporal point patterns. The latter constitute an emerging field within statistics (Diggle 2007) and this promises to open even more opportunities for analysis of ecological data.
In discussing the data example here, we aim at introducing an ecological audience to spatial modelling based on INLA fitting a latent Gaussian model, in particular a marked Cox process model to an ecological data set. Many spatial point process models, including Poisson models (Aarts, Fieberg & Matthiopoulos 2012) and Gibbs process models (Baddeley & Turner 2005) do not assume a latent random model, but use models that are based on a deterministic trend. Modelling the spatial trend in these models hence often assumes that an explicit and deterministic model of the trend as a function of location (and spatial covariates) is known (Baddeley & Turner 2005). The estimated values of the underlying spatial trend are considered fixed values, which are subject neither to stochastic variation nor to measurement error. As it is based on a latent random field, the approach discussed here differs from these approaches in assuming a hierarchical, doubly stochastic structure. This provides a flexible class of point processes models which assume that the spatial trends exist in the data that cannot be accounted for by the covariates. The spatial trend is hence not regarded as deterministic, but assumed to be a random field.
In general, analysing the spatial pattern formed by individuals in space is not necessarily the interest of all ecological studies involving spatial data and hence point process models are certainly only one type of spatial model that is relevant here. As the class of latent Gaussian models is very general, many other spatial (and indeed non-spatial) data structures may be fitted with INLA. For instance, similar modelling techniques may also be applied to geostatistical data, i.e. a situation where the aim is to fit a spatially continuous model to measurements taken at a finite number of discrete locations (Diggle & Ribeiro 2007). This includes situations where preferential sampling is likely to have occurred (Diggle, Menezes & Su 2010). Similarly, models for data that have been collected on a – regular or irregular – spatial grid can also be fitted taking a strongly related approach to the model discussed here (Rue & Held 2005). In other words, while we discuss one specific example here, the INLA methodology is generally applicable to many other spatial models.
It is worth mentioning that many other complex data structures that are not necessarily spatial may be fitted with INLA – in a Bayesian setting. Examples include models with random effects, dynamic linear models, stochastic volatility models, generalized linear (mixed) models, generalized additive (mixed) models, spline smoothing, semiparametric regression, space-varying (semiparametric) regression models, disease mapping, spatio-temporal models, survival models etc. (see Rue, Martino & Chopin 2009). While INLA facilitates the fitting of increasingly complex models, there will inevitably be eventual limitations. In particular, an increase in the number of hyperparameters will eventually also slow down INLA.
The current approach uses a regular spatial grid and approximates both the latent field and the spatial pattern by this grid. Due to this, a dense lattice has to be used to be as exact as possible. Recent statistical developments that approximate the random field by the solution to a stochastic differential equation (SPDE) defined on a triangulation avoid these issues. Here, the resolution of the spatial component can be locally controlled (Lindgren, Rue & Lindström 2011). Combining this SPDE approach with INLA is currently undergoing development. This will allow for more flexible models to be fitted since the spatial field and hence the latent process may be defined to account for phenomena relevant in realistic data sets such as varying boundary conditions or observation windows with holes (Simpson et al. 2011).
In summary, INLA already provides considerable opportunities for the fitting of spatial ecological data that would previously have been impossible to fit using other approaches. Although most often ecologists will apply newly emerging statistical methods some time (often some considerable time) after they have been initially developed by the statisticians, the development and application of the methods can, in this case, benefit substantially from the close working together of spatial ecologists and statisticians. There are many ways in which INLA can be further developed such that it is able to be used for analysis of a greater range of spatial data and ecologists with an intimate knowledge of their data, and of the key questions they want to explore using their data, can help to prioritize the directions future statistical developments take. The ecologists benefit by having methods available to address questions they may otherwise be unable to answer while the statisticians benefit by having access to ecological data exhibiting interesting statistical properties that may often demand the development of new statistical approaches. We hope and anticipate that over the next few years we will witness a rapid development of these statistical methods driven, at least in part, by a recognition that they offer enormous potential to provide novel insights into ecological processes through the analysis of complex spatial data.