Local environmental effects and spatial effects in macroecological studies using mapped abundance classes: the case of the rook Corvus frugilegus in Scotland


  • A. GIMONA,

    1. Macaulay Land Use Research Institute, Craigiebuckler, Aberdeen, Scotland, AB15 8QH; and Biomathematics and Statistics Scotland, Macaulay Land Use Research Institute, Craigiebuckler, Aberdeen, Scotland, AB15 8QH
    Search for more papers by this author
  • M. J. BREWER

    1. Macaulay Land Use Research Institute, Craigiebuckler, Aberdeen, Scotland, AB15 8QH; and Biomathematics and Statistics Scotland, Macaulay Land Use Research Institute, Craigiebuckler, Aberdeen, Scotland, AB15 8QH
    Search for more papers by this author

A. Gimona, Macaulay Land Use Research Institute, Craigiebuckler, Aberdeen, Scotland, AB15 8QH, UK. E-mail: a.gimona@macaulay.ac.uk


  • 1The study of the spatial pattern of species abundance is complicated by statistical problems, such as spatial autocorrelation of the abundance data, which lead to the confusion of environmental effects and dispersal.
  • 2Atlas-derived data for the rook in Scotland are used as a case study to propose an approach for assessing the likely contribution of dispersal and local environmental effects, based on a Bayesian Conditional Autoregressive (CAR) approach.
  • 3The availability of moist grasslands is a key factor explaining the spatial pattern of abundance. This is influenced by a combination of climatic and soil-related factors. A direct link to soil properties is for the first time reported for the wide-scale distribution of a bird species. In addition, for this species, dispersal seems to contribute significantly to the spatial pattern and produces a smoother than expected decline in abundance at the north-western edge of its distribution range. Areas where dispersal is most likely to be important are highlighted.
  • 4The approach described can help ecologists make more efficient use of atlas data for the investigation of the structure of species abundance, and can highlight potential sink areas at the landscape and regional scale.
  • 5Bayesian spatial models can deal with data autocorrelation in atlas-type data, while clearly communicating uncertainty through the estimation of the full posterior probability distribution of all parameters.


Knowledge of the factors shaping the structure of abundance of a species within its range is very important for a general theory of species distribution (Gaston 2003) and such a theory is central to the aims of ecology, which is often defined as the study of abundance and distribution of organisms. As well as academic reasons, there are applied, conservation-related reasons to study the pattern of abundance. For example, because it is believed to be related to population viability, some authors have proposed using abundance as a criterion for the selection of nature reserves (Winston & Angermeier 1995; Araujo, Williams & Fuller 2002; Carroll et al. 2003). Studying the spatial pattern of abundance can also shed light on factors affecting distributional boundaries, and this has an important application to predicting climate-driven range shifts. It should also be noticed that abundance conveys more information than simple occurrence, as it permits investigation of whether boundaries are gradual or abrupt, which, in turn, can provide clues regarding factors involved in causing them.

The pattern of abundance is usually not a smooth one, as many species tend to be rare in most areas but have a few hotspots of very high density (e.g. Brown, Mehlman & Stevens 1995; Blackburn et al. 1999; Gaston et al. 2000). Modelling suggests that this could be due to a mixture of environmental effects and population processes (Brown et al. 1995; Guo et al. 2005). Although it is sensible to suppose that the two interact, it is difficult to separate these effects in empirical macroecological studies. In this paper we show how this problem can be addressed.

In order to investigate this topic in detail, one has to overcome methodological challenges. The data used in macroecological studies are often taken from distribution atlases and are often ordinal, i.e. ordered categorical (e.g. Gibbons, Reid & Chapman 1993; Asher et al. 2001). Also, the data are almost always spatially autocorrelated. As a first approximation, the abundance structure of a species could be seen as an optimum response surface, reflecting variation in environmental ‘niche’ variables. However, this is likely to be an oversimplification because of additional population processes such as dispersal, source and sink dynamics involving neighbouring sites, and suboptimal habitat choice (van Horne 1983; Pulliam 1988; Bock & Jones 2004). As a result, overlooking the effects of spatial population processes can distort the evidence in support of the influence of known local environmental variables, and this can lead to erroneous predictions and underestimation of uncertainty (Ver Hoef et al. 2001; Keitt et al. 2002). Therefore, there is a sound argument in favour of using spatial models in the presence of suspected biological interaction processes, such as dispersal and conspecific attraction.

Several authors have investigated autocorrelation in spatial distributions of populations; there are various examples of models accounting for spatial autocorrelation response (Augustin, Mugglestone & Buckland 1996, 1998; Huffer & Wu 1998) and also some examples of Conditional Autoregressive (CAR) models using continuous variables in the macroecological literature (Lichstein et al. 2002; Tognelli & Kelt 2004). However, there are very few examples of categorical CARs in ecology (Brewer et al. 2004; Gelfand et al. 2005) and we know of no examples of Bayesian macroecological analyses that are based on an ordinal response and account for spatially correlated errors. In this paper we show how a CAR approach can be applied to the study of patterns of abundance in geographical ecology.

We apply our approach to understanding associations between abundance and a combination of habitat, climatic and population factors affecting our study species in Scotland, at a 10 km resolution. We will demonstrate that a CAR model accounts for autocorrelation and is easy to interpret; this is because spatial effects can be mapped and, in a Bayesian framework, the uncertainty of the parameters is easily examined.

Materials and methods


We use rook Corvus frugilegus (Linnaeus) data derived from the Bird Distribution Atlas (Gibbons et al. 1993), Brenchley (1984) & Brenchley (1986). Rook abundance, expressed as number of nests per square km (Fig. 1), was recorded for 799 10 km2 cells of the National Grid of coordinates, and divided into four classes: 0 (no nests), 1 (1–5 nest per square km), 2 (5–10), and 3 (10+).

Figure 1.

Nest density class in each 10 km2 of the national grid.

Based on prior knowledge about relevant variables (Gimona 1998) we used a set of six initial candidate independent variables, which were: ‘hectares of grass for mowing’; ‘hectares of grass for grazing’, ‘percentage of soil clay’, ‘average April rainfall’; ‘average annual rainfall’; and ‘average July temperature’ (we report further details in Appendix S1, in electronic supplementary material). These variables are related to the foraging habitat of nesting rooks and/or are potentially important for their invertebrate prey (see, e.g. Coombs 1978; Gimona 1998; Griffin & Thomas 2000). Land use-related variables were provided by the Edinburgh data library (see http://datalib.ed.ac.uk/EUDL/agriculture/griddata.html). Soil-related variables were derived from the Macaulay Institute soils database (see Gimona & Birnie 2002), and climatic variables were derived from the Macaulay Institute climatic database surfaces (Matthews, Allison & MacDonald 1993).

bayesian models

Using the set of independent variables listed above, we fitted one nonspatial Bayesian probit ordinal model and two (spatial) Bayesian probit ordinal CAR models, the latter two using different-sized ‘neighbourhoods’ over which to model spatial autocorrelation. The performances of the three models were compared using the Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002). DIC is defined as follows:

DIC = Dbar + pD(eqn 1)

where Dbar is the posterior mean of the deviance, defined as −2 × log(likelihood), and pD is the ‘effective number of parameters’. This is calculated as the posterior mean of the deviance minus the deviance of the posterior means.

Statistical details on our ordinal probit-based Bayesian CAR model can be found in Appendix S2. Briefly, in a CAR model, the spatial effect for any given cell is defined in terms of differences with spatial terms for neighbouring cells. These differences are given (conditional) normal distributions, and spatial variation is ‘smoothed out’, thus creating a better picture of global trends, which have been adjusted for small-scale random variation; this, in turn, accounts for spatial autocorrelation.

We define two types of neighbourhood: one in which only cells sharing the same boundaries are considered neighbours (4-neighbourhood); and one in which the neighbours of a cell are also those cells sharing corners with it (8-neighbourhood).

To be able to estimate the parameters (i.e. to make each model identifiable), without loss of generality we fix two of the cut-off points, setting θ1 = 0 and θ2 = 1, and we assign a diffuse Gamma prior to the precision (1/variance) of the latent distribution of Yi as in Brewer et al. (2004). Thus, for the spatial models, we now have a different cell-specific mean µi:

image(eqn 2)

where Si are the spatial random effects. We fit such a model twice, one for each neighbourhood size. We use the package WinBUGS (Lunn et al. 2000) to fit our models. The specifications of priors are summarized in Table 1.

Table 1.  Prior probability distributions
Regression coefficients βNormal (0, 10−4)
Cut-off point on latent scale θ3Normal (1, 10−4) I(0, ∞)
Latent scale precision (1/σ2)Gamma (10−2, 10−2)
Spatial random effect precision (1/λ2)Gamma (5−1, 5−4)

Any MCMC analysis requires the convergence of the parameter estimation to be checked; for all three of our models we insured that the total deviance and individual coefficients passed the Heidelberg and Welch and the Geweke tests in CODA (Best, Cowles & Vines 1995). This indicates that valid inferences can be made based on the estimated posterior probability distributions.


DIC values were 1202·0 for the nonspatial model and 1144·1 and 1127·6 for the 4- and 8-neighbourhood models, respectively, Based on DIC therefore the best supported model is the spatial model using an 8-neighbourhood, followed by the 4-neighbourhood model, while the nonspatial model was least favoured. For each model, Table 2 provides the medians of coefficients and their credible intervals. These are derived from estimates of the full posterior distributions of each coefficient. It is clear from these results that the coefficients for some independent variables are a little different for the three models, i.e. the sizes of the estimated effects on the density of rook nests are different. This is not a problem per se, as we do not expect all models to have the same coefficients. On the other hand, as expected, the means and medians of all coefficients agree in the direction (sign) of the effect. It is also clear that, with respect to the spatial models, the nonspatial model has suggested a slightly different set of variables are important. This can be seen by looking at which coefficients have credible intervals not including zero.

Table 2.  Bayesian models compared via the medians and 95% credible intervals of the posterior distributions for each coefficient. Coefficients with credible intervals not including 0 are in bold
Model coefficients2·50%Nonspatial median97·5%2·5%Spatial (4- neighbourhood) median97·5%2·5%Spatial (8- neighbourhood) median97·5%

The results make biological sense. Clay content in soils (b.clay%) is related to water moisture retention and this was expected to show an effect on nest density via the influence on soil invertebrate prey (e.g. Feare, Dunnet & Patterson 1974; Edwards 1983; Gimona 1998). While according to the Bayesian nonspatial model clay content has no effect, both spatial models detect a positive effect of clay content on nest density. The nonspatial model is unable to pick up the effect of clay content, as it does not account for spatial autocorrelation, which is also present in the distribution of clay.

Also, all three models suggest that there are more nests per squared kilometre in areas with: more hectares of mowing grass available (b.mow.gr); a higher mean July temperature (b.jul.temp); and a lower annual rainfall (b.ann.rain). These climatic effects reflect a positive association with eastern locations.

Finally, the nonspatial model (which is the least favoured) seems to overestimate the importance of April rainfall, which, according to both spatial models, has very little influence on nest density.

The map of the spatial effect term is of considerable interest as an indicator of possible source-sink dynamics. This accounts for the variation that could not be captured by environmental variables, and it has a clear spatial structure with high values occurring mostly on the east coast and low values on the west coast (Fig. 2). We will return to spatial effects in the discussion.

Figure 2.

Spatial effect Si. High/low values (> 2, ≤ 2) indicate more/less rooks expected from environmental variables than were recorded.


Rook density is influenced by the availability of rather moist grasslands on fertile soils and by dispersal from neighbouring locations. This is the only study we know of in which the spatial pattern of abundance of a species has been associated with soil factors at the macroecological scale. Equally importantly, we have shown that our use of CAR models can provide preliminary evidence of where dispersal might be important in shaping the abundance pattern.

The spatial effect term gives an overview of where there are more or fewer nests than would be expected based on environmental variables alone. There is a general pattern in the spatial effect indicating that, based on environmental variables, there are more rooks than expected in areas adjacent to high density ones. For instance, in the north-east of Scotland there are several grid squares with many more nests than expected. These are adjacent to areas with very high rook density (the Ythan catchment) mostly explained by environmental factors. Similarly, in southern Scotland, squares having high spatial effect values are likely to receive individuals from neighbouring areas. We interpret these high values of the spatial effect as the result of dispersal from the adjacent, high density areas. This might be due to density-dependent emigration, which might lead to nonequilibrium between local population and resources and therefore to source-sink dynamics within a species distribution range. Although further tests are needed to prove that source-sink dynamics is present in the case of the rook, results of a model of potential prey availability support the hypothesis that, in some areas, there is a spatial mismatch between resources and birds density (Gimona 1998).

Conversely, the Isle of Lewis probably suffers from an isolation effect, as it has only one breeding colony, despite the fact that other suitable nesting areas exist. This is indicated both by our analysis (spatial effect <−2) and by the observation on the ground that several apparently suitable areas are only used by sparse overwintering individuals (Stevenson, Outer Hebrides bird recorders association, pers. comm.). Also, the values in Fig. 2 are spatially structured, with the stronger evidence for environmental variables as sole explanation in squares with values between −0·5 and 0·5.

Our findings are consistent both with the fact that agricultural activity and soil characteristics influence important prey, such as earthworms, at the landscape scale (e.g. Decaëns, Bureau & Margerie 2003), and with the social nature of this species. The selected environmental variables reflect invertebrate food availability and are likely to play an important part during chick rearing and in the post-fledging period, when the young rooks are entirely dependent on invertebrates (e.g. Lockie 1956, 1959; Feare et al. 1974; Coombs 1978). Many abundance ‘hotspots’ can be justified by environmental variables. However, several others are likely to be also caused by dispersal from adjacent areas (see especially areas with value > 2 in Fig. 2). This cannot be detected by the nonspatial model and exemplifies why a spatial model is necessary. Environmental variables and nest density are spatially autocorrelated to different extents, and there seems to be some disagreement in the ecological literature regarding whether and when autocorrelation of observations is a problem for data analysis (e.g. Lennon 2000; Diniz-Filho, Bini & Hawkins 2003). It is a problem only when errors are autocorrelated and therefore not independent, which leads to spurious significance of some variables. This is corroborated by our results showing that the best supported spatial model has a better fit to the data and is also more parsimonious, because it regards fewer variables as important in comparison with the others.

Evidence that biological interactions influence rook nest density has already been presented by Griffin & Thomas (2000) at the landscape scale. These authors suggested that the number of nests per colony is influenced by both environmental factors and competition. Moreover, they proposed that unexplained variance might be associated with social interactions (see also Ainley, Nur & Woehler 1995; Danchin & Wagner 1997). We interpret the spatial effect mapped in Fig. 2 as capturing most of the geographical structure of such interactions, and therefore propose that this can be regarded as evidence that, while competitive effects shape the fine details of the abundance pattern at the landscape scale, dispersal and conspecific attraction (as well as environmental variables) influence the abundance pattern at the macroecological scale. Care, however, should be taken in such interpretation: the spatial effect indicates that the model with the environmental variables alone does not fit, but cannot tell us exactly and explicitly why. As always, when interpreting statistical results of fitting models to observational data, one has to caution against their causal significance. None the less, we contend that our interpretation is an ‘inference to the best explanation’, which accounts for biological as well as statistical arguments, is more parsimonious than adding a high number of variables with diminishing returns in terms of fit, and is testable. There is good reason to believe that dispersal is mostly responsible for the pattern observed. This is because (1) a previous thorough analysis (Gimona 1998) excluded seven other environmental variables, including hectares of cereals (see Appendix S1 for details), and (2) the structure of the spatial term lends itself to reasonable interpretation based on behaviour, given the highly social nature of the species.

It should be noted that some areas with no rooks recorded have a positive spatial random effect, apparently implying more rooks were recorded than expected – which seems puzzling at first sight. However, such grid squares are likely to have neighbouring squares, which do have rooks. The spatial model is using information from the spatial autocorrelation present in the data to suggest a nonzero probability of finding rooks in an area where, in fact, none were found. The positive random effect relates to this probability, and not to the actual count of zero.

Source-sink dynamics have been proposed to explain the distribution of other bird species, such as oven bird and wood thrush (Lloyd et al. 2005); and corn bunting (Donald & Evans 1995). However, this explanation should be evaluated critically, and one could choose to regard it as a spatially targeted hypothesis that would need further experimental testing. For instance, the areas highlighted by the CAR models could be subjected to comparisons in nest success and site fidelity for breeding birds, and movements within and between areas. Also, it is very important to notice that variables recorded at a much finer scale might be important for a species but not appear in wide-scale studies (see, e.g. McCulloch & Norris 2001; Norris, Atkinson & Gill 2004) and that understanding the causes of macroecological patterns most likely involves investigations at multiple scales. Given the resolution of the study, we did not have any direct information, for example, on how exactly environmental variables might interact with any density-dependent social behaviour to produce the observed pattern. Nevertheless, this is a promising use of CAR models, which has general application, and is therefore potentially relevant to any study using atlas-type data aimed at identifying important environmental variables and therefore areas that appear important for a species’ population dynamics. Additionally, broad areas could be highlighted for landscape-scale studies within a species distribution range, with the intent of clarifying whether they are population sinks/sources. This approach can therefore offer new insights in the analysis of macroecological data.


The approach presented in this paper can be applied to any species and has several advantages: it permits: (1) investigation of the structure of species abundance – using, for example, atlas data – while accounting for autocorrelation; (2) highlighting of the geographical areas in which dispersal form other locations or isolation is likely to contribute more/less substantially to shaping that pattern; and (3) assessment of uncertainty in all parameters via their posterior probability distributions, from which ‘credible intervals’ at different probability thresholds can be derived.

Bayesian CAR models therefore offer geographical ecologists the opportunity to produce statistically sound results, greatly reducing the probability of overestimating the influence of climatic and other variables on distribution patterns. This has obvious advantages, for example, in bioclimatic models.

Potential applications of this approach for species of higher conservation value include prediction of range shift under different climate scenarios and better estimation of habitat suitability and source/sink areas for landscape conservation planning.


We thank the Scottish Executive Environmental and Rural Affairs Department for financial support. Thanks also to Simon Thirgood, David Elston, Jonathan Yearsley and two anonymous referees for helpful comments on the manuscript.