SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References

In the face of accelerating biodiversity loss and limited data, species distribution models – which statistically capture and predict species’ occurrences based on environmental correlates – are increasingly used to inform conservation strategies. Additionally, distribution models and their fit provide insights on the broad-scale environmental niche of species. To investigate whether the performance of such models varies with species’ ecological characteristics, we examined distribution models for 1329 bird species in southern and eastern Africa. The models were constructed at two spatial resolutions with both logistic and autologistic regression. Satellite-derived environmental indices served as predictors, and model accuracy was assessed with three metrics: sensitivity, specificity and the area under the curve (AUC) of receiver operating characteristics plots. We then determined the relationship between each measure of accuracy and ten ecological species characteristics using generalised linear models.

Among the ecological traits tested, species’ range size, migratory status, affinity for wetlands and endemism proved most influential on the performance of distribution models. The number of habitat types frequented (habitat tolerance), trophic rank, body mass, preferred habitat structure and association with sub-resolution habitats also showed some effect. In contrast, conservation status made no significant impact. These findings did not differ from one spatial resolution to the next. Our analyses thus provide conservation scientists and resource managers with a rule of thumb that helps distinguish, on the basis of ecological traits, between species whose occurrence is reliably or less reliably predicted by distribution models. Reasonably accurate distribution models should, however, be attainable for most species, because the influence ecological traits bore on model performance was only limited. These results suggest that none of the ecological traits tested provides an obvious correlate for environmental niche breadth or intra-specific niche differentiation.

Humans are rapidly altering ecosystems the world over (Balmford et al. 2002). Research exploring the ecological consequences of this change, and efforts to protect individual species (Corsi et al. 1999) or biodiversity overall (Gioia and Pigott 2000) increasingly rely on species distribution models. These models use empirical data to describe and predict the occurrence of individual species, essentially by quantifying statistically their broad environmental (Grinellian) niche (Guisan and Zimmermann 2000, Guisan and Thuiller 2005).

Such models are useful to: identify species’ ecological requirements (Diekotter et al. 2006); determine possible causes of species declines (Knapp et al. 2003); ascertain inter-specific competition (Leathwick and Austin 2001); uncover evolutionary processes in range dynamics (Peterson and Holt 2003); estimate species persistence in reserve networks (Burns et al. 2003); supplement inventory data (Engler et al. 2004); and forecast species invasions (Thuiller et al. 2005), or the effects of climate change (Thuiller 2003) and other habitat alterations (Manel et al. 2000).

Ideally, the reliability of distribution models should be carefully assessed before their predictions inform conservation planning (Fielding and Bell 1997). Adequate tests of model accuracy are, however, difficult where data are scarce (Gibson et al. 2004) or models explore future scenarios (Thuiller 2003, but see Araújo et al. 2005). Under such circumstances, a rule of thumb would be useful to help distinguish between models that should be treated with scepticism and those whose predictions can be applied with reasonable confidence.

Model reliability depends on a number of factors, including methodological aspects, such as the model algorithm used (Thuiller 2003, Segurado and Araújo 2004), and the nature of data available for model training (Kadmon et al. 2003, McPherson et al. 2004).

Also of potential importance are ecological characteristics of the species being modelled, since these can affect distribution models in two ways. First, they can influence the quality of data available for model development and testing (Boone and Krohn 1999). Second, certain ecological characteristics may make it more difficult to statistically capture the relationship between the species’ occurrence and environmental conditions (Brotons et al. 2004). Species differ considerably in the breadth of their ecological niche, for example, and broad ranges are less likely to be adequately captured by any one predictor (Grinnell 1917, Colwell and Futuyma 1971). Moreover, differences in species’ dispersal patterns and associated gene flow may lead some species to exhibit sub-specific variations in habitat preferences (local adaptations, Holt 2003). Even in the absence of genetically driven differences in habitat use, spatial variation in competitors, predators or other biotic factors could result in different populations of a single species expressing different “realised niches” (Hutchinson 1957, Osborne and Suarez-Seoane 2002, Holt 2003, Peterson and Holt 2003). Such sub-specific differences risk reducing the accuracy with which distribution models capture occurrence-environment relationships in species-level analyses.

Among the ecological characteristics postulated to influence the accuracy of distribution models are: association with poorly sampled or poorly mapped habitats, body size, conservation status, habitat tolerance and distinctiveness, nomadism and migratory behaviour, population trend, range size, rarity, response to conspecifics, and trophic level (Elith and Burgman 2002, Garrison and Lupo 2002, Stockwell and Peterson 2002, Hepinstall et al. 2002, Huntley et al. 2004). How each of these characteristics might impact distribution models is outlined in Table 1.

Table 1.  Ecological traits that have been hypothesized to influence the accuracy of species distribution models, and ways in which they might exert their influence. For traits whose influence was examined in our study, the names and nature of representative measures used is indicated to the right.
Traits and their possible impact on distribution modelsMeasure
Association with poorly recorded environmentswetland affinity (categorical)
• models for species associated with habitats poorly captured in survey data (e.g. man-modified habitats in surveys focused on natural environments) may suffer from limited amounts of data or a distorted representation of habitat preferences (Pearce et al. 2001).habitat structure (categorical)
• where species’ habitat preferences are difficult to capture in predictor variables (e.g. localised resources missed by coarse environmental assessments), distribution models may misjudge occurrence-environment relationship or not find any statistical associations (Fielding and Haworth 1995).sub-resolution habitat (categorical)
 
Body size 
• larger species may be more conspicuous, improving data availability.body mass (quantitative)
• if larger species have larger home ranges, they may perceive the environment at coarser scales, facilitating distribution models based on coarse-grained predictors (Suarez-Seoane et al. 2002). 
• where body size correlates with range size, spatial variability in habitat associations may affect the success of distribution models (see range size). 
 
Competitive exclusion 
• if a species is prevented by another, ecologically similar species from utilising otherwise suitable habitat, distribution models that do not include the competing species as predictor variable will likely overestimate the species’ distribution (Best and Stauffer 1986). 
 
Conservation status 
• Species considered threatened because of limited range size or population size impose the same constraints on data availability as any rare species (see rarity).conservation status (categorical)
• Species considered threatened due to their range or population declining may be underutilising suitable environments, leading to model misspecification (see population trend). 
 
Dispersal mode 
• Dispersal mechanisms and dispersal associated behaviours, including site fidelity, may result in the absence of individuals from suitable areas or their presence at suboptimal sites, and can thus lead models to misjudge habitat preferences (Fielding and Bell 1997, Pulliam 2000, Knick and Rotenberry 2000). 
 
Endemism 
• Although endemism has not previously been identified as influential on model accuracy, we postulated that species endemic to the study region yield better models than non-endemics, because models receive less comprehensive data on the latter's habitat preferences.endemism (categorical)
 
Habitat tolerance and distinctiveness 
• Species utilising a wide variety of habitats, or habitats that predominate in the study region, may impede model algorithms from distinguishing between suitable habitats and the overall environment (Brotons et al. 2004).habitat tolerance (quantitative)
 
Nomadism and migration 
• if the movement of nomadic species responds to resources too localised (e.g. fruiting trees) or short-lived (e.g. ephemeral water bodies) to be captured in predictor variables, distribution models may misjudge species-habitat associations (Pearce et al. 2001).migration behaviour (categorical)
• Seasonal changes in the occurrence and habitat requirements of migratory species may lead models to overestimate the species’ ecological niche unless occurrence data and predictor variables have the appropriate temporal resolution (Neave et al. 1996). 
• Migrants may perceive the landscape at coarser scales, facilitating distribution models based on coarse-grained predictors (Mitchell et al. 2001). 
 
Population trend 
• at high or fluctuating population sizes, density dependent habitat selection can lead species to occupy suboptimal habitats, leading distribution models to overestimate the species’ ecological niche (Hepinstall et al. 2002). 
• in rapidly declining species, predominantly suboptimal habitat may be occupied, leading distribution models to misread occurrence-environment relationships (Hepinstall et al. 2002). 
• if species colonising new areas (introduced species) or recovering from past declines are not yet utilising all suitable environments, distribution models may underestimate ecological niches (Hepinstall et al. 2002). 
 
Range size 
• range size influences either the amount of data available for model development or the balance between observations of presence and absence, which – unless controlled for – influences model accuracy through statistical artefacts (McPherson et al. 2004).global range (quantitative)
• if widespread species exhibit local adaptations in habitat preferences, spatial variability in occurrence-environment relationships may lead models to overestimate the species’ ecological niche (Stockwell and Peterson 2002). 
 
Rarity 
• rarity reduces the amount and quality of data available for model development if rare species occupy few sites (range size rarity) or are overlooked during surveys (low density) (Garrison and Lupo 2002, Kadmon et al. 2003). 
 
Response to conspecifics 
• if territorial species or those living in large groups (colonies, herds) are more conspicuous thanks to behaviours associated with territory defence (e.g. vocalisations) or high local abundance, more data may be available for model development (Garrison and Lupo 2002). 
• where the presence, absence or distance to conspecifics significantly influences occurrence patterns, weakened species-environment relationships may lead to poorer model specification (Fielding and Haworth 1995). 
 
Trophic level 
• species of higher trophic level may yield poorer models if they respond predominantly to biotic interactions, which are more difficult to incorporate in distribution models than the abiotic resources thought to constrain the distributions of organisms at lower trophic levels (Huntley et al. 2004).trophic rank (quantitative)
• as noted by a reviewer, certain biotic interactions such as competition or specialisation on single prey species may, however, amplify abiotic constraints (e.g. via narrowing realised niche space, Austin 2002), negating above point. 

Empirical assessments of the influence species’ traits exert on model accuracy have to date yielded mixed results, with some authors finding no effect (Elith and Burgman 2002, Huntley et al. 2004) and others reporting significant impacts by one or another ecological characteristic (Stockwell and Peterson 2002, Kadmon et al. 2003, Segurado and Araújo 2004). Most studies, however, contemplated only a few (≤3) ecological traits (Pearce et al. 2001, Garrison and Lupo 2002, Karl et al. 2002, Huntley et al. 2004), considered distribution models for only a limited number (<50) of species (Mitchell et al. 2001, Elith and Burgman 2002, Hepinstall et al. 2002, Segurado and Araújo 2004, Brotons et al. 2004), or used measures of model accuracy vulnerable to statistical artefacts (Boone and Krohn 1999, Stockwell and Peterson 2002, Kadmon et al. 2003; see discussion for further details).

To overcome these shortcomings, our study examined the relationship between model accuracy and ten ecological characteristics in distribution models for>1300 bird species in southern and eastern Africa. Models were built with ordinary logistic as well as autologistic regression at two spatial resolutions, because the influence of ecological characteristics on model accuracy may vary with the modelling algorithm used (Hepinstall et al. 2002, Segurado and Araújo 2004), and potentially with spatial scale (Mitchell et al. 2001). The primary aim was to identify types of species for which distribution models yield poor results, so that such species can be handled with extra care in future assessments for conservation planning. In addition, our results offer preliminary insights on the interrelationship between a species’ core ecological attributes and its broad-scale environmental niche.

Materials and methods

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References

Overview

We used logistic and autologistic models to relate the presence and absence of 1329 bird species in two avifaunal zones of southern and eastern Africa to satellite-derived environmental indices related to climate and vegetation. The accuracy of these distribution models was measured on data withheld from model training using three metrics: sensitivity, specificity, and the area under the curve (AUC) of receiver operating characteristics (ROC) plots. We then used generalised linear models and Kruskal-Wallis H-tests to examine the relationships between model accuracy and ten ecological traits.

Species distribution data

Information on species’ distributions was derived from a database integrating bird atlas data from 14 nations in southern and eastern Africa (Fig. 1). This database draws on both published atlases (Lewis and Pomeroy 1989, Parker 1994, 1999, Harrison et al. 1997, Dean 2000, Carswell et al. 2005) and works in progress (contributions by N. and L. Baker, B. Dowsett, V. Parker and T. Pedersen), recording species occurrences at varying spatial and temporal resolutions.

image

Figure 1. The study region, indicating areas from which data on species occurrence was available (grey shading), country boundaries (grey lines) and the two avifaunal zones identified by de Klerk et al. (2002) to which analyses were limited (black lines).

Download figure to PowerPoint

Data were summarised across seasons and each bird sighting geo-referenced to a grid of half-degree squares (HDS, 0.5° longitude by 0.5° latitude), the coarsest resolution among original data. For a subset of countries (Angola, the Democratic Republic of Congo, Lesotho, Malawi, Mozambique, Namibia, South Africa, Swaziland, Uganda, and Zimbabwe), distribution records were also geo-referenced to a grid of quarter-degree squares (QDS, 0.25° longitude by 0.25° latitude), allowing for analyses at finer resolution.

Taxonomic discrepancies between sources were harmonised following Sibley and Monroe (1990, 1993), with some modifications. The resulting database at half-degree resolution contained 471 720 occurrence records for 1697 species, where each record represents a unique combination of species and HDS. Data at quarter-degree resolution captured the distributions of 1,475 species with 628 917 records.

Not all species, however, were included in analysis: some occurred in too few (<10) HDS or QDS to allow for the construction of distribution models; others lacked information on ecological traits. Half-degree models, therefore, were constructed for a total of 1315 species occupying 2515 HDS, quarter-degree models for 1092 species occupying 5166 QDS. 1078 species were modelled at both resolutions.

Species’ ecological traits

For each species analysed, we compiled information on: body mass, conservation status, diet preferences, endemism, global range size, migratory behaviour, and typical habitat. As a shorthand, we refer to all these characteristics as “ecological traits”, but recognise that some are intrinsic to the species (e.g. body mass), while others arise out of interaction with the physical environment and other organisms, including humans (e.g. range size and conservation status).

Information on body mass was gleaned primarily from Dunning (1993) and Brown et al. (1982–2004). Where available, the mean or maximum recorded female body mass was noted in grams. Species lacking body mass statistics were assigned the mean weight of congeneric species. Body mass was then log-transformed to improve normality.

Conservation status was taken from the IUCN Red List (Anon. 2004) and re-coded as a binary variable. Species of Vulnerable, Endangered or Critically Endangered status were considered threatened. All others, including Data Deficient and unevaluated species (which are scarce among birds), were treated as not threatened.

The diet preferences of most species were obtained from data compiled by Şekercioğlu et al. (2004) and supplemented with information in Brown et al. (1982–2004) and Ginn et al. (1989). Food items were classified into nine categories, assigned a trophic stratum (Table 2) and ranked in the order of each species’ preference based on quantitative diet analyses or verbal qualifiers included in published diet descriptions. The trophic rank of each species was then calculated as the mean trophic stratum of all food items, with items of primary preference double-counted to attribute them greater importance.

Table 2.  Dietary categories used in determining the trophic rank of bird species.
CategoryTrophic stratumExamples
Fruit1fruit, drupes, berries
Nectar1nectar
Seed1seeds, maize, nuts, spores
Plant material1leaves, buds, bulbs, roots, tubers, grass, algae, vegetation
Invertebrates2insects, arthropods, krill, shrimp, polychaetes, gastropods, molluscs
Land vertebrates3reptiles, snakes, amphibians, salamanders, mammals, birds
Fish3fish
Omnivory3omnivore, generalist, opportunist
Scavenge4carcasses, garbage, offal, fishing boat discards, scavenger

Species endemic to the 14 countries covered by our database were identified based on range descriptions in Avibase (Lepage 2005).

Each species’ global range size (extent of occurrence in km2, log-transformed) was computed in ArcGIS 8.0 (ESRI) based on digitised, equal area projections of published range maps.

The migratory behaviour of species was determined using Dowsett and Forbes-Watson (1993). Seasonal visitors (breeding and non-breeding) and species exhibiting regional movements within the study area were classified as migratory. Others, including locally nomadic species, were treated as sedentary.

Habitat information was gleaned primarily from Sibley and Monroe (1990, 1993) and used to develop four separate predictors. The first, habitat tolerance, measured ecological breadth as the number of habitat categories (out of 18) reportedly utilised by the species. The remaining three variables were categorical and identified species associated with habitats poorly captured by the satellite-derived environmental indices used in model development.

The variable wetland affinity flagged species associated with freshwater bodies and wetlands, because wetlands are difficult to distinguish in coarse-grained satellite imagery, particularly if they contain emergent vegetation (Girard and Girard 2003).

The variable sub-resolution habitat identified species partial to linear habitats (e.g. riparian vegetation, coastal beaches) or localised features (cliffs, caves, crevices) poorly described by gridded environmental data of quarter or half-degree resolution.

Habitat structure distinguished species confined to structurally complex environments (forests, woodlands, thickets, bush), species occurring exclusively in open, structurally simpler habitats (grasslands, heath, farmland, desert), and those occupying both. In vertically structured environments, satellite-derived indices might miss important nuances in sub-canopy habitats (Joachim et al. 1998). In open areas, in contrast, indices may identify potentially trivial details such as soil type (Hay 2000), and so unnecessarily complicate species-environment relationships in distribution models.

Satellite-derived environmental predictors

A broad range of environmental predictors (61 in total) was considered to ensure that unwitting omission of variables important to particular ecological groups would not bias results. Among these predictors, individual species’ distribution models picked 14 on average based on forward stepwise variable selection. Models were thus saturated with predictors before we tested for ecologically driven differences in model accuracy.

Among the predictors considered, mean altitude was obtained from a digital elevation model provided by the United States Geological Survey's EROS Data Center (<http://edcdaac.usgs.gov/gtopo30/gtopo30.html>).

The remaining predictors constituted seasonal measures of satellite-derived environmental indices. Satellite images collected twice daily over an 18-yr period (1982–1999) by the United States National Oceanic and Atmospheric Administration's Advanced Very High Resolution Radiometer (AVHRR) satellites contributed: land surface temperature; air temperature; the vapour pressure deficit (a measure of the air's drying power); a middle infrared signal reflective of both temperature and vegetation structure; and the normalised difference vegetation index (NDVI), which estimates photosynthetic activity (Hay 2000, Goetz et al. 2000). Cold cloud duration, an index of rainfall (Hay 2000), was obtained from 10 yr (1989–1998) worth of European Meteosat imagery.

All imagery was composited into cloud-free, monthly images and re-sampled from its original spatial resolution of 1 km2 to the quarter and half-degree grids used to geo-reference bird data. Each index was then subjected to Fourier analysis, a data reduction technique ideal for summarising seasonal variables (Rogers et al. 1996). Fourier analysis extracted, from each environmental index, the overall mean, minimum, maximum, and variance, plus the amplitude (strength) and phase (timing) of annual, biannual and triannual cycles.

Distribution modelling

Many techniques exist for building distribution models. Given the number of species examined, we focused on just two for practical reasons: logistic regression and its derivative, autologistic regression. Logistic regression is a commonly used technique that performs well in comparison with other modelling techniques (Elith et al. 2006).

At both quarter and half-degree resolution, distribution models were first built with logistic regression, which uses data on species presence and absence to establish what proportion of sites the species occupies at each value of the explanatory variables. A logit link ensures that a linear function of predictors yields response values between 0 and 1 for each site, representing the probability of species occurrence (Legendre and Legendre 1998). Logistic regression was implemented in S-Plus (Anon. 2001) with forward stepwise variable selection based on the Akaike information criterion (Sakamoto et al. 1986). Stepwise variable selection has its shortcomings (Guisan and Thuiller 2005), but as we did not need to draw inference based on the variables selected, we chose it for computational speed.

Like many widely used regression techniques, logistic regression assumes that individual data points are independent of each other. This assumption is violated in the presence of spatial autocorrelation, which is common in ecological data: data points in close proximity tend to be more alike than data points further apart (Legendre and Legendre 1998). In species’ distribution records, such patterns can originate in processes endogenous to the species, like conspecific attraction and dispersal limitations, or in functional ties between the species’ occurrence and environmental conditions that in themselves are spatially structured (Keitt et al. 2002). Some of these processes operate at scales too fine to be detectable at quarter or half-degree resolution (e.g. conspecific attraction), but others can operate at much coarser scales (Legendre and Legendre 1998), and visual inspection of our data certainly suggested that species’ occurrences were clustered in space.

In the presence of such spatial autocorrelation, logistic regression may misjudge the relative importance of predictors (Hoeting et al. 2000). The problem can be addressed by incorporating a spatial parameter into logistic models that reflects how strongly a species’ probability of occurrence at one site is affected by its presence or absence at neighbouring sites. The result is known as autologistic regression (Augustin et al. 1996).

In autologistic models, calculating the probability of a species’ presence at any one site has immediate knock-on effects on predictions for neighbouring sites. Model computation becomes challenging, therefore, when the study region includes unsurveyed sites. The solution is an iterative procedure known as Gibbs sampler, which repeatedly updates predictions at each site based on the site-specific environmental conditions and the (ever-changing) predictions at neighbouring sites (Augustin et al. 1996, Hoeting et al. 2000).

Autologistic regression was implemented in S-Plus via a custom-written program (Appendix, Text S1). The Gibbs sampler cycled through 50 iterations, which sufficed for convergence in the majority of models. Grid squares (QDS or HDS) influenced each others’ predictions if they were in immediate contact, so that any focal square had up to eight neighbours. Neighbours outside the study area or beyond the coastline were ignored. Renewed variable selection under consideration of the autocovariate would have been desirable. Variable selection methods for models incorporating spatial autocorrelation are only now being developed (Hoeting et al. 2006), however, and we are as of yet unaware of a technique that could feasibly be applied to models as computationally intensive as autologistic regressions with Gibbs sampler. We therefore followed other authors (Augustin et al. 1996) in using the same environmental predictors for each species as selected during logistic regression.

To retain discrete data for model testing, logistic and autologistic models were trained with only two-thirds of the data available per species. Both training and test data contained an equal number of presence and absence samples, because this optimises accuracy in logistic models and ensures comparability between species (McPherson et al. 2004, but see Real et al. 2006 for an alternative approach). Where necessary, the balance between presence and absence samples was achieved by sub-sampling the more numerous category (generally absence). Potential absence samples included any surveyed square not recorded to harbour the species.

To avoid forcing models to make predictions for areas ecologically distinct from training data, analyses were limited to two avifaunal zones that encompassed the bulk of our distribution records. These avifaunal zones were identified by de Klerk et al. (2002) as the southwestern and southern Savanna subregions (Fig. 1).

Measures of model accuracy

Model accuracy was measured on data withheld from model training in a split-sample approach. Such test data are not fully independent from training data due to spatial autocorrelation (Araújo et al. 2005). Nonetheless, they yield more conservative estimates of model accuracy than training data themselves (Fielding and Bell 1997, McPherson et al. 2004) and can provide a reasonable surrogate for fully independent data (Araújo et al. 2005). More precise estimates of accuracy could be obtained by k-fold partitioning or bootstrapping (Verbyla and Litvaitis 1989), but this was unfeasible in our study given the large number of species modelled.

We used three metrics to measure accuracy: sensitivity, specificity, and AUC. All three range from zero (very poor model accuracy) to one (perfect fit between observations and predictions). Sensitivity quantifies the proportion of correctly predicted observations of species presence. Sensitivity is low, therefore, when omission errors (erroneous predictions of absence) are common. Specificity, conversely, identifies the proportion of correctly predicted observations of species absence. Thus specificity is low when commission errors (erroneous predictions of presence) are frequent (Fielding and Bell 1997). To calculate the two measures, probabilistic estimates of species occurrence must be classified into presence-absence predictions. A classification threshold of 0.5 was used to strike a compromise between omission and commission errors (McPherson et al. 2004).

AUC measures the area under a receiver operating characteristics (ROC) curve, which plots sensitivity against (1 – specificity) over a number of classification thresholds. AUC is thus independent of classification thresholds and evaluates the ability of models to correctly predict a higher probability of occurrence where species are present than where they are absent. It was calculated non-parametrically using the Wilcoxon statistic. Values below 0.7 indicate poor model performance, as they suggest similar rates of correct and erroneous predictions; values between 0.7 and 0.9 indicate (moderately) useful models; values exceeding 0.9 signify excellent accuracy (Pearce and Ferrier 2000).

Linking accuracy to species’ ecology

The relationship between model accuracy and ecological traits was examined using simple (single-trait) and multiple (multi-trait) regressions as well as Kruskal-Wallis H-tests. All calculations were undertaken in S-Plus (Anon. 2001). Holm's method was used to adjust significance levels for multiple testing (Aickin and Gensler 1996). We also present unadjusted significance levels, however, given debate over the appropriateness of adjustments (Perneger 1998).

Regressions were implemented as generalised linear models (GLMs) with a binomial error distribution and logit link function, because the response variable (accuracy) was limited to values between 0 and 1. The proportion of variance in distribution model accuracy explained by these GLMs was quantified via the coefficient of determination (r2, calculated as the square of the Pearson correlation coefficient between observed and predicted accuracy). The statistical significance and relative importance of individual ecological traits in explaining accuracy was assessed using log-likelihood ratio tests (Legendre and Legendre 1998). To identify a parsimonious model for each measure of accuracy, various combinations of ecological predictors and their interactions were examined, using both r2 and the Akaike information criterion as guides.

For Kruskal-Wallis H-tests, a non-parametric analysis of variance, continuous ecological traits were converted to binary variables (e.g. small vs big global range) using the median to split data into lower and upper values.

Results

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References

The influence of spatial scale

Results obtained at quarter-degree resolution were qualitatively and quantitatively similar to those obtained at half-degree resolution. Consequently, only results at half-degree resolution are reported below, but figures and tables for results at quarter-degree resolution are provided in the Appendix.

Overall model accuracy

Both logistic and autologistic distribution models achieved high accuracy on average (Fig. 2). On a species-by-species basis, autologistic models slightly but significantly outperformed logistic models (paired-sample Wilcoxon's signed rank tests: Z<−13.64, p<0.001, n=1,315 for sensitivity, specificity and AUC).

image

Figure 2. The accuracy achieved by logistic (top) and autologistic (bottom) distribution models at half-degree resolution. Accuracy was measured as sensitivity (left), specificity (middle) and AUC (right). Density plots show the spread of accuracy values attained by models for 1315 species; vertical lines indicate mean accuracy (±one standard deviation).

Download figure to PowerPoint

The influence of ecological characteristics

Single-trait GLMs had weak explanatory power (maximum r2=0.15) but – based on unadjusted significance levels – suggested that six of the ten ecological predictors examined significantly influenced at least one measure of model accuracy: global range size, migratory behaviour, wetland affinity, endemism, habitat tolerance and body mass. Among these, global range size, migratory behaviour and wetland affinity exhibited the most consistent effects: regardless of model algorithm, all measures of accuracy declined as range size increased, if species were migratory, or if they frequented wetlands (Table 3).

Table 3.  Effects of individual ecological traits on model accuracy at half-degree resolution. Shown are coefficient estimates (±SE) for the intercept and trait, as well as the coefficient of determination (r2) of single-trait models. Whether a trait's influence was significant was determined via log-likelihood ratio tests. Significance is indicated with one asterisk (*) when p<0.05 or two asterisks (**) when p<0.01. Parameter estimates in bold remained significant (at global α=0.05) after Holm's adjustment for multiple tests.
 Logistic regressionAutologistic regressioni
Response PredictorInterceptParameterr2InterceptParameterr2
Sensitivity
 body mass (log)2.29±0.22−0.23±0.11*0.042.37±0.23−0.20±0.120.03
 conservation status1.83±0.25−0.04±0.250.002.14±0.300.14±0.300.00
 endemism2.05±0.110.30±0.11**0.062.29±0.120.29±0.12**0.06
 global range size (log)5.65±0.96−0.58±0.14**0.155.62±1.01−0.55±0.15**0.13
 habitat structure      
  open1.86±0.08−0.07±0.100.022.01±0.09−0.09 ±0.110.02
  complex 0.08±0.06  0.06±0.06 
 habitat tolerance2.21±0.17−0.16±0.07*0.052.32±0.18−0.14±0.07*0.04
 migratory behaviour1.76±0.09−0.26±0.09**0.071.90±0.09−0.27±0.09**0.08
 sub-resolution habitat1.85±0.08−0.08±0.080.012.00±0.09−0.08±0.090.01
 trophic rank2.40±0.29−0.29±0.150.032.51±0.30−0.26±0.150.03
 wetland affinity1.69±0.09−0.31±0.09**0.091.85±0.10−0.28±0.10**0.08
      
Specificity
 body mass (log)1.84±0.20−0.15±00.021.97±0.21−0.13±0.110.01
 conservation status1.60±0.230.03±0.230.001.68±0.23−0.06±0.230.00
 endemism1.68±0.090.19±0.09*0.041.84±0.100.19±00.19±00.04
 global range size (log)4.67±0.85−0.48±0.13**0.154.08±0.88−0.36±0.13**0.08
 habitat structure      
  open1.57±0.07−0.05±0.090.011.73±0.08−0.07±0.100.01
  complex 0.04±0.05  0.03±0.05 
 habitat tolerance1.85±0.15−0.13±0.06*0.041.96±0.16−0.11±0.070.03
 migratory behaviour1.47±0.08−0.22±0.08**0.071.64±0.08−0.22±0.08**0.07
 sub-resolution habitat1.55±0.07−0.07±0.070.011.71±0.08−0.09±0.080.01
 trophic rank2.00±0.26−0.23±0.130.032.15±0.28−0.22±0.140.02
 wetland affinity1.44±0.09−0.21±0.09*0.061.62±0.09−0.19±0.09*0.04
      
AUC
 body mass (log)2.45±0.24−0.09±0.120.032.66±0.27−0.15±0.140.02
 conservation status2.03±0.26−0.09±0.260.002.35±0.30−0.04±0.300.00
 endemism2.24±0.120.22±0.12*0.042.53±0.130.24±0.130.04
 global range size (log)4.80±1.02−0.41±0.15**0.104.86±1.14−0.38±0.17*0.07
 habitat structure      
  open2.11±0.09−0.08±0.110.012.38±0.10−0.09±0.130.02
  complex 0.05±0.06  0.06±0.07 
 habitat tolerance2.41±0.19−0.14±0.070.042.65±0.21−0.12±0.080.03
 migratory behaviour2.00±0.09−0.26±0.09**0.092.27±0.10−0.29±0.10**0.10
 sub-resolution habitat2.09±0.09−0.10±0.090.022.36±0.10−0.13±00.02
 trophic rank2.60±0.32−0.26±0.160.032.84±0.35−0.24±0.180.02
 wetland affinity1.95±0.10−0.27±0.10**0.092.23±0.12−0.26±0.12*0.07

Holm's adjustments for multiple comparisons reduced the number significant predictors in single-trait GLMs to four traits exerting their impacts primarily on sensitivity: global range size, migratory behaviour, wetland affinity and endemism (Table 3). In contrast, Kruskal-Wallis H-tests on the categorical variables suggested that all measures of accuracy were affected by ecological traits, and that all traits except conservation status had a significant influence (before and after Holm's adjustments; Fig. 3). Accuracy was lower in: wide-ranging species (global range size≥3 325 075 km2) than narrow-ranging species; migrants than non-migrants; species associated with wetlands than species not associated; non-endemics than endemics; species tolerating many (≥2) rather than few land cover types; species of higher (≥1.8) than lower trophic rank; larger (≥38 g) than smaller species; species associated with sub-resolution habitats than species not associated; and species restricted to open habitats than species frequenting vertically structured habitats or both (Fig. 4). These patterns held for all measures of accuracy and at both spatial scales, except that the effect of association with sub-resolution habitats lost significance at quarter-degree resolution (Appendix, Fig. S2, S3).

image

Figure 3. The relative importance of ecological traits in influencing the accuracy (sensitivity, specificity or AUC, as indicated) of species distribution models built at half-degree resolution. Relative importance was judged using: 1) each traits’ likelihood ratio statistics in single-trait (left) and multi-trait (not shown) generalised linear models (GLMs), measuring the change in model deviance attributable to that trait; and 2) the test statistic of Kruskal-Wallis H-tests (right). Both statistics are chi-square distributed and were significant (at test-specific α=0.05) only if they surpassed a value of 3.84, demarcated by the dashed horizontal line. (For habitat structure, a three-level factor-variable, the significance cut-off was 5.99, not shown.) Stars above bars indicate which trait's impacts retained significance (at global α=0.05) after Holm's adjustments for multiple comparisons.

Download figure to PowerPoint

image

Figure 4. Box plots illustrating the impact of ecological traits on model accuracy at half-degree resolution. Impacts were highly similar across models types (logistic vs autologistic) and accuracy metrics (AUC, sensitivity, and specificity). We therefore show patterns for only AUC in logistic models. Boxes delimit the inter-quartile range, with girdles at the median and notches to indicate the median's 95% confidence intervals. Whiskers show the spread of data up to 1.5 times the inter-quartile range.

Download figure to PowerPoint

Note that the relative magnitude of H-test statistics suggested that the impacts of global range, migratory behaviour, wetland affinity and endemism were considerably stronger than those of other traits. H-test and single-trait GLMs thus agreed on the relative importance of predictors, just not significance levels (Fig. 3, Appendix, Fig. S2).

Multi-trait GLMs that simultaneously included all 10 ecological traits as predictors assigned significance only to global range size (when predicting sensitivity and specificity) or no variable at all (in the case of AUC and most cases after Holm's adjustments). Judging from likelihood ratios, however, the relative importance of ecological predictors was again similar, although body mass and trophic rank gained influence in comparative terms (not shown).

In our search for parsimonious models, global range size and migratory behaviour emerged as the most universally applicable predictors, with wetland affinity occasionally providing a useful third or substitute. Parsimonious models explained up to 20% of the variation in distribution model accuracy (Table 4) and suggested, for example, that migratory wetland species with large global ranges (35 000 000 km2) would typically yield logistic models with AUCs of 0.79, whereas models for narrow-ranging (1 500 000 km2) non-migrants with no wetland affiliation reached AUCs of 0.88.

Table 4.  Parsimonious generalised linear models (GLMs) describing the influence of ecological characteristics on each of three measures of accuracy in logistic and autologistic distribution models built at half-degree resolution. Shown are coefficient estimates (±standard error) for each GLM's intercept and applicable predictors, as well as the model's coefficient of determination (r2).
Response [RIGHTWARDS ARROW]Logistic regression modelsAutologistic regression models
[DOWNWARDS ARROW] PredictorsSensitivitySpecificityAUCSensitivitySpecificityAUC
Intercept4.52±1.044.19±0.883.63±1.104.94±1.053.51±0.912.17±0.12
Global range size−0.43±0.15−0.41±0.13−0.26±0.16−0.46±0.16−0.28±0.14 
Migratory behaviour−0.14±0.09−0.14±0.08−0.17±0.10−0.19±0.10−0.17±0.09−0.25±0.11
Wetland affinity−0.17±0.10 −0.17±0.11  −0.19±0.12
r20.200.170.180.160.120.14

Discussion

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References

Species’ ecological traits have long been suspected to influence the accuracy of distribution models (Best and Stauffer 1986). Yet empirical assessments of this influence have been limited in number and often subject to methodological foibles.

Several studies, for example, relied on measures of accuracy such as the matching coefficient, sensitivity, specificity, or kappa (Garrison and Lupo 2002, Stockwell and Peterson 2002, Kadmon et al. 2003) without controlling for statistical artefacts arising from species’ prevalence (for a discussion of these artefacts see Fielding and Bell 1997, McPherson et al. 2004). Alternatively, accuracy was measured solely on data used in model parameterisation (Hepinstall et al. 2002, Segurado and Araújo 2004, Huntley et al. 2004), which tends to provide an overoptimistic impression of model reliability (Stockwell and Peterson 2002, McPherson et al. 2004), and may not allow for enough variability in accuracy to detect ecological differences. Furthermore, few authors have discussed in detail how species’ ecological traits might exert their control over model accuracy (Fielding and Bell 1997), and because most studies assess a limited number of traits, deliberations on this topic are scattered in the literature.

We have herein attempted a synopsis of the ecological traits postulated to affect model accuracy and the mechanisms by which they may wield their influence. Based on the literature, we identified 13 ecological traits as potentially influential (Table 1), and empirically examined the effects of eight of these. Findings for each trait are discussed individually below, following brief observations on how spatial resolution and the choice of model algorithm influenced results. Throughout, we try to differentiate between effects that reflect ecological phenomena and effects that have primarily methodological roots, nonetheless acknowledging that the two can become entangled.

Spatial resolution

The spatial scale at which species respond to their environment – and thus the best scale for modelling their distributions – may vary with species’ ecological characteristics (Hutchinson 1959, Mitchell et al. 2001). The interaction between ecological traits and model accuracy might, therefore, vary with the spatial resolution of analysis. Our results, however, indicated little difference between analyses at half-degree and quarter-degree resolution, possibly because both resolutions are coarse from an organism's perspective. Analyses at a wider range of scales may yet detect an effect of scale and shed light on how ecological traits influence species’ habitat perception.

Choice of model algorithm

A wealth of algorithms is available for the construction of species distribution models (Guisan and Zimmermann 2000). These algorithms diverge in terms of both the input data required and the computation of species-environment relationships. Consequently, they may also diverge in their sensitivity to species’ ecological traits (Hepinstall et al. 2002, Segurado and Araújo 2004).

Logistic and autologistic distribution models responded similarly to ecological characteristics in our analyses, although the accuracy of autologistic models was often slightly less affected. The two algorithms differ only in their approach to spatial autocorrelation. We conclude, therefore, that explicitly addressing spatial autocorrelation counteracts some of the statistical irregularities species’ ecology imposes on occurrence-environment relationships. This confirms that methodological aspects modulate how strongly ecological traits influence the accuracy of distribution models (Segurado and Araújo 2004).

Association with poorly recorded habitats

Species might favour habitats under-sampled during field surveys, or habitats ill-defined by the predictors used to construct distribution models. Either situation could influence model accuracy unfavourably, but via different mechanisms (Table 1).

Pearce et al. (2001) and Kadmon et al. (2003) noted that model accuracy suffers if data used in model parameterisation disproportionately under or over-represent aspects of a species’ environment. We were unable to determine how representative our distribution data were, but employed three categorical variables to flag habitats poorly captured by the predictors used.

Among them, habitat structure exerted a minor effect on model accuracy, with poorer predictive power for species in open than vertically structured habitats. Such an effect could have ecological roots: birds of prey, for example, may use open spaces as hunting grounds regardless of land use type, leading to occurrence records in a large variety of environments and thus a lack of well-defined statistical associations. In our study, species in open habitats did have greater habitat tolerance than species restricted to vertically structured habitats (Wilcoxon's rank sum test: Z<4.94, p<0.001, n1=408, n2=524). Species frequenting both types of environment had even larger habitat tolerance (Wilcoxon's rank sum test: Z<2.04, p<0.05, n1=383, n2=408), however, and nonetheless yielded better models. Our findings therefore more likely reflect methodological issues, with satellite-derived indices in open areas picking up soil type or other forms of environmental heterogeneity irrelevant to birds.

Species’ affiliation with sub-resolution habitats also had only a minor impact on model accuracy that lost significance at finer resolution, illustrating its methodological roots.

Wetland affinity, in contrast, emerged as one of the most significant influences on model accuracy, with wetland-affiliated species yielding poorer models. This, too, is clearly a methodological issue that use of more appropriate predictor variables should resolve (e.g. high-resolution land cover maps that adequately distinguish wetlands).

Body size

Body size has been suggested as a trait of interest (Boone and Krohn 1999), although Stockwell and Peterson (2002) found no significant relationship with model accuracy. In our study, body mass had a small but significant impact. Contrary to expectations, it was larger rather than smaller species that yielded poorer models. This suggests that the distributions of larger organisms are not necessarily better captured by surveys or more suitable to analyses at coarse spatial scales (Suarez-Seoane et al. 2002). As body mass was only weakly related to global range size and habitat tolerance (rs≤0.27), the mechanism underlying its (albeit small) impact remain uncertain.

Conservation status

Although distribution models are often employed in conservation contexts (Corsi et al. 1999, Gibson et al. 2004), concern has been raised over their suitability for the rare and range-restricted species of most conservation interest (Vaughan and Ormerod 2003).

This concern is primarily fuelled by the notion that the scarcity and small ranges of threatened species limit the data available for model specification, which can indeed diminish model accuracy (Stockwell and Peterson 2002, McPherson et al. 2004). Rare and threatened species, however, are often better studied than common ones, so that models of their distribution may actually benefit from relatively comprehensive data (Gioia and Pigott 2000, Karl et al. 2002). Accordingly, neither Elith and Burgman (2002) nor our study detected a link between conservation status and model accuracy.

Endemism

Comprehensive sampling of the environmental conditions encompassed by a study region is considered a prerequisite for reliable models. Otherwise, species’ response curves – which describe the relationship between species occurrence and environmental gradients – may be truncated and their shape misjudged (Vaughan and Ormerod 2003).

Clearly, response curves will always be truncated globally speaking, unless species are endemic to the study region. It has previously been recognised that this reduces transferability of models from one region to another (Best and Stauffer 1986, Fielding and Haworth 1995). Our findings suggest that it also diminishes model accuracy within the original study area, because non-endemics yielded poorer models than endemics, at least in single-trait GLMs and Kruskal-Wallis H-tests.

In multi-trait analyses, endemism lost significance potentially because of interdependence with three other influential traits: endemic species tended to have smaller global ranges (Wilcoxon's signed rank test Z=−19.13, p<0.01, n=1,329), included fewer migrants (χ2=73.01, p<0.01, 1 degree of freedom), and displayed less affinity for wetland habitats (χ2=27.57, p<0.01, 1 degree of freedom) than non-endemics.

Such interdependence between ecological characteristics complicates judgement of their individual effects. Any effect of endemism must, however, be considered methodological if its underlying cause truly is the truncation of response curves. Of course, it is often difficult to obtain data from a species’ entire geographic range, illustrating that species’ ecological traits can impose methodological deficiencies that are not easily corrected.

Habitat tolerance and distinctiveness

Both the number and distinctiveness of habitats a species utilises are thought to influence the ability of model algorithms to discern patterns in a species’ occurrence (Kadmon et al. 2003, Brotons et al. 2004). Among studies that have assessed this influence empirically, only two reported no effect (Garrison and Lupo 2002, Stockwell and Peterson 2002). All others found that increasing habitat tolerance affected model accuracy unfavourably (Mitchell et al. 2001, Pearce et al. 2001, Hepinstall et al. 2002, Kadmon et al. 2003, Segurado and Araújo 2004, Brotons et al. 2004).

Despite our rather simplistic measure of habitat tolerance, we too found that broader tolerance reduced model accuracy, particularly in Kruskal-Wallis H-tests. Stronger significance might have been attributed to habitat tolerance in GLMs had we used a more sophisticated measure, e.g. one based on Environmental niche factor analysis or related techniques (Segurado and Araújo 2004, Thuiller et al. 2004). In multi-trait GLMs, habitat tolerance lost all significance, perhaps because of collinearity with global range size (Spearman rank correlation rs=0.40, p<0.01, n=1,329).

Surprisingly, habitat tolerance in our analysis augmented omission as well as commission errors. It has generally been assumed that broader habitat use causes model algorithms to overestimate species’ distributions, leading to increased commission (Hepinstall et al. 2002, Kadmon et al. 2003). With both types of error affected, it becomes clear that habitat tolerance influences the ability of model algorithms to distinguish species presence from absence. Perhaps the occurrence patterns of species able to utilise a variety of habitats are less dependent on abiotic factors than biotic interactions. If so, the poorer model performance for more versatile species has ecological roots.

Nomadism and migration

Migratory species are thought to exploit seasonal peaks in resources that resident species, whose numbers are constrained by year-round resource availability, cannot take full advantage of. Their occurrence patterns, therefore, are determined by seasonal conditions rather than conditions throughout the year, and may change both within and between years depending on the location of local resource peaks (Walther et al. 2004). Occurrence patterns of nomadic species may be even more variable in time and space, because by definition they follow highly unpredictable resources (Dean 1997).

Consequently, accurate models for nomadic and migratory species likely require occurrence records and environmental predictors of high temporal resolution. Data of sufficient quality are rare, however, leaving modellers with data that inadequately reflect occurrence-environment relationships (Walther et al. 2004).

Nonetheless, Garrison and Lupo (2002) and Stockwell and Peterson (2002) detected no difference in model accuracy between migratory and non-migratory birds. Mitchell et al. (2001) in fact found migrants easier to model than resident species. Pearce et al. (2001), however, reported poorer models for motile than sedentary taxa.

In our study, non-migrants yielded significantly better models than migrants. In part, this potentially reflected a tendency among migrants to frequent poorly captured wetland habitats (χ2=83.20, p<0.01, 1 degree of freedom). Migration must have imposed additional constraints on distribution models, however, since parsimonious models of model accuracy sometimes included both migratory behaviour and wetland affinity as predictors. This constraint likely was methodological, because our data pooled observations of species’ occurrence and environmental conditions over several years. Any intra or inter-annual variation in the distribution of migrants would therefore have been obscured, and species-environment relationships weakened.

Range size

Previous assessments of the relationship between range size and model accuracy have provided contradictory results, perhaps because artefacts arising from range size's influence over sample size and sampling prevalence were not accounted for (McPherson et al. 2004). Garrison and Lupo (2002), for example, reported that species with larger ranges yielded better models. Others, in contrast, found wide-ranging species more difficult to model (Stockwell and Peterson 2002, Segurado and Araújo 2004).

Our analyses, which controlled for artefacts of sampling prevalence, strongly suggested that model accuracy declined with increasing range size. Since range size influenced the size of training and test datasets (rs=0.61, p<0.01, n=1,315 at half-degree resolution), the possibility of an artefact of sample size remained. Greater training sample size should, however, augment rather than diminish model accuracy (McPherson et al. 2004). Moreover, the relationship between model accuracy and range size retained significance when training sample size was included in regressions as a predictor.

Consequently, we conclude that the negative effect global range size exerted on model accuracy was not a statistical artefact but implies the existence of an underlying ecological phenomenon. Possibly, the occurrence-environment relationships of wide-ranging species are weakened by spatial variability in habitat associations. Such variability might reflect genetically driven divergence in habitat preferences (local adaptations, Stockwell and Peterson 2002, Peterson and Holt 2003) or external constraints, with organisms settling for what is best locally (Osborne and Suarez-Seoane 2002). Neither phenomenon need be limited to wide-ranging species. Local adaptations can occur over short distances if dispersal is limited (Åbjörnsson et al. 2004) or gene-flow overwhelmed by local selection pressures (Michalak et al. 2001). Moreover, all organisms must make the best of conditions within the bounds of their individual mobility. Because larger ranges generally encompass greater ecological heterogeneity, however, these phenomena could lead to noisier occurrence-environment relationships in wide than narrow-ranging species.

Given this potential, the negative effect of range size on model accuracy was astonishingly small. Lack of environmental heterogeneity is an unlikely explanation, because our study region encompassed temperate to tropical climes with, for example, considerable variation in annual temperature range. Instead, our findings are indicative of considerable niche conservatism even in wide-spread species. Such niche conservatism may be common (Guralnick 2006) and has important implications for many large-scale ecological patterns (Wiens and Graham 2005).

Trophic level

Biotic factors have been postulated to dominate over abiotic factors in shaping the distributions of species higher up the food chain (Huntley et al. 2004). Although information on forage, competitors and predators can be incorporated into distribution models (Corsi et al. 1999, Knapp et al. 2003), most models use abiotic factors or habitat categorisations as predictors. Poorer accuracy may consequently be expected from models for species of higher trophic rank. Contrary to Huntley et al. (2004), our analyses found evidence to that effect. The impact on accuracy was relatively small, however, suggesting that species’ distributions respond to the physical environment regardless of trophic level.

Conclusion

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References

With distribution models increasingly informing conservation strategies, it is important to know how trustworthy their predictions are. Ideally, this should be tested on a case by case basis. Where model validation is impeded by insufficient data, however, a rough guide regarding factors affecting model reliability may be useful.

The reliability of models depends on both properties of the data used to parameterise them and species’ ecological characteristics. Because species’ ecology affects how easily data on their occurrence is collected, methodological and ecological impacts on model accuracy may become entangled. Methodological impediments can sometimes be overcome, so we attempted to tease the two influences on model accuracy apart. We conclude that certain ecological traits, including habitat tolerance and range size, exert real effects on the accuracy of distribution models that cannot be explained by methodological artefacts. Other traits, in contrast, influence accuracy via methodological aspects of the modelling process, and better data could theoretically eliminate their effects.

Obtaining better data is often difficult, of course, and distribution models have gained popularity among conservation ecologists precisely because they provide answers where raw data alone are insufficient. Diminished accuracy of distribution models for certain types of species may therefore be unavoidable. As a rule of thumb, we suggest that models are associated with greater uncertainty if they describe the distributions of species who: 1) depend on poorly mapped habitats; 2) are migrants, nomadic or otherwise display temporal or spatial variation in their habitat associations; 3) occur beyond the region from which data were drawn; 4) are tolerant of a large variety of habitats; and/or 5) have very large ranges. In addition, large size and higher trophic level may further augment uncertainty.

That said, the influence of ecological characteristics on model accuracy, although significant, was small in our analyses, suggesting that useful models can be obtained for the vast majority of species. This is good news in the context of conservation science. For evolutionary ecologists, it unfortunately means that strong, easily quantifiable correlates of environmental niche breadth and intra-specific niche differentiation remain elusive. Both latter phenomena are functions of phenotypic plasticity and gene flow. Potentially fruitful avenues for future research therefore include: a) integrating distribution models with molecular analyses (Scribner et al. 2001); and b) scrutinising the relationship between species’ niche characteristics and traits most directly linked to gene flow, such as morphological, behavioural or metabolic adaptations to dispersal (Thuiller et al. 2004). In both cases, phylogenetically controlled comparisons may help to elucidate subtle differences otherwise masked by the enormous diversity of life, its genetic make-up and dispersal modes.

Download the appendix as file E4823 from <www.oikos.ekol.lu.se/appendix>.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References

Satellite data were processed by colleagues at TALA (Trypanosomiasis And Land-use in Africa), Dept of Zoology, Univ. of Oxford. We thank everyone involved and in particular David J. Rogers, who helped supervise the research presented here. Similarly, we thank all those who contributed data on bird distributions: Vincent Parker, Robert Dowsett, Liz and Neil Baker, Derek Pomeroy, Herbert Tushabe, Tommy Pedersen, Adrian Lewis, Douglas Taylor, Richard Dean, the Univ. of Cape Town's Avian Demography Unit, Namibia's Directorate of Environmental Affairs, Birdlife Zimbabwe, Birdlife Botswana, and the Zambian Ornithological Society. Information on species’ diet preferences was kindly made available by Çagan Sekercioglu. We are also grateful to James H. Brown and Ransom A. Myers for providing access to computing facilities. Fred W. Huffer, Jennifer A. Hoeting, Alexander Teterukovski, Laurel Duquette, Keith Knight and Graeme Cumming offered advice on autologistic regression. Finally, we thank Antoine Guisan, Robert Freckleton, and Wilfried Thuiller for their thorough and thought-provoking feedback.

References

  1. Top of page
  2. Abstract
  3. Materials and methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgements
  8. References
  • Åbjörnsson, K. et al. 2004. Responses of prey from habitats with different predator regimes: local adaptation and heritability. Ecology 85: 18591866.
  • Aickin, M. and Gensler, H.. 1996. Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am. J. Public Health 86: 726728.
  • Anon., . 2001. S-Plus 6 for Windows guide to statistics. Vol. 2. Insightful Corporation.
  • Anon., . 2004. 2004 IUCN Red List of Threatened Species IUCN, Gland, Switzerland, accessed online 11 April 2006 at <http://www.iucnredlist.org>.
  • Araújo, M. B. et al. 2005. Validation of species-climate impact models under climate change. Global Change Biol. 11: 15041513.
  • Augustin, N. H. et al. 1996. An autologistic model for the spatial distribution of wildlife. J. Appl. Ecol. 33: 339347.
  • Austin, M. P.. 2002. Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. Ecol. Modell. 157: 101118.
  • Balmford, A. et al. 2002. Economic reasons for conserving wild nature. Science 297: 950953.
  • Best, L. B. and Stauffer, D. F.. 1986. Factors confounding evaluation of bird-habitat relationships. – In: Verner, J. et al (eds), Wildlife 2000: modeling habitat relationships of terrestrial vertebrates. The Univ. of Wisconsin Press, pp. 209216.
  • Boone, R. B. and Krohn, W. B.. 1999. Modeling the occurrence of bird species: are the errors predictable?. Ecol. Appl. 9: 835848.
  • Brotons, L. et al. 2004. Presence-absence versus presence-only modelling methods for predicting bird habitat suitability. Ecography 27: 437448.
  • – In: Brown, L. H. et al (eds), The birds of Africa. Academic Press 19822004..
  • Burns, C. E. et al. 2003. Global climate change and mammalian species diversity in US national parks. Proc. Nat. Acad. Sci. USA. 100: 1147411477.
  • Carswell, M. et al. 2005. Bird atlas of Uganda. British Ornithologists’ Club.
  • Colwell, R. K. and Futuyma, D. J.. 1971. On the measurement of niche breadth and overlap. Ecology 52: 567576.
  • Corsi, F. et al. 1999. A large-scale model of wolf distribution in Italy for conservation planning. Conserv. Biol. 13: 150159.
  • Dean, W. R. J.. 1997. The distribution and biology of nomadic birds in the Karoo, South Africa. J. Biogeogr. 24: 769779.
  • Dean, W. R. J.. 2000. The birds of Angola: an annotated checklist. British Ornithologists’ Union.
  • De Klerk, H. M. et al. 2002. Biogeographical patterns of endemic terrestrial Afrotropical birds. Div. Distrib. 8: 147162.
  • Diekotter, T. et al. 2006. Effects of landscape elements on the distribution of the rare bumblebee species Bombus muscorum in an agricultural landscape. Biodiv. Conserv. 15: 5768.
  • Dowsett, R. J. and Forbes-Watson, A. D.. 1993. Checklist of birds of the Afrotropical and Malagasy regions. Tauraco Press.
  • – In: Dunning, J. B. (ed.), CRC handbook of avian body masses. CRC Press 1993..
  • Elith, J. and Burgman, M.. 2002. Predictions and their validation: rare plants in the central highlands, Victoria, Australia. – In: Scott, J. M. et al (eds), Predicting species occurrences – issues of accuracy and scale. Island Press, pp. 303313.
  • Elith, J. et al. 2006. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29: 129151.
  • Engler, R. et al. 2004. An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data. J. Appl. Ecol. 41: 263274.
  • Fielding, A. H. and Haworth, P. F.. 1995. Testing the generality of bird-habitat models. Conserv. Biol. 9: 14661481.
  • Fielding, A. H. and Bell, J. F.. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ. Conserv. 24: 3849.
  • Garrison, B. A. and Lupo, T.. 2002. Accuracy of bird range maps based on habitat maps and habitat relationship models. – In: Scott, J. M. et al (eds), Predicting species occurrences – issues of accuracy and scale. Island Press, pp. 367375.
  • Gibson, L. A. et al. 2004. Spatial prediction of rufous bristlebird habitat in a coastal heathland: a GIS-based approach. J. Appl. Ecol. 41: 213223.
  • – In: Ginn, P. J. et al (eds), The complete book of southern African birds. Struik Winchester 1989..
  • Gioia, P. and Pigott, J. P.. 2000. Biodiversity assessment: a case study in predicting richness from the potential distributions of plant species in the forests of south-western Australia. J. Biogeogr. 27: 10651078.
  • Girard, M.-C. and Girard, C.. 2003. Processing of remote sensing data. A. A. Balkema Publ.
  • Goetz, S. J. et al. 2000. Advances in satellite remote sensing of environmental variables for epidemiological applications. – In: Hay, S. I. et al (eds), Remote sensing and geographical information systems in epidemiology. Academic Press, pp. 289307.
  • Grinnell, J.. 1917. Field tests of theories concerning distributional control. Am. Nat. 51: 115128.
  • Guisan, A. and Zimmermann, N. E.. 2000. Predictive habitat distribution models in ecology. Ecol. Modell. 135: 147186.
  • Guisan, A. and Thuiller, W.. 2005. Predicting species distribution: offering more than simple habitat models. Ecol. Lett. 8: 9931009.
  • Guralnick, R.. 2006. The legacy of past climate and landscape change on species’ current experienced climate and elevation ranges across latitude: a multispecies study utilizing mammals in western North America. Global Ecol. Biogeogr. 15: 505518.
  • – In: Harrison, J. A. et al (eds), The atlas of southern African birds. Birdlife South Africa 1997..
  • Hay, S. I.. 2000. An overview of remote sensing and geodesy for epidemiology and public health applications. – In: Hay, S. I. et al (eds), Remote sensing and geographical information systems in epidemiology. Academic Press, pp. 135.
  • Hepinstall, J. A. et al. 2002. Effects of niche width on the performance and agreement of avian habitat models. – In: Scott, J. M. et al (eds), Predicting species occurrences – issues of accuracy and scale. Island Press, pp. 593606.
  • Hoeting, J. A. et al. 2000. An improved model for spatially correlated binary responses. J. Agricult. Biol. Environ. Stat. 5: 102114.
  • Hoeting, J. A. et al. 2006. Model selection for geostatistical models. Ecol. Appl. 16: 8798.
  • Holt, R. D.. 2003. On the evolutionary ecology of species’ ranges. Evol. Ecol. Res. 5: 159178.
  • Huntley, B. et al. 2004. The performance of models relating species geographical distributions to climate is independent of trophic level. Ecol. Lett. 7: 417426.
  • Hutchinson, G. E.. 1957. Concluding remarks. Cold Spring Harb. Symp. 22: 415427.
  • Hutchinson, G. E.. 1959. Homage to Santa Rosalia: or why are there so many kind of animals?. Am. Nat. 93: 145159.
  • Joachim, J. et al. 1998. Évaluation par télédétection des biotopes à gélinotte de bois (Bonasa bonasia) dans le Parc national des Cévennes. Gibier Faune Sauvage 15: 3145.
  • Kadmon, R. et al. 2003. A systematic analysis of factors affecting the performance of climatic envelope models. Ecol. Appl. 13: 853867.
  • Karl, J. W. et al. 2002. Species commonness and the accuracy of habitat-relationship models. – In: Scott, J. M. et al (eds), Predicting species occurrences – issues of accuracy and scale. Island Press, pp. 573580.
  • Keitt, T. H. et al. 2002. Accounting for spatial pattern when modeling organism- environment interactions. Ecography 25: 616625.
  • Knapp, R. A. et al. 2003. Developing probabilistic models to predict amphibian site occupancy in a patchy landscape. Ecol. Appl. 13: 10691082.
  • Knick, S. T. and Rotenberry, J. T.. 2000. Ghosts of habitats past: contribution of landscape change to current habitats used by shrubland birds. Ecology 81: 220227.
  • Leathwick, J. R. and Austin, M. P.. 2001. Competitive interactions between tree species in New Zealand's old-growth indigenous forests. Ecology 82: 25602573.
  • Legendre, P. and Legendre, L.. 1998. Numerical ecology, 2nd English ed. Elsevier.
  • Lepage, D.. 2005. Avibase – the world bird database Bird Studies Canada and Birdlife International, accessed online 2 March 2005 at <http://www.bsc-eoc.org/avibase/avibase.jsp>.
  • Lewis, A. and Pomeroy, D. E.. 1989. A bird atlas of Kenya. Balkema.
  • Manel, S. et al. 2000. Testing large-scale hypotheses using surveys: the effects of land use on the habitats, invertebrates and birds of Himalayan rivers. J. Appl. Ecol. 37: 756770.
  • McPherson, J. M. et al. 2004. The effects of species’ range sizes on the accuracy of distribution models: ecological phenomenon or statistical artefact?. J. Appl. Ecol. 41: 811823.
  • Michalak, P. et al. 2001. Genetic evidence for adaptation-driven incipient speciation of Drosophila melanogaster along a microclimatic contrast in “Evolution Canyon”, Israel. Proc. Nat. Acad. Sci. USA 98: 1319513200.
  • Mitchell, M. S. et al. 2001. Using landscape-level data to predict the distribution of birds on a managed forest: effects of scale. Ecol. Appl. 11: 16921708.
  • Neave, H. M. et al. 1996. Biological inventory for conservation evaluation: 3. Relationships between birds, vegetation and environmental attributes in southern Australia. For. Ecol. Manage. 85: 197218.
  • Osborne, P. E. and Suarez-Seoane, S.. 2002. Should data be partitioned spatially before building large-scale distribution models?. Ecol. Modell. 157: 249259.
  • Parker, V.. 1994. Swaziland bird atlas, 1985–1991. Websters.
  • Parker, V.. 1999. The atlas of birds of Sul do Save, southern Mozambique. Avian Demography Unit and the Endangered Wildlife Trust.
  • Pearce, J. and Ferrier, S.. 2000. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol. Modell. 133: 225245.
  • Pearce, J. et al. 2001. An evaluation of the predictive performance of distributional models for flora and fauna in north-east New South Wales. J. Environ. Manage. 62: 171184.
  • Perneger, T. V.. 1998. What's wrong with Bonferroni adjustments. Brit. Med. J. 316: 12361238.
  • Peterson, A. T. and Holt, R. D.. 2003. Niche differentiation in Mexican birds: using point occurrences to detect ecological innovation. Ecol. Lett. 6: 774782.
  • Pulliam, H. R.. 2000. On the relationship between niche and distribution. Ecol. Lett. 3: 349361.
  • Real, R. et al. 2006. Obtaining environmental favourability functions from logistic regression. Environ. Ecol. Stat. 13: 237245.
  • Rogers, D. J. et al. 1996. Predicting the distribution of tsetse flies in West Africa using temporal Fourier processed meteorological satellite data. Ann. Trop. Med. Parasitol. 90: 225241.
  • Sakamoto, Y. et al. 1986. Akaike information criterion statistics. D. Reidel Publ. Company.
  • Scribner, K. T. et al. 2001. Environmental correlates of toad abundance and population genetic diversity. Biol. Conserv. 98: 201210.
  • Segurado, P. and Araújo, M. B.. 2004. An evaluation of methods for modelling species distributions. J. Biogeogr. 31: 15551568.
  • Şekercioğlu, Ç. H. et al. 2004. Ecosystem consequences of bird declines. Proc. Nat. Acad. Sci. USA 101: 1804218047.
  • Sibley, C. G. and Monroe, B. L.. 1990. Distribution and taxonomy of the birds of the world. Yale Univ. Press.
  • Sibley, C. G. and Monroe, B. L.. 1993. A supplement to the distribution and taxonomy of birds of the world. Yale Univ. Press.
  • Stockwell, D. R. B. and Peterson, A. T.. 2002. Effects of sample size on accuracy of species distribution models. Ecol. Modell. 148: 113.
  • Suarez-Seoane, S. et al. 2002. Large-scale habitat selection by agricultural steppe birds in Spain: identifying species-habitat responses using generalized additive models. J. Appl. Ecol. 39: 755771.
  • Thuiller, W.. 2003. BIOMOD – optimizing predictions of species distributions and projecting potential future shifts under global change. Global. Change Biol. 9: 13531362.
  • Thuiller, W. et al. 2004. Relating plant traits and species distributions along bioclimatic gradients for 88 Leucadendron taxa. Ecology 85: 16881699.
  • Thuiller, W. et al. 2005. Niche-based modelling as a tool for predicting the risk of alien plant invasions at a global scale. Global. Change Biol. 11: 22342250.
  • Vaughan, I. P. and Ormerod, S. J.. 2003. Improving the quality of distribution models for conservation by addressing shortcomings in the field collection of training data. Conserv. Biol. 17: 16011611.
  • Verbyla, D. L. and Litvaitis, J. A.. 1989. Resampling methods for evaluating classification accuracy of wildlife habitat models. Environ. Manage. 13: 783787.
  • Walther, B. A. et al. 2004. Known and predicted African winter distributions and habitat use of the endangered Basra reed warbler (Acrocephalus griseldis) and the near-threatened cinereous bunting (Emberiza cineracea). J. Ornithol. 145: 287299.
  • Wiens, J. J. and Graham, C. H.. 2005. Niche conservatism: integrating evolution, ecology, and conservation biology. Annu. Rev. Ecol. Evol. Syst. 36: 519539.