Predicting species distributions based on incomplete survey data: the trade-off between precision and scale


  • Veronika Braunisch,

  • Rudi Suchant

V. Braunisch (, Forest Research Inst. of Baden-Wuerttemberg, Dept of Landscape Ecology, Wildlife Ecology Div., Wonnhaldestr. 4, DE-79100 Freiburg, Germany, and Univ. of Bern, Inst. of Ecology and Evolution, Conservation Biology, Baltzerstr. 6, CH-3012 Bern, Switzerland. – R. Suchant, Forest Research Inst. of Baden-Wuerttemberg, Dept of Landscape Ecology, Wildlife Ecology Div., Wonnhaldestr. 4, DE-79100 Freiburg, Germany.


Systematic species surveys over large areas are mostly not affordable, constraining conservation planners to make best use of incomplete data. Spatially explicit species distribution models (SDM) may be useful to detect and compensate for incomplete information. SDMs can either be based on standardized, systematic sampling in a restricted subarea, or – as a cost-effective alternative – on data haphazardly collated by “volunteer-based monitoring schemes” (VMS), area-wide but inherently biased and of heterogeneous spatial precision. Using data on capercaillie Tetrao urogallus, we evaluated the capacity of SDMs generated from incomplete survey data to localise unknown areas inhabited by the species and to predict relative local observation density. Addressing the trade-off between data precision, sample size and spatial extent of the sampling area, we compared three different sampling strategies: VMS-data collected throughout the whole study area (7000 km2) using either 1) exact locations or 2) locations aggregated to grid cells of the size of an average individual home range, and 3) systematic transect counts conducted within a small subarea (23.8 km2). For each strategy, we compared two sample sizes and two modelling methods (ENFA and Maxent), which were evaluated using cross-validation and independent data. Models based on VMS-data (strategies 1 and 2) performed equally well in predicting relative observation density and in localizing “unknown” occurrences. They always outperformed strategy 3-models, irrespective of sample size and modelling method, partly because the VMS-data provided the more comprehensive clues for setting the discrimination-threshold for predicting presence or absence. Accounting for potential errors due to extrapolation (e.g. projections outside the environmental domain or potentially biasing variables) reduced, but did not fully compensate for the observed discrepancies. As they cover a broader range of species-habitat relations, the area-wide data achieved a better model quality with less a-priori knowledge. Furthermore, in a highly mobile species like capercaillie a sampling resolution corresponding to an individuals' home range can lead to equally good predictions as the use of exact locations. Consequently, when a trade-off between the sampling effort and the spatial extent of the sampling area is necessary, less precise data unsystematically collected over a large representative region are preferable to systematically sampled data from a restricted region.