Get access

Effects of sample size on the performance of species distribution models

Authors

  • M. S. Wisz,

    Corresponding author
    1. Department of Arctic Environment, National Environmental Research Institute, University of Aarhus, Frederikborgvej 399, Roskilde, Denmark,
      *Correspondence: M. S. Wisz, Department of Arctic Environment, National Environmental Research Institute, University of Aarhus, Frederikborgvej 399, Roskilde, Denmark. E-mail: msw@dmu.dk
    Search for more papers by this author
  • R. J. Hijmans,

    1. International Rice Research Institute, Los Baños, Laguna, Philippines,
    Search for more papers by this author
  • J. Li,

    1. Department of Marine and Coastal Environment, Geoscience, Canberra, ACT, Australia,
    Search for more papers by this author
  • A. T. Peterson,

    1. University of Kansas Natural History Museum and Biodiversity Research Center, Lawrence, KS, USA,
    Search for more papers by this author
  • C. H. Graham,

    1. Department of Ecology and Evolution, Stony Brook University, NY 11794, USA,
    Search for more papers by this author
  • A. Guisan,

    1. Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
    Search for more papers by this author
  • NCEAS Predicting Species Distributions Working Group

    Search for more papers by this author
    • NCEAS Predicting Species Distributions Working Group: J. Elith School of Botany, University of Melbourne, Parkville, Victoria 3010, Australia; R. P.; M. Dudík, Princeton University, Princeton, NJ, USA; S. Ferrier, Department of Environmental and Climate Change, Armidale, NSW, Australia; F. Huettmann, University of Alaska Fairbanks, AK, USA; J. R. Leathwick, NIWA, Hamilton, New Zealand; A. Lehmann, Swiss Centre for Faunal Cartography (CSCF), Neuchâtel, Switzerland; L. Lohmann, Universidade de São Paulo, Brazil; B. A. Loiselle, University of Missouri, St. Louis, USA; G. Manion, Department of Environmental and Climate Change, Armidale, NSW, Australia; C. Moritz, The University of California, Berkeley, USA; M. Nakamura, Centro de Investigación en Matematicas (CIMAT), Mexico; Y. Nakazawa, University of Kansas, Lawrence, KS, USA; J. McC. Overton, Landcare Research, Hamilton, New Zealand; S. J. Phillips, AT&T Labs-Research, Florham Park, NJ, USA; K. S. Richardson, McGill University, QC, Canada; R. Scachetti-Pereira, Centro de Referência em Informacão Ambiental, Brazil; R. E. Schapire, Princeton University, Princeton, NJ, USA; J. Soberón, University of Kansas, Lawrence, KS, USA; S. E. Williams, James Cook University, Queensland, Australia; N. E. Zimmermann, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland.


*Correspondence: M. S. Wisz, Department of Arctic Environment, National Environmental Research Institute, University of Aarhus, Frederikborgvej 399, Roskilde, Denmark. E-mail: msw@dmu.dk

ABSTRACT

A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.

Get access to the full text of this article

Ancillary