Get access

Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines

Authors

  • Jane Elith,

    Corresponding author
    1. School of Botany, The University of Melbourne, Parkville, Victoria, Australia 3010,
      *Correspondence: Jane Elith, School of Botany, The University of Melbourne, Parkville, Victoria 3010, Australia. E-mail: j.elith@unimelb.edu.au
    Search for more papers by this author
  • John Leathwick

    1. National Institute of Water and Atmospheric Research, PO Box 11115, Hamilton, New Zealand
    Search for more papers by this author

  • 1

    Previously we have called these ‘community’ models but now prefer the term ‘multiresponse’, as used in the statistical literature (Hastie et al., 1994).

*Correspondence: Jane Elith, School of Botany, The University of Melbourne, Parkville, Victoria 3010, Australia. E-mail: j.elith@unimelb.edu.au

ABSTRACT

Current circumstances — that the majority of species distribution records exist as presence-only data (e.g. from museums and herbaria), and that there is an established need for predictions of species distributions — mean that scientists and conservation managers seek to develop robust methods for using these data. Such methods must, in particular, accommodate the difficulties caused by lack of reliable information about sites where species are absent. Here we test two approaches for overcoming these difficulties, analysing a range of data sets using the technique of multivariate adaptive regression splines (MARS). MARS is closely related to regression techniques such as generalized additive models (GAMs) that are commonly and successfully used in modelling species distributions, but has particular advantages in its analytical speed and the ease of transfer of analysis results to other computational environments such as a Geographic Information System. MARS also has the advantage that it can model multiple responses, meaning that it can combine information from a set of species to determine the dominant environmental drivers of variation in species composition. We use data from 226 species from six regions of the world, and demonstrate the use of MARS for distribution modelling using presence-only data. We test whether (1) the type of data used to represent absence or background and (2) the signal from multiple species affect predictive performance, by evaluating predictions at completely independent sites where genuine presence–absence data were recorded. Models developed with absences inferred from the total set of presence-only sites for a biological group, and using simultaneous analysis of multiple species to inform the choice of predictor variables, performed better than models in which species were analysed singly, or in which pseudo-absences were drawn randomly from the study area. The methods are fast, relatively simple to understand, and useful for situations where data are limited. A tutorial is included.

Get access to the full text of this article

Ancillary