- 1Species distribution models could bring manifold benefits across ecology, but require careful testing to prove their reliability and guide users. Shortcomings in testing are often evident, failing to reflect recent methodological developments and changes in the way models are applied. We considered some of the fundamental issues.
- 2Generalizability is a basic requirement for predictive models, describing their capacity to produce accurate predictions with new data, i.e. in real applications beyond model training. Tests of generalizability should be as rigorous as possible: ideally using a large number of independent test sites (≥ 200–300) that represent anticipated applications. Bootstrapping identifies the role of overfitting of the training data in limiting a model's generalizability.
- 3Predictions from most distribution models are continuous variables. Their accuracy may be described by discrimination and calibration components. Discriminatory ability describes how well a model separates occupied from unoccupied sites. It is independent of species prevalence and is readily comparable between models. Rank correlation coefficients, such as the concordance index, are effective measures.
- 4Calibration describes the numerical accuracy of predictions (e.g. whether 40% of sites with predicted probabilities of 0·40 are occupied) but is frequently overlooked in model testing. Poor calibration could mislead any conservation efforts utilizing models to estimate the ‘value’ of different sites for a given species. Effective assessments can be made using smoothed calibration plots.
- 5The effects of species prevalence on nominal presence–absence predictions are well known. The currently preferred accuracy measure, Cohen's κ, has weaknesses. We argue that mutual information measures, based in information theory, may be more appropriate.
- 6Synthesis and applications. Model evaluation must be informative and should ideally: (i) define generalizability in detail; (ii) separate the discrimination and calibration components of accuracy and test both; (iii) adopt assessment techniques that permit more valid intermodel comparisons; (iv) avoid nominal presence–absence evaluation where possible and consider information-theoretic measures; and (v) utilize the full range of techniques to help diagnose the causes of prediction problems. Few modellers in applied ecology and conservation biology satisfy these needs, making it difficult for others to evaluate models and identify potential misuses. The problems are real, and if uncorrected will damage conservation efforts through the inaccurate assessment of distribution and habitat preferences of important organisms.