Increasing concern over the implications of climate change for biodiversity has led to the use of species–climate envelope models to project species extinction risk under climate-change scenarios. However, recent studies have demonstrated significant variability in model predictions and there remains a pressing need to validate models and to reduce uncertainties. Model validation is problematic as predictions are made for events that have not yet occurred. Resubstituition and data partitioning of present-day data sets are, therefore, commonly used to test the predictive performance of models. However, these approaches suffer from the problems of spatial and temporal autocorrelation in the calibration and validation sets. Using observed distribution shifts among 116 British breeding-bird species over the past ∼20 years, we are able to provide a first independent validation of four envelope modelling techniques under climate change. Results showed good to fair predictive performance on independent validation, although rules used to assess model performance are difficult to interpret in a decision-planning context. We also showed that measures of performance on nonindependent data provided optimistic estimates of models' predictive ability on independent data. Artificial neural networks and generalized additive models provided generally more accurate predictions of species range shifts than generalized linear models or classification tree analysis. Data for independent model validation and replication of this study are rare and we argue that perfect validation may not in fact be conceptually possible. We also note that usefulness of models is contingent on both the questions being asked and the techniques used. Implementations of species–climate envelope models for testing hypotheses and predicting future events may prove wrong, while being potentially useful if put into appropriate context.