#### Data

Spatial abundance data were available for 345 species of European breeding birds from the EBCC (European Bird Census Council) Atlas of breeding birds (Hagemeijer & Blair 1997). These data record a logarithmically scaled, categorical estimate of the abundance of each species across a 50 × 50 km Universal Transverse Mercator (UTM) grid, mostly representing the period from 1985 to 1988 (data for a few areas were drawn from slightly earlier/later censuses). Population size estimates are based on a 7-point scale, including 6 logarithmically scaled categories (1–9, 10–99, 100–999, 1000–9999, 10 000–99 999, ≥100 000 breeding pairs) and 0. These categorical abundance data were simplified to presence–absence data to enable a comparison of the performance of SDMs trained on the two types of data.

#### Environmental variables

Bioclimatic variables were derived from a global compilation (New, Hulme & Jones 1999) for the 30-year period 1961–1990. This consisted of four bioclimatic variables: mean temperature of the warmest month (MTWM), mean temperature of the coldest month (MTCO), growing degree days above 5° (GDD5) and the annual ratio of actual to potential evapotransipration (APET). These variables were calculated at the same resolution as the species data, using the formulation in Prentice *et al*. (1992). The specific bioclimatic variables were chosen because all have been shown to describe both the range extents (Thuiller, Araujo & Lavorel 2004; Huntley *et al*. 2007; Doswald *et al*. 2009) and abundance patterns (Green *et al*. 2008; Gregory *et al*. 2009) of European birds.

Land cover variables were derived from the Pan-European Land Cover (PELCOM) 1-km resolution data base (Mucher *et al*. 2000). These data were aggregated to provide percentage coverage at the same resolution as the species data. In total, eight land cover classifications were used: forest, grassland, urban, arable, wetland, coastal, shrub land, marine and barren.

#### Statistical modelling

Random forest (RF) models were used to model species' distributions from both the abundance and the presence–absence data. This machine learning technique is a bootstrap-based classification and regression trees (CART) method (Cutler *et al*. 2007). Here, to account for a high degree of correlation between climatic covariates (with Pearson's *r* ranging between 0·61 and 0·9) and the potential for biased variable selection, we use the party package in r, which uses a RF implementation based on a conditional inference framework (Hothorn, Hornik & Zeileis 2006a,b; Strobl, Hothorn & Zeileis 2009; R Development Core Team 2012). As with other classification methods, RFs draw bootstrap samples and a subset of predictors to construct multiple classification trees (Prasad, Iverson & Liaw 2006). The classification trees find optimal binary splits in the selected covariates to partition the sample recursively into increasingly homogenous areas with respect to the class variable (Cutler *et al*. 2007). Under the conditional inference framework, unbiased variable selection is achieved by using a linear statistic to test the relationship between covariate and response, selecting the covariate with the minimum *P*-value. This linear statistic is also used to optimize the binary split into each homogenous area (Hothorn, Hornik & Zeileis 2006a,b; Strobl, Hothorn & Zeileis 2009). In the case of ordinal response variables, a score vector reflecting the ‘distances' between class levels is combined linearly with the linear statistic altering both the selection and binary splitting of variables according to the scale of the ordinal response data (Hothorn, Hornik & Zeileis 2006b).

Random forests make few assumptions about the distribution of variables, are robust to over-fitting and are widely recognized to produce good predictive models (Breiman 2001; Liaw & Wiener 2002; Prasad, Iverson & Liaw 2006). These models typically outperform traditional regression-based approaches to species distribution modelling and are ideal for modelling categorical and ordinal data (Lawler *et al*. 2006; Magness, Huettmann & Morton 2008; Marmion *et al*. 2009). More established approaches to ordinal data modelling include proportional odds and continuation ratio ordinal regression models (Guisan & Harrell 2000). However, these models have limiting assumptions, such as parallelism between classes, and lack the flexibility to identify nonlinear, context-dependent relationships among predictor variables (De'ath & Fabricius 2000; Olden, Lawler & Poff 2008; Strobl, Malley & Tutz 2009).

To account for spatial autocorrelation, we included a measure of the surrounding abundance of conspecifics in the first-order neighbouring UTM grid cells (Segurado, Araujo & Kunin 2006) as a spatial autocovariate (SAC). This term accounts for the greater degree of similarity between more proximate samples, which arises through distance-related biological process and spatially structured environmental processes (Dormann *et al*. 2007). We account for potential spatial autocorrelation in our abundance-based models by calculating an indicator of surrounding abundance for each UTM grid cell, using the following equation:

- (eqn 1)

where *L* = surrounding local abundance, *n* = number of adjacent cells, *A* = categorical abundance, *i* = abundance category index. The log-scaled abundance categories in the adjacent cells are back-transformed to the mid-points of the relevant categories; these are averaged and retransformed to the log scale. For models based on presence–absence data, the spatial autocovariate used the same equation, except that the abundance categories (*A*_{i}) were converted to binary (presence–absence) data. Models were fitted using 10-fold cross-validation to reduce SAC between training and test data and to minimize overfitting. We used correlograms to compare autocorrelation in the model residuals with autocorrelation present in the raw data. Correlograms plot a measure of spatial autocorrelation, Moran's I (Moran 1950), between grid cells as a function of the distance between them (Fortin & Dale. 2005; Dormann *et al*. 2007; Kissling & Carl 2008). A value of zero of Moran's I for within model residuals indicates an absence of spatial autocorrelation. Therefore, a significant deviation from zero suggests that the model is not adequately accounting for spatial autocorrelation (Dormann *et al*. 2007). Here, we note that all of our models showed substantial reductions in residual spatial autocorrelation when compared to that present in the raw data (see Fig. S1). r code to implement species abundance and distribution modelling using the party package, along with code to calculate the spatial autocovariate term is available in the Supporting Information.

Predictions of the probability of a species occurring at each abundance class were based on the number of votes for each class from the 1000 classifiers that comprised each forest (Robnik-Sikonja 2004). Predicted probability across the abundance classes are summed to give a predicted probability of occurrence, whilst predicted ordinal abundance is based on the class with the majority vote. Ordinal predictions from the distribution model based on abundance data were converted to presence–absence data to enable a direct comparison to recorded presence–absence data.

Model fits of simulated presence–absences derived from the abundance (after conversion to presence–absence data) and presence–absence models to observe presence–absence data were assessed using three methods, which included measures of both model calibration and discrimination. We used two measures of discrimination, which indicate the ability of a model to discriminate between species presence and absence. First, the kappa statistic measures model accuracy whilst correcting for accuracy expected to occur by chance (Cohen 1960); we used this on the simulated occurrences from the cross-validated data sets. Kappa is the most widely used measure of discrimination and performance for presence–absence models (Manel, Williams & Ormerod 2001; Pearson, Dawson & Liu 2004; Segurado & Araújo 2004; Allouche, Tsoar & Kadmon 2006) but is criticized for being inherently dependent on prevalence and the often arbitrary choice of threshold value (Allouche, Tsoar & Kadmon 2006; Freeman & Moisen 2008). Our second measure of discrimination therefore was a threshold-independent measure of model performance, the area under the receiver operating characteristic (ROC) curve (AUC) (Manel, Williams & Ormerod 2001; Thuiller 2003; Brotons *et al*. 2004).

As a measure of model calibration, we used calibration curves to assess agreement between the logits of the predicted probabilities and the observed proportions of occurrence in the test data (Zurell *et al*. 2009). The slope and intercept of this regression can provide a measure of model bias and spread (Pearce & Ferrier 2000). Model bias is the systematic over- or under-estimation of the probability of occurrence across the range of a species and results in an upwards or downwards shift of the regression line, causing the intercept to deviate from zero (Reineking & Schröder 2006). The slope of the regression line, fitted to the predicted and observed values on *x* and *y* logit axes, respectively, indicates the spread of the data. If predicted values lower than 0·5 overestimate the probability of occurrence whilst predicted values >0·5 underestimate the probability of occurrence, the slope of the regression line will be greater than one. Conversely, a gradient of less than one indicates that predicted values lower than 0·5 are underestimating the probability of occurrence, whilst predicted values >0·5 overestimate the probability of occurrence (Pearce & Ferrier 2000). A perfectly calibrated model will have an intercept of zero and a slope of one (Reineking & Schröder 2006; Zurell *et al*. 2009; Vorpahl *et al*. 2012).

We used a paired *t*-test on logit-transformed data to assess differences between the predictive performances, according to kappa, of models trained on each data set. The effect of prevalence (the proportion of presences out of 2813 cells) on predictive accuracy was assessed using a generalised additive model (GAM), after controlling for species (to account for the paired nature of the data set). The model was fitted with a binomial error structure with a logit link and included species as a random effect, using the mgcv package in r (Wood 2011; R Development Core Team 2012).