The accuracy of plant assemblage prediction from species distribution models varies along environmental gradients

Authors


Correspondence: Julien Pottier, INRA, Grassland Ecosystem Research (UR874), 5 Chemin de Beaulieu F-63069, Clermont Ferrand Cedex 2, France.

E-mail: julien.pottier@clermont.inra.fr

Abstract

Aim

Climatic niche modelling of species and community distributions implicitly assumes strong and constant climatic determinism across geographical space. We tested this assumption by assessing how stacked-species distribution models (S-SDMs) perform for predicting plant species assemblages along elevation gradients.

Location

The western Swiss Alps.

Methods

Using robust presence–absence data, we first assessed the ability of topo-climatic S-SDMs to predict plant assemblages in a study area encompassing a 2800-m wide elevation gradient. We then assessed the relationships among several evaluation metrics and trait-based tests of community assembly rules.

Results

The standard errors of individual SDMs decreased significantly towards higher elevations. Overall, the S-SDM overpredicted far more than they underpredicted richness and could not reproduce the humpback curve along elevation. Overprediction was greater at low and mid-range elevations in absolute values but greater at high elevations when standardized by the actual richness. Looking at species composition, overall prediction success, kappa and specificity increased with increasing elevation, while the Jaccard index and sensitivity decreased. The best overall evaluation – as driven by specificity – occurred at high elevation where species assemblages were shown to be subject to significant environmental filtering of small plants. In contrast, the decreased overall accuracy in the lowlands was associated with functional patterns representing any type of assembly rule (environmental filtering, limiting similarity or null assembly).

Main conclusions

We provide a thorough evaluation of S-SDM emphasizing the need to carefully interpret standard evaluation metrics, which reflect different aspects of assemblage predictions. We further reported interesting patterns of change in S-SDM errors with changes in assembly rules along elevation. Yet, significant levels of assemblage prediction errors occurred throughout the gradient, calling for further improvement of SDMs, e.g. by adding key environmental filters that act at fine scales and developing approaches to account for variations in the influence of predictors along environmental gradients.

Introduction

Understanding the role of the environment in shaping species diversity patterns has long motivated ecological and biogeographical research on local to global scales. In recent years, this research has greatly benefited from the development of large species occurrence databases and from conceptual and technical advances in niche based species distribution models (SDMs; Guisan & Thuiller, 2005; Elith & Leathwick, 2009). SDMs are used to predict the potential distribution of species in space and time by relating observed occurrence or abundance patterns to a set of environmental variables (Guisan & Thuiller, 2005). When most species pertaining to a given geographical species pool are considered, such as the whole flora of a given area, different structural or compositional aspects of species assemblages can be predicted. For example, several studies have used stacked-SDMs (S-SDMs) to predict current and future distributions of species richness (e.g. Guisan & Theurillat, 2000; Feria & Peterson, 2002; Algar et al., 2009; Newbold et al., 2009; Pineda & Lobo, 2009) or turnover (Thuiller et al., 2005; Maiorano et al., 2011) under various scenarios of climate change.

Geographical distributions of species are constrained by strict eco-physiological requirements and various other factors, including dispersal processes, positive and negative biotic interactions as well as anthropogenic and geomorphic perturbations (Soberón, 2007). Thus, SDMs are often assumed to fit spatial realizations of the environmental niche of the studied species (Araújo & Guisan, 2006). However, there are still ambiguities over the components of the realized niches that are estimated in climate-based SDMs (Elith & Leathwick, 2009). In addition, there is a large amount of evidence indicating that the relative importance of species distribution/assembly drivers are not constant over space and time or along productivity gradients (Michalet et al., 2006) and that these factors can mutually influence each other along these trajectories (Agrawal et al., 2007). Finally, little is known about how SDMs are affected by such variations of species distribution/assembly drivers.

Despite an increasing use of climate-based S-SDMs and promising associated perspectives, few studies have evaluated the models’ predictive accuracy (Feria & Peterson, 2002; Algar et al., 2009; Newbold et al., 2009; Pineda & Lobo, 2009; Trotta-Moreu & Lobo, 2010; Dubuis et al., 2011). To our knowledge, no study has assessed whether the performance of S-SDMs for predicting communities is constant throughout space or along environmental gradients. Such an evaluation is important because S-SDMs represent crucial tools for assessing how future species assemblages will look under future climatic conditions (Ferrier & Guisan, 2006; Guisan & Rahbek, 2011).

In harsh or stressful climatic conditions, community assembly is commonly driven by environmental filtering, which permits only those species sharing the appropriate physiological, behavioural and/or ecological attributes required to survive in the local climate to coexist (Weiher et al., 1999). In this case, climate directly affects the physiology of the species and represents an important filter of community assembly. In addition, climate influences the nature and strength of community- and ecosystem-level processes, which also determine the assembly and distribution of species. For example, it has been reported that low summer temperatures are associated with an increase in the occurrence of facilitative effects among plants (Callaway et al., 2002). Alternatively, the importance of species interactions has been shown to be reduced in alpine ecosystems (Mitchell et al., 2009). Because precipitation and frost partially control geomorphological processes (including erosion), these factors are also responsible for increased disturbance regimes. These examples emphasize climate as a strong direct or indirect determinant of species distribution and assembly, especially in harsh conditions, where more accurate SDM and S-SDM predictions should thus be expected.

In milder climates, ecosystems are more productive, and biotic interactions, such as competition, may be more important than purely abiotic effects for shaping both species distributions and communities (Grime, 2001). In the context of the realized niche, a typical expected consequence of competitive exclusion is a restriction in the occupation of the fundamental niche space. Therefore, predictions based on species responses modelled from distribution data should also account for the role of biotic interactions. In addition, climatically mild habitats are also noticeably affected by intense and diverse human activities, which locally modify abiotic and biotic factors and/or enhance the stochasticity driven by the interplay between disturbance regimes and dispersal events. This impact leads to situations in which the species are likely not at equilibrium with the climate (Araújo & Pearson, 2005). Among locations sharing similar climatic constraints, the same species may encounter very different biotic and abiotic constraints, which may cause the species to exhibit different performances in terms of establishment, growth and fitness. In such conditions, climatic factors may make less of a contribution to the distribution and assembly of the species, and climatic niche models would accordingly fit weaker species responses and produce predictions that are less accurate. As a result, the S-SDM predictions of such communities are likely to be less accurate.

In summary, when stacking climate-based SDMs to predict species assemblages, and provided that the most relevant environmental predictors are available, one could expect the following (Fig. 1): (1) H1 – the accuracy of assemblage predictions increases when moving from productive and mild conditions towards climatically stressful habitats; (2) H2A – the best S-SDM performance in the harshest habitats is primarily associated with strong environmental filtering; and (3), as a corollary, H2B – the inaccurate assemblage predictions in milder habitats are associated with a larger spectrum of constraints driving the local assembly of species.

Figure 1.

Hypothetical explanation and expected variation in the performance of stacked-species distribution models (S-SDMs) along elevation. At high elevations, the climate is the main determinant of species assembly and species responses to topo-climatic environment are precisely fitted (i.e. tight confidence intervals). Stacked-species distributions are thus expected to provide accurate predictions of species assemblages. In contrast, at low elevations, species distribution is not always at equilibrium with the climate and is rather caused by a variety of assembly rules (possibly mediated by human activities) while climatic conditions remain the same. Fitted species responses to the climate are weak (i.e. large confidence intervals) and the accuracy of the S-SDMs is reduced.

The objective of this study was to conduct a thorough evaluation of S-SDMs. We tested hypotheses H1 and H2 using elevation as a gradient of climate stress and habitat productivity. We used a large vegetation dataset derived from a robust sampling at a fine resolution, covering the full elevation range of the western Swiss Alps. We reconstructed plant communities by stacking predictions from individual SDMs based on high-resolution topo-climatic predictors and evaluated the deviation between the predicted and observed species assemblages using an independent dataset. Finally, we assessed whether the performance of S-SDMs changed with the spatial variations in the mean and dispersion patterns of the plant functional traits, which are used to infer the constraints that drive the assembly of communities (assembly rules in their broader definition, following Keddy, 1992). Trait convergence was interpreted as a signature of environmental filtering, implying that only species that share common ecological abilities to face local environmental conditions can coexist. Conversely, trait divergence was interpreted as a signature that two competing species cannot coexist unless they exhibit limited similarity in their ecological requirements.

Materials and Methods

Study area, species and trait data

The study area covered approximately 700 km2 of a mountainous region in the western Swiss Alps (Fig. 2a). This region is characterized by a large elevation gradient, ranging from 375 to 3210 m. Human activities are more extensive at the lower elevations (Fig. 2b).

Figure 2.

(a) Maps of plot locations, for the calibration and evaluation datasets, within the study area located in the western Swiss Alps. (b) Human land-use distribution in elevation bands for the same study area. Land-cover classes not related to human activities were removed. As observed at a more global scale (Nogués-Bravo et al., 2008), land-use is more intense at the lowest elevations and more diverse.

A total of 613 2 m × 2 m vegetation plots were exhaustively surveyed, and the presence–absence of each species was registered. Each plot was selected following a random-stratified sampling design (Hirzel & Guisan, 2002) limited to open and non-woody vegetation units; the stratification was based on elevation, slope and aspect. This presence–absence dataset was used for the SDM calibration. An additional set of 298 plots was identically surveyed to evaluate S-SDMs and to test the relationship between the functional trait patterns of communities and S-SDM errors. The abundance of each species was additionally recorded in all of the plots following an ordinal scale (0, absent; 1, ≤ 1%; 2, 1–5%; 3, 5–12.5%; 4, 12.5–25%; 5, 25–50%; 6, 50–75%; 7, > 75%).

Three plant traits related to the plant persistence phase (Westoby et al., 2002) were investigated: canopy height (CH), specific leaf area (SLA) and leaf dry matter content (LDMC). CH is associated with the plant's ability to compete for light, while SLA and LDMC are related to the plant's ability to capture, use and release resources from/to the plant's environment. These traits were measured within the same study area following Cornelissen et al. (2003) over a minimum of 10 individuals per species picked from the available 911 plots (613 calibration plots plus 298 evaluation plots).

Species distribution modelling

We modelled the spatial distribution of 211 species (the species with more than 20 occurrences across the 613 plots). This set of species was considered as the ‘landscape species pool’. We used five topo-climatic explanatory variables to calibrate the SDMs: growing degree days (with a 0°C threshold); moisture index over the growing season (average values for June to August in mm day−1); slope (in degrees); topographic position (an integrated and unitless measure of topographic exposure; Zimmermann et al., 2007); and the annual sum of radiation (in kJ m−2 year−1). The spatial resolution of the predictor maps was 25 m × 25 m so that the models could capture most of the small-scale variations of the climatic factors in the mountainous areas. These variables are well recognized as being of great eco-physiological importance for plant species in mountain regions and have already been successfully used (Randin et al., 2010). Single-species models were performed following an ensemble forecasting approach based on the weighted average of multi-model predictions weighted by their respective area under the curve (AUC) of a receiver operating characteristic plot (ROC), as proposed by Araújo & New (2007). We considered the following eight modelling techniques: generalized linear model (GLM); generalized additive model (GAM); generalized boosted regression models (BRT); multivariate adaptive regression splines (MARS); random forest (RF); classification tree analysis (CTA); and surface range envelope (SRE). Species distribution modelling was performed using the BIOMOD platform (Thuiller et al., 2009) for R (R Development Core Team, 2011). The spatial and temporal autocorrelation of the residuals of the SDMs was assessed between the training and the evaluation datasets based on neighbourhood plots and Moran's I coefficient (see Appendix S1 in Supporting Information).

We further estimated the confidence intervals of the SDM predictions at each elevation band. To do so, we fitted GAMs for each species and predicted the probabilities of occurrence for each calibration plot as well as their associated confidence interval (following the Bayesian confidence interval computed with the ‘mgcv’ package for R, according to Wood, 2006). Finally, we simply tracked their variation along the elevation gradient. Thus, the GAMs were fitted with the same set of topo-climatic variables as the ensemble models; however, the GAMs used a randomly selected subset of plots to obtain a homogeneous distribution of plots along the elevation gradient. This subsampling strategy was designed to avoid any potential bias in the confidence interval estimates caused by uneven numbers of observations among the elevation classes. Because there is a random component in this subsampling procedure, the sampling and model fitting were repeated 9999 times. The mean confidence intervals (over sampling repetitions) of the predicted probabilities for each species were retrieved for each plot, and the average for each elevation band was calculated. For each species, the results included only the elevation bands where the species was observed at least once in the calibration dataset.

Evaluating stacked-SDMs

Using maps of the topo-climatic variables that were considered, the SDMs were projected over the geographical space of the study area. Using these maps, we obtained the predicted probabilities of presence for each species and each pixel over the entire study area, including the 298 plots of our evaluation dataset.

Evaluating S-SDM performance for predicting assemblage composition and species richness requires species presence–absence predictions, while SDMs provide probabilities. Because the choice of a threshold for classifying the predicted probabilities has been shown to have an impact on the accuracy of S-SDM predictions (Pineda & Lobo, 2009), we used a novel approach that randomly generated binary predictions from a binomial distribution with the predicted probability of the species (Dubuis et al., 2011). We performed this procedure 9999 times and, at each trial, calculated six evaluation metrics (Table 1) for each plot: (1) species richness errors, i.e. the difference between the observed and predicted species richness; (2) assemblage prediction success, i.e. the proportion of correct predictions; (3) assemblage kappa, i.e. the proportion of specific agreement; (4) assemblage specificity, i.e. the proportion of absences that were correctly predicted; (5) assemblage sensitivity, i.e. the proportion of presences that were correctly predicted; and (6) the Jaccard index, a widely used metric of community similarity.

Table 1. Metrics used to evaluate the prediction accuracy of stacked species distribution models (S-SDMs) along elevation in the western Swiss Alps
Evaluation metric of species assemblage predictionsFormula
  1. Most of these metrics rely on confusion matrices in which species (N = 211) from the landscape species pool are classified into: a, species that are both observed and predicted as being present; b, species that are observed as absent but predicted as present; c, species that are observed as present but predicted as absent; and d, species that are both observed and predicted as absent. Note that such confusion matrices and several of the associated metrics are the same as those used to evaluate predictions from single SDMs but applied to species assemblages (counting species instead of occurrences). To avoid confusion, we added the term ‘assemblage’ prior to each metric name.
Species richness errorsPredicted – observed species richness
Assemblage prediction success math formula
Assemblage kappa math formula
Assemblage specificity math formula
Assemblage sensitivity math formula
Jaccard math formula

To test our first hypothesis (H1: S-SDM performance increases with increasing elevation as a result of tighter confidence intervals of the modelled species’ responses to the climate), we used nonparametric generalized additive modelling to fit the relationship between these six metrics and elevation, also accounting for nonlinear trends.

Understanding prediction errors

We tested our second general hypothesis (H2: the variation in S-SDM performance is related to changes in the assembly rules constraining the communities) by testing the relationship between the S-SDM performance and functional patterns of plant communities towards high (H2A) and low (H2B) elevations.

Functional patterns were assessed in the 298 evaluation plots using the three traits measured at the species level (CH, SLA and LDMC). The traits could only be measured in the field for the 254 most frequent species (out of the 771 identified). Therefore, we computed community aggregated trait values and functional α-diversity for the plots where trait-assigned species accounted for more than 80% of the vegetation cover (Pakeman & Quested, 2007). This procedure restricted the number of plots that could be used for functional pattern analyses, although the procedure conserved 85% of the plots in the original evaluation dataset (256 plots). The community aggregated trait values (TraitCA) consisted of trait averaging of the species found in a given plot weighted by their estimated cover. This measure has been shown to provide useful insight to community dynamics and ecosystem functioning (Garnier et al., 2004). We calculated the α-Rao index following de Bello et al. (2010). Doing so, we calculated the sum of functional dissimilarities between all of the possible pairs of species, which were weighted by the product of their estimated cover. Therefore α-Rao is strictly an index of functional evenness. We first calculated the modified α-Rao index for a three-dimensional functional space, considering all of the measured traits together, and then calculated the same index for three one-dimensional functional spaces, considering each trait separately.

To infer assembly rules and discriminate between limiting similarity and environmental filtering, we tested for any deviation of the observed Rao index in the evaluation plots from a null distribution of trait values (divergence or convergence). For a given plot, the null distribution was generated by randomly re-assigning (9999 times) the trait values of the co-occurring species from the trait matrix composed of the entire pool of species (211 species). The actual values of each species for the three traits were kept as a set to conserve the fundamental trade-offs among traits. This simple randomization procedure allowed for testing the null hypothesis that trait values are randomly assembled, while keeping species richness patterns, species frequency patterns and abundance patterns within the plots constant. Then, we computed a standardized effect size (SES) for the Rao values, as proposed by Gotelli & McCabe (2002), as follows:

display math

where Raoobs represents the observed Rao value, Raosim is the average of the Rao values simulated using the null model, and σRaosim is the variance. Positive values indicate trait divergence, and negative values indicate trait convergence. This standardization of the Rao index allows for a better comparison of the functional evenness across different plant communities (Gotelli & McCabe, 2002).

Finally, we tested the relationship between TraitCA, TraitSES-Rao and metrics of S-SDM performance in the plots using GAMs to facilitate the detection of nonlinear trends.

Results

SDMs were successfully calibrated for the 211 dominant species in the study area and showed fair to good prediction accuracy (all of the AUCs were greater than 0.7; Araújo et al., 2005). For the large majority of the species, no spatial autocorrelation was observed in the residuals between the calibration and evaluation datasets, and the correlations remained very low even when significant (average Moran's I = 0.143 for the shortest distances separating the calibration and evaluation plots; see Appendix S1). We therefore considered the second dataset of 298 plots as valid for the S-SDM evaluation.

The modelled species responses using GAMs were not consistently well adjusted along the elevation gradient. In particular, smaller standard prediction errors were observed at high elevations than at middle or low elevations (Fig. 3).

Figure 3.

Change in the accuracy of fitted species response to the topo-climatic factors along elevation. This was assessed using generalised additive models (GAMs) and the confidence interval of predicted probabilities for the calibration plots. Each sample represents the mean confidence interval of a given species for a given elevation band.

Using S-SDMs to predict species assemblages in our evaluation dataset resulted in a significant relationship between the predicted and observed species richness (r = 0.45, P < 0.001; Fig. 4c). The slope estimate of this relationship was 0.23, the standard error was 0.03 and the intercept estimate was 26.52 with a standard error of 0.73. The mean species richness error in the overall dataset was 7.30 (SE = 0.58), the mean assemblage prediction success was 0.78 (SE = 0.003), the mean assemblage kappa was 0.72 (SE = 0.004), the mean assemblage specificity was 0.85 (SE = 0.001), the mean assemblage sensitivity was 0.23 (SE = 0.004) and the mean Jaccard index was 0.11 (SE = 0.003).

Figure 4.

The evaluation of stacked-species distribution models (S-SDMs) based on topo-climatic predictors. Calibration of topo-climatic niche models of the species was based on 613 plots, while evaluation of S-SDMs was based on 298 different evaluation plots (see Supporting Information for a test of independence between the calibration and evaluation datasets). Solid lines represent significant trends fitted with generalized additive models.

The predicted species richness pattern in the evaluation dataset (Fig. 4b) did not reproduce the observed hump-shaped curve with a peak of diversity at 1500–1700 m (Fig. 4a). Instead, the S-SDM predicted a progressive and slow (compared with the observed) decrease of species richness as the elevation of the plots increased (Fig. 4b). In addition, the different evaluation metrics reported here showed significant variations along the elevation gradient (Fig. 4d–i). More specifically, we observed a larger overprediction of species richness at low elevations than at mid-range and high elevations, and a larger overprediction was observed at high elevations than at mid-range elevations (Fig. 4, Appendix S2). Next, we observed very similar nonlinear trends of assemblage prediction success (Fig. 4e) and assemblage kappa (Fig. 4f), with no significant variation from low to mid-range elevations and a strong increase from mid-range to high elevations (Appendix S2). The specificity and sensitivity showed significant linear variations along elevation, positive for the specificity (linear regression R2 = 0.52, P < 0.001; Fig. 4g) and negative for the sensitivity (linear regression R2 = 0.41, P < 0.001; Fig. 4h). Conversely, the Jaccard index showed a decrease towards higher elevations (Fig. 4i, Appendix S2). The variation of most of the evaluation metrics was strongly correlated with the observed plant species richness, except the assemblage specificity and sensitivity (Fig. S3.1 in Appendix S3). Species richness errors standardized by the observed species richness were higher by far at high elevations compared with mid-range or low elevations. The assemblage prediction success, kappa and specificity, standardized by the observed species richness, confirmed the increase of S-SDM performance with elevation. The relative sensitivity also increased with elevation, whereas the Jaccard index showed a U-curve-like trend (Fig. S3.2 of Appendix S3). The absolute sensitivity (and to some extent the Jaccard index) and specificity were strongly and positively correlated with the predicted species richness (Fig. S3.1 of Appendix S3): the more species the S-SDM predicts, the better it predicts true presences, and conversely the fewer species it predicts, the poorer it predicts true absences.

We only report here on the assemblage prediction success, assemblage specificity and assemblage sensitivity for the analyses that tracked the association between the assembly rules and variation in the S-SDM accuracy. The results with the other metrics provided complementary support to our conclusion or non-significant trends (Appendix S4). The assemblage prediction success was only significantly related to the community aggregated canopy height (CHCA; R2 = 0.49; Fig. 5b). We observed a significant decrease (R2 = 0.21; Fig. 5c) of assemblage specificity with the deviation in the Rao index of canopy height values compared with the null expectation (measured as the SES of Rao: CHSES-Rao), and the specificity decreased with the community aggregated canopy height (CHCA; R2 = 0.49; Fig. 5d). The assemblage sensitivity increased with CHSES-Rao (R2 = 0.13; Fig. 5e) and CHCA (R2 = 0.21; Fig. 5g). The observed patterns revealed that the best prediction success and specificity, the worst sensitivity and the highest elevations were almost exclusively associated with the significant convergence of CH. On the contrary, the worst prediction success and specificity, the best sensitivity and the lowest elevations were associated equally with the convergence, divergence and null distribution of CH. Regarding SLA and LDMC, we observed significant, although weak, positive relationships between SLASES-Rao (SLA for specific leaf area) and LDMCSES-Rao (LDMC for leaf dry matter content) with assemblage specificity and negative relationships with assemblage specificity (Appendix S4). LDMCCA showed a significant but hardly interpretable inverse-parabolic (decreasing then increasing) trend with the assemblage specificity and a parabolic (increasing then decreasing) trend with the assemblage sensitivity. SLACA showed a significantly negative relationship with the assemblage specificity and a positive relationship with the sensitivity (Appendix S4).

Figure 5.

Variation in the accuracy of stacked-species distribution models (S-SDMs) with plant functional patterns in the evaluation plots. The plant functional trait being considered here is canopy height (CH, measured in mm). Its pattern was estimated using community aggregated mean (CHCA, in mm) and evenness using the standardized effect size of the Rao index (CHSES-Rao). Null distribution of trait values was estimated using 9999 simulations. Positive values of CHSES-Rao indicate that the observed distribution of canopy height tends to be more dissimilar than by chance (trait divergence indicative of limiting similarity) while negative values indicate more similarity (trait convergence indicative of environmental filtering). Evaluation metrics are the means over 9999 community samples based on the predicted probabilities of the presence of the 211 species. Solid lines represent significant trends fitted with generalized additive models.

Discussion

Using S-SDMs, we documented detailed variations in the accuracy of assemblage predictions along a 2700-m wide gradient, which is considered partially as a surrogate of a stress–productivity gradient. Such an evaluation had not previously been performed, despite its crucial importance for assessing the reliability of S-SDMs as a tool to derive scenarios of biodiversity responses to global change along large environmental gradients (Thuiller et al., 2005; Engler et al., 2011). We further showed that the S-SDM accuracy changed with community assembly rules, using analyses of functional trait patterns at the community level. The exact causes behind these observations require careful interpretation and are discussed in detail in the following sections.

Errors in plant species assemblage predictions vary in space

An important result is the decrease of the confidence intervals of the SDMs with increasing elevation. Individual SDMs are usually evaluated with metrics based on all of the predictions and locations (global accuracy as calculated with the AUC). However, our results strongly suggest that modelled plant species responses can show important variation in their accuracy when used for prediction across space. Previous findings on non-stationary species–environment relationships (Osborne et al., 2007) suggest that this variation occurs for many taxa and ecosystems at different scales. Therefore, proper evaluations of SDM predictions should provide estimates of the variation in their quality over space in addition to global metrics (Hanspach et al., 2011).

Our study further revealed varying abilities of stacked-SDMs (S-SDMs) to reproduce plant community patterns. We observed an overall tendency of the S-SDMs to overpredict species richness, which is consistent with previous results based on different scales, regions and taxa (Feria & Peterson, 2002; Pineda & Lobo, 2009; Trotta-Moreu & Lobo, 2010) and is in line with theoretical expectations (Guisan & Rahbek, 2011). However, the novel finding here is that such overprediction can vary in space along environmental gradients. Indeed, the S-SDM approach did not reproduce the humpback curve of species richness along elevation, which is a common pattern when a large range of elevations is considered (Nogués-Bravo et al., 2008). Instead, the S-SDMs predicted constant and high species richness from low to mid-range elevations and then a decrease towards high elevations. This prediction is consistent with better predictions of species richness reported by Guisan & Theurillat (2000) and by Dubuis et al. (2011) at high elevations. However, such a decrease was much slower than the observed one. For instance, the S-SDMs predicted a minimum number of species of approximately 18 at the highest elevations, while we recorded three to four species in the field. This mismatch resulted in an overprediction of species richness of up to 500%, which is translated into two opposite tendencies with regard to compositional evaluation metrics. On the one hand, the indices accounting for the correct predictions of absences (overall prediction success, kappa and specificity) showed better S-SDM performance towards higher elevations, as the S-SDMs predicted a decrease in species richness towards high elevations (although too slowly) but not towards low elevations. These findings suggest that the S-SDMs probably captured important ecological filters that restrict the composition of local species pools from the regional pool (Keddy, 1992; Zobel, 1997) at high elevations but likely not at low elevations. On the other hand, the evaluation indices focusing on correct predictions of presence (sensitivity and the Jaccard index) reported worse prediction accuracy at high elevations than at mid-range and low elevations. This finding implies that strong limitations seem to affect S-SDMs towards high elevations, which need to be addressed.

Overall, considering all of the evaluation criteria, our results did not unequivocally verify our H1 hypothesis that the S-SDM performance improved towards high elevations. However, our results did not refute the reasoning behind this hypothesis. Rather, the highlighted patterns depict a more complex picture, as we discuss below.

Insufficient species filtering towards high elevations

The best assemblage specificity and prediction success were almost exclusively associated with strong environmental filtering (i.e. trait convergence) of small plant species towards unproductive environments at high elevations. There, the harsh climate is characterized by extreme temperatures and short growth periods. These conditions impose strong constraints on the physiology of plants, allowing only species of small stature to locally persist and coexist (Körner, 2003). Because climate and topography also influence community- and ecosystem-level processes, climatically stressful conditions may reduce the potential for competitive interactions to occur and the associated patterns of limiting similarity (i.e. trait divergence) for growth traits (Mitchell et al., 2009). Our results showed that the strong climatic filtering of species could be reproduced using the S-SDMs, providing some support to the H2A hypothesis that the most accurate S-SDM predictions are predominantly associated with strong environmental filtering in climatically harsh environments. However, such support holds when considering the assemblage specificity and prediction success and when focusing on canopy height. When considering the assemblage sensitivity and other traits (specific leaf area and leaf dry matter content), the link between the S-SDM performance and particular assembly rules becomes hardly interpretable or non-existent. Moreover, the results tend to indicate that topo-climatic factors are not sufficient to restrict local species assemblages at very low values of species richness, as observed in situ. This finding suggests that important additional filters were not accounted for, even indirectly, in our set of predictors. For instance, a wide range of snow duration and snow depth can be found for similar climatic conditions because of snowdrift and its control by wind and topography (Litaor et al., 2008). Geomorphological processes and pedogenesis are very good candidate predictors (Randin et al., 2009) that would need to be added. Alternatively, the SDMs may have been flawed by data accuracy (Hanspach et al., 2011). For instance, the computation and resolution of the climatic data cannot account for wind chill or small-scale microclimatic refuges (Scherrer & Körner, 2011). The prediction error at high elevations may also result from the decoupling of low-stature plants from the atmospheric conditions (Körner, 2003) captured by standard 2 m air-temperature weather stations. Consequently, the modelled species responses to temperature may probably not reproduce what species actually experience. In addition, in the alpine zone, vegetation cover is often scarce and patchy in a matrix of bare soil and rocks, implying that the evaluation of SDMs may be affected by a sampling effect at high elevations. Indeed, SDMs predict the potential presence of all species that can face local climatic conditions. Therefore, mismatches between observed and predicted assemblages may also result from the fact that vegetation records, conducted in relatively small sampling units (2 m × 2 m) at a given time, can easily miss species from the local pool that were (for some reason, such as local extinction by disturbances or too small plot size to capture all species) not observed in the sampled unit but were occurring nearby in the same habitat type (and thus were probably captured in other plots with similar conditions). Such sampling effects (in space and time) may prove especially important in sparse and disturbed vegetation, as found at the highest elevations, because growth is very slow there and thus recolonization after extinction is likely to be significantly delayed. Finer-resolution environmental predictors, and especially those related to substrate, would also be needed to better match the fine vegetation sampling units used in this study. Finally, we cannot exclude the hypothesis that a small part of the mismatch may also be due to overlooked species, i.e. species that were simply missed in the plots, because this type of error seems hardly avoidable in vegetation sampling, even by the best botanists (Vittoz et al., 2010). However, this factor should remain constant along the elevation gradient, or should rather be problematic at lower elevations, where denser vegetation may hide some seedlings or small-stature plants. Therefore, it cannot solely account for the strong overprediction at higher elevations.

Non-equilibrium with climate towards low elevations

The reported links of S-SDM performance and community assembly rules also provided some support to our hypothesis H2B that less accurate predictions of S-SDMs are associated with a larger spectrum of assembly rules in the productive habitats found at low elevations. Indeed, we reported a much larger variety of assembly rules as the specificity, prediction success and elevation decreased. Moreover, patterns of trait convergence/divergence along elevation indicated that biotic assembly rules (i.e. a limited similarity of canopy height values) worked mainly in lowlands and that null assembly was more frequent in lowlands than in highlands. In mild climatic conditions, some places may be fertilized by agriculture, enhancing competitive interactions (Keddy et al., 1997), leading either to a limited similarity of coexisting species or to apparent filtering of high-stature plants due to the exclusion of small species. In other locations, grazing may impose strong trait filtering (Díaz et al., 2007) or release niche processes (Mason et al., 2011) so that stochastic processes prevail. In addition, it is recognized that traditional practices sustain low levels of soil resources (Maurer et al., 2006), resulting in strong environmental filtering due to harsh abiotic conditions in the lowlands, although these habitats often show high diversity. Together, these drivers of community assembly can occur in different locations at the same time under the same mild climatic conditions and may thus overcome the influence of climate. In such situations, the species and community distributions are most likely not at equilibrium with the climate (at those elevations, forest is the natural climax in this region). S-SDMs based on topo-climatic factors alone exhibit greater uncertainty (Fig. 3) and necessarily lose their capacity to reliably filter the species from the landscape pool. In turn, species overprediction may have inflated the probability of predicting the correct species, resulting in greater assemblage sensitivity towards lower elevations.

Perspectives for biodiversity modelling

The main lesson emerging from these results is that the accuracy of species assemblage patterns reconstructed from topo-climatic SDMs can vary significantly in geographical space, especially where steep environmental gradients prevail, such as in mountainous landscapes. This finding strongly calls for a systematic evaluation of assemblage predictions even when individual SDMs show overall good predictive power. Moreover, we found that the increase or decrease of S-SDM prediction accuracy with elevation strongly depends on the evaluation criteria considered. While it is common practice to evaluate model predictions using a single general metric (e.g. kappa), our results clearly suggest that assemblage specificity and assemblage sensitivity (and their spatial variation) should also be assessed.

A second lesson learned from our findings is that S-SDMs based on topo-climatic factors may not represent species distribution and assembly drivers equally well along ecological gradients. For instance, in mild climatic conditions, such as those found in our study at lower elevations, species distributions and assemblies are in strong disequilibrium with the climate. Instead, other factors mediated by human activities may play a more prominent role. This disequilibrium is likely to be the case for other mountain ranges in the world (Nogués-Bravo et al., 2008). At higher elevations, species are likely to be under strong climatic determinism, but additional environmental filters and high-resolution data as well as more proximal climatic data may need to be considered.

As an important perspective, the generality of our findings should be tested by applying S-SDMs to other datasets along distinct altitudinal and latitudinal gradients. Further improvements of species assemblage predictions in a changing climate should account for additional local assembly drivers, such as competitive interactions, and additional environmental filtering other than climate and stochastic events (Guisan & Rahbek, 2011). Finally, new modelling approaches should be developed to incorporate the variability of the strength of assembly drivers over geographical and environmental spaces.

Acknowledgements

This research was conducted in the framework of the ECOCHANGE project funded by the Sixth European Framework Programme (grant GOCE-CT-2007–036866) and the BIOASSEMBLE project, funded by the Swiss National Science Foundation (SNF grant no. 31003A-125145 to A.G.). Large computations were performed at the Vital-IT (http://www.vital-it.ch) Centre for high-performance computing at the Swiss Institute of Bioinformatics. We are grateful to all the people involved in the data collection. We would like to thank D. D. Ackerly, R. Field, D. Currie and three anonymous referees for very helpful comments.

The Spatial Ecology group at the University of Lausanne (http://www.unil.ch/ecospat; led by A.G.) is specialized in niche-based spatial modelling of species, diversity and community distributions, using empirical data, statistical models and more dynamic approaches. A strong focus is given to use models and their predictions to support conservation management, such as of model-based sampling of endangered species, the assessment of the response of biodiversity to climate and land-use changes, and evaluating the invasive potential of neophyte species.

Author contributions: J.P. and A.G. designed the study. J.P., A.D., L.P., C.F.R., L.R., P.V. and A.G. collected the data. J.P. conducted the analyses with the help of A.D., and J.P. and A.G. wrote the manuscript with the help of A.D., L.P., L.M., C.F.R. and P.V.

Ancillary