Improving the prediction of plant species distribution and community composition by adding edaphic to topo-climatic variables


corresponding author,



Soil properties have been widely shown to influence plant growth and distribution. However, the degree to which edaphic variables can improve models based on topo-climatic variables is still unclear. In this study, we tested the roles of seven edaphic variables, namely (1) pH; (2) the content of nitrogen and of (3) phosphorus; (4) silt; (5) sand; (6) clay and (7) carbon-to-nitrogen ratio, as predictors of species distribution models in an edaphically heterogeneous landscape. We also tested how the respective influence of these variables in the models is linked to different ecological and functional species characteristics.


The Western Alps, Switzerland.


With four different modelling techniques, we built models for 115 plant species using topo-climatic variables alone and then topo-climatic variables plus each of the seven edaphic variables, one at a time. We evaluated the contribution of each edaphic variable by assessing the change in predictive power of the model. In a second step, we evaluated the importance of the two edaphic variables that yielded the largest increase in predictive power in one final set of models for each species. Third, we explored the change in predictive power and the importance of variables across plant functional groups. Finally, we assessed the influence of the edaphic predictors on the prediction of community composition by stacking the models for all species and comparing the predicted communities with the observed community.


Among the set of edaphic variables studied, pH and nitrogen content showed the highest contributions to improvement of the predictive power of the models, as well as the predictions of community composition. When considering all topo-climatic and edaphic variables together, pH was the second most important variable after degree-days. The changes in model results caused by edaphic predictors were dependent on species characteristics. The predictions for the species that have a low specific leaf area, and acidophilic preferences, tolerating low soil pH and high humus content, showed the largest improvement by the addition of pH and nitrogen in the model.


pH was an important predictor variable for explaining species distribution and community composition of the mountain plants considered in our study. pH allowed more precise predictions for acidophilic species. This variable should not be neglected in the construction of species distribution models in areas with contrasting edaphic conditions.

Aeschimann & Heitz



soil acidity


area under the curve


clay content


carbon-to-nitrogen ratio


soil humus content


leaf dry matter content


nitrogen content


soil nutrient content


phosphorus content


sand content


species distribution model


silt content


specific leaf area




true skill statistics


vegetative height


The ecological preferences of plant species for particular soil properties are known to influence plant distributions. For instance, Alvarez et al. (2009) demonstrated that ecological preferences for different bedrock types and thus, soil acidity, determined the primary patterns of refugia and plant migrations during the last glacial era and the subsequent climate variations of the Holocene. Similarly, the manuring of agricultural grasslands within the past decades has induced profound changes in their floristic compositions (e.g. Zechmeister et al. 2003; Peter et al. 2008).

The success of a plant is largely conditioned by the soil chemical properties. These properties determine ion concentration and soil structure and are important for the soil capacity to hold water. Ions contained in the soil (e.g. N, Ca or P) are used as nutrients by plants (Aerts & Chapin 2000). Hence, variations in ionic concentrations in the soil directly impact plant growth and, consequently, the formation of plant communities. To a large extent, the availability of ions depends on soil acidity, making soil pH another important factor influencing plant distribution (Gobat et al. 2004). Low soil acidity (high pH) can prevent the release of important ions, causing nutrient deficiencies in the plant. High soil acidity can also cause deficiencies because elements such as P or N can form complexes with other ions and become unavailable to plants (Gobat et al. 2004). Conversely, high soil acidity (low pH) enhances the solubilization of several elements, especially metals such as aluminium or iron that can be toxic to maladapted plants. Finally, soil texture has a physical impact on plants by supporting or limiting root growth and influencing the amount of water and oxygen available to the plants (Bouma & Bryla 2000; Martre et al. 2002). The proportion of clay also directly influences the clay-organic complex and the associated soil fertility (Gobat et al. 2004).

Plants vary in their functional traits and thus in their preference towards edaphic conditions. Plants displaying tougher, long-living leaves, which better conserve nutrients in the tissues, are expected to grow better in acidic soils, which are frequently nutrient-poor. In contrast, if nutrients are not limiting, plant species that are able to rapidly assimilate nutrients and to convert them into biomass have a competitive advantage over less efficient species (Grime 1977; Pellissier et al. 2010a). Finally, generalist species, which are able to grow under a wide range of soil conditions, are less affected by edaphic factors.

Thus, soil is an important factor explaining plant distributions. However, the degree to which edaphic variables contribute to species distributions, beyond other environmental variables such as climate or topography, is still unclear. One way to address this question is to ask to what degree the predictive power of species distribution models (SDMs; Guisan & Zimmermann 2000) can be improved by the addition of edaphic variables as predictors. Although numerous studies have used SDMs during the last decade, few have focused on the influence that soil may have on plant distributions. Most of these studies have investigated trees and shrubs (Coudun & Gegout 2005, 2007; Coudun et al. 2006; Marage & Gegout 2009), which show small functional differences between species. However, the importance of soil-derived variables in SDMs will most likely vary widely in accordance with species traits.

We evaluated the importance of seven edaphic variables ((1) soil pH, its content of (2) N and of (3) P, (4) sand (Sa), (5) silt (Si) and (6) clay (Cl), and (7) the C:N ratio (C:N)) to explain the distributions of 115 mountain grassland plant species with different ecological characteristics (i.e. the ecological indicator values for their preferred soil properties (Landolt et al. 2010)) and functional traits, as well as the composition and species richness of the plant communities. More specifically, we explored the extent to which the edaphic variables used as predictors improved the predictive power of the SDMs based on topo-climatic variables, and determined the characteristics of the species with the most-improved model results. In a second step, we built topo-climatic models for each species containing the most influential edaphic variables to test the relative importance of the variables and related the importance values of the variables to the different ecological and functional characteristics of the species. Finally, we predicted plant communities from the SDM results and assessed the importance of edaphic variables in this context. For these analyses, we used a comprehensive data set of vegetation plots, together with a set of environmental predictors at a fine resolution, and a data set of plant functional traits for all modelled species.


Study area

Our study area is located in the Western Swiss Alps (Canton de Vaud, Switzerland, 46°10′–46°30′ N; 6°50′–7°10′ E; Fig. 1). It covers ca. 700 km2 and has an elevation from 375 to 3210 m a.s.l. The climate is temperate. The annual temperatures and precipitations vary from 8 °C and 1200 mm, respectively, at 600 m to −5 °C and 2600 mm at 3000 m (Bouët 1985). The bedrock is primarily calcareous. The soils are variable, ranging from deep brown soils to shallow lithological soils, the latter exerting more influence on vegetation. Basic cations and carbonates promote an alkaline pedogenesis that can be followed by an acidic pedogenesis after carbonate lixiviation (Gobat et al. 2004). Spaltenstein (1984) demonstrated a locally important post-glacial loess accumulation in this region. The accumulation of loess accelerated the acidic transition of the pedogenesis. Human influence on the vegetation is important. Deforested areas are often used as pastures or mowed, and manuring is important from the lowlands to the sub-alpine belt. The grasslands in the alpine belt are used as pastures without manuring. For more information on this geographical area, see Randin et al. (2006).

Figure 1.

Map of the study area situated in the western Alps of Canton de Vaud, Switzerland. The 252 plots for which we have edaphic and vegetation data are indicated as white squares, the background shows the relief of the study area; the meteorological stations are displayed as stars.

Species data

During the summer of 2009, 252 vegetation plots of 4 m2 were chosen using a random stratified sampling procedure (Hirzel & Guisan 2002) based on elevation, slope and aspect, and were extensively inventoried. The sampling was limited to open, non-woody areas. The vegetation plots cover an altitudinal range from 820 to 2680 m a.s.l. (Fig. 1). Each sampling point was separated from the others by a minimum distance of 200 m, as it has been shown that from this distance onwards, no autocorrelation is observed between plots in this study area (Pottier et al. 2013). We only included the species that occurred in more than 20 plots throughout the data set for the following modelling analyses (i.e. 115 species; App. S1).

Climate and topographic predictors

Three climate predictors (degree-days, moisture index of the growing season and global solar radiation) and two topographic predictors (slope and topographic position) were used at a resolution of 25 m. The climate predictors were computed from the monthly means of the average temperature (°C) and sum of precipitation (mm) data recorded for the period 1961–1990 by the Swiss network of meteorological stations ( These data are interpolated on Switzerland based on a 25-m resolution digital elevation model (from the Swiss Federal Office of Topography, with local thin-plate spline-functions for temperature and a regionalized linear regression model for precipitation (Zimmermann & Kienast 1999). Degree-days are defined as the sum of days of the growing season (June, July and August) multiplied by the temperature above 0 °C, see Zimmermann & Kienast (1999) for the formula. The moisture index is defined as the difference between precipitation and potential evapotranspiration and represents the potential amount of water available in the soil at a site. In this study, we use the sum of the mean daily values for the growing season. The potential global solar radiation was calculated over the year. The methods for computing these predictors are described in more detail in Zimmermann & Kienast (1999). The slope in degrees was derived from the digital elevation model with ArcGis 9.3 spatial analyst tool (ESRI 2008). The topographic position is an integration of topographic features at various scales and is computed with moving windows. Positive values of this variable indicate relative ridges and tops, whereas negative values correspond to valleys and sinks. This variable was calculated using the method of Zimmermann et al. (2007). These five variables are recognized as theoretically meaningful for explaining plant distributions (Körner 1999) and have already been used successfully in several modelling studies in the same study area (Engler et al. 2009; Randin et al. 2009a; Pellissier et al. 2010b).

Edaphic predictors

During the vegetation field survey, five soil samples were taken per vegetation plot; one sample was taken from each corner of the plot, and one was taken from the middle. The samples were taken from the top 10 cm of the soil after removing the first organic soil horizon, corresponding primarily to the organo-mineral horizon (Gobat et al. 2004). The samples were eventually mixed to equalize intra-plot variation, air-dried, sieved at 2 mm and ground into powder.

We measured the following soil properties: water pH, total N and P content, organic C and mineral texture in terms of three classes (clay, silt and sand). The pH was measured with a pH meter after diluting soil in water in a 1:2.5 proportion. The total C and N content were analysed using element analysis after combustion with a CHN analyser. We then estimated the amount of organic C by removing the mineral C (derived by calcimetric analyses with a Bernard calcimeter) according to the methods of Baize (2000), if the soil pH was above 6.5. Phosphorus content was determined by colorimetric analysis after mineralization at 550 °C (Baize 2000).

Species distribution modelling

A summary of the following analyses can be found in the flowchart presented in Fig. 2. We used the BIOMOD library (Thuiller et al. 2009) in the R software (2.13.0, R Foundation for Statistical Computing, Vienna, Austria) to model the distribution of the 115 plant species using four different modelling techniques (two regression methods and two classification methods): generalized linear models (GLM; McCullagh & Nelder 1989), generalized additive models (GAM; Hastie & Tibshirani 1990), generalized boosted models (GBM; Friedman et al. 2000; Ridgeway 1999) and random forests (RF; Breiman 2001). We used four methods as the variability between techniques has been raised in many papers as an important source of uncertainty (Elith et al. 2006; Guisan et al. 2007; Buisson et al. 2010). Ensemble modelling (Araujo & New 2007) is a way to limit and quantify this uncertainty. The four modelling techniques used in this study were shown to provide satisfactory predictions of species distributions in a large comparative study (Elith et al. 2006). GLM and GAM were calibrated using a binomial distribution and a logistic link function, and polynomials were allowed up to second-order (linear and quadratic terms) for each predictor. GBM was calibrated using 2000 trees and RF with default parameters from BIOMOD. We calibrated eight sets of models for each species, one set of models using only the topo-climatic predictors and one set of models for each of the seven edaphic predictors (pH, N, P, Sa, Si, Cl and C:N) in addition to the five topo-climatic (TC) predictors. The distribution of all predictors is presented in App. S5, and the correlation between them in App. S3. The models were validated with a repeated (ten times) random split-sample procedure using 70% of the data points to fit the models and 30% for an independent validation. For each data partition, the predictive power was estimated with the area under the curve (AUC) of a receiver operating characteristic plot (Fielding & Bell 1997) and the true skill statistic (TSS; Allouche et al. 2006). The AUC values range from 0.5 for models with random predictions to 1.0 for models perfectly fitting the data. A model is rated as fair if its AUC is higher than 0.7 (Swets 1988). The TSS values vary between 0 for a random model and 1 for a model showing perfect agreement.

Figure 2.

Flowchart summarizing the sequence of the analyses. For each species, four modelling techniques were used to build sets of eight models (a), first with topo-climatic predictors only (TC models), then with topo-climatic and each of the seven edaphic variables (TC + S models). Models were evaluated with AUC (b), and differences in AUC were computed between TC models and each TC + S model (c). Differences in AUC (mean across the four modelling techniques) were analysed to assess their respective influence in the models across the species (as a function of their ecological characteristics and functional traits) (d). A final model set was built for each species with topo-climatic predictors and the edaphic predictors that most ameliorated the first models (e). A variable importance analysis was made on these more parsimonious models (f). The importance of the edaphic variables conserved in the final models was related to species ecological characteristics and species traits to characterize species for which the retained edaphic variables were important in the models. We finally predict vegetation communities by stacking SDM results and evaluate the resulting prediction with the Sørensen index and by comparing the predicted species richness with the observed richness (not shown in the flowchart).

Predictive power analysis

We evaluated the global contributions of each edaphic variable to the models of our species by focusing on the mean change in the predictive power (AUC or TSS) of all four modelling techniques caused by the addition of the respective edaphic variable to the models. We used the mean of the change across the four techniques, instead of relying on one modelling technique alone to draw conclusion on the variables contributions in the models. Using the mean changes is a way to consider only changes that are consistent through modelling techniques. To verify whether the different modelling techniques showed a comparable variation due to the addition of edaphic predictors, we measured the correlations of the predictive power change due to the addition of each edaphic predictor for each pair of modelling techniques. We then calculated the percentage of species with improved or worsened models. To discard species for which the change in the AUC was too small, we only considered the predictive power of the model to be improved or worsened if the AUC difference exceeded 0.02 (ad-hoc threshold). Previous studies had simply quantified any improvement or degradation of the AUC (Randin et al. 2009b,c), so our approach is comparatively more conservative in the sense that our estimates of improvement are, at worst, underestimated.

To gain a better understanding regarding the ecological implications of the results, we explored the changes in predictive power for the species after grouping them according to three categorical ecological characteristics. We used three ecological indicator values (Landolt et al. 2010) to classify our species according to their predicted tolerance to three different soil characteristics: soil pH, nutrient content and humus value. The ecological values for soil pH ranged from 1, for species with tolerance to low pH, to 5, for species growing in base-saturated soil. The nutrient content values ranged from 1, for species that grow on poor soils, to 5, for species favouring nutrient-rich soil. The humus content values ranged from 1, for species growing on humus-poor soil, to 5, for species preferring humus-rich soil. We also considered the predictive power changes as a function of three continuous functional traits using linear regression The three functional traits for which we had collected field measurements on most of the modelled species are leaf dry matter content (LDMC), specific leaf area (SLA) and plant vegetative height (VH). LDMC, SLA and VH were averaged from measurements of ten individuals of each species, all sampled within the study area (see Pottier et al. (2013) for more information). All ecological characteristics and trait values are presented in App. S1.

Variable importance analysis

To investigate the relative importance of predictor variables in the models, we built a final set of models for each plant species. The models contained the topo-climatic predictors and those edaphic variables that caused AUC amelioration for most species in the first part of the analysis. For each species, the importance of each variable in the models was assessed in BIOMOD by randomizing each variable individually and then projecting the model with the randomized variable while keeping the other variables unchanged [Correction made after online publication October 29th, 2012: the sentence has been rephrased, ‘… and then recalibrating …’ has been changed to ‘… and then projecting …’]. The results of the model containing the randomized variable were then correlated with those of the original models. Finally, the importance of the variable was calculated as one minus the correlation; higher values indicate predictors that are more important for the model (Thuiller et al. 2009). This analysis was repeated five times for each modelling technique, and the resulting variable importance values were averaged. To investigate changes in predictive power, the variable importance values (averaged among the four modelling techniques) were displayed for all species as a function of the ecological characteristics and functional traits mentioned above.

Community composition prediction

For each model the predicted probability at each sampling plot was transformed into binary presences or absences by maximizing the sensitivity and specificity of the prediction threshold. By stacking the binary prediction for each species, we could predict a community composition for each plot. This stacking of model outputs was performed for each of the nine sets of predictors (topo-climatic only, topo-climatic plus each of the above-cited edaphic variables, topo-climatic plus the edaphic variable chosen for the variable importance analysis) and each of the four modelling techniques. We obtained 36 different predicted communities for each plot. Those predicted communities were evaluated by comparison with the observed data set by computing the Sørensen index (Sørensen 1948), which estimates the similarity between the predicted and the observed communities, as follow:

display math

where a is the number of species that are observed as well as predicted as present, b the number of species observed as present but predicted as absent, c is the number of species observed as absent but predicted as present. A Sørensen index of 0 means that there is no agreement between the predicted and observed community, while a value of 1 indicates a perfect match between the two communities. The correlation between observed and predicted species richness, as well as the mean absolute error of species richness, were also computed. Species richness is a simple and widely used index of biodiversity that can be modelled by stacking individual species distribution models. However, this method has been demonstrated to overestimate species richness (Dubuis et al. 2011). We aim to investigate if the addition of edaphic variables in the models can allow reduction of this bias. The three evaluation metrics were averaged for the four modelling techniques for each set of predictors. The results obtained were compared between models with and without the edaphic predictor to assess the relevance of the respective edaphic variable to predict plant communities.


Model results

Models containing topo-climatic predictors had AUC values ranging from 0.52 to 0.93 (mean = 0.76; App. S1) and TSS values from 0.16 to 0.84 (mean = 0.51; App. S2). These values represented model performance levels ranging from poor to very good. Because the AUC and TSS values are consistent (correlation from 0.93 to 0.95; App. S6), we will focus the following Results and Discussion sections on the AUC results. The AUC were comparable among the four modelling techniques in the eight models sets, with no technique giving higher or lower evaluation values (App. S7). The correlations computed between the four modelling techniques, for the changes due to the addition of edaphic predictors, were all significant and ranged from 0.39 and 0.89 (App. S4). Therefore we consider the average change in AUC among the four modelling technique in the following presentation of results.

Soil pH was the edaphic variable that best improved the models; it positively influenced the models for 47% of the species, with an average AUC increase of 0.05 for the improved models and 0.02, on average, for all models (Table 1). The species for which models were most ameliorated by the addition of pH were Vaccinium myrtillus (average AUC increase of 0.18 ± 0.02; App. S1), Nardus stricta (average AUC increase of 0.12 ± 0.01), Alchemilla vulgaris, Linum catharticum and Hieracium lactucella (average AUC increase of 0.1 ± 0.03 for the three species). Total N content contributed the second most important AUC increase (0.04, on average, for the improved species and 0.005 for all species) and improved the models for 25% of the species. Total N was important in the models for Prunella grandiflora (average AUC increase of 0.1 ± 0.02), Veronica serpyllifolia (average AUC increase of 0.09 ± 0.01), Deschampsia cespitosa (average AUC increase of 0.09 ± 0.04) and Knautia dipsacifolia (average AUC increase of 0.08 ± 0.01). The C:N ratio and clay content positively changed the model for more than half of the species, but only slightly (0.003 and 0.002 on average, respectively). The P, silt and sand content changed the AUC negatively for at least half of the models.

Table 1. A summary of the changes induced in the AUC upon including edaphic variables in the topo-climatic models
 Mean AUC change% of species with AUC increase of more than 0.02Mean AUC increase% of species with AUC decrease of more than −0.02Mean AUC decrease% of species with AUC change between −0.02 and +0.02
  1. The table lists the mean change for all models, the percentage of species for which the AUC increase is higher than 0.02 and the mean increase for these species, the percentage of species for which the AUC decrease is higher than 0.02 and the mean decrease for these species, and the percentage of species for which the AUC is not strongly influenced by the addition of edaphic variables in the models.


Species characteristics and predictive power

The predictive power of the models for the small number of species growing preferentially on acidic soil, as determined by their ecological indicator values (Landolt et al. 2010), was improved by adding soil pH to the model (acidity values 1 and 2; Fig. 3a). The models of species with preferences for high humus content and low soil nutrient values displayed a trend towards model amelioration, but these results have to be carefully considered due to the small number of species that present these characteristics (Fig. 3b, c). The AUC changes caused by the other edaphic predictors did not show any trends that could be related to ecological indicator values (App. S8).

Figure 3.

The change in the AUC upon the addition of edaphic predictors in the models, compared to topo-climatic models, for each species (mean of the four modelling techniques). The plants are classified according to their preferred soil conditions regarding (a) soil acidity, (b) soil humus content and (c) soil nutrient content, as in Landolt et al. (2010). The width of the box-plots is proportional to the number of species showing the indicator value of interest (there was only one species in the categories 1 of acidity and nutrient and the category 5 of humus). The middle line is the median, and the outer limits of the box-plots are the 1st and 4th quartiles.

The analysis of functional traits revealed a significant negative relationship between the average AUC change for each species caused by the addition of N content as a predictor in the models and its SLA (F = 8.32, df = 103, P-value = 0.005; Fig. 4e). The addition of pH to the models also negatively influenced species with high SLA (F = 4.18, df = 103, P-value = 0.043; Fig. 4b) but positively influenced species with high LDMC (F = 4.73, df = 103, P-value = 0.032; Fig. 4c). No other significant relationships could be demonstrated (App. S9).

Figure 4.

AUC changes as a function of plant functional traits. The first line shows the change in AUC, induced by adding pH as a predictor in the species distribution models (mean of the four modelling techniques), as a function of the species (a) VH, (b) SLA and (c) LDMC. The second line shows the change in AUC induced by adding N as a function of the species (d) VH, (e) SLA and (f) LDMC. The dashed line is the linear regression.

Species characteristics and the importance of variables

To build the final model, we used pH and N content to supplement the topo-climatic variables, because pH and N content were the most influential edaphic variables in our data set. Comparing all variables based on their importance to the models revealed degree-days and soil pH as the two most important predictors among all species, followed by slope, moisture index and solar radiation. On average among all species and all modelling techniques, N was the second least important variable (Table 2).

Table 2. Summary of variable importance, as calculated with BIOMOD, for each modelling technique and their means
  1. The larger the value, the more important the variable is in the model. For this analysis only, the two edaphic variables that most improved the AUC in previous analysis (pH and N) were kept in addition to topo-climatic predictors.

Topographic position0.0470.0420.0660.0560.053

Sorting the variable importance results for each species according to the ecological indicator values for soil acidity showed that soil pH was more important in models of species that tolerate high soil acidity (acidity value 1 and 2; Fig. 5a) and high humus content (humus values 4 and 5; Fig. 5b). However, no evidence could be found for the importance of soil N content (Fig. 5d, e, f).

Figure 5.

Importance of edaphic variables as a function of species ecological preferences. The importance (mean of the four modelling techniques) of pH (first row) and N (second row) used as predictors in the models for each species, for the species classified according to their preference for soil acidity (a and d), humus (b and e) and nutrient content (c and f) according to ecological indicator values (Landolt et al. 2010). The width of the box-plots is proportional to the number of species showing the indicator value of interest (there was only one species in the categories 1 of acidity and nutrient and the category 5 of humus). The middle line is the median, and the outer limits of the box-plots are the 1st and 4th quartiles.

Considering the variable importance as a function of the three continuous functional traits, we showed that pH importance in the models was related to species VH; pH was significantly more important for species with smaller stature (F = 4.14, df = 103, P-value = 0.044; Fig. 6a). N importance showed no significant relationship with VH (Fig. 6d) and no significant relationship was detected for the importance of either edaphic variable as a function of SLA or LDMC (Fig. 6b, c, e, f).

Figure 6.

Importance of edaphic variables as a function of species functional traits. The first plot row shows the importance of pH as a predictor in species distribution models (the mean of the four modelling techniques) as a function of the species (a) VH, (b) SLA and (c) LDMC. The second line shows the importance of N as a function of the species (d) VH, (e) SLA and (f) LDMC. The dashed line is the linear regression.

Community composition predictions

Among the seven edaphic variables used in the species models, pH was the predictor that was most effective in improving the community predictions. It allowed reductions in the mean absolute error of species richness prediction and increase the correlation between the observed and predicted richness, as well as the Sørensen index. N produced the second largest improvement in the prediction of community composition. However, the best results for community prediction were obtained by stacking models containing pH and N as well as topo-climatic predictors. All results are presented in Table 3.

Table 3. Results of the community prediction evaluation
Correlation observed-predicted SRMean absolute error SRMean Sørensen
  1. Models for each species were stacked in order to reconstruct a predicted community at each plot. These predictions were evaluated with correlation between observed and predicted species richness (SR), mean absolute error of SR and the Sørensen index. Results presented here for each predictor combination are averages for modelling techniques and plots.



Our study showed that among the seven edaphic variables studied, pH was the most important predictor of plant species distribution and community composition in our mountain landscape, if considered in conjunction with topo-climatic predictors. We also showed that the importance of pH varied across species according to their ecological preferences and morphologies.

How important are soil predictors?

Soil acidity was the most important variable among the soil predictors and was second overall, after degree-days. The importance of degree-days to predict species distributions was expected, given the very large elevation gradient in the study area. The importance of soil acidity was also expected because it affects the plants through nutrient availability and the release of toxic elements; to date, soil acidity has rarely been examined in a modelling study. Peppler-Lisbach & Schroder (2004) previously showed the importance of soil acidity for predicting the distribution of species belonging to communities dominated by Nardus stricta. Coudun et al. (2006) and Coudun & Gegout (2007) similarly observed the importance of this factor for Acer campestre and Vaccinium myrtillus.

The N and P content values were poorer predictors of plant species distribution in our models. This outcome may be due to the fact that these variables, as measured in this study, may not correspond directly to the amounts of N and P that are actually available to the plants. For example, high N content can be associated with a large amount of organic matter and would consequently not be directly available to the plants. Data on available N and P would be more interesting for the present type of study, but these measures are also more time consuming, costly and difficult to obtain due to their high variability throughout the year (Cain et al. 1999). Coudun & Gegout (2007) and Pinto & Gegout (2005) used the C:N ratio as a proxy for available N and showed that it is important for predicting the distribution of Vaccinium myrtillus as well as the dominant tree species in the Vosges Mountains. We were not able to demonstrate the same effect with our data on non-woody mountain plants.

The texture data (sand, silt and clay content) were not good predictors of our species' distributions, contrary to what would be expected based on several previous studies (Barrett 2006; Peper et al. 2010; Kamrani et al. 2011). These predictors may have lacked influence in the models because our study area, although reasonably large (700 km2) and encompassing a wide environmental gradient (400 to 3200 m elevation), does not span the entire soil texture gradient, and therefore lacks plots displaying extreme values of silt, sand or clay content (App. S6).

The results of the community composition prediction analysis confirm the importance of pH for obtaining reliable community composition estimates. Nevertheless, community predictions obtained by stacking single-species distribution models still overestimate the number of species (Dubuis et al. 2011), even if the addition of pH or pH plus N allows this overestimation to be reduced (from ca. ten to eight species). The similarity between observed and predicted community composition (measured with the Sørensen index) is also slightly improved by adding pH and pH plus N.

One limitation of our study is that we used variables from interpolated maps (the climatic predictors) and variables directly measured in the plot (the soils predictors) in the same models. This approach may have influenced our results in the sense that interpolated variables contain uncertainty (Foster et al. 2012) and may consequently be expected to perform less well as predictors in the models, while directly measured variables are more accurate and should be better predictors. This consideration notwithstanding, the best predictor among all those considered remained a climatic one.

Is the importance of edaphic variables for species distribution related to species ecology?

As a predictor of the species distribution in the models, soil acidity was not equally important across all species. Instead, it was related to species characteristics. We related the differences in predictive power between the models with and without edaphic data to the ecological indicator values for soil pH, and found the highest increase for species that only tolerate acidic soil. We could have expected both acidophilic and basophilic species to respond to the inclusion of soil pH in the models. However its importance for basophilic species did not differ from its importance for neutrophilic species. This finding can be explained by the observation that the parent soil in our study area is calcareous. Thus, acidophilic species can only grow where the soil is deep enough to isolate the roots from the effect of the calcareous bedrock because an excess Ca would be toxic (Chytrý et al. 2003; Gobat et al. 2004). Accounting for soil acidity allows a finer discrimination of these particular cases. Including soil pH as a predictor in the model therefore acts as a filter that better allows predictions of a species absence in areas with a suitable climate, but in which the species cannot grow due to the basic nature of the soil. We can expect this phenomenon to be inverted in areas with dominant acidic soils, where species less tolerant to acidity will be unable to grow except in areas that contain calcareous inclusions.

It is possible that similar effects could have been observed with the other ecological gradients if an extended data set had been used. For instance, a slight trend across species with ecological values for humus and nutrients was seen in the models improved by pH (Fig. 3b, c), but too few species were available for the analysis to produce significant results. Species in extreme conditions (indicator values of 1 or 5) are rare in our data set, and often have too few occurrences to allow building models of acceptable quality. The majority of species in the analysis had values of 3 or 4, which represents a reduced gradient. Supplementary sampling of areas with more extreme (but also less frequent) site conditions (very high pH, humus or nutrient content) would allow more comprehensive testing.

Is the importance of edaphic variables for species distribution related to species functional traits?

We showed that the AUC increase caused by adding edaphic variables to the models (either pH or N) was more important for species with a low SLA. pH also significantly improved the models for species with high LDMC. Finally, the importance of pH in SDMs could be demonstrated for a portion of the small-stature species. Although the relationships between changes in AUC and plant functional traits are weak and marginally significant, they can nonetheless be interpreted. These functional attributes (low SLA and VH, high LDMC) correspond to species that grow slowly and sequester nutrients. In our study area, such species are characteristic of nutrient-poor, acidic soils (Rusch et al. 2009; Pontes et al. 2010). These soils display little microbial activity and, consequently, slow litter decomposition and low nutrient availability. Species that require large amounts of nutrients cannot grow under such conditions, whereas those that possess resource-sequestering strategies (i.e. long leaf life span, high dry matter content in the leaves) have an advantage (Aerts & Chapin 2000). Vaccinium myrtillus and Nardus stricta are typical examples of such species. Their LDMCs are among the highest, as are their AUC improvements if pH is included in their models.

Conclusion and perspectives

Overall, our study of a large pool of mountain species showed that even if only a minority of edaphic factors significantly improved the models for the species considered, soil acidity should not be overlooked in modelling species and community distributions. SDMs used in conservation planning or for assessing the distribution of rare species could benefit from the inclusion of soil pH and allow for a finer discrimination of favourable sites than SDMs based on climate variables alone, at least, as our results showed, for acidophilic species. The first step to achieve such goals is to build spatially explicit layers for soil characteristics. This could be achieved first by sampling soil at the same time as the vegetation, and then by applying a modelling procedure as proposed in McBratney et al. (2003) or Shi et al. (2009). However, such modelling, even if giving good results in the mentioned studies, still requires testing and adaptation for areas with a more complex topography. Using better topo-climatic data based on more recent and finer-scale measurements would also certainly help to improve the SDM results.

In the context of perspectives for future research, our results also have implications for global change projections. SDMs have been extensively used for projecting future plant species distributions according to several climate or land-use change scenarios (Dirnböck et al. 2003; Thuiller et al. 2005; Engler et al. 2011; Vicente et al. 2011). However, it is known that soil properties are not stable over time and are subject to on-going changes as a result of N deposition that leads to, among other transformations, acidification and eutrophication (Bouwman et al. 2002; Horswill et al. 2008; Bobbink et al. 2010). This process affects the vegetation in the long term (Sala et al. 2000). In future global change modelling studies, it would be interesting to consider incorporating soil predictors in SDMs. Several soil acidification models have been developed (Reinds et al. 2008; Posch & Reinds 2009), and it would be worthwhile to add soil change scenarios to global change projections, even at coarse resolutions.


We are grateful to all who helped with the data collection, especially S. Godat, C. Purro, J.-N. Pradervand and V. Rion. We thank E. Verecchia, J.-M. Gobat and T. Adatte for advice on soil analysis, as well as B. Bomou and T. Monnier for their help in the lab. We also thank G. Litsios for helpful discussions and three anonymous referees for comments that greatly improved this manuscript. This study received support from the Swiss National Fund for Research (SNF grant Nr. 31003A-125145, BIOASSEMBLE project) and from the European Commission (ECOCHANGE Project).