Introduction
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
Hutchinson (1957) used a mathematical definition of the niche as a hyper-volume, called the fundamental niche, located in an n-dimensional ‘hyper-space’ enclosing conditions that allow a species to survive and reproduce. In an ideal environment, a species is expected to live in a geographical area that strictly corresponds to the projection of its fundamental niche, thus occurring everywhere environmental conditions are suitable (Pulliam 2000). In reality, the fundamental niche is unlikely to be observed and we usually detect the realized niche (Guisan & Thuiller 2005). Interspecific competition may exclude individuals from some parts of their fundamental niche. Moreover, when dispersal across the landscape is limited by a hostile matrix (e.g. geographical barriers or human artefacts), parts of the fundamental niche may remain uninhabited whatever their local suitability (Pulliam 2000). Selection of best areas within the fundamental niche may also limit the extent of the realized niche (Hutchinson 1978). A species might be absent from suitable habitats because of local extinction events or limited dispersal ability, or it might occur in a sink habitat where its population growth rate is less than 1, and thus where it would disappear without constant immigration from source habitats (Guisan & Thuiller 2005).
Niche modelling, also known as species or habitat potential distribution modelling (Guisan & Thuiller 2005), is used to inductively interpolate or extrapolate fundamental niche outside the locations where a species is present (i.e. realized niche), by relating species presence to environmental predictors (Franklin 1995). Niche modelling is important for a range of land management activities. Examples include predicting the distribution of rare and threatened species and plant communities (Engler, Guisan & Rechsteiner 2004), risk assessment of invasive species in new environments (Peterson 2003), and estimation of the likely intensity of biological responses to climate change (Guisan & Theurillat 2000; Thuiller 2004). Reintroduction and augmentation of small natural populations of threatened or rare plant species (Sutherland & Hill 1995; IUCN 1998) should rely on reliable distribution models (Sergio et al. 2007) for management and other applied ecology purposes (Seddon, Anderson & Schapire 2007).
Various methods have been developed for niche modelling. Some of these methods, including generalized linear models (GLM; McCullagh & Nelder 1989) and generalized additive models (GAM; Hastie & Tibshirani 1986), require presence/absence data in order to generate statistical or discriminant functions rules. However, there is growing interest in making use of presence-only data, consisting of occurrences but with no reliable data on where the species was truly absent. In fact, the large majority of available data consist of presence-only data sets, coming from atlases, museum and herbarium records, observational data bases and in situ field surveys (Pearce & Boyce 2006). Therefore, a second group of methods, including for instance the genetic algorithms (GARP; Stockwell & Peters 1999) and Bioclim (Busby 1991), is gaining more consideration. The recently proposed Maximum Entropy (Maxent) algorithm (Phillips et al. 2006) allows the use of presence-only data and categorical predictors. In addition, Maxent has been shown to perform better than other algorithms. For example, Elith et al. (2006) demonstrated that Maxent performed very well when compared to more established methods such as Bioclim, GARP, GAM and GLM. In addition, Barry & Elith (2006) noted similarities among Maxent, GLM, and GAM, specifically in their ability at fitting nonlinear response surfaces that are frequently observed in biological data. Hernandez et al. (2006) tested four modelling methods and showed that Maxent had the strongest performance among the tested methods, since it performed well and remained reasonably stable in prediction accuracy across all sample size categories and produces maximal accuracy levels for the smallest sample size categories. Finally, Sergio et al. (2007) showed that Maxent outperformed GARP when applied to presence-only herbarium collection data. In effect, recent studies on potential distribution assessment are mainly focused on the comparison among methodological tools. Although it is a critical topic, many modelling features remain largely overlooked and several authors (Araujo & Guisan 2006; Barry & Elith 2006; Guisan et al. 2006) have recently noted frequent algorithmic uncertainties and ambiguities in predictive distribution modelling, such as for instance scarce attention to: (j) the choice of environmental predictors, (ii) the problem of autocorrelation, and (iii) the contribution of each predictor to the model accuracy.
Accordingly, in this study besides using the Maxent algorithm, we conceptualize and test whether handling methodological topics that are often overlooked by modellers may improve the prediction of plant distributions by: (i) the use of a constrained random split of sampled data, in order to minimize biases due to spatial autocorrelation; (ii) the use of a stepwise selection of predictor variables for the evaluation of the contribution of each predictor to model accuracy, in order to obtain a less overfitted reduced model containing only meaningful variables; (iii) the comparison among three different sets of environmental predictors (i.e. an initial set with seven predictors, a set employing only topoclimatic variables, and a pruned set resulting from the stepwise predictor selection applied to the initial set); (iv) the choice of the most suitable areas for species augmentation as a result of the degree of concordance among three competing models, each based on a different set of predictors; and (v) the use of divergence maps as a complement to conventional performance comparison assessments.
We applied the proposed approach to presence-only data of Arnica montana L. (Asteraceae), aiming at a further rigorous reinforcement of this threatened species within a Site of Community Importance (hereafter SCI) in the Alps. This approach should result in improvements in the modelling of species distributions.
Results
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
Eighty-five growing sites of A. montana were identified in field surveys (Fig. 1). Minimum, maximum and mean distances among locations were 14·27 m, 4360·07 m and 744·55 m, respectively. The convex hull covered 2566·18 ha; thus, our niche models are interpolative with regard to 43·05% of the study area and extrapolative for the remaining 56·95%.
The stepwise selection of predictors applied to the full model (Fig. 2) revealed that TWI, SOLAR, slope angle and slope aspects had the least predictive power when applied to the test data (25 locations). The overall AUC score of the full model on the test data was 0·864. The predictor variable with the highest AUC value when used in isolation (i.e. one-predictor niche model) was habitat (AUC = 0·848) followed by elevation (AUC = 0·8394) and geomorphology (AUC = 0·811). In addition, SOLAR (AUC = 0·666), TWI (AUC = 0·670), slope aspect (AUC = 0·714) and slope angle (AUC = 0·766) had little predictive power when used in isolation. The environmental variable that decreased AUC score the most when omitted (i.e. leave-one-out niche model) was elevation (AUC = 0·852), followed by habitat (AUC = 0·855) and geomorphology (AUC = 0·860). Omitting slope aspect (AUC = 0·877), TWI (AUC = 0·866) and SOLAR (AUC = 0·870) increased the AUC, while excluding slope angle (AUC = 0·864) had a negligible effect on AUC. Hence, the stepwise selection suggested a pruned model based on the three predictors: elevation, habitat type and geomorphology.
The accuracy of the three niche models measured through the ROC curve (Fig. 3) demonstrated that the topoclimatic model was outperformed by both the full and pruned models. The full model was the most accurate on the training data (AUC = 0·941), while the pruned model was most accurate on the test data (AUC = 0·888). All three models performed significantly better than random prediction (AUC = 0·5).
The three niche models resulted in similar suitability rankings for the pixels constituting the study area. The full and pruned models were highly correlated (Spearman's rho = 0·918) and both had lower correlations with the topoclimatic model (correlation coefficients equal to 0·774 and 0·635, respectively).
The three resulting suitability maps have similar spatial patterns (Fig. 4). The full (Fig. 4a) and pruned (Fig. 4c) models clearly identify three valleys with high suitability scores within the study area (Val Viola, Val Dosdè and Val Verva). The topoclimatic model (Fig. 4b) suggests the same areas, but is spatially less restrictive and smoother. BEST75 (Fig. 4d) and BEST90 (Fig. 4e) covers 29·13 ha and 7·79 ha, respectively. BEST75 encloses two areas that are 2300 m distant, being the first adjacent to the Val Viola lake (zone 1) and the second close to the northern limit of the study area (zone 2). Zone 1 is a 1450-m long strip with a maximum width of about 150 m, while zone 2 is a rectangular area approximately 400 × 250 m. BEST90 restricts the suitable niche to zone 1, while BEST95 (Fig. 4f) identifies a minimal subset of zone 1 with an extent of only 1·15 ha.
The spatially explicit comparison among the niche models (Fig. 5) identifies few areas where the three suitability maps disagree. DIVERGmin highlights a minor valley (i.e. Val Cantone) in the upper left portion of the SCI to which the full model assigns average suitability values, while the pruned model ascribes high scores. DIVERGmax depicts three portions of the study area (Val Cantone and two regions close to zone 1 and zone 2, respectively) where the disagreement among the niche models is entirely due to the difference between the topoclimatic model and the other two niche models.