- Top of page
- Material and methods
Species distribution models (SDMs) have been widely used in ecology, biogeography, and conservation. Although ecological theory predicts that species occupancy is dynamic, the outputs of SDMs are generally converted into a single occurrence map, and model performance is evaluated in terms of success to predict presences and absences. The aim of this study was to characterize the effects of a gradual response in species occupancy to environmental gradients into the performance of SDMs. First we outline guidelines for the appropriate simulation of artificial species that allows controlling for gradualism and prevalence in the occupancy patterns over an environmental gradient. Second, we derive theoretical expected values for success measures based on presence-absence predictions (AUC, Kappa, sensitivity and specificity). And finally we used artificial species to exemplify and test the effect of a gradual probabilistic occupancy response to environmental gradients on SDM performance. Our results show that when a species responds gradually to an environmental gradient, conventional measures of SDM predictive success based on presence-absence cannot be expected to attain currently accepted performance values considered as good, even for a model that recovers perfectly well the true probability of occurrence. A gradual response imposes a theoretical expected value for these measures of performance that can be calculated from the species properties. However, irrespective of the statistical modeling strategy used and of how gradual the species response is, one can recover the true probability of occurrence as a function of environmental variables provided that species and sample prevalence are similar. Therefore, model performance based on presence-absence should be judged against the theoretical expected value rather than to absolute values currently in use such as AUC > 0.8. Overall, we advocate for a wider use of the probability of occurrence and emphasize the need for further technical developments in this sense.
Species distribution models (SDMs) have been widely used to generate occurrence maps under different circumstances, including estimating distributions for rare species and for species found in under-sampled regions, predicting species potential range shifts under climate change, or estimating potential invasion ranges for exotic species (see reviews in Guisan and Thuiller 2005, Elith and Leathwick 2009, Franklin 2009). Although the output of these models is often a probability of occurrence (i.e. the probability that the species will be present in a site), for many applications of SDMs, if not most, the predicted probability is converted into a prediction of occurrence (presence or absence of the species in a given site) by using a threshold (Liu et al. 2005). The predictive ability of the models is then measured on the basis of the predicted presences and absences rather than on the predicted probability of occurrence (Fielding and Bell 1997).
Therefore, model performance have been largely studied on the basis of how well can the statistical models predict the patterns of presence-absence. Among the factors that have been widely cited as influencing model performance, both species and sample prevalence, defined as the proportion of sites in which a species is present, have been shown to be important (Pearce and Ferrier 2000, Kadmon et al. 2003, Hernandez et al. 2006, Santika 2011), but other factors such as the properties of the specific indices used (Fielding and Bell 1997) or the ratio between extent and area of occupancy have also been shown to be important (Lobo et al. 2008). Species prevalence, also referred to as true prevalence, is the frequency of occurrence of the species overall in the landscape or biogeographic region of interest. It has been shown for example that common species (i.e. species with higher prevalence) are more difficult to model than rare species (Guisan et al. 2006, Hernandez et al. 2006), and it has also been argued that generalists would be more difficult to model than specialists (Seoane et al. 2005, Hernandez et al. 2006). Sample prevalence (i.e. the proportion of sites within the sample used for model fitting and testing) can be (and usually is) different from species prevalence because researchers often target collection sites in which they know the species of interest is likely to be found, or introduce sampling bias due to road accessibility or other logistic reasons (Phillips et al. 2009, Ward et al. 2009, Lobo et al. 2010, Lobo and Tognelli 2011). It has been generally shown that very small or very large sample prevalence can negatively affect model performance. However, the importance of such effects has been linked to the relationship between species and sample prevalence in a way that the bias in model estimates could be corrected if we had information regarding the true species prevalence (Real et al. 2006, Albert and Thuiller 2008) or information regarding the bias in the environmental sampling (Phillips et al. 2009, Ward et al. 2009). A further complication comes from the fact that, in order to validate model outputs, the data is often divided into a training and a testing set, the training set being used during model calibration and the testing set being used in order to calculate performance measures (Fielding and Bell 1997). Depending on how these training and testing sets were designed, the training sample and the testing sample could also show differences in prevalence which will affect SDM performance measures differently (Lobo et al. 2008). All these issues are of course difficult to explore with real-world data, since it would require extensive knowledge regarding species and sample properties, including their relationship to environmental gradients.
Artificial species may provide key insights on how and when do different statistical models differ, since they allow manipulating targeted properties of the species and the sampling scheme in a way that is not feasible in the real world. The true distribution of the artificial species and its relationship to the different environmental gradients are known (Meynard and Quinn 2007, Elith and Graham 2009, Zurell et al. 2010), allowing one to examine the effect of each individual factor. Furthermore, species and sample prevalence can be manipulated, and their relative contributions to model performance can therefore be investigated. The applicability of SDMs to different scenarios, as well as comparisons between different strategies to model and predict actual species distributions, has been widely debated (Randin et al. 2006, Elith and Leathwick 2009, Franklin 2009, Zurell et al. 2009). Empirical studies have given ambiguous results regarding which models are best to predict species distributions, or even whether or not there is any difference in model performance (Elith et al. 2006, Elith and Graham 2009). As Elith and Graham (2009) point out, differences in predictive ability may stem from model properties as well as from species properties, and the use of artificial species may contribute significantly to disentangle the two. The simulation process usually begins by developing a habitat suitability function that associates a set of environmental variables characterizing a virtual or a real landscape with the probability that a species occurs for each grid cell in that landscape. This habitat suitability function needs to be converted into a presence and absence map in order to generate a species distribution that can then be sampled by the virtual ecologist.
Despite the apparent simplicity of the simulation approach, simulation results need to be considered carefully with respect to the assumptions taken at each step. For example, major ecological and biogeographic theories have recognized the importance of dynamic occupancy patterns with important colonization-extinction events happening through time and space (Brown and Lomolino 1999, Hanski 1999, Holyoak et al. 2005). These occupancy patterns are more rigorously characterized by a probability of occurrence than by a simple presence-absence map. In effect, a probability of occurrence can reflect the fact that some marginal habitats will be dominated by the extinction dynamics whereas some others will act as sources and be more stable over time and space (Hanski 1999, MacKenzie et al. 2006). Under these theoretical frameworks stochasticity is part of the ecological system. Yet this dynamic perspective in habitat occupancy has been largely ignored in species distribution modeling as well as in simulation studies testing SDMs. This stems from the fact that, though conceptually straightforward, the final simulation step of converting a probability of occurrence into an ‘observed’ species distribution carries with it some implicit assumptions that are usually left unexplained. More specifically, a common strategy used at this stage has been to select an arbitrary threshold in the suitability function under which the artificial species will always be absent and above which it will always be present. For example, Albert and Thuiller (2008) simulated artificial species on a virtual landscape to which a random error was added, and for which the species showed a linearly increasing suitability function. However, to generate species occurrences from that suitability function on the last simulation step, all grid cells that were above a threshold value were considered as presences, and all those below that threshold were considered as absences. Similar approaches can be found in Hirzel et al. (2001), Real et al. (2006), Jimenez-Valverde and Lobo (2007), Jimenez-Valverde et al. (2009), Santika and Hutchinson (2009), Lobo and Tognelli (2011) among others. This means that even though a functional response was modeled using some non-threshold function between the environmental gradients and habitat suitability, the final step converts this function into a threshold response erasing any stochasticity or gradualism generated during the previous steps.
A few other simulation studies (Meynard and Quinn 2007, Elith and Graham 2009, Santika 2011) have taken a different approach to translate the suitability function into presences and absences. This strategy involves converting the suitability values into probabilities (i.e. rescaling the suitability values to range from 0 to 1) and taking a probabilistic approach to site occupancy. This procedure creates a pattern where a grid cell with a probability of occurrence of 0.4 will be occupied in 4 out of 10 simulations. Using this probabilistic approach has two major consequences. First, the number of occurrences increases as the probability of occurrence increases in a gradual fashion as opposed to a threshold response. And second, contrary to the threshold strategy, each new simulation will generate a different pattern of presences and absences introducing uncertainty when it comes to statistical tests of model outputs based on presence-absence predictions.
Although the aforementioned studies introduced stochasticity and gradualism into the observed patterns of occupancy, they did not explore directly how this affects the performance of SDMs, nor did they provide guidelines on how to control species prevalence as well as the strength of the gradual occupancy response. For example, Elith and Graham (2009) generated just one virtual species which responded to three environmental gradients in different ways. Meynard and Quinn (2007) generated a wider array of potential functional curves and interactions between environmental factors but did not explore the effects of different slopes in the distribution model. Austin et al. (2006) provide guidelines for probabilistic simulation approaches in the context of plant species abundance distribution modeling. However, in many cases involving species distributions researchers have been interested in species occurrences rather than species relative abundance (Guisan and Thuiller 2005, Elith and Leathwick 2009, Franklin 2009), and no such guidelines have been published regarding presence-absence simulation studies. This opens the door for asking: 1) what is the effect of a gradual change in species occupancy rate as a function of habitat suitability on species distribution models?; and 2) could this explain why some species are easier to predict than others regardless of the statistical model used (Elith et al. 2006, Syphard and Franklin 2010)?
Here we directly test the effect of a gradual species occupancy response to environmental gradients on the performance of species distribution models and we set clear guidelines in order to simulate artificial species avoiding previous pitfalls. We derive a conceptual framework for such an analysis and provide guidelines on how to control species prevalence and the strength of the gradual occupancy pattern when simulating presences and absences. Notice that we use the terms occurrence patterns and occupancy patterns interchangeably, although there is a subtle difference that has been reflected in the current literature: occurrence patterns are usually mentioned when referring to specific presence-absence patterns with a static connotation, whereas the more general term occupancy has a stochastic or dynamic component associated to it (MacKenzie et al. 2006).
- Top of page
- Material and methods
Our results show that the species’ gradual response to a given environmental gradient has two major consequences. First, a gradual response will produce a decrease in our ability to predict species occurrences. This decrease is independent of the modeling strategy used and sets an upper limit to all statistics used to measure model success in terms of presence and absence predictions (Supplementary material Appendix 2, Fig. 3–5). Second, the more gradual the species response to environmental gradients is, the more sensitive the potential reproduction of the true probability of occurrence is to sample prevalence, with the optimal predictions occurring when sample and species prevalence are the same. Contrary to presence-absence statistics, even with very gradual responses we can potentially reproduce the true probability of occurrence of the species, though, as α increases, our ability to do so becomes increasingly dependent on our ability to match species and sample prevalence.
Until now, it has been assumed that a good model should be able to predict presences and absences correctly, and the vast majority of papers published using SDMs evaluate their models on these grounds (see reviews in Guisan and Thuiller 2005, Elith and Leathwick 2009, Franklin 2009). Common guidelines regarding good model performance include AUC values >0.8 or Kappa >0.8 (Fielding and Bell 1997, Franklin 2009). Our results clearly show that this can only be realized for species with threshold-like responses to the environment. On the contrary, species with gradual responses to environmental gradients will have a maximum expected value of Kappa, AUC, sensitivity and specificity well below 1 (Fig. 2–6). For example, for a species with prevalence near 50% (β= 0 in Fig. 2, 3) and α= 1, the best possible model will have an AUC somewhere around 0.7 (Fig. 3), a Kappa value around 0.35, and sensitivity and specificity values near 0.7 (Fig. 2); which would mean overall that this is a poor predictive model. In fact, this is the best that we can do in terms of predicting species occurrences. However, the same model that produces a poor prediction in terms of presences and absences may be recovering perfectly well the true probability of occurrence throughout the environment and therefore be a useful tool for conservation purposes. Notice for example that for species with α= 1 the difference between predicted and true probabilities of occurrence is almost null when species prevalence is equal to the sample prevalence (Fig. 6).
It is important to note that these results regarding the presence-absence performance measures, and in particular to AUC, are not an artifact of the area of occupancy, as has been suggested elsewhere (Lobo et al. 2008). In our simulations the total extent for all species is constant so that the area of occupancy is equivalent to species prevalence (i.e. the proportion of sites in which the species is present in the whole landscape). As we have already shown, for a given gradualism level α, species prevalence will have significant effects on presence-absence measures calculated using the KMT thresholds such as sensitivity, specificity and Kappa (Fig. 2) but not when using the MDT threshold (Fig. 2) or the AUC (Fig. 3), and it will have no effect on the estimation of the probability of occurrence (Fig. 6). In other words, it is not species prevalence per se that is affecting model predictive performance but rather the relationship between sample and species prevalence (i.e. is true prevalence well represented in the sample?), the nature of the relationship between the environmental gradients and species responses (i.e. threshold vs gradual response), and the type of threshold used to calculate the statistics (KMT being sensitive to species prevalence but not MDT). Some previous studies of performance measures relied on simulations where the virtual species were generated using a threshold strategy (Jimenez-Valverde and Lobo 2007, Albert and Thuiller 2008, Jimenez-Valverde et al. 2009, Santika and Hutchinson 2009, Lobo and Tognelli 2011). In these cases, in order to manipulate prevalence, the authors actually manipulated sample prevalence by randomly sub-sampling the species occurrences. Therefore, an undesirable side effect of the threshold simulation strategy is that it makes it difficult to independently manipulate species prevalence, sample prevalence and sample size at the same time. A second undesirable effect of such a simulation strategy is that using a logit link to try to estimate model performance, as would be the case most often when using GLM or GAM, becomes inadequate. This is because the slope of the logistic curve for a threshold response is infinite at the inflexion point, and therefore most algorithms would fall into a convergence problem that needs to be considered carefully (Venables and Ripley 2002). In these cases a regression tree approach such as CART or the BRT method used here would become more appropriate, or else one would have to consider applying corrections such as the one proposed by Venables and Ripley (2002, pp. 197–199 and 445–446) in order to estimate parameters in such problematic situations, and systematically check for model convergence after each iteration.
This simulation study, as any other simulation study, is of course an over simplification of reality. After all, we are dealing here with a species that responds to just one perfectly Gaussian environmental gradient. By over-simplifying the complexity of our virtual species and landscape we were able to show unequivocally the influence of a gradual change in occupancy rate over an environmental gradient on species distribution predictive performance. However, there is no reason to believe that making this model more complex would change the main conclusion of this study i.e. that a gradual response reduces our ability to predict the actual occurrences but not the ability to estimate the probability of occurrence. Elith and Graham (2009) for example modeled a slightly more complex species that responded in three different ways to three different environmental gradients. They found a similar result without looking at the causes and effects of these findings: the statistics calculated with the true (simulated) probabilities of occurrence were significantly below 1 (Table 3 in Elith and Graham 2009). Meynard and Quinn (2007) simulated an even more complex set of species types, some with linear, Gaussian or threshold responses, and some with combinations of these responses to three different real environmental gradients. Even though the environmental variables used were not perfectly Gaussian (they were real values taken from the state of California), the same pattern can be found in their results: virtual species that responded to a threshold in at least one of the gradients were easier to predict than non-threshold responses. Although the modeling algorithm GARP stands out as particularly poor with respect to the other statistical tools (Meynard and Quinn 2007, Elith and Graham 2009), the differences in predictive performance among the rest of the models is not remarkably large and may depend more on the species properties than on the statistical model properties (Meynard and Quinn 2007, Elith and Graham 2009). Empirical studies have suggested similar conclusions, showing that species characteristics linked to life history traits may greatly influence our success rates when predicting species occurrences (Elith et al. 2006, Syphard and Franklin 2010). The fact that some species respond more gradually than others to different environmental gradients may therefore be an important factor explaining why some species are just more predictable than others.
Our results raise two major questions related to species distribution modeling. First, if we cannot rely solely on the quality of the presence-absence predictions, how do we measure the quality of predictive statistical models? And second, if we can rely on the predicted probability of occurrence but not on the predicted presence-absence patterns, how do we adapt our current uses of species distribution models to tools that directly incorporate model outputs rather than the species presence-absence distributions? Although we do not currently have complete solutions to these issues, several potential technical developments could be suggested.
First, our results show that relying on presence-absence is more uncertain than relying on species probabilities of occurrence if the model is good. We therefore believe that new evaluation methodologies should be developed based on the probabilities of occurrence rather than on the presence-absence estimate. A possibility would be using training and testing datasets, just as is current practice today, but comparing the output probabilities of occurrence on both datasets. In this case, the realized probability of occurrence on the training and testing set should match. This test is not sufficient though, as more sophisticated tools should be developed to explore situations where the wrong environmental variables are incorporated into the modeling strategies and situations where the sample prevalence is biased in both testing and training sets (Phillips et al. 2009, Lobo et al. 2010, Lobo and Tognelli 2011). The issue of sample prevalence has been repeatedly raised in the past (Jimenez-Valverde and Lobo 2007, Albert and Thuiller 2008, Jimenez-Valverde et al. 2009). Our simulations suggest that forcing the sample prevalence to 50% as has been suggested before (Fielding and Bell 1997, Reese et al. 2005) is not a good strategy unless the species prevalence is itself near 50%. The best strategy is to have a sample prevalence that matches the species prevalence (Albert and Thuiller 2008), which can only be achieved with a good systematic sampling of the environmental space and extensive field surveys. An alternative would be to correct the estimated probabilities of occurrence based on the environmental sampling bias if it can be estimated from GIS or remote sensing data (Pakkala et al. 2002, Phillips et al. 2009, Ward et al. 2009).
Regarding the second issue, some of the applications of species distribution modeling are already adapted to consider directly a habitat suitability index or a probability of occurrence of a species rather than the species presence-absence map. For example, it has been current practice to use maps of species distributions in order to set conservation goals within a systematic conservation planning strategy (Moilanen et al. 2009). In most cases one could consider weighting the importance of each site according to the probability of occurrence rather than to an arbitrary presence-absence threshold, though further software development may be necessary to obtain biologically meaningful optimizations using this strategy (Wilson et al. 2005).
Obviously some biological data may behave in a more threshold-fashion than others so that the importance of a gradual response may be highly scale-dependent. For example, using fine-scale yearly occurrence data will probably result in a more stochastic and dynamic occupancy pattern than if pooling data over several kilometers and decades of survey. Not surprisingly, SDMs have been shown to have higher performance (higher AUC, sensitivity, specificity) when used at coarse spatial scales rather than at fine spatial scales (Meyer 2007, Storch et al. 2008) . Therefore our conclusions have to be evaluated in the light of the available data being considered for modeling: using presence-absence performance measures at fine scales may be more misleading than using them at coarser resolutions. However, there is always a grey area in between fine and coarse scales and it would be preferable to have SDM performance measures that are robust to such changes in temporal and spatial resolution. Moreover, if real world data were often threshold-like, a simple inspection (e.g. a boxplot of presences and absences with respect to different environmental gradients) would easily reveal such relationships so that no statistical modeling would be necessary. The expanding literature regarding how sophisticated SDMs techniques are developing (Elith et al. 2006, Elith and Graham 2009) may therefore attest to the rarity of such simple species responses to environmental gradients.
In conclusion, our results show that gradual occupancy changes along environmental gradients decrease our ability to estimate species presence and absence patterns in a predictable way. However, given a good sampling strategy, we can still reproduce the general pattern in the change of occupancy probability as a function of environmental gradients. This can in part explain why some species are just harder to predict than others, and why comparisons between different statistical modeling techniques have been so inconclusive except for the very worse models (Elith et al. 2006, Elith and Graham 2009). We strongly advocate a revision of the current methods to evaluate model performance based on predictions of occurrence, and suggest that probabilities of occurrence should be used more often. However, theoretical and applied studies are needed to develop performance measures incorporating probabilities of occurrence that are adapted to real world modeling and management.