SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Material and methods
  4. Results
  5. Discussion
  6. Acknowledgements
  7. References

Species distribution models (SDMs) have been widely used in ecology, biogeography, and conservation. Although ecological theory predicts that species occupancy is dynamic, the outputs of SDMs are generally converted into a single occurrence map, and model performance is evaluated in terms of success to predict presences and absences. The aim of this study was to characterize the effects of a gradual response in species occupancy to environmental gradients into the performance of SDMs. First we outline guidelines for the appropriate simulation of artificial species that allows controlling for gradualism and prevalence in the occupancy patterns over an environmental gradient. Second, we derive theoretical expected values for success measures based on presence-absence predictions (AUC, Kappa, sensitivity and specificity). And finally we used artificial species to exemplify and test the effect of a gradual probabilistic occupancy response to environmental gradients on SDM performance. Our results show that when a species responds gradually to an environmental gradient, conventional measures of SDM predictive success based on presence-absence cannot be expected to attain currently accepted performance values considered as good, even for a model that recovers perfectly well the true probability of occurrence. A gradual response imposes a theoretical expected value for these measures of performance that can be calculated from the species properties. However, irrespective of the statistical modeling strategy used and of how gradual the species response is, one can recover the true probability of occurrence as a function of environmental variables provided that species and sample prevalence are similar. Therefore, model performance based on presence-absence should be judged against the theoretical expected value rather than to absolute values currently in use such as AUC > 0.8. Overall, we advocate for a wider use of the probability of occurrence and emphasize the need for further technical developments in this sense.

Species distribution models (SDMs) have been widely used to generate occurrence maps under different circumstances, including estimating distributions for rare species and for species found in under-sampled regions, predicting species potential range shifts under climate change, or estimating potential invasion ranges for exotic species (see reviews in Guisan and Thuiller 2005, Elith and Leathwick 2009, Franklin 2009). Although the output of these models is often a probability of occurrence (i.e. the probability that the species will be present in a site), for many applications of SDMs, if not most, the predicted probability is converted into a prediction of occurrence (presence or absence of the species in a given site) by using a threshold (Liu et al. 2005). The predictive ability of the models is then measured on the basis of the predicted presences and absences rather than on the predicted probability of occurrence (Fielding and Bell 1997).

Therefore, model performance have been largely studied on the basis of how well can the statistical models predict the patterns of presence-absence. Among the factors that have been widely cited as influencing model performance, both species and sample prevalence, defined as the proportion of sites in which a species is present, have been shown to be important (Pearce and Ferrier 2000, Kadmon et al. 2003, Hernandez et al. 2006, Santika 2011), but other factors such as the properties of the specific indices used (Fielding and Bell 1997) or the ratio between extent and area of occupancy have also been shown to be important (Lobo et al. 2008). Species prevalence, also referred to as true prevalence, is the frequency of occurrence of the species overall in the landscape or biogeographic region of interest. It has been shown for example that common species (i.e. species with higher prevalence) are more difficult to model than rare species (Guisan et al. 2006, Hernandez et al. 2006), and it has also been argued that generalists would be more difficult to model than specialists (Seoane et al. 2005, Hernandez et al. 2006). Sample prevalence (i.e. the proportion of sites within the sample used for model fitting and testing) can be (and usually is) different from species prevalence because researchers often target collection sites in which they know the species of interest is likely to be found, or introduce sampling bias due to road accessibility or other logistic reasons (Phillips et al. 2009, Ward et al. 2009, Lobo et al. 2010, Lobo and Tognelli 2011). It has been generally shown that very small or very large sample prevalence can negatively affect model performance. However, the importance of such effects has been linked to the relationship between species and sample prevalence in a way that the bias in model estimates could be corrected if we had information regarding the true species prevalence (Real et al. 2006, Albert and Thuiller 2008) or information regarding the bias in the environmental sampling (Phillips et al. 2009, Ward et al. 2009). A further complication comes from the fact that, in order to validate model outputs, the data is often divided into a training and a testing set, the training set being used during model calibration and the testing set being used in order to calculate performance measures (Fielding and Bell 1997). Depending on how these training and testing sets were designed, the training sample and the testing sample could also show differences in prevalence which will affect SDM performance measures differently (Lobo et al. 2008). All these issues are of course difficult to explore with real-world data, since it would require extensive knowledge regarding species and sample properties, including their relationship to environmental gradients.

Artificial species may provide key insights on how and when do different statistical models differ, since they allow manipulating targeted properties of the species and the sampling scheme in a way that is not feasible in the real world. The true distribution of the artificial species and its relationship to the different environmental gradients are known (Meynard and Quinn 2007, Elith and Graham 2009, Zurell et al. 2010), allowing one to examine the effect of each individual factor. Furthermore, species and sample prevalence can be manipulated, and their relative contributions to model performance can therefore be investigated. The applicability of SDMs to different scenarios, as well as comparisons between different strategies to model and predict actual species distributions, has been widely debated (Randin et al. 2006, Elith and Leathwick 2009, Franklin 2009, Zurell et al. 2009). Empirical studies have given ambiguous results regarding which models are best to predict species distributions, or even whether or not there is any difference in model performance (Elith et al. 2006, Elith and Graham 2009). As Elith and Graham (2009) point out, differences in predictive ability may stem from model properties as well as from species properties, and the use of artificial species may contribute significantly to disentangle the two. The simulation process usually begins by developing a habitat suitability function that associates a set of environmental variables characterizing a virtual or a real landscape with the probability that a species occurs for each grid cell in that landscape. This habitat suitability function needs to be converted into a presence and absence map in order to generate a species distribution that can then be sampled by the virtual ecologist.

Despite the apparent simplicity of the simulation approach, simulation results need to be considered carefully with respect to the assumptions taken at each step. For example, major ecological and biogeographic theories have recognized the importance of dynamic occupancy patterns with important colonization-extinction events happening through time and space (Brown and Lomolino 1999, Hanski 1999, Holyoak et al. 2005). These occupancy patterns are more rigorously characterized by a probability of occurrence than by a simple presence-absence map. In effect, a probability of occurrence can reflect the fact that some marginal habitats will be dominated by the extinction dynamics whereas some others will act as sources and be more stable over time and space (Hanski 1999, MacKenzie et al. 2006). Under these theoretical frameworks stochasticity is part of the ecological system. Yet this dynamic perspective in habitat occupancy has been largely ignored in species distribution modeling as well as in simulation studies testing SDMs. This stems from the fact that, though conceptually straightforward, the final simulation step of converting a probability of occurrence into an ‘observed’ species distribution carries with it some implicit assumptions that are usually left unexplained. More specifically, a common strategy used at this stage has been to select an arbitrary threshold in the suitability function under which the artificial species will always be absent and above which it will always be present. For example, Albert and Thuiller (2008) simulated artificial species on a virtual landscape to which a random error was added, and for which the species showed a linearly increasing suitability function. However, to generate species occurrences from that suitability function on the last simulation step, all grid cells that were above a threshold value were considered as presences, and all those below that threshold were considered as absences. Similar approaches can be found in Hirzel et al. (2001), Real et al. (2006), Jimenez-Valverde and Lobo (2007), Jimenez-Valverde et al. (2009), Santika and Hutchinson (2009), Lobo and Tognelli (2011) among others. This means that even though a functional response was modeled using some non-threshold function between the environmental gradients and habitat suitability, the final step converts this function into a threshold response erasing any stochasticity or gradualism generated during the previous steps.

A few other simulation studies (Meynard and Quinn 2007, Elith and Graham 2009, Santika 2011) have taken a different approach to translate the suitability function into presences and absences. This strategy involves converting the suitability values into probabilities (i.e. rescaling the suitability values to range from 0 to 1) and taking a probabilistic approach to site occupancy. This procedure creates a pattern where a grid cell with a probability of occurrence of 0.4 will be occupied in 4 out of 10 simulations. Using this probabilistic approach has two major consequences. First, the number of occurrences increases as the probability of occurrence increases in a gradual fashion as opposed to a threshold response. And second, contrary to the threshold strategy, each new simulation will generate a different pattern of presences and absences introducing uncertainty when it comes to statistical tests of model outputs based on presence-absence predictions.

Although the aforementioned studies introduced stochasticity and gradualism into the observed patterns of occupancy, they did not explore directly how this affects the performance of SDMs, nor did they provide guidelines on how to control species prevalence as well as the strength of the gradual occupancy response. For example, Elith and Graham (2009) generated just one virtual species which responded to three environmental gradients in different ways. Meynard and Quinn (2007) generated a wider array of potential functional curves and interactions between environmental factors but did not explore the effects of different slopes in the distribution model. Austin et al. (2006) provide guidelines for probabilistic simulation approaches in the context of plant species abundance distribution modeling. However, in many cases involving species distributions researchers have been interested in species occurrences rather than species relative abundance (Guisan and Thuiller 2005, Elith and Leathwick 2009, Franklin 2009), and no such guidelines have been published regarding presence-absence simulation studies. This opens the door for asking: 1) what is the effect of a gradual change in species occupancy rate as a function of habitat suitability on species distribution models?; and 2) could this explain why some species are easier to predict than others regardless of the statistical model used (Elith et al. 2006, Syphard and Franklin 2010)?

Here we directly test the effect of a gradual species occupancy response to environmental gradients on the performance of species distribution models and we set clear guidelines in order to simulate artificial species avoiding previous pitfalls. We derive a conceptual framework for such an analysis and provide guidelines on how to control species prevalence and the strength of the gradual occupancy pattern when simulating presences and absences. Notice that we use the terms occurrence patterns and occupancy patterns interchangeably, although there is a subtle difference that has been reflected in the current literature: occurrence patterns are usually mentioned when referring to specific presence-absence patterns with a static connotation, whereas the more general term occupancy has a stochastic or dynamic component associated to it (MacKenzie et al. 2006).

Material and methods

  1. Top of page
  2. Abstract
  3. Material and methods
  4. Results
  5. Discussion
  6. Acknowledgements
  7. References

As we are interested in testing the effects of gradual responses and stochasticity on SDMs, our simulation strategy needs to comply with the following conditions: 1) the pattern of presence-absence should be generated using a probabilistic approach rather than a threshold value; 2) how gradually the species responds to a given environmental gradient must be simple to control and vary; and 3) species prevalence, a factor that has been repeatedly shown to have a large influence in model performance, must be controlled for (Pearce and Ferrier 2000, Kadmon et al. 2003, Hernandez et al. 2006, Santika 2011).

We start by explaining how we modeled artificial species so as to comply with these conditions. Then we expand on these ideas in more practical terms through simulations.

Conceptual foundations and theoretical prediction success

To control for the species’ gradual response to the environmental gradient we used a logit function that allow us to link the environmental gradient with a probability of occurrence (Fig. 1):

image

Figure 1. A schematic explanation of the logistic curve with the parameter names used here. 1/α is the slope of the logistic curve, and β is the inflexion point in the environmental gradient where the probability of occurrence is 0.5. The slope of the logistic curve represents how fast will the species occupancy change in response to changes in the environmental gradient near the inflexion point. β represents the position of the inflexion point with respect to the environmental gradient.

Download figure to PowerPoint

  • image

where pi is the probability of occurrence of the virtual species at location i, and Yi is a function of the environmental gradient (or more generally a combination of environmental gradients) at location i (Supplementary material Appendix 1).

Here we express Yi in a way that will allow us to easily relate the slope and inflexion point of the logistic curve with the scale of the environmental gradient as well as biological properties of the species:

  • image

where xi is an environmental gradient or a suitability function (i.e. a function of environmental gradients on an arbitrary scale), β is the point of inflection of the logistic curve (i.e. the value of the environmental gradient at which the probability of occurrence is 50%), and 1/α is the slope of the logistic curve at the inflection point, and therefore controls the species’ gradual response to the environmental gradient (Fig. 1). Small values of α will simulate threshold-types of responses, whereas large values of α will simulate linear responses (Supplementary material Appendix 1). Intermediate values of α will generate the classic logistic curve having an S shape. In order to control for species prevalence, the parameter space of α and β must be explored (see Supplementary material Appendix 1 for details). As a general rule of thumb we can say that, for a fixed value of α, increasing β will have the effect of decreasing species prevalence. Note that with this notation α and β have the same units as the environmental gradient (Supplementary material Appendix 1).

The logit function is useful for the simulation of artificial species because one can directly control how quickly and at what absolute value a species responds to an environmental gradient via the function Yi (Fig. 1). It also provides a continuum of shapes between threshold-type and linear-type of response curves. This flexibility allows treating these two extremes (threshold and linear responses) as particular cases within the more general logistic formulation.

To generate the realized distribution for the virtual species, we compared the simulated probability of occurrence with a random number drawn from a uniform distribution between 0 and 1. If a value less than pi is drawn, then the site is assumed occupied; otherwise, unoccupied. The numbers drawn over multiple virtual habitat realizations will be less than pi with frequency equal to pi (e.g. on average 4 out of 10 sites with a probability of occurrence of 0.4 will be occupied).

Since we adopt a probabilistic approach to generate the simulated presences and absences, the ‘perfect model’ will (theoretically) reproduce perfectly well the true probability of occurrence, but will not necessarily reproduce the exact same distribution pattern i.e. each new iteration will produce a different occurrence pattern despite the fact that the probability of occurrence remains unchanged. For example, if we apply a threshold value of 0.5 in order to predict species presences from the predicted probabilities of occurrence, and assuming that the model is perfect, a site that has a probability of occurrence of 0.4 will be occupied in 4 out of 10 iterations, but will always be predicted as empty. This means that the absence predictions will be correct in 6 out of 10 cases, and will be wrong in 4 out of 10 cases. We expand on this idea in the Supplementary material Appendix 2 to derive analytically the equations that correspond to the expected model performances based on presence-absence predictions (sensitivity, specificity, Kappa and AUC, see below) given the uncertainties introduced in the occupancy patterns. As we show in Supplementary material Appendix 2, these theoretical expected performances depend on α and β. We subsequently represent graphically the dependency between α and β and these theoretical predictive success rates.

Simulations

To exemplify the consequences of gradual responses to environmental gradients on SDMs, we used a normally distributed environmental gradient with mean of 0 and standard deviation of 1 as our virtual landscape where the species was simulated (10 000 grid cells). We used four values of α (0.01, 0.1, 0.5, 1) representing small (threshold-type of response), intermediate (classic logistic) and high (linear-type of response) slope values in the logistic curve (Fig. 1). The values of β were selected so that for each value of α we had three levels of species prevalence (20, 50 and 80%; Supplementary material Appendix 1). For each combination of values of α and β we generated 20 realizations of species presences and absences (i.e. 20 iterations of randomly assigning presences and absences corresponding to the probabilities of occurrence). For each such pattern, several subsamples of 1000 sites were taken so that sample prevalence varied from 5 to 95% in 5% increments (one subsample for each sample prevalence).

Each subsample was used to estimate model parameters and then predict species distributions in the whole landscape. Although we carried out this analysis using both generalized linear models (GLM) and boosted regression trees (BRT), two methods that apply fundamentally different modeling strategies (Elith et al. 2008), both methods showed very similar results. Notice also that applying a logit link when using a GLM to estimate parameters a threshold-like response will require some corrections since the slope of the logistic curve at the inflexion point will be infinite, creating convergence issues in most algorithms (Venables and Ripley 2002). We therefore present only BRT results below. The BRT models were fitted using the function GBM in R ver. 2.12.1 (R Development Core Team) with one linear term (our virtual environmental gradient) as a predictor. A minimum of 1000 trees were fitted at first and a new set of 500 trees were added if the predicted probabilities were not stable. The predictions were considered stable if the mean absolute difference in fitted probabilities between one output and the following one (after the addition of 500 trees) was less than 0.005.

We used two different methods to determine the threshold value to predict presence-absence: the Kappa maximization threshold (KMT) which will maximize the Kappa statistic (see below), and the sensitivity-specificity minimization threshold (MDT), which will minimize the difference between the predictive success rates of presences and absences (Liu et al. 2005, Jimenez-Valverde and Lobo 2007). We also used the maximization of the sum between sensitivity and specificity (MST) which yielded very similar results to MDT and are therefore omitted below.

Due to the large number of species types and simulations generated with this methodology, we present here only selected results that will allow us to understand the effects of α on species distribution modeling while controlling for species and sample prevalence.

Measures of model performance

We calculated sensitivity, specificity, AUC and Kappa in order to measure the predictive ability of each model in terms of presence-absence relative to theoretical estimations of these statistics (Supplementary material Appendix 2). All these indices measure different aspects of model performance based on predictions of presences and absences (Fielding and Bell 1997). Sensitivity measures the success rate at predicting species presences, specificity measures the success rate predicting species absences, and both Kappa and AUC integrate both aspects (Supplementary material Appendix 2). While AUC is threshold independent, Kappa, sensitivity and specificity require a predetermined threshold for defining presences and absences from the predicted probabilities of occurrence. We used KMT and MDT (see section above) to determine such threshold values.

Since we are dealing here with virtual species, we can also compare the simulated (real) probability of occurrence with the predicted probability of occurrence. To do this, we calculated the root-mean-squared difference (RMS) between both probabilities. A good model will show RMS values near 0, meaning that the predictions of species probabilities of occurrence are nearly perfect throughout the probability range.

Results

  1. Top of page
  2. Abstract
  3. Material and methods
  4. Results
  5. Discussion
  6. Acknowledgements
  7. References

Theoretical prediction success

Not all measures of predictive performance behave in the same way. Kappa is very sensitive to the value of α i.e. to the species’ gradual response to the environmental gradient (Fig. 2a, d): small values of α (threshold-like response) will generate high values of Kappa, meaning that both presences and absences are predicted almost perfectly. As α increases, Kappa quickly decreases.

image

Figure 2. Theoretical predictive performance as a function of α and β. The upper row represents the theoretical values when a KMT threshold is used; the lower row represents theoretical values when a MDT threshold is used. The first column represents Cohen's Kappa statistics, which summarizes both presence and absence success rates, while the second column represents sensitivity (presence success rate) and the third column represents specificity (absence success rate). Black dots represent parameter values used in the simulations. For a given level of α, the dots represent, from left to right, a species with prevalence 80, 50 and 20%.

Download figure to PowerPoint

Looking at sensitivity and specificity allows us to tease apart the effects on prediction success of presences (Fig. 2b, e) and absences (Fig. 2c, f) separately. Sensitivity and specificity generally mirror each other: whenever we might expect an increase in sensitivity we may also expect a decrease in specificity, and vice-versa. When KMT is used, sensitivity and specificity are a function of both α and β. For small values of β, increasing α decreases sensitivity by small steps (Fig. 2b), but decreases specificity very quickly (Fig. 2c). The opposite is true for high values of β. In other words, for a species with high prevalence, the prediction of presences remains fairly robust to changes in α whereas the prediction of absences is very sensitive to such changes. On the contrary, for species of low prevalence, the prediction of absences remains robust to changes in α, but the prediction of presences is fairly sensitive to such changes. When MDT is used, changes in sensitivity and specificity depend mainly on α: both sensitivity (Fig. 2e) and specificity (Fig. 2f) decrease gradually as the species response becomes more gradual and less threshold-like.

Finally, the area under the receiver-operator curve (AUC), which is independent of any threshold value, behaves in a similar way to sensitivity and specificity with the MDT threshold, decreasing gradually as α increases (Fig. 3).

image

Figure 3. Theoretical values of the area under the receiver-operator curve (AUC) as a function of α and β. Black dots represent parameter values used in the simulations. For a given level of α, the dots represent, from left to right, species with prevalence 80, 50 and 20% respectively.

Download figure to PowerPoint

Overall these results show that for species with values of α different from 0 the expected maximum value of the different performance statistics will be <1 (Fig. 2, 3). It is only for threshold-type of species (very small values of α) that we may expect the perfect model to show an AUC, Kappa, sensitivity or specificity near 1 (Fig. 2, 3).

Simulations

Simulations confirm these theoretical results, as well as add information regarding the effect of sample prevalence. For a very small α, corresponding to a threshold-type response, Kappa, sensitivity, specificity and AUC are all very high (Fig. 4a, c, e and 5a). On the contrary, for α= 1, corresponding to a gradual occupancy response, the same statistics show much lower values (Fig. 4b, d, f and 5b). In both cases, simulated results agree with theoretical predictions when sample prevalence equals species prevalence. Performance measures incorporating both presence and absence success rate (Kappa in Fig. 4 and AUC in Fig. 5a, b) remain on average below this expected theoretical value for all values of sample prevalence, confirming that these theoretical predictions represent expected maximum performance values <1 with respect to which model performance should be judged. On the other hand, sensitivity or specificity can be above the theoretical value, but not both simultaneously for a single value of sample prevalence. For high values of sample prevalence, sensitivity values are greater than the theoretical prediction, but specificity is lower, and inversely for low values of sample prevalence.

image

Figure 4. Presence-absence predictive performance for simulations of a species with prevalence of 0.5 as a function of sample prevalence. Panels on the left hand side correspond to a value of α= 0.01 (threshold-like response), while those on the right are for α= 1.0. From top to bottom, panels show the Kappa statistic (a, b), sensitivity (c, d) and specificity (e, f). The species distribution was modeled as a linear function of the environment using boosted-regression trees (BRT). Thresholds were selected to minimize the difference between sensitivity and specificity (MDT). Horizontal lines indicate theoretical expected value of each statistic (Supplementary material Appendix 2), and vertical lines indicate the species prevalence.

Download figure to PowerPoint

image

Figure 5. AUC statistic (a, b) and root-mean-squared difference (RMS) between true and predicted probabilities of occurrence (c, d) as a function of sample prevalence for a species with prevalence of 20%. Panels on the left hand side correspond to a value of α= 0.01, while those on the right are for α= 1. The species distribution was modeled as a linear function of environment using boosted-regression trees (BRT). Vertical lines indicate the species prevalence and, horizontal lines in (a, b) indicate theoretical value of AUC statistic.

Download figure to PowerPoint

We then compared the success in the predictions of the actual probabilities of occurrence. For threshold-types of species (small value of α) the simulated probability of occurrence is reproduced very well, except when the sample prevalence is very far from the species prevalence (Fig. 5c). However, when α is larger, the probability of occurrence is only reproduced when sample prevalence is similar to the actual species prevalence (Fig. 5d and Fig. 6). If sample prevalence is biased by >10% in either direction, the difference between the predicted and the real probability of occurrence becomes considerable (Fig. 6) due to model tendency to bias predicted probabilities of occurrence towards presences or absences depending upon which represents a larger fraction of the sample (Supplementary material Appendix 3). Finally, comparing the first and second rows of Fig. 5, we notice that AUC seems insensitive to sample prevalence (Fig. 5a, b), whereas the species probability of occurrence is clearly poorly reproduced for sample prevalence that are far from the species prevalence (Fig. 5c, d). Moreover, the differences between predicted and real probabilities of occurrence when sample and species prevalence agree are of the same order of magnitude for the three levels of species prevalence (Fig. 6). Increasing the value of α has the effect of augmenting how fast the RMS increases as one moves away from the sample prevalence that corresponds to the species prevalence (Fig. 5c vs Fig. 5d). In other words, the larger α is, the more important it is to have a sample prevalence that corresponds to the species prevalence in order to produce a good estimate of the probability of occurrence.

image

Figure 6. Root-mean-squared (RMS) difference between real and predicted probabilities of occurrence as a function of sample prevalence for species with an α= 1 and 3 prevalence levels: (a) 20%, (b) 50%, and (c) 80%.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Material and methods
  4. Results
  5. Discussion
  6. Acknowledgements
  7. References

Our results show that the species’ gradual response to a given environmental gradient has two major consequences. First, a gradual response will produce a decrease in our ability to predict species occurrences. This decrease is independent of the modeling strategy used and sets an upper limit to all statistics used to measure model success in terms of presence and absence predictions (Supplementary material Appendix 2, Fig. 35). Second, the more gradual the species response to environmental gradients is, the more sensitive the potential reproduction of the true probability of occurrence is to sample prevalence, with the optimal predictions occurring when sample and species prevalence are the same. Contrary to presence-absence statistics, even with very gradual responses we can potentially reproduce the true probability of occurrence of the species, though, as α increases, our ability to do so becomes increasingly dependent on our ability to match species and sample prevalence.

Until now, it has been assumed that a good model should be able to predict presences and absences correctly, and the vast majority of papers published using SDMs evaluate their models on these grounds (see reviews in Guisan and Thuiller 2005, Elith and Leathwick 2009, Franklin 2009). Common guidelines regarding good model performance include AUC values >0.8 or Kappa >0.8 (Fielding and Bell 1997, Franklin 2009). Our results clearly show that this can only be realized for species with threshold-like responses to the environment. On the contrary, species with gradual responses to environmental gradients will have a maximum expected value of Kappa, AUC, sensitivity and specificity well below 1 (Fig. 26). For example, for a species with prevalence near 50% (β= 0 in Fig. 2, 3) and α= 1, the best possible model will have an AUC somewhere around 0.7 (Fig. 3), a Kappa value around 0.35, and sensitivity and specificity values near 0.7 (Fig. 2); which would mean overall that this is a poor predictive model. In fact, this is the best that we can do in terms of predicting species occurrences. However, the same model that produces a poor prediction in terms of presences and absences may be recovering perfectly well the true probability of occurrence throughout the environment and therefore be a useful tool for conservation purposes. Notice for example that for species with α= 1 the difference between predicted and true probabilities of occurrence is almost null when species prevalence is equal to the sample prevalence (Fig. 6).

It is important to note that these results regarding the presence-absence performance measures, and in particular to AUC, are not an artifact of the area of occupancy, as has been suggested elsewhere (Lobo et al. 2008). In our simulations the total extent for all species is constant so that the area of occupancy is equivalent to species prevalence (i.e. the proportion of sites in which the species is present in the whole landscape). As we have already shown, for a given gradualism level α, species prevalence will have significant effects on presence-absence measures calculated using the KMT thresholds such as sensitivity, specificity and Kappa (Fig. 2) but not when using the MDT threshold (Fig. 2) or the AUC (Fig. 3), and it will have no effect on the estimation of the probability of occurrence (Fig. 6). In other words, it is not species prevalence per se that is affecting model predictive performance but rather the relationship between sample and species prevalence (i.e. is true prevalence well represented in the sample?), the nature of the relationship between the environmental gradients and species responses (i.e. threshold vs gradual response), and the type of threshold used to calculate the statistics (KMT being sensitive to species prevalence but not MDT). Some previous studies of performance measures relied on simulations where the virtual species were generated using a threshold strategy (Jimenez-Valverde and Lobo 2007, Albert and Thuiller 2008, Jimenez-Valverde et al. 2009, Santika and Hutchinson 2009, Lobo and Tognelli 2011). In these cases, in order to manipulate prevalence, the authors actually manipulated sample prevalence by randomly sub-sampling the species occurrences. Therefore, an undesirable side effect of the threshold simulation strategy is that it makes it difficult to independently manipulate species prevalence, sample prevalence and sample size at the same time. A second undesirable effect of such a simulation strategy is that using a logit link to try to estimate model performance, as would be the case most often when using GLM or GAM, becomes inadequate. This is because the slope of the logistic curve for a threshold response is infinite at the inflexion point, and therefore most algorithms would fall into a convergence problem that needs to be considered carefully (Venables and Ripley 2002). In these cases a regression tree approach such as CART or the BRT method used here would become more appropriate, or else one would have to consider applying corrections such as the one proposed by Venables and Ripley (2002, pp. 197–199 and 445–446) in order to estimate parameters in such problematic situations, and systematically check for model convergence after each iteration.

This simulation study, as any other simulation study, is of course an over simplification of reality. After all, we are dealing here with a species that responds to just one perfectly Gaussian environmental gradient. By over-simplifying the complexity of our virtual species and landscape we were able to show unequivocally the influence of a gradual change in occupancy rate over an environmental gradient on species distribution predictive performance. However, there is no reason to believe that making this model more complex would change the main conclusion of this study i.e. that a gradual response reduces our ability to predict the actual occurrences but not the ability to estimate the probability of occurrence. Elith and Graham (2009) for example modeled a slightly more complex species that responded in three different ways to three different environmental gradients. They found a similar result without looking at the causes and effects of these findings: the statistics calculated with the true (simulated) probabilities of occurrence were significantly below 1 (Table 3 in Elith and Graham 2009). Meynard and Quinn (2007) simulated an even more complex set of species types, some with linear, Gaussian or threshold responses, and some with combinations of these responses to three different real environmental gradients. Even though the environmental variables used were not perfectly Gaussian (they were real values taken from the state of California), the same pattern can be found in their results: virtual species that responded to a threshold in at least one of the gradients were easier to predict than non-threshold responses. Although the modeling algorithm GARP stands out as particularly poor with respect to the other statistical tools (Meynard and Quinn 2007, Elith and Graham 2009), the differences in predictive performance among the rest of the models is not remarkably large and may depend more on the species properties than on the statistical model properties (Meynard and Quinn 2007, Elith and Graham 2009). Empirical studies have suggested similar conclusions, showing that species characteristics linked to life history traits may greatly influence our success rates when predicting species occurrences (Elith et al. 2006, Syphard and Franklin 2010). The fact that some species respond more gradually than others to different environmental gradients may therefore be an important factor explaining why some species are just more predictable than others.

Our results raise two major questions related to species distribution modeling. First, if we cannot rely solely on the quality of the presence-absence predictions, how do we measure the quality of predictive statistical models? And second, if we can rely on the predicted probability of occurrence but not on the predicted presence-absence patterns, how do we adapt our current uses of species distribution models to tools that directly incorporate model outputs rather than the species presence-absence distributions? Although we do not currently have complete solutions to these issues, several potential technical developments could be suggested.

First, our results show that relying on presence-absence is more uncertain than relying on species probabilities of occurrence if the model is good. We therefore believe that new evaluation methodologies should be developed based on the probabilities of occurrence rather than on the presence-absence estimate. A possibility would be using training and testing datasets, just as is current practice today, but comparing the output probabilities of occurrence on both datasets. In this case, the realized probability of occurrence on the training and testing set should match. This test is not sufficient though, as more sophisticated tools should be developed to explore situations where the wrong environmental variables are incorporated into the modeling strategies and situations where the sample prevalence is biased in both testing and training sets (Phillips et al. 2009, Lobo et al. 2010, Lobo and Tognelli 2011). The issue of sample prevalence has been repeatedly raised in the past (Jimenez-Valverde and Lobo 2007, Albert and Thuiller 2008, Jimenez-Valverde et al. 2009). Our simulations suggest that forcing the sample prevalence to 50% as has been suggested before (Fielding and Bell 1997, Reese et al. 2005) is not a good strategy unless the species prevalence is itself near 50%. The best strategy is to have a sample prevalence that matches the species prevalence (Albert and Thuiller 2008), which can only be achieved with a good systematic sampling of the environmental space and extensive field surveys. An alternative would be to correct the estimated probabilities of occurrence based on the environmental sampling bias if it can be estimated from GIS or remote sensing data (Pakkala et al. 2002, Phillips et al. 2009, Ward et al. 2009).

Regarding the second issue, some of the applications of species distribution modeling are already adapted to consider directly a habitat suitability index or a probability of occurrence of a species rather than the species presence-absence map. For example, it has been current practice to use maps of species distributions in order to set conservation goals within a systematic conservation planning strategy (Moilanen et al. 2009). In most cases one could consider weighting the importance of each site according to the probability of occurrence rather than to an arbitrary presence-absence threshold, though further software development may be necessary to obtain biologically meaningful optimizations using this strategy (Wilson et al. 2005).

Obviously some biological data may behave in a more threshold-fashion than others so that the importance of a gradual response may be highly scale-dependent. For example, using fine-scale yearly occurrence data will probably result in a more stochastic and dynamic occupancy pattern than if pooling data over several kilometers and decades of survey. Not surprisingly, SDMs have been shown to have higher performance (higher AUC, sensitivity, specificity) when used at coarse spatial scales rather than at fine spatial scales (Meyer 2007, Storch et al. 2008) . Therefore our conclusions have to be evaluated in the light of the available data being considered for modeling: using presence-absence performance measures at fine scales may be more misleading than using them at coarser resolutions. However, there is always a grey area in between fine and coarse scales and it would be preferable to have SDM performance measures that are robust to such changes in temporal and spatial resolution. Moreover, if real world data were often threshold-like, a simple inspection (e.g. a boxplot of presences and absences with respect to different environmental gradients) would easily reveal such relationships so that no statistical modeling would be necessary. The expanding literature regarding how sophisticated SDMs techniques are developing (Elith et al. 2006, Elith and Graham 2009) may therefore attest to the rarity of such simple species responses to environmental gradients.

In conclusion, our results show that gradual occupancy changes along environmental gradients decrease our ability to estimate species presence and absence patterns in a predictable way. However, given a good sampling strategy, we can still reproduce the general pattern in the change of occupancy probability as a function of environmental gradients. This can in part explain why some species are just harder to predict than others, and why comparisons between different statistical modeling techniques have been so inconclusive except for the very worse models (Elith et al. 2006, Elith and Graham 2009). We strongly advocate a revision of the current methods to evaluate model performance based on predictions of occurrence, and suggest that probabilities of occurrence should be used more often. However, theoretical and applied studies are needed to develop performance measures incorporating probabilities of occurrence that are adapted to real world modeling and management.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Material and methods
  4. Results
  5. Discussion
  6. Acknowledgements
  7. References

We greatly appreciated productive discussions with Wilfried Thuiller and Cécile Albert regarding many issues treated in this paper. This paper was funded by the INRA AAP-SPE project no. 470338 to CNM.

References

  1. Top of page
  2. Abstract
  3. Material and methods
  4. Results
  5. Discussion
  6. Acknowledgements
  7. References

Supplementary material (Appendix E7157 at <www.oikosoffice.lu.se/appendix>). Appendix 1–3.