The AUC, i.e. the area under the receiver operating characteristic (ROC) plot, is a measurement of the discriminatory capacity of classification models. After being developed for radar signal detection, ROC graphs were adopted in medical research (Pepe, 2000), and in the last decade the summary AUC index has been accepted as the standard measure for assessing the accuracy of species distribution models (SDMs; Fielding & Bell, 1997; Lobo et al., 2008). Taking into account sensitivity (Se), the proportion of instances of presence correctly predicted as presence, and specificity (Sp), the proportion of instances of absence correctly predicted as absence, the ROC curve plots Se versus (1 −Sp) (the commission error, i.e. the proportion of instances of absence wrongly predicted as presence) across all possible thresholds between 0 and 1. A model will be considered to discriminate better than chance if the curve lies above the diagonal of no discrimination, i.e. if the AUC is higher than 0.5 (Fig. 1; see Krzanowski & Hand, 2009, for complete details of the ROC methodology). This statistical summary of the ROC graph has been widely adopted by SDM researchers as a standard measure of overall discriminatory capacity because it is independent of the threshold. In other words, it avoids the potential arbitrariness associated with the selection of the threshold needed to build 2 × 2 contingency matrices for calculating Se, Sp and the commission and omission error (i.e. the proportion of instances of presence wrongly predicted as absence) rates (Lobo et al., 2008).
Most of the time, absence data (i.e. environmental data describing locations where absence of the species is almost certainly known) are not available and presence data (i.e. environmental data about a sample of locations that are known instances of presence) are the only information that modellers have about the species. In recent years, SDM researchers have devoted much of their effort to developing techniques to deal with this situation. For example, envelope and distance-based methods only require presence data (e.g. Busby, 1991; Farber & Kadmon, 2003). Other methods such as Maxent (Phillips et al., 2006) or GARP (Stockwell & Peters, 1999) use presence and background data (environmental data about a random sample of locations with no information about the absence or presence of the species) to train the algorithms. In the same way, background data have also been used with classical presence–absence techniques such as generalized linear models (e.g. Elith et al., 2006). To use the AUC without instances of absence, the ROC plot has to be modified so that instead of plotting Se against (1 −Sp), it is plotted against the proportion of the background locations predicted as presences (or the proportionate area predicted as presence) for all possible thresholds (Phillips et al., 2006; Peterson et al., 2008). Under this new scenario, the models are still ranked according to their AUC, i.e. the higher the better (Phillips et al., 2006).
In this paper, by the general term SDMs I refer to both potential and realized distribution models sensuJiménez-Valverde et al. (2008) and the corresponding ecological niche models (ENMs) and distribution models sensuSoberón (2010). Whereas potential distribution models (or ENMs) estimate the regions where the species could survive and reproduce due to the existence of suitable environmental conditions, realized distribution models estimate the regions where the species actually lives. Numerous applications of SDMs require estimations of the potential distribution, for instance the prediction of species invasions, the assessment of the impact of climate change on species distributions, the discovery of new species or the understanding of species evolutionary and biogeographic history (see Peterson, 2006). Estimates of realized distribution are of interest in many conservation planning studies (Peterson, 2006). Thus, the distinction is far from trivial and different strategies, in terms of data and modelling techniques, are required for approaching one concept or another (Jiménez-Valverde et al., 2008; Soberón & Nakamura, 2009). In the same way, the strategies used to evaluate the models need to be different because the weight of commission errors is definitively lower in the case of the potential distribution than in the case of the realized distribution (Lobo et al., 2008; Peterson et al., 2008).
Recently, the AUC has been severely criticized in the SDM field (Lobo et al., 2008; Peterson et al., 2008). The need to evaluate the models and the current huge reliance of SDM researchers on the AUC statistic make it urgent that this matter is addressed. In this paper, I analyse the question of the supposed advantage of the AUC over threshold-dependent measures. I also address the implications of using the AUC depending on whether the goal of the research is to estimate the potential or the realized distribution, which is closely related to the situation in which no instances of absence are available and background data are used to evaluate the models.