Key details about the ROC plot
The ROC space is defined by the Se and (1 −Sp) axes; the upper-left corner of this space, i.e. the point (0, 1), satisfies the equation
The ROC curve for a model with perfect discriminatory capacity would go from point (0,0) to point (0,1), and from this to point (1,1) (Fig. 1). In theory, the AUC does not depend on a particular portion of the curve (Zweig & Campbell, 1993). However, in practice, the conjunction of three features of the ROC graph leads me to suggest that the AUC value strongly depends on certain points of the curve: (1) the high discriminatory capacity corresponds to curves that are closer to point (0,1); (2) the ROC curve is – by definition – anchored to points (0,0) and (1,1); (3) the curve monotonically increases and is by nature convex. Thus, the ‘point of inflexion’ of the smoothed ROC graph (j) may be a determinant point in the curve. This point corresponds with that in which the slope of the tangent to the ROC curve equals 1 (Fig. 1) or with that in which the Youden index is maximized (Hilden, 1991), i.e.
From equation 2 it is straightforward that this point also maximizes (Se+Sp), i.e. it minimizes the error or misclassification rate (Kaivanto, 2008). It is also the point that maximizes the vertical distance between the ROC curve and the diagonal of no discrimination (Fig. 1; Perkins & Schisterman, 2005).
Some authors have suggested identifying the upper-left point of the ROC curve by applying the Pythagorean theorem (e.g. Freeman & Moisen, 2008), satisfying
However, note that although the length of the segment from point (0,1) to the point of the ROC curve that satisfies equation 3 is geometrically intuitive, it has no probabilistic basis, and therefore no practical interpretation (J. Hilden, pers. comm.). Actually, the point identified as such is not the one with the minimum overall error rate and does not have to coincide with j (Perkins & Schisterman, 2006). The origin of this misconception comes from the usual statement in the ROC literature that the nearer the ROC curve is to the upper-left corner of the ROC space, the higher the discriminatory capacity of the model. Proximity to the point (0,1) is meant as an imprecise definition here; it has never been intended to be naively interpreted as Euclidean distance (J. Hilden pers. comm.). Zweig & Campbell (1993, p. 565) wrote ‘Qualitatively, the closer the plot is to the upper left corner, the higher the overall accuracy of the test’. The adjective ‘qualitative(ly)’, which is repeated twice more in the same context throughout the paper (see p. 566), calls attention to the vagueness of the statement.
Another important point is the one in which the ROC curve crosses the perpendicular to the line of no discrimination (h; Fig. 1), satisfying
Its centred location in the ROC curve makes this point intuitively important in the determination of the AUC. Since the points (0,0) and (1,1) are fixed, and because the ROC curve is usually convex, variations in the location of h in the ROC space may have the most dramatic changes in the AUC.
Let T be the threshold for a certain point in the curve. Note that if the costs of omission and commission errors are the same, the two thresholds, Tj and Th, are appealing because: (1) given the trade-off between Se and Sp (an increase in Se implies a decrease in Sp, and vice versa; Shapiro, 1999), Th balances the rate of correctly predicted presences and correctly predicted absences, and (2) Tj maximizes the rate of correct classifications. Additionally, if the prevalence is 0.5, Th balances the costs of making commission and omission errors and Tj minimizes the overall misclassification cost. These are desirable properties in a good performance classifier (Fielding & Bell, 1997; Kaivanto, 2008).
Here, I used a virtual species to study the relationship between the AUC value and the Se and Sp obtained using Tj and Th, to test the hypothesis that the AUC is closely related to certain points in the ROC curve. Furthermore, for comparison and just because it is widely used in SDM (Freeman & Moisen, 2008; but see the Discussion), I also explored the relationship with the threshold that maximizes the kappa statistic (κ; Cohen, 1960). To corroborate the consistency of the main result in a real-world situation, I used the results of the SDMs of 48 arthropod species on Terceira Island published in Jiménez-Valverde et al. (2009b).
ROC analysis with background data
A common situation in SDMs is a complete lack of absence data (Lobo et al., 2010). As a result, in recent years, despite its weaknesses (see Warton & Shepherd, 2010), an approach taken from the resource selection literature (Manly et al., 2002) has gained popularity with SDM researchers. Background data are randomly selected from the area of study, and the idea behind this approach is to find a function which discriminates between the instances of presence and background locations. When no instances of absence are available, and (1 −Sp) is replaced by the number of background data predicted as present in the ROC plot, Wiley et al. (2003) and Phillips et al. (2006) argued that the maximum AUC value (AUCmax) depends on the actual (unknown) area of distribution of the species. As a result, AUCmax no longer has to be equal to 1, and is inversely related to the area of distribution. In other words, the bigger the area, the lower AUCmax will be. In practice, since the area of distribution of the species is unknown (this is why the models are needed!), AUCmax is also unknown. This just recognizes that among the background data there are presences, and that the main interest is in identifying those background data that could be presences based on the predictors. This is a clear acknowledgement that background data are – in no way – a ‘gold standard’ (a group of cases, independent from the training set, in which the state of the dependent variable – presence and absence of the species – is known without error). Notice that, in the same way that the AUCmax no longer has to be 1, the AUC corresponding to a no-better-than-chance prediction no longer has to be 0.5. Every point along the line of no discrimination in the ROC plot (Fig. 1) satisfies
The explanation for equation 5 is that if a model predicts no better than chance, one expects the same proportion of correctly and wrongly predicted presences and absences (Fawcett, 2006). However, when one explicitly acknowledges that an unknown percentage of background data are in fact instances of presence, this reasoning no longer makes any sense. For this same reason, the comparison of models between species is flawed as there is no point in considering species with the highest AUC values to be better predicted than those species with the lowest ones; this will entirely depend on the unknown area of distribution of each species. That the boundaries of the AUC are no longer fixed at 0.5 and 1, and that higher AUC values are no longer equivalent to better predictions, implies that AUC theory is clearly violated, hampering the evaluation and comparison of models.
Evaluating potential or realized distribution models
Different situations require different objectives; the distinction between potential and realized distribution is essential. Instances of presence and instances of absence, both coming from a standardized and unbiased sampling, are necessary if the goal is to estimate the realized distribution (Jiménez-Valverde et al., 2008; Soberón & Nakamura, 2009; Ward et al., 2009). Most importantly, unbiased instances of absence are a must if one wants to evaluate how accurately a model estimates a species' realized distribution (Jiménez-Valverde et al., 2008). However, in the case of the potential distribution, the evaluation of the models is not so straightforward as there is no ‘gold standard’ available for validation. The situation is analogous to the use of background data to evaluate the models; since the potential distribution of the species is unknown, there is no reason to penalize the models for the number of absence or background data – or the extent of the area – predicted as presence. Here, I will use a virtual species to show the implications of using the AUC to evaluate estimations of potential distributions.