Do habitat suitability models reliably predict the recovery areas of threatened species?

Authors

  • Carmen Cianfrani,

    Corresponding author
    1. Department of Science and Technology for the Environment, University of Molise, I-86090, Pesche, Italy
    2. Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
      *Correspondence author. E-mail: carmencianfrani@hotmail.it
    Search for more papers by this author
    • Both authors contributed equally.

  • Gwenaëlle Le Lay,

    1. Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
    2. Landscape Modeling Research Group, Federal Research Institute WSL, Zürcherstrasse 111, CH-8903 Birmensdorf, Switzerland
    Search for more papers by this author
    • Both authors contributed equally.

  • Alexandre H. Hirzel,

    1. Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
    2. Informatics’ Center, University of Lausanne, CH-1015 Lausanne, Switzerland
    Search for more papers by this author
  • Anna Loy

    1. Department of Science and Technology for the Environment, University of Molise, I-86090, Pesche, Italy
    Search for more papers by this author

*Correspondence author. E-mail: carmencianfrani@hotmail.it

Summary

1. Identifying those areas suitable for recolonization by threatened species is essential to support efficient conservation policies. Habitat suitability models (HSM) predict species’ potential distributions, but the quality of their predictions should be carefully assessed when the species-environment equilibrium assumption is violated.

2. We studied the Eurasian otter Lutra lutra, whose numbers are recovering in southern Italy. To produce widely applicable results, we chose standard HSM procedures and looked for the models’ capacities in predicting the suitability of a recolonization area. We used two fieldwork datasets: presence-only data, used in the Ecological Niche Factor Analyses (ENFA), and presence-absence data, used in a Generalized Linear Model (GLM). In addition to cross-validation, we independently evaluated the models with data from a recolonization event, providing presences on a previously unoccupied river.

3. Three of the models successfully predicted the suitability of the recolonization area, but the GLM built with data before the recolonization disagreed with these predictions, missing the recolonized river’s suitability and badly describing the otter’s niche. Our results highlighted three points of relevance to modelling practices: (1) absences may prevent the models from correctly identifying areas suitable for a species spread; (2) the selection of variables may lead to randomness in the predictions; and (3) the Area Under Curve (AUC), a commonly used validation index, was not well suited to the evaluation of model quality, whereas the Boyce Index (CBI), based on presence data only, better highlighted the models’ fit to the recolonization observations.

4. For species with unstable spatial distributions, presence-only models may work better than presence-absence methods in making reliable predictions of suitable areas for expansion. An iterative modelling process, using new occurrences from each step of the species spread, may also help in progressively reducing errors.

5.Synthesis and applications. Conservation plans depend on reliable models of the species’ suitable habitats. In non-equilibrium situations, such as the case for threatened or invasive species, models could be affected negatively by the inclusion of absence data when predicting the areas of potential expansion. Presence-only methods will here provide a better basis for productive conservation management practices.

Introduction

Sound wildlife management policies depend critically on our ability to predict the spatial distribution of species, both in their current situation and in the future. Threatened species and communities, invading alien species, species whose habitat will be altered by climate change, or recovering populations of rare species are all examples where the prediction of future species distribution is paramount. In the latter case, identifying the location of habitats suitable for potential colonization is crucial to the production of efficient and long term conservation strategies. In this context, habitat suitability models (HSMs), which produce maps of the distribution of suitable habitats, are fundamental tools as they support the geographic perspective for conservation strategies (Barbosa 2003; Rondinini et al. 2005; Pearce & Boyce 2006).

HSMs have been applied to a wide range of species and issues for several years. Many methods are available but some are more commonly used (Guisan & Zimmermann 2000; Scott et al. 2002; Hirzel & Le Lay 2008), in particular those using species’ presence and absence data. Some methods were developed earlier than others and their relative simplicity probably contributed to their success. Several analyses have compared HSM performance (Brotons et al. 2004; Tsoar et al. 2007; Elith & Graham 2009) and have recommend that the input data and the application goals be considered carefully when choosing HSM methods (e.g. Elith & Graham 2009). HSMs are based on the species-environment equilibrium assumption (Guisan & Zimmermann 2000), which supposes that each species occupies all suitable habitats available. Its validation is essential to model the species’ ecological niche accurately. Conversely, absence data indicate areas unsuitable for the species. Absence data can therefore restrict habitat suitability (HS) predictions (Elith & Graham 2009), making presence-absence methods the most suitable in many situations. However, in the case of populations threatened by human perturbations (hunting, pollution, etc.) or subject to dispersal limitations (e.g. ecological barriers), the species does not occupy all the suitable areas. The significance of observed absences should thus be carefully addressed (Pulliam 2000; Gibson et al. 2007; Hirzel & Le Lay 2008; Lobo et al. 2008) because using them in HSMs could produce unreliable predictions, as demonstrated with a virtual species dataset (Hirzel et al. 2001).

Although the species-environment equilibrium assumption of HSMs has been questioned already (Guisan & Thuiller 2005; Hirzel & Le Lay 2008), surprisingly, its consequences have never been thoroughly tested. Technical aspects could limit such tests. For instance, data reflecting the expanding distribution of a species is quite rare. Surveys have to be conducted over areas larger than the currently known distribution, during several years, and intensively enough to be sure that absences and presences are identified correctly. Modelling methods add some difficulties: the input data, model algorithms, types of results and validation methods are all parameters that make comparisons complicated. Nevertheless, as HSMs provide useful information, managers have to rely on the available methods and data to build them. Identifying the domain of application of the models is of great importance at several stages of conservation plans, and researchers and managers still lack extensive tests (Elith & Graham 2009), especially with data on real cases (Brotons et al. 2004). The availability of pre- and post-colonization data from a threatened recovering species, the Eurasian otter Lutra lutra in Italy, gave us the opportunity to assess the efficiency of modelling methods when identifying suitable areas for the recolonization.

The Eurasian otter is a semi-aquatic carnivore, once widespread in Europe. Its distribution has contracted sharply in the last few decades, but the species has recovered at a local scale in several European countries more recently (Kranz 2000). In Italy the species was listed as critically endangered in the Italian vertebrates red lists (Bulgarini et al. 1998). Before 1950, the species was distributed over the whole country (Cagnolaro et al. 1975), but by 2002 it was confined to a small part of southern Italy (Spagnesi et al. 2002). The causes of its decline include hunting, food shortages (mainly of fish), and the destruction of riparian vegetation (Mason 1989). At present, the otter’s Italian population consists of two, apparently isolated, sub-populations (Fig. 1) that are slowly recovering (Prigioni et al. 2007). The expansion process, the relatively small size of the otter Italian population and the presence of two disjoint sub-populations, all mean effective conservation strategies are of paramount importance. Priorities include (i) protection of areas suitable for otter recolonization and (ii) a better understanding of the otter’s habitat requirements.

Figure 1.

 Sampling design and study area. Points refer to the fieldwork observations, years of the surveys are indicated in brackets. Black lines and names refer to the rivers. The inset (top-right) shows the two Italian distributions (grey areas).

Our goal was to test the ability of HSMs to identify the locations and characteristics of those areas potentially suitable for the recovery of the European otter in Italy. More generally, we expected the results to help in defining guidelines for appropriate use of the HSMs, especially in non-equilibrium situations, such as spreading species. We considered two species distribution datasets: the first was collected before, and the second after, a recolonization event. We assumed that the first situation was not at equilibrium, and that the second had reached a sub-equilibrium state. For HSMs, we used two common methods, one dealing with presence-only data, and the second using presence and absence data for the species. We computed and cross-validated models based on the pre-colonization dataset and then externally validated them with the post-colonization dataset. The modelling methods were deliberately chosen from among the commonly used methodologies, as we wanted our results to be useful to other researchers and conservation practitioners.

Materials and methods

Study area

The study area covers the northern sub-population of the Italian otter range, mostly located in the Molise region (Fig. 1). This area comprises seven river catchments: Sangro, Biferno, Trigno, Fortore, Saccione, Sinarca, and the upper part of the Volturno River. We described the study area using variables related to the ecological requirements of the otter and to potential disturbances (Table 1). Variables were prepared as maps in a geographic information system (ArcGIS 9.3; ESRI, Redlands, USA).

Table 1.   Environmental predictors used in habitat suitability models
CategoryPredictor nameDescription
Resting site availabilityFORESTFrequency of deciduous forest in a 2·5 km radius
SCLEROPHFrequency of sclerophylous vegetation in a 2·5 km radius
HERB-CROPFrequency of dry herbaceous cropland in a 2·5 km radius
ARBOR-CROPFrequency of arboreal cropland in a 2·5 km radius
AGR-HETERFrequency of heterogeneous agricultural areas in a 2·5 km radius
DisturbanceCITIESDistance from cities
MINESDistance from surface excavation
INDUSTRIALDistance from productive areas
Water availabilityFLOW-ACCFlow accumulation
FoodCONVEXITYConvexity (hunting efficiency)
SLOPESlope (hunting capacity)
FISH-BIODFish biodiversity

Otters are attracted by the availability of food, breeding dens and low levels of human disturbance (Kruuk 2006). Their diet consists mostly of fish (Ruiz-Olmo 2001). Preliminary analyses (unpublished data) indicated that the otter’s presence is probably linked to high fish biodiversity. To obtain a continuous map of the fish biodiversity (F) for our study area, we fitted a Generalized Linear Models (GLM) model expressing F, from data of 59 electrofishing sites (Regione Molise 2005), as a function of elevation E and slope S: F ∼ E + S (all coefficients were highly significant with adjusted deviance of 45·8). Slope and convexity are important descriptors influencing the hunting capacity of individuals (Kruuk 2006). These variables were derived from a Digital Elevation Model of 20 m resolution. Convexity was computed in a moving circular window of 1 km diameter. Positive (negative) values indicate a convex (concave) terrain shape. Vegetation cover is related to the availability of potential resting sites. During the day, otters rely on riparian vegetation for resting, reproduction and to care for cubs (Beja 1996). We considered the distances to cities, excavated surfaces and productive sites as the main metrics of disturbance (Prenda 1996). We used a land-use map at scale 1:5000, derived through interpretation of the most recent high-resolution digital orthophotograph, rasterized at 20 m resolution (Loy et al. in press-b). Land-use classes were then transformed into frequency maps computed on a moving circular window of 5 km diameter. This diameter was chosen in relation to the otter’s home range size, which has been estimated to be between 5 and 50 linear km (Kruuk 2006).

Water availability is an important parameter (Beja 1992). We calculated flow accumulation as the number of upslope cells that flow into each cell. To ensure their equal influence in the models despite various data units, all predictors were standardized.

The study area was restricted to a 150 m buffer around rivers, as otters are rarely found beyond 150 m from streams (Philcox et al. 1999; Kruuk 2006). We only considered main courses and the first-order tributaries, as the presence of permanent water is an important factor for the otter (Prenda 2001).

Species data

In each UTM grid cell of 10 × 10 km, we searched for otter droppings (spraints) at four random sites (Loy et al. 2009). A site consisted of a 600 m segment of both banks of the river. The first survey was carried out from 2001 to 2004 (174 sites) and revealed that otter occurrence was restricted to the Biferno and the upper Volturno river basins, with only sporadic records of occurrence in the Fortore. The Trigno, Saccione, Sangro and Sinarca Rivers did not reveal any sign of otter presence (Fig. 1). The second survey, carried out in the Sangro river (24 sites) from 2006 to 2007, revealed that this river had been recently recolonized (Fig. 1).

For our analyses, we distinguished three datasets (Fig. 2). The first dataset (‘01-04’) includes data from surveys carried out in 2001–2004, from which we excluded the absence data from the Sangro River. As shown by the 2006–2007 survey, these absences were not indices of unsuitable habitats. This pre-colonization dataset reflects non-equilibrium conditions. The second dataset (‘01-07’), identified as post-recolonization, includes all survey data and reflects sub-equilibrium conditions. The third dataset (‘rec’) only contains data collected from the newly colonized Sangro River, in 2006–2007. This dataset was used to evaluate how the predictions of the HSMs performed in the recolonized river (Fig. 2).

Figure 2.

 Modelling framework. Two species’ datasets from fieldwork, indicating a pre-recolonization event (dataset 01-04) and a post-recolonization event (dataset 01-07), are used to calibrate and cross-evaluate ENFAs and GLMs, by two indices, the AUC and the Boyce index (CBI). Data of the recolonization event (dataset rec) are used to independently evaluate the pre-recolonization models. Models obtained before and after the recolonization event are compared with Spearman’s rank correlation coefficient (rho) and Cohen’s agreement coefficient (Kappa).

All absences recorded within 5 km of any presence were rejected as inappropriate, as measured along the river network (Janssens 2006) (see Appendix S1 in Supporting Information). These absences may be due to a non-detection of the species (MacKenzie et al. 2003).

Habitat suitability modelling

We selected two standard methods of habitat suitability modelling: the Ecological Niche Factor Analysis (ENFA; Hirzel et al. 2002), using presence-only data, and the Generalized Linear Model (GLM; McCullagh & Nelder 1989), using presence-absence data. Each approach was applied twice: first before the recolonization event, on the dataset ‘01-04’ (ENFA1 and GLM1), and second after the recolonization event, on dataset ‘01-07’ (ENFA2 and GLM2). We evaluated all four models by cross and independent validation (Fig. 2). The models were finally applied to geographic space in order to produce habitat suitability (HS) maps. Following the predicted-to-expected evaluation-point frequency curves (P/E curves; see Appendix S2 Supporting Information), obtained during the validation process, HS predictions were reclassified into three classes: unsuitable, suitable or optimal. Thresholds were based on the mean μ and the standard deviation σ of the P/E curves. The ‘unsuitable’ class corresponds to all HS values whose μ is below 1 and the ‘suitable’ class to μ values ranging from 1 to σ. The ‘optimal’ class corresponds to HS values whose μ is above σ. All the modelling procedures were done in the Biomapper 4.0 software (Hirzel et al. 2008).

Ecological niche factor analysis

We applied ENFA to presence data according to the standard software procedures. The ENFA quantifies the species’ ecological niche by comparing the environmental characteristics of the sites it occupies (‘the species distribution’) with the environmental characteristics of the whole study area (the ‘global distribution’) (Hirzel et al. 2002). To model habitat suitability, we chose a geometric-mean distance algorithm, as recommended by Hirzel & Arlettaz (2003). The environmental predictors are collated into a few uncorrelated ecological niche factors. The first factor explains all species’ marginality, i.e. the difference between the species means and the global mean. Other factors explain specialization, i.e. the niche narrowness relative to the global variance. The inverse of the specialization indicates the tolerance of the species. From each environmental predictor, a score for marginality, specialization and tolerance (the weighted sum of the specialization coefficients) can be calculated. The importance of a predictor is given by the weighted sum of these scores.

Generalized linear model

We used GLM with a binomial variance and a logistic link function to relate the species presence-absence with the environmental variables. GLM were conducted in R (Development Core Team 2006), in conjunction with Biomapper. We used an automatic forward stepwise model-selection procedure, based on the Akaike Information Criteria (Akaike 1973) to select the most parsimonious model. Although this selection criterion has been criticized in some papers (e.g. Johnson 1981; Burnham & Anderson 2002; Whittingham et al. 2006) it is still the standard method used in most studies (e.g. Elith & Graham 2009; Roura-Pascual et al. 2009). Up to second-order polynomials (linear and quadratic terms) were allowed for each predictor. GLMs contain a formula where significant variables appear with their corresponding weighting coefficients. As the environmental predictors were standardized, we used these coefficients as indicators of their importance.

Evaluation of model predictions

Assessing the predictive ability of a model is a crucial step towards its application in conservation ecology (Fielding & Bell 1997; Manel et al. 2001).

We made two types of evaluation: (i) an internal evaluation by means of cross-validation of all four models and (ii) an external evaluation of the ENFA1 and GLM1 by the ‘rec’ dataset (Fig. 2).

For internal evaluation, we made a k-fold cross-validation (Fielding & Bell 1997; Manel et al. 2001). It consists of partitioning the species dataset into k sets, building a model on the base of k – 1 sets and validating it with the remaining data set. The procedure is repeated k times, providing a mean and variance for the validation measure. In our case, we used = 4 (Huberty’s rule; Huberty 1994) (Fig. 3).

Figure 3.

 Mean values and standard deviation of the AUC and the Boyce index (CBI) are calculated for the six models. The names on the X axis refer first to the model type (GLM or ENFA), second to the dataset of calibration (1 for survey 01-04 and 2 for survey 01-07) and third to the dataset of validation (01-04, 01-07 or rec, i.e. dataset from the recolonized area).

For both internal and external validation, we used two measures to compare predictions with field observations: (i) the threshold-independent Receiver Operating Characteristic (ROC) approach (Fielding & Bell 1997) by calculating the area under the ROC curve (AUC). It uses presence-absence data and ranges from 0 for an inverse model to 0·5 for a random model and to 1 for a perfect model. (ii) The continuous Boyce index (CBI) (Boyce et al. 2002; Hirzel et al. 2006), a recently developed index particularly useful in assessing the quality of the HSMs’ predictions for the species’ presence. This index is based on P/E curves and calculates the Spearman correlation between the suitability index and the predicted-to-expected ratio of the frequency of evaluation points, over a moving window (cf. Appendix S2 Supporting Information). The continuous Boyce Index varies from −1 for an inverse model to 0 for a random model to 1 for a perfect model (Boyce et al. 2002; Hirzel et al. 2006).

We calculated correlations between the HS maps produced by the different models to compare our results. We calculated Spearman coefficients on continuous HS values and Cohen’s agreement coefficients (Kappa; Cohen 1960) on the reclassified maps (with three HS classes). These two measures allow model similarity to be evaluated in two ways. One evaluation is independent from the HS reclassification threshold, but potentially influenced by the different meanings of the HS values. The second evaluation is based on reclassified HS values, which improve the comparability between predictions, but depends on the reclassification threshold. Note, however, that this latter factor is minimized here, as we chose the same information (i.e. P/E curves) to reclassify the models. Both measures range from −1 (negative correlation) to +1 (positive correlation).

Complementary assessments

Our main hypothesis was that the predictions of GLM1 and ENFA1 may differ, mainly because GLM1 had information on the species’ absence, whereas ENFA1 not. Nevertheless, we addressed the effects of two complementary factors on the modelling procedures. The first one concerns the type of the species absences. GLM1 used absences recorded in the field. When such data are available, they should be preferred, as demonstrated in previous studies (Engler et al. 2004; Lobo et al. 2008). However, assuming that field data may bring fallacious absences (Hirzel & Le Lay 2008), we produced GLM for the pre-colonization stage with pseudo-absence data. We generated ten sets of randomly selected pseudo-absences, defined as the absences of the ‘01-04’ dataset, i.e. as numerous as in the ‘01-04’ dataset and located on all the rivers except the Sangro River. Using our standard modelling procedure, we made ten GLMs with these pseudo-absences, cross validated them and independently validated them with the ‘rec’ dataset. These ten values provide ranges for AUC and CBI.

The second factor we assessed was the influence of the selection of the environmental variables on the final HS predictions. Using the stepwise procedure, the final GLM results from a selection of a set of variables, whereas the ENFA can keep all the environmental variables in the model. To increase the models’ comparability, we thus cross-used the sets of variables: we produced GLMs with the six best variables of the ENFAs, as ranked by their global scores (Table 2), allowing linear and quadratic terms for each predictor. Inversely, we made ENFAs with the variables selected in the GLMs (see Appendix S3 Supporting Information).

Table 2.   Ranking of environmental predictors by the four habitat-suitability models
Biological interpretationPredictorsENFA1ENFA2GLM1GLM2
Marginality1Sum2Marginality1Sum2Coefficient3Coefficient3
  1. 1+ and − mean that otters are found, on average, in areas with higher (respectively lower) values than the study area mean. Values around 0 means that the otters’ habitats do not differ from the common areas of the study area.

  2. 2Sum of the scores over all ENFA factors.

  3. 3Coefficients in the GLM formula. Predictors were standardized.

  4. Numbers in brackets indicate the predictor’s rank of importance.

FoodFISH-BIOD+(1) 20·60+(2) 11·30
SLOPE(3) 15·39 (5) 7·42(4) 0·91 (4) 0·87
CONVEXITY(5) 13·45 (1) 11·71 (3) 0·89
Resting sitesHERB-CROP(2) 17·76 (4) 7·56 (6) 0·59 (7) 0·08
FOREST+(4) 14·41 +(3) 9·84(1) 1·46(1) 1·32
ARBOR-CROP+(9) 6·11 0(8) 5·27 (2) 1·24
SCLEROPH0(10) 5·96 0(6) 7·23
AGR-ETER+(11) 5·81 +(10) 4·39(6) 0·39
DisturbanceMINES(6) 8·50 (7) 6·29 (2) 1·20
CITIES(7) 6·81 (11) 4·26(3) 1·15(5) 0·61
INDUSTRIAL0(12) 3·93 (9) 4·47 (5) 0·90
WaterFLOW-ACC+(8) 6·14 +(12) 3·58

Results

Ecological niche characteristics

The four models do not rank the importance of the various predictors similarly (Table 2). The two ENFA models agree that the most important environmental predictors are related to food (FISH-BIOD, SLOPE, CONVEXITY) and resting site availability (HERB-CROP, FOREST). By contrast, the two GLMs differ. While GLM1 ranks resting site availability (FOREST) and disturbance (MINES, INDUSTRIAL) first, GLM2 finds resting site availability (FOREST, ARBOR-CROP) and food (CONVEXITY, SLOPE) more important (Table 2). In short, ENFA2 and GLM2 agree, ENFA1 agrees with both ENFA2 and GLM2, and GLM1 differs from the other models. When doing GLM as GLM1 with pseudo-absence data (10 GLMps), the selection and the order of the predictors differ. Resting site availability (HERB-CROP, FOREST) were still the most selected variables, but here food information (FISH-BIOD) ranked just before the disturbance variables (MINES, INDUSTRIAL).

Comparing the niche statistics provided by the two ENFA, models showed that, while the global marginality coefficients were similar (0·59 for ENFA1 and 0·60 for ENFA2), the tolerance (i.e. the inverse of specialization) coefficients differed (0·50 for ENFA1 vs. 0·63 for ENFA2).

Comparison of spatial predictions

The comparison of the HS values of the map showed that ENFA1, ENFA2 and GLM2 were highly correlated (Spearman ρ ≅ 0·7, Kappa ≅ 0·55), while GLM1 was less correlated to the others (0·37 ≤ ρ≤ 0·59, 0·18≤ Kappa≤ 0·35) (Table 3). The HS map predicted by GLM1 is indeed quite different from that predicted by ENFA1, ENFA2 or GLM2, particularly on the Sangro River recolonization area (Fig. 4).

Table 3.   Spearman rank correlation and kappa index, in brackets, calculated between pairs of habitat-suitability maps
 ENFA1ENFA2GLM1GLM2
  1. The Spearman correlations are calculated on continuous HS values of the maps and the Kappa index on the reclassified maps (three HS classes).

ENFA11·00 (1·00)   
ENFA20·73 (0·47)1·00 (1·00)  
GLM10·45 (0·23)0·37 (0·18)1·00 (1·00) 
GLM20·68 (0·63)0·70 (0·44)0·59 (0·35)1·00 (1·00)
Figure 4.

 Habitat suitability maps (ENFA1, GLM1, ENFA2 and GLM2) reclassified by binning the predicted/expected (P/E) curves in three classes of suitability: unsuitable, suitable and optimal. The arrows on the recolonized area indicate the recolonization data (‘rec’).

Model validation

Cross-validation and external validation procedures gave conflicting results for the relative predictive power of the ENFA and GLM models (Fig. 3). The cross-validation results identified that the GLM had a non-significant tendency to provide better results than ENFA models, with both the ‘01-04’ and ‘01-07’ datasets, and the pseudo-absence data. The external validation of the models built with the ‘01-04’ dataset or with pseudo-absence data provided contradictory results: AUC were slightly better for the GLMs than the ENFA (0·81 and 0·799 vs. 0·722), while the CBI values suggested that the GLM predictions in the recolonized area were unable to predict the species’ presence (−0·582 and −0·512), whereas the ENFA achieved a good performance (0·767).

When exchanging the variable sets from ENFA to GLM and respectively (Appendix S3 Supporting Information), our results are generally worse, both for cross and independent validation. For the GLM1 using the ENFA1 variables (GLM1varENFA1), the model returned a similar AUC value (0·78) than the GLM1 for cross-validation, but better values for the cross-validation CBI (0·75 vs 0·651) and the independent-validation AUC (0·889 vs. 0·806). With respect to the CBI index with independent validation, the GLM1varENFA1 predicts the habitat suitability of the recolonized areas poorly although it is a slight improvement over the GLM1 (−0·492 vs. −0·582).

Discussion

Habitat suitability models (HSMs) can be used to assess species’ distribution and to provide guidelines for their management. However, evaluations of the prediction quality are generally driven in simple ways and rely on commonly used methods, without comprehensive assessments. Studies that compare HSMs rarely provide guidelines on the conditions suitable for application, despite the fact that some applications could lead to wrong conclusions, as in the case of species that are not at equilibrium with their environment. Our results highlight some important points by testing the ability of HSMs to predict suitable locations for species’ recolonization,

Success in predicting the recolonization area

The predictions from the presence-only approach applied to the non-equilibrium situation (ENFA1) agreed with both models applied to the sub-equilibrium situation (ENFA2, GLM2), while it disagreed with the presence-absence approach applied to the non-equilibrium situation (GLM1) (Table 3). This difference is particularly evident in the recolonization area (Sangro River), where ENFA1 correctly predicts suitability whereas the GLM1 does not (Fig. 4). When a species is in a non-equilibrium situation, absence data used in HSM can thus lead to bad predictions on suitable future recolonization areas. Four results support this conclusion: (i) when fed with pseudo-absence data, the GLMps give poorer predictions on the colonization area (CBI value, Fig. 3) than the GLM1; (ii) when fed with ENFA1 variables, the GLM still predicts poorly (CBI_rec = −0·492; Appendix S3); (iii) when fed with post-colonization data, i.e. in the sub-equilibrium situation, the GLM makes predictions very similar to those of ENFA, i.e. good predictions; (iv) in the recolonized area, ENFA1 correctly identifies the suitability of the recolonization areas, while GLM1 does not (Fig. 4). ENFA has sometimes been suspected of over-predicting habitat suitability (Zaniewski et al. 2002; Brotons et al. 2004), but here it appears that many of the supposedly ‘over-predicted’ areas were recolonized by the otter later (Fig. 4).

Description of the species’ ecological niche

Regarding ecological niche trends (Table 2), GLM1 is the model most influenced by disturbance variables. By contrast, ENFA1 gives more importance to food and resting-site variables, consistent with the models of the sub-equilibrium condition (GLM2 and ENFA2) (Table 2). The otter’s distribution may then essentially depend on resource factors. Under strong environmental disturbance, the species has a reduced distribution, as in the case of the non-equilibrium situation (‘01-04’ dataset). However, including the disturbance variables in the ecological niche description obviously leads to a misidentification of the potential areas for the species’ spread, as shown by the independent validation of the ENFA1GLM1 and the ENFA2GLM2 (Appendix S3 Supporting Information). The wide habitat tolerance of the otter, and its capacity for colonizing new areas, is also highlighted by the ENFA niche statistics, which show that the overall tolerance was higher in the sub-equilibrium than the non-equilibrium situation.

From our results, the ENFA shows a better generalization power (or transferability; Peterson 2006) than the GLM, i.e. a better ability to provide correct predictions in an area different from the one on which the model has been calibrated (Vaughan 2005). ENFA was able to predict the suitability on the Sangro River, without data from this particular river. This capacity is crucial when making predictions about the distribution of a spreading species. This property of the ENFA models may come from three factors: (i) its reliance on presence data only, i.e. no influence from unreliable absence data; (ii) comparison of the environmental characteristics of the sites occupied by the species to the whole study area and not only comparison of selected sites, i.e. locations of the presence or absence data, as with the GLM model; and (iii) capacity to take many predictors into account, without requiring a selection process, as in the stepwise/AIC procedure. However, a good model can only be produced when fed with the right predictors and accurate species data. Indeed, when presence data corresponds to almost all of the environmental situations the species can use, i.e. covering the whole ecological niche of the species, the model can predict the species distribution outside the calibration area. For the GLM1, some absences contained in the species dataset of the non-equilibrium situation (dataset ‘01-04’) may correspond to environmental situations similar to those found in the Sangro River, i.e. the colonized area, which causes difficulties in the predictions. ENFA1, not influenced by these unreliable absences, predicts a larger ecological niche, which is better matched to the real ecological niche of the species. Similarly, ENFA2 and GLM2, taking into account all the available data (here the ‘01-07’ dataset), provide better predictions. The problem is that it is difficult to know a priori which absences are unreliable. The species distribution is often a snapshot of a dynamic system. It is probable, therefore, that the species will further recolonize more suitable areas, e.g. on the Trigno river (Fig. 4), and that the supposed sub-equilibrium situation would still contain some unreliable absences (Fig. 5).

Figure 5.

 Theoretical representations of the otters’ ecological niche and the global environment of a study area, along one environmental gradient. Two cases are represented: first, a non-equilibrium distribution (dark grey) with pre-recolonization presences and absences data (‘01-04’) and, second, a sub-equilibrium distribution (light grey) with post-colonization presences and absences data (‘01-07’). The species absences are reliable for the models only when they fall outside the species’ niche.

Methodological contributions

We observed that, in the non-equilibrium situation, presence-only data lead models to describe the species’ niche correctly, thus predicting potentially suitable areas for the species spread. In contrast, we found that models using either fieldwork absences or pseudo-absences produced poor predictions for the recolonization area. In non-equilibrium situations, presence-only models are therefore preferable. However, as shown by our results, the strength of the presence-absence models’ failure depends on the stage of the species’ spread under consideration, i.e. how much the ecological niche is already occupied or otherwise. It may also depend on the species characteristics, such as its capacity for adaptation and the spatial and temporal variation of the environment. In order to test this issue further, models should be run in different environmental conditions, for other species, and with other modelling methods (e.g. minimizing the weight of absences, models including autocorrelation, pseudo-absences based on an ENFA as Engler et al. 2004) to thoroughly identify the importance of the influence of absences in HSM and to provide alternative solutions.

The selection of the predictors during the stepwise procedure, in the GLM, may also lead to randomness. Indeed, the ten GLMs built with pseudo-absence data differed slightly in their selection and ordering of the predictors, and also showed high variability in their quality (see AUC and CBI for the recolonization area, Fig. 3). The GLM1ENFA1 returns poor predictions (CBI_rec, Appendix S3 Supporting Information), but slightly better than those of the GLM1, when input with the twelve environmental variables. This result suggests that, even when using the same ‘fallacious’ absence data, the GLM can perform a bit better when it uses the ‘right’ predictors. Nevertheless, the selection of the variables is closely related to the information on the presence-absence and it is thus difficult to improve the models. Although the stepwise/AIC procedure has been criticized in some papers, it is still the standard method used in most studies; it was the preferred method in 22 out of 23 recent papers picked up in the literature, which used GLM. More research is necessary to provide modellers with better methods than stepwise selection of variables.

Finally, in all situations but one, the AUC and CBI evaluation indices are in agreement about the quality of the models. In the case of the GLM1, the evaluations disagree: the AUC (0·81) indicates a good model, while the CBI (−0·58) indicates a poor one. As the maps also revealed that GLM1 was a poor model for the recolonization area, we are led to conclude that the AUC was misleading. More specifically, AUC did not provide an appropriate assessment for our goal: using presence–absence data, it looked for the overall quality of the predictions, i.e. presences and absences, although we wanted to focus on the models’ ability to predict the areas suitability, i.e. presences. Although the AUC is commonly used to estimate the accuracy of species distribution models (Pearce & Ferrier 2000; Manel et al. 2001; Gibson et al. 2007), some recent papers criticized its use as a standard measurement (Hirzel et al. 2006; Austin 2007; Lobo et al. 2008; Elith & Graham 2009). Another weakness of AUC is the equal weights assigned to omission and commission errors. In many applications of distribution modelling, omission and commission errors may not have the same importance (Peterson 2006).

In conclusion, in the case of modelling the distribution of a spreading species, the accuracy of the models should not be assessed by AUC. The CBI is more suited for such assessment goals and, as previously shown (Hirzel et al. 2006), it is more efficient in highlighting errors in modelling patterns.

Implication for conservation

Fitting habitat suitability models to predict recolonization areas of a recovering species is a challenging issue for applied ecology. In the important case of recovering or invasive species, which are not yet in equilibrium with their environment, modellers often are faced with limited datasets, potentially unable to capture the whole ecological niche of the species. In our study case, using ENFA1 to establish conservation management strategies would have rightly led to protection on the Sangro River, thus supporting the species recovery. However, using the GLM1 would not have produced the same result. The problems caused by the use of unreliable absences are real, and, if ignored, will damage conservation efforts.

In the case of a non-equilibrium situation, models should thus concentrate on presence data. The evaluation of the predictions should also be cautious and be done through several methods. In order to progressively improve the quality of the predictions, the modelling process should also be considered as a dynamic process and planned as an iterative framework, i.e. doing a new model at each step of the species’ spread.

Acknowledgements

We thank two anonymous referees and L. Boitani (University ‘La Sapienza’) for helpful suggestions, E. D’Alessandro and M.L. Carranza (University of Molise), for producing the Corine Land Cover map, R. Engler and O. Bröennimann for advices and D. Pio for English revision. This study was possible thanks to a grant from the Italian Ministry of the Environment to C. Ciafrani, one from the MAVA foundation to G. Le Lay, and from the Swiss National Foundation to A. Hirzel (no. 3100A-112511).

Ancillary