Alternative methods for predicting species distribution: an illustration with Himalayan river birds


*Correspondence: Dr Steve Ormerod (


1. Current emphasis on species conservation requires the development of specific distribution models. Several modelling methods are available, but their performance has seldom been compared. We therefore used discriminant analysis, logistic regression and artificial neural networks with environmental data to predict the presence or absence of six river birds along 180 Himalayan streams. We applied each method to calibration sites and independent test sites. With logistic regression, we compared performance in predicting presence–absence using map-derived predictors (river slope and altitude) as opposed to detailed data from a standardized river habitat survey (RHS).

2. Using the entire calibration data, overall success at predicting presence or absence was only slightly greater using artificial neural networks (89–100%) than either logistic regression (75–92%) or discriminant analysis (81–95%), and on this criterion all methods gave good performance.

3. When applied to independent test data, overall prediction success averaged 71–80%, with logistic regression marginally but significantly out-performing the other methods. Encouragingly for researchers with limited data, model performance in jack-knife tests faithfully represented performance in more rigorous validations where calibration (n = 119) and test sites (n = 61) were in separate geographical regions.

4. All three methods predicted true absences (83–92% success) better than true presences (31–44%). Results from logistic regression were the most variable across species, but positive prediction declined with increasing species rarity in each method.

5. Applications with logistic regression illustrated that significant habitat predictors varied between data sets within species. Hypotheses about causal effects by habitat structure on distribution were thus difficult to erect or test. Logistic regression also showed that detailed data from the river habitat survey substantially improved positive prediction by comparison with prediction using slope or altitude alone.

6. We conclude that discriminant analysis, logistic regression and artificial neural networks differ only marginally in performance when predicting species distributions. Model choice should therefore depend on the nature of the data, on the needs of any particular analysis, and on whether assumptions for each method are satisfied. All three methods share drawbacks due to systematic effects by species rarity on performance measures. They also share limitations due to the correlative nature of survey data often used for model development at the spatial scales required in macro-ecology and conservation biology. Tests with independent data, using a wider range of performance measures than those used traditionally, will be important in examining models and testing hypotheses for such applications.


Much applied ecology revolves increasingly around the need to manage ecosystems for biodiversity (Heywood 1995; Pearson & Carroll 1998). Recent international conventions and national legislation have placed particular emphasis on individual species, which in turn require management plans, recovery programmes and methods of diagnosing reasons for their decline (Gaston & Blackburn 1995; Heywood 1995; Williams et al. 1997). For all these purposes, quantitative models of distribution are helpful in allowing assessment of major factors influencing distribution, or prediction of where given species should occur (Lawton & Woodroffe 1991). In rivers, the type of ecosystem involved in this study, species-based conservation is still developmental, but distribution models have been investigated at habitat (Peeters & Gardeniers 1998), reach (Mastrorillo et al. 1997), catchment (Richards, Johnson & Host 1996) and global scales (Buckton 1998; Guégan, Lek & Oberdoff 1998).

At the coarse scales typical of conservation biology or macro-ecology, species models are often derived empirically from survey data using correlative techniques, typically regression, ordination or classification (Jongman, ter Braak & van Tongeren 1995). In the case of species presence–absence data, discriminant analysis (DA) and logistic regression (LR) have been widely used (Osborne & Tigar 1992; Austin et al. 1994; Green, Osborne & Sears 1994; Austin & Meyers 1996; Buckton & Ormerod 1997; Buckton et al. 1998) but some authors indicate that these methods are limited in assuming a linear response to environmental predictors (Lek et al. 1996a). Artificial neural networks (ANN) provide an increasingly advocated alternative because they accommodate non-linear influences on species pattern. Although they are potentially important, the applications of ANN in ecology have so far been few (Mastrorillo et al. 1997; Walley & Fontama 1998; Spitz & Lek 1999).

With such a large and evolving range of approaches available for modelling, it is potentially difficult for practising ecologists or conservation biologists to choose appropriate methods. Clearly one of the greatest needs is for studies that compare model performance, but surprisingly few are available (Mastrorillo et al. 1997). Our aim in this paper was to provide such a comparison using Himalayan river birds as organisms relevant to the conservation of river ecosystems. They are conspicuous and easily censused, presenting good model organisms for distribution studies. Their ecology is known less well than their European counterparts, but they make a disproportionately large contribution to river-bird richness globally (Buckton 1998). Their habitat is at risk from environmental change (Ormerod et al. 1994; Jüttner, Rothfritz & Ormerod 1996; Suren & Ormerod 1998), hence improved understanding is important.

Our objectives in this paper were fourfold. First, we wished to compare the accuracy of DA, LR and ANN in predicting the distribution of river birds using environmental data. However, rather than evaluating performance solely on prediction success, the number of cases in which species presence or absence is correctly assessed (Mastrorillo et al. 1997; Buckton & Ormerod 1997), we have used more detailed comparisons of model performance as suggested by Fielding & Bell (1997). This involves examining the ability of a given model to predict separately both presence and absence, and represents an advance over more conventional model evaluation.

Secondly, we wished to test models based on DA, LR and ANN using independent data. Such tests in macro-ecology are normally made by separating calibration and test sites using jack-knifing techniques or data partitioning (Fielding & Bell 1997). However, these approaches risk spatial auto-correlation where test and calibration sites are adjacent. Therefore, in addition to jack-knifing, we tested each model in regions that were geographically isolated from those in which they were calibrated. Currently, emphasis is being placed on the need for methods in conservation biology and macro-ecology that permit the testing of hypotheses and models at appropriate spatial scales (Gaston & Blackburn 1999).

Thirdly, we wished to compare simple and complex environmental predictors of river bird distributions. Some authors have shown previously that simple map-derived variables, such as river slope and altitude, are sufficient to predict species occurrence regionally (Hill 1991; Brewin, Buckton & Ormerod 1998). However, recent methodological development allows the measurement of complex variations in river habitat character using a standard river habitat survey (RHS; Raven et al. 1997). We have assessed the value of RHS, relative to slope and altitude alone, in predicting bird distributions. This part of the work contributes to studies currently evaluating habitat survey techniques in biodiversity assessment (Boon & Raven 1998) or assessing the consequences of habitat degradation for important species (Sutherland 1998).

Finally, we wished to assess whether species prevalence (i.e. frequency of occurrence) might affect model performance. Those species of greatest interest in conservation biology are often rare or threatened, and hence an evaluation of the intrinsic effects of rarity on distribution models is important. Moreover, there is evidence that some of the performance measures used in modelling are sensitive to the effects of prevalence (Fielding & Bell 1997). Model applications across a range of species indicate whether such effects are large in reality.

Through these objectives, we expand our previous work that developed modelling procedures with one species of river bird (Manel, Dias & Ormerod, in press) by modelling the distribution of six species at 180 independent sites in the Himalayan mountains.

Materials and methods

Study area and field sampling

The work encompassed seven geographically distinct regions of the Himalaya across 1000 km between the Kumaon range of northern Uttar Pradesh and Kanchenjunga in eastern Nepal (Fig. 1). Data were collected in winter (October–November 1994–96) from 180 streams of second–fourth order (definition from Strahler 1957), all in independent catchments (n = 19–32 per region). The pattern of visits to each region was randomized as far as logistically possible to avoid spatio-temporal auto-correlation in the resulting data. Streams in each region were sampled opportunistically when encountered by field teams trekking over long distances (< 200 km) and over a large range of altitudes (Table 1).

Figure 1.

The study regions in north-western India and Nepal. 1, Roop Kund (n = 19 sites); 2, Pindari (n = 24 sites); 3, Simikhot (n = 24 sites); 4, Dunai (n = 20 sites); 5, Manaslu (n = 29 sites); 6 Makalu, (n = 32 sites); 7 Kanchenjunga, (n = 32 sites). K = Kathmandu; E = Everest.

Table 1.  The range of physico-chemical attributes recorded during surveys of 180 streams in north-western India and Nepal in 1994–96
Altitude (m)35046952117
Slope (deg)13510.0
Channel width (m)0.4607.0
Water width (m)0.15323.0
Water depth (m)
Bankfull height (m)0.02301.5
NO3 (mg l–1)
PO4 (mg l–1)
Cl (mg l–1)
Na (mg l–1)
K (mg l–1)0.0417.91.0
Ca (mg l–1)0.0254.08.5
Mg (mg l–1)
Si (mg l–1)0.511.53.3
Conductivty μS cm–1941362.4

At each stream, chemical samples were collected for full ionic analysis (Collins & Jenkins 1996), and habitat structure was recorded over a 200-m reach using the UK Environment Agency RHS. This method records over 120 variables describing the stream channel, flow character and riparian character in addition to measurements of altitude and slope, respectively, by altimeters and clinometers (Buckton & Ormerod 1997; Raven et al. 1997). Such a large array of variables is intended to capture the complex structure of rivers that arises from local geomorphology, natural variations in vegetation and river management. The results provide significant and ecologically meaningful correlates with the distribution of river birds (Buckton & Ormerod 1997; Ormerod et al. 1997).

The presence of birds was recorded in the early morning (07.00–11.00) or late afternoon (15.00–18.00) using 8× or 10× binoculars over the same 200-m reaches involved in habitat surveys. Seventeen species were recorded in total, of which the six most widespread were used for this exercise: grey wagtail Motacilla cinerea Tunst. (Motacillidae); brown dipper Cinclus pallasii Temm. (Cinclidae); little forktail Enicurus scouleri Vigors; plumbeous redstart Rhyacornis fuliginosus (Vigors); river chat Chaimorrornis leucocephalus (Vigors); and blue whistling-thrush Myiophonus caeruleus (Scop.) (all Turdidae). With the exception of brown dipper, all are partial altitudinal migrants, so that the surveys will have reflected their winter pattern. The survey reaches were short by comparison with traditional methods of river bird census (500 m–2 km; Buckton et al. 1998). However, pilot studies along 46 Himalayan streams in the Langtang-Trishuli river system showed that, on average, a 200-m reach detected two-thirds of the species and 75% of the individuals recorded in reaches of 400 m (Buckton 1998). Other recent evaluations show that single-survey visits like ours are effective in recording the presence of strongly riverine bird species, like those in our study, due to their near-continual use of the river corridor (Marchant, Langston & Gregory 1996).

All six target species are insectivores, and three (brown dipper, grey wagtail and little forktail) feed largely or exclusively on benthic invertebrates during winter. Thus, as potential indicators of prey quantity, the abundance of benthic macro-invertebrates was assessed contemporaneously with the bird surveys using separate, timed kick-samples in riffle and marginal habitats (i.e. 1-min sample in riffles, 1-min in margins; mesh size 400 μm; Buckton et al. 1998). Samples were preserved on site in 70% ethanol, and later identified to the level of order.

Data analysis

Our overall approach was to derive algorithms that modelled and predicted the distribution of each species from subsets of 32 possible environmental variables using DA, LR and ANN. In addition to altitude, slope and the abundance of each invertebrate order, these predictor variables were derived from the complete habitat and chemical data sets using principal components analysis (PCA) on the correlation matrix. For RHS, we separated sets of variables describing flow character (FlowPC1–5), channel structure (ChanPC1–5) and riparian character (RiparPC1–5). Predictors were thus orthogonal within these sets, and colinearity was small between them.

Discriminant analysis

In general, DA is well known and often applied to ornithological data (Buckton & Ormerod 1997; Buckton et al. 1998). Here, the procedure involved creating linear combinations of variables with normal errors that best discriminated between site groups defined by the presence or absence of each species. DA was performed with s-plus4 software release 3 (lda function), in which combinations of explanatory variables were selected to maximize the ratio of group mean discriminant scores to within-group variance (Venables & Ripley 1997). Assessing statistical significance in DA is not straightforward; colinearity among the explanatory variables and departure from multivariate normality can invalidate assumptions in the method (Hair et al. 1995). In our case, the derivation of explanatory variables from PCA in part avoided this problem because predictors were orthogonal and univariately normal. Nevertheless, multivariate normality is difficult to assess, particularly with large numbers of predictor variables, so that results from DA should always be treated with caution (Hair et al. 1995). Its use here demonstrates a typical application by ecologists.

Logistic regression

The presence and absence of each bird species was related to altitude, slope, transformed invertebrate abundance, habitat and chemical principal components using a generalized linear model: multiple logistic regression with a logit link and binomial error distribution (McCullagh & Nelder 1989; Jongman et al. 1995). The logit transformation (equation 1) of the probability of presence/absence (p) was modelled as a linear function of 32 possible explanatory variables (xi, i = 1,32; equation 2):

image(eqn 1)
image(eqn 2)

in which b0 and bi are the regression constants. Models were fitted using a maximum likelihood method (McCullagh & Nelder 1989). We used backwards elimination to select the variables in the final model (Green, Osborne & Sears 1994; Austin & Meyers 1996; Manel, Dias & Ormerod, in press). The step function, used in the statistical package s-plus4, provides a procedure for this purpose using Akaike's information criterion (AIC); this is a penalized version of the likelihood function in which the best model fit is given by the lowest value). Significant variables at each step have to reduce the scaled deviance significantly. The change in scaled deviance as each variable is removed is distributed approximately like χ2 (McCullagh & Nelder 1989; Collett 1991). Although initially all explanatory variables were potential predictors, only those variables selected by the above criteria were used in the final solutions. The exact array of predictors varied between bird species, depending on their individual ecology.

Artificial neural networks

ANN, derived from a simple model of the structure and function of the brain, are characterized by their ability to ‘learn’. This is achieved in the model training phase by comparison between actual outputs and the desired outputs. In this phase, a training algorithm modifies internal parameters (= weights) until the performance of the network, equivalent here to prediction success, is maximized. In our case, the presence or absence of each species was predicted using the most commonly used method, the back-propagation algorithm, in Matlab software release 4 (Rumelhart, Hinton & Williams 1986; Bishop 1995). This network comprises a feed-forward neural network of three layers, the architecture of which has been described by other authors (Baran et al. 1996; Lek et al. 1996a; Walley & Fontama 1998; Manel, Dias & Ormerod, in press).

The first layer, called the input layer, comprises input nodes related to the 32 environmental variables. The second layer, or hidden layer, is composed of a set of processing elements, called neurones, whose number is determined through a series of training iterations. In our case, the hidden layer involved five neurones, trained through 200 iterations to optimize performance without overtraining (Manel, Dias & Ormerod, in press). This overtraining problem arises when model performance against test data no longer improves as rapidly as in initial iterations; this is because the network attempts to model noise in the data, rather than real pattern (Rumelhart, Hinton & Williams 1986; Baran et al. 1996; Walley & Fontama 1998). The back-propagation algorithm attempts to reproduce the ‘learning’ process through iterative adjustment of the weights on the links between layers to better approximate true solutions. The input signal, in turn dependent on the weights, is processed within each neurone by the sigmoidal transfer function:


The sigmoidal functions then produce an output between 0 and 1. This output signal is transmitted to the third layer, or output layer, which consists of one neurone responsible for prediction of species presence or absence (y) from the explanatory variables (for model structure and training see Manel, Dias & Ormerod, in press).

Global modelling approach and model comparison

Prediction success . Our first assessments of model performance involved the entire data set (180 sites × 32 environmental variables) and were based solely on prediction success, the overall percentage of sites at which the presence or absence of species was correctly predicted (Table 2). The entire data matrix was used to perform DA, LR and ANN, with explanatory variables optimally selected as described above. In LR and ANN, the output variables for each case have a value within the range 0 and 1, and presence is usually accepted at a threshold probability of 0·5. For DA, classification of each case was derived from Euclidean distances to the centroids of the ‘positive’ and ‘negative’ groups.

Table 2.  Possible measures for assessing the importance of presence–absence models (Fielding & Bell 1997). The formulae are applied to assessments of correctly predicted positive occurrences (a), falsely predicted positive occurrences (b), falsely predicted negative occurrences (c) and correctly predicted negative cases (d). n is the overall number of cases
Performance measure and definitionFormula
Overall prediction success: percentage of all cases correctly predicted (S)a + d/n
Sensitivity: percentage of true positives correctly predicted (Sn)a/(a + c)
Specificity: percentage of true negatives correctly predicted (Sp)d/(b + d)
False positive rate: percentages of actual absences wrongly predicted as presencesb/(b + d)
False negative rate: percentage of actual presences wrongly predicted as absencesc/(a + c)
Positive predictive power: percentage of predicted presences that were reala/(a + b)
Negative predictive power: percentage of predicted absences that were reald/(c + d)
The odds ratio: ratio of correctly assigned cases to incorrectly assigned casesad/cb
Kappa: proportion of specific agreement[(a + d) – (((a + c)(a + b) +(b + d)(c + d))/n)]
[n – (((a + c)(a + b) + (b + d)(c + d))/n)

Deconstructing prediction success . To deconstruct overall prediction success into separate elements, we derived matrices of confusion after Fielding & Bell (1997). True positive (a), false positive (b), false negative (c) and true negative (d) cases were first identified in each application. From these values, we initially calculated a range of performance measures including sensitivity, specificity, false positive rate, false negative rate, positive predictive power, negative predictive power, the odds ratio and kappa (Table 2; Fielding & Bell 1997). Using individual values for each species as replicate results, we compared the performance measures across methods using either one-way analysis of variance (anova) or Kruskal–Wallis tests where variances were not homogeneous. In practice, several of these performance measures were intercorrelated, so that only three (overall prediction success, sensitivity and specificity) were chosen to illustrate most aspects of comparative model performance in subsequent analyses with the test data.

Prediction performance . We tested each modelling procedure on independent observations reserved by partitioning data into calibration sets and test sets.

First, we used the ‘leave-one-out’ method of jack-knifing to isolate a calibration set (179 sites × 32 explanatory variables) and an independent test set (1 site × 32) of sites, iterated through separate runs for each observation (i.e. n = 180). In each run, the model was first calibrated and then used to predict the presence or absence of each species in the test set. The final product was a model test obtained from 180 iterations.

Secondly, to permit the realistic model test outlined in the introduction, we used regions 1, 2, 3, 4 and 5 (119 sites; Fig. 1) to calibrate models that were then used to predict species occurrence in regions 6 and 7 (61 sites). This choice allowed a test of model performance in two regions that were the most geographically isolated from the calibration set.

Identifying explanatory variables

We not only evaluated the success of each model in predicting species occurrence, but also in assessing potentially important influences on species distribution. In correlative ecology, this aspect of modelling is important in generating testable hypotheses about the causes of distribution pattern. However, the identification of likely causal variables is currently more straightforward with conventional statistical methods than with ANN (Lek et al. 1996b), and may be one advantage of LR and DA (Manel, Dias & Ormerod, in press). We thus examined this feature with an application of LR (Peeters & Gardeniers 1998). We assessed whether significant explanatory variables were consistent between the regional and entire data sets. Such consistency would provide confidence in the derivation of hypotheses about causal influences on distribution. We also assessed whether simple map-derived variables alone (altitude and slope) were sufficient to explain or predict presence/absence in the absence of detailed RHS data.


Prediction success

In the complete data set there were only minor variations in prediction success between models or species. ANN correctly predicted a significantly greater number of cases than LR (Mann–Whitney U-test, P < 0·016). However, overall prediction success in all cases was high, varying on average from 82% with LR to 94% with ANN (Table 3). For individual species–model combinations, prediction success varied from 75% for the brown dipper to 100% for the blue whistling-thrush. Prediction success was not obviously linked to species occurrence because values were equally high in species that were rare (grey wagtail and blue whistling-thrush) and common in the data set (plumbeous redstart and river chat). Further analysis with test data revealed how this use of prediction success alone would have failed to reveal important effects caused by species rarity, and important differences between approaches (see below).

Table 3.  Comparing three methods (discriminant analysis, DA; logistic regression, LR; artificial neural networks, ANN) for predicting the presence–absence of six species of Himalayan river birds. The values are percentage prediction successes estimated from 180 rivers. The mean success and coefficient of variation (CV) is given for each method
SpeciesPercentage occurrenceDALRANN
Little forktail (LF)24937890
Brown dipper (BD)30817594
Plumbeous redstart (PR)36848389
River chat (RC)33828392
Blue whistling-thrush (BWT)218179100
Grey wagtail (GW)12959297
CV (%)

Model testing

Overall prediction success in the jack-knife application to the test data varied only moderately between models, with LR predicting marginally but significantly more cases than ANN (Mann–Whitney U-test, P < 0·012; Table 4). There were also significant variations (P < 0·05) between methods in specificity, false positive rates and odds ratios; there were near-significant (P < 0·1) variations in positive predictive power and kappa (Table 5). In general, LR performed most favourably, and ANN least; these performance measures indicated that LR better identified true absence and true presence, hence having the highest overall ratio of correct to incorrect assignment.

Table 4.  Comparing three methods for predicting the presence–absence of six species of Himalayan river birds at test sites using a jack-knife procedure (one test site, iterated 180 times). The values are sensitivity (Sn), specificity (Sp) and overall prediction successes (S) with the mean success and coefficient of variation (CV) given for each method. Other conventions as in Table 3
CV (%)0.400.080.10.590.050.080.480.080.06
Table 5.  Comparing three methods (discriminant analysis, DA; logistic regression, LR; artificial neural networks, ANN) for predicting the presence–absence of six species of Himalayan river birds using a jack-knife procedure (one test site, iterated 180 times). The values for each performance measure were averaged across species (means with SD), and compared between methods by one-way anova against F (i.e. 2/17 d.f.). Values for odds ratios are species medians, and statistical comparison is by Kruskal–Wallis test against H. (*P < 0·05; **P < 0·1)
Performance measureDALRANNF or H
Sensitivity41 (16)41 (23)31 (15)NS
Specificity84 (6)92 (5)84 (7)3.82*
False positive rate16 (6)8 (5)17 (7)3.82*
False negative rate59 (16)59 (24)68 (15)NS
Positive predictive power47 (12)60 (17)40 (13)2.9**
Negative predictive power80 (4)82 (6)77 (4)NS
Kappa0.27 (0.13)0.36 (0.21)0.15 (0.11)2.69**
Odds ratio4.*

In all three methods, the negative predictive power (mean 77–82%) was considerably greater than positive predictive power (40–60%), indicated by marked differences between specificity and sensitivity (Tables 4 and 5). Values for negative predictive power and specificity were also less variable across species (Tables 4 and 5). Comparison between methods suggested that LR was worst affected by differences between species because the coefficient of variation in sensitivity of 59% was greater than for DA and ANN (Table 4). In general, species effects reflected occurrence: sensitivity tended to increase in the more common species, while specificity and overall prediction success tended to decrease (Fig. 2). Linear regression suggested that these relationships were statistically significant (P < 0·05) in six out of the nine cases illustrated.

Figure 2.

The effect of species rarity (as percentage occurrence, x-axis) on overall prediction success (%), sensitivity (%) and specificity (%) for six Himalayan river birds using discriminant analysis (DA), logistic regression (LR) and artificial neural networks (ANN). This application describes testing following a ‘leave-one-out’ (= jack-knife) procedure. Relationships significant at P < 0·05 are indicated ( d.f. = 5).

The regional application, the most realistic of all the model tests we performed, produced results generally consistent with the jack-knife procedure. The overall mean values for sensitivity and specificity (Table 6) were very similar to the jack-knife values (Table 4), with the exception that variation between species in performance became more strongly marked. This was most extreme in LR due to zero positive success for the two rarest species in the data, blue whistling-thrush and grey wagtail (Table 6). As in the jack-knife procedure, species occurrence affected sensitivity, specificity and overall predictive power, with four of the nine relationships again significant at P < 0·05 (Fig. 3).

Table 6.  Comparing three methods for predicting the presence–absence of six species of Himalayan river birds by the application of models calibrated in regions 1, 2, 3, 4 and 5 (119 sites; Fig. 1) to test sites in regions 6 and 7 (61 sites). Other conventions as in Tables 3 and 4
CV (%)0.620.160.100.900.160.070.650.140.08
Figure 3.

As for Fig. 2, but this application describes use with a test data set (n = 61 sites) separated regionally from the calibration data set (n = 119 sites).

Identifying explanatory variables

Altitude, and for one species slope, was always a significant predictor of species occurrence (Table 7). Additional habitat predictors from RHS added only in a small way to overall prediction success and negative prediction (Fig. 4). However, the additional predictive information from RHS variables sometimes substantially increased positive prediction, and hence sensitivity (Fig. 4, e.g. plumbeous redstart and river chat).

Table 7.  Results from stepwise logistic regressions to relate the presence and absence of Himalayan river birds to the altitude, slope, habitat structure (as principal components), chemistry and aquatic invertebrate abundance of 180 Himalayan streams. Only statistical significant cases are shown, as standardized logistic regression coefficients (t = estimate/standard error). Data sets either included all 180 streams (All) or just the regional calibration set (Reg = regions 1, 2, 3, 4 and 5). Species codes are from Table 3
Altitude–4.6–3.9  –5.2–4.7–4.0–2.3–3.6–3.61.5 
(Altitude)2          –1.9 
Slope3.1 –2.4–2.2  –2.5     
FlowPC1–2.3 –3.4–1.5        
FlowPC4      –3.1     
FlowPC53.1 3.0     
ChanPC4           2.5
RiparPC1          2.5 
RiparPC5 2.3  –2.8       
BankPC1 –2.6     2.9    
BankPC2     2.5 2.5    
BankPC4        1.8   
BankPC5      –2.0     
ChemPC1       3.3    
Ephemeroptera    2.8 2.7   3.0 
Coleoptera      –3.3     
Plecoptera          –2.1 
Figure 4.

Overall prediction success (%), sensitivity (%) and specificity (%) achieved using logistic regression to predict the presence–absence of six species of river birds (see Table 3 for species codes). Open bars illustrate success with map-derived variables alone (altitude or slope, s) and solid bars illustrate success using detailed habitat data from RHS. Patterns were assessed either (a) from all sites (180 sites) or (b) from calibration regions 1, 2, 3, 4 and 5 (119 sites); in the latter case for grey wagtail (GW) only habitat features had significant effects. Species codes are as in Table 3.

There was some inconsistency between the entire data and the partitioned regional data in the array of significant predictors for some species (Table 7). For example, taking the two most widespread species, for the river chat only altitude was a consistent predictor in the regional and complete data sets; for the plumbeous redstart, there were three predictors in common (Table 7). This contrast, even in the common species, showed that correlative approaches might not reliably indicate significant influences on distribution.


The management of individual species currently figures prominently in nature conservation, with vertebrates such as birds among the groups that are frequently involved (Neave et al. 1996; Pearson & Carroll 1998). However, the expense and logistical problems of exhaustive surveys, particularly in remote locations such as the Himalaya, mean that modelling approaches are increasingly used to identify sites or regions with conservation value (Williams et al. 1997). Species distribution models can help managers predict the future outcome of adverse change or management action (May & Webb 1994; Davis et al. 1998; Sutherland 1998). They can contribute to bioindicator systems (Birks et al. 1990), or they can help to identify management problems where important species are absent from predicted locations (Lawton & Woodroffe 1991). So far, attempts to apply these approaches to rivers have been few (Boon & Raven 1998) but they will become more important as the conservation value of river systems is recognized increasingly. Therefore, it is encouraging that all these modelling procedures performed well in the overall prediction of distribution in specialist river birds. This result supports the findings from our previous study, involving only one species (Manel, Dias & Ormerod, in press): working either with the entire calibration data, or with test data, prediction success across this wider array of species always exceeded 62%. At the same time, the deconstruction of overall prediction success, and the model tests with independent data, have illustrated some shortfalls in species distribution modelling that will be common to many techniques and habitats.

Deconstructing prediction success

Conventionally, the effectiveness of models for predicting species distribution pattern has been judged from prediction success alone (Buckton & Ormerod 1997; Mastrorillo et al. 1997). Recently, however, Fielding & Bell (1997) outlined the value of assessing separately different elements of prediction success, and our results support their recommendation; on average, errors in correctly predicting species presence were roughly twice those of predicting absence, with sensitivity ranging from 0% to 85%. This is despite the apparently good performance indicated by overall prediction success. In addition to possible errors in model structure or poor selection of predictor variables, such patterns can arise from systematic effects by species occurrence (Fielding & Bell 1997). They can arise also from sampling error where true presence is not correctly detected (Fielding & Haworth 1995) or because apparently ideal locations are not occupied. At least two of these features were apparent in our study due to the measurable effects of the short sampling reach (see the Materials and methods), and due to the effects of species rarity (Figs 2 and 3). It is noteworthy that low positive prediction was apparent even though we selected the six most common species from our data; rarer species may well produce even greater difficulties. We will return to this theme in a further paper specifically investigating the effects of prevalence on performance indicators in invertebrate distribution models (S. Manel, P. A. Brewin & S. J. Ormerod, unpublished data).

Deconstructed prediction errors have real practical significance. First, a decrease in the positive prediction of rarer species implies that occurrence will be most difficult to predict in those organisms for which conservation management is often most critical (Hunter 1996). The identification of good locations for species reintroductions will also be limited. Species rarity is associated with other modelling problems, such as defining threshold probabilities of occurrence in logistic regression (Manel, Dias & Ormerod, in press). Thus, important environmental influences on the probability of occurrence may be estimated incorrectly. Secondly, problems of positive prediction and low false positive rates will create difficulty in diagnosing conservation problems. For example, the absence of a species from sites of suitable physical habitat can indicate other limits on distribution, such as poor water quality (Buckton & Ormerod 1997), dispersal (Wahlberg, Moilanen & Hanski 1996) or the adverse effects of exotic species (Lawton & Woodroffe 1991). However, species rarity reduced the rate at which all methods predicted such false positive occurrences. Paradoxically, more common species will thus provide better biological indicators despite wider habitat preferences; because it is easier to predict where they should be present, absences will be easier to detect. Thirdly, the deconstruction of overall prediction success clearly allows improved insight into comparative model performance, as in this study (Fig. 4; cf. Mastrorillo et al. 1997). As an additional performance indicator, kappa (Table 2) has considerable additional value because it is unaffected by species occurrence (Fielding & Bell 1997; S. Manel, P. A. Brewin & S. J. Ormerod, unpublished data). In our applications, logistic regression tended to outperform artificial neural networks on this criterion (Table 5).

Relative advantages of da, ann and lr

Except for moderate effects, for example the marginally better performance of logistic regression in the jack-knife application, there were no marked differences between these three modelling approaches. This result contrasts with recent work by other authors who considered artificial neural networks to be advantageous in modelling species presence and absence (Mastrorillo et al. 1997). We suggest that, providing the underlying data structure is appropriate to a given method, there is no reason to believe that major differences in performance should occur. Probably of greater importance is the choice of criteria used to assess performance (Fielding & Bell 1997), the nature of the data on which prediction is based (e.g. linearity, species prevalence, data quality, sampling error) and the assumptions that must be satisfied by any given operation (Table 8). As examples, discriminant analysis will require careful interpretation where univariate or multivariate normality cannot be assured among predictor variables (Hair et al. 1995), while logistic regression is sensitive to threshold effects due to species prevalence (Fielding & Bell 1997; Manel, Dias & Ormerod, in press). Artificial neural networks may well have major advantages where species–environment links cannot be transformed to linearity (Lek et al. 1996a; Guégan, Lek & Oberdoff 1998; Walley & Fontama 1998), but might be just as sensitive as any other algorithm to colinearity and violations of independence. These possibilities require further investigation. A further practical advantage with logistic regression and discriminant analysis is that, at present, the assessment of effects by individual variables on species distribution is more straightforward than with artificial neural networks (Lek et al. 1996b; Walley & Fontama 1998). At the same time, all of the methods we tested suffer the same inevitable drawback of work carried out at such broad spatial scales, that the models rely on techniques from which cause and effect are difficult to assess. Problems of this type produce major difficulties; pattern at coarse spatial scales is crucial to macro-ecology, conservation biology or environmental management, but is also beyond validation by experimental manipulation (Blackburn & Gaston 1998; Gaston & Blackburn 1999).

Table 8.  A summary of the key constraints, assumptions and sensitivities of discriminant analysis, logistic regression and artificial neural networks
  1. 1, Hair et al. (1995); 2, Green & Vascotto (1978); 3, Fielding & Bell (1997); 4, Manel, Dias & Ormerod (in press); 5, Ripley (1996) and this study.

Discriminant analysis1,2
Operates with categorical dependent variables (n≥ 2 groups) whose position is linearly related to either untransformed or transformed dependent variables
Sensitive to departures from univariate and multivariate normality in predictor variables
Departures from multivariate normality difficult to assess
Sensitive to multicolinearity among independent variables
Requires equal dispersion and covariance structures for the groups as defined by the dependent variable
Cases should have independent errors
Sensitive to outliers
Stepwise procedures not recommended in ecological applications
Logistic regression3,4
Operates with categorical (n = 2) or continuous dependent variables that have a binomial distribution
Probabilities should be related linearly to untransformed or transformed predictor variables
Requires absence of multicolinearity
Cases should have independent errors
Sensitive to species prevalence, affecting thresholds for classification on predictor variables
Stepwise procedures possible
Artificial neural networks5
Dependent data (called desired ‘output variables’) can be related non-linearly to predictor variables
No specific assumptions concerning independent variables (called ‘input variables’)
Sensitive to overtraining: the tendency to model random noise as deterministic pattern
Assessment of effects by individual predictors currently difficult
No stepwise procedure currently available
Effects by sample independence and multicolinearity unclear

The value and importance of model testing

Under these circumstances, where models depend on correlative approaches, model tests with independent data are of considerable importance (Fielding & Haworth 1995). This practice is increasingly established in ecology (Fielding & Bell 1997; Mastrorillo et al. 1997; Manel, Dias & Ormerod, in press), although some authors have questioned the use of partitioned data for this purpose. Chatfield (1995), for example, suggested that the arbitrary division of available data is not the same as collecting new data for reasons outlined in the introduction. However, experimental tests of model predictions involving large ecosystems are rarely practicable. Thus, as a compromise, we provided test conditions by partitioning data by regions selected to maximize geographical distance between calibration sites and test sites. The consistency in results between this regional application (Tables 4 and 6) and the more artificial jack-knife procedure (leave-one-out) shows that the latter approach can provide a valuable insight to model performance. Researchers with limited test data might find this result of some value.

Equally, however, the regional application revealed just how misleading correlative approaches to data analysis can be; although there were consistently strong correlates with species distribution between the entire data and the regional data (Table 7), statistical significance in other cases varied between data sets. These problems arose even in widespread species. Such variations might reflect real differences in habitat between regions, or variation between locations in influences on distribution. As an example, the distribution of dippers Cinclus spp. in Europe reflects food abundance and stream chemistry in addition to habitat (Buckton et al. 1998), but in the Himalaya large effects by physical structure subsume all other influences (this study). Differences between regions in apparent influences on distribution can arise stochastically, as a previous resampling simulation with these data has shown (Manel, Dias & Ormerod, in press).

The value of rhs data in predicting species presence

The risk of chance correlations between habitat data and bird distributions reflects, in part, the wide array of variables produced during the RHS: over 120 variables are recorded, and spurious significant relationships with river bird distribution can arise (Buckton & Ormerod 1997). Even when reduced by principal components analysis, as in this study, this risk of spurious correlates with distribution remains. Brewin, Buckton & Ormerod (1998) therefore recommended judicious use of RHS in modelling species patterns. On the other hand, the wide array of variables recorded in RHS may well be necessary to capture quantitatively the complex variation present in river and riparian habitats (Newson et al. 1998). Although other workers (Hill 1991; Brewin, Buckton & Ormerod 1998) have suggested that simple measures of habitat (altitude, slope) can effectively predict species distributions at coarse scales, their assessments were made from overall prediction success. Our data show, in contrast, that predictions of species presence can be improved substantially by detailed habitat data (Fig. 4). Thus, while measures of location such as altitude might reveal where a bird species could occur at coarse scales, detailed reach-scale measures predict more effectively where it will occur. With current emphasis in macro-ecology on crude global predictors of species pattern, further investigations using the more detailed habitat predictors provided by methods like RHS may provide important additional insights (Sutherland 1998).


These data were collected under a programme funded by the Darwin Initiative for the Survival of Species co-ordinated by the UK Department of Environment, Transport and the Regions. We thank Dr Alan Jenkins of the Institute of Hydrology (UK), Phil Brewin and Hem Sagar Baral, without whom the work would not have been possible. The analysis was funded by the Royal Society European Science Exchange Programme. We thank Professor Claude Mouchés for providing the important opportunity for this collaboration between the Université de Pau et des pays de l’Adour and Cardiff University. We thank two referees and Dr Gill Kerby for comments on the manuscript.