Estimating species richness: calibrating a large avian monitoring programme

Authors


Marc Kéry, Swiss Ornithological Institute, CH-6204 Sempach, Switzerland (e-mail marc.kery@vogelwarte.ch).

Summary

  • 1Species richness is the most widely used measure for the diversity of a biological community. Unfortunately, the number of species counted is usually a biased measure, as not all species present may be detected. Use of species counts as a proxy for true species richness requires the assumption of constant (over space and time) species detectability. This index assumption is hardly ever tested and, if violated, comparisons over time, space or other dimensions, for example different habitats, will be distorted. In monitoring programmes one therefore needs to know the proportion of species present that are detected and how this proportion is affected by external factors.
  • 2We used capture–recapture techniques to calibrate the Swiss breeding bird survey, where species richness is recorded annually in c. 270 1-km2 quadrats during two to three visits and interest is focused on annual trends and regional comparisons. Hitherto, analysis has been restricted to species counts, while species detectability and its determinants are not known. We used the interpolated jackknife estimator to compute mean species detectability for 268 quadrats in 2001–03 and tested determinants of detectability related to space, time, observer, survey effort and biology.
  • 3Mean species detectability averaged 0·89 (SD 0·06, range 0·72–1·00), with no significant difference among years and significant, but small, regional differences. Observers differed, but surprisingly not in relation to their experience in a quadrat. Detectability was positively related to mean visit duration. Larger communities had a lower mean species detectability. A slight violation of population closure because of staggered arrival of migrants did not introduce any measurable bias into our results.
  • 4Synthesis and applications. Species detectability in the Swiss programme was high and varied little in relation to recognized sources of heterogeneity. Nevertheless, increased standardization should be considered for mean visit duration. While these results are pleasing for the Swiss programme and show that using counts as indices of species diversity need not always induce serious bias, conditions in other programmes, and in the future in the Swiss programme, may be quite different. Both in monitoring programmes and in ecological studies, as a way of risk minimization, species richness ought to be rigorously estimated whenever possible to avoid detection of spurious effects because of changes in species detectability.

Introduction

Species richness is the most widely used measure for the diversity of a biological community (Purvis & Hector 2000; Gaston & Spicer 2004) and is also an important state variable in many biological monitoring programmes (Yoccoz, Nichols & Boulinier 2001; Pollock et al. 2002). Species richness owes its frequent use to several attractive features. Unlike other diversity measures, such as genetic diversity and landscape diversity, it is conceptually fairly clear and also easy to communicate to the public. For many taxonomic groups, species are relatively well-defined entities. And, again, compared with other measures of diversity, often only little apparatus is necessary to record the species.

However, it is well known that frequently not all species present at a place and time will be recorded (Yoccoz, Nichols & Boulinier 2001; Williams, Nichols & Conroy 2002). Some species may be overlooked because they are rare, elusive, small, nocturnal or because the observer does not look at the right place or does not know them well. Conceptually, a species count, C, is related to the true number of species, N, by a constant, p, which can be thought of as the average probability of detection (or detectability) of a species, given it is present. The use of species’ counts as an index of true species richness requires the ‘index assumption’: that the expectation of the detectability of all species E(p) remains constant over space, time and any other dimensions over which comparisons are desired. We note that there has been some confusion in the literature about what ‘constancy’ means (Bart et al. 2004): it is the expectation of p and not p itself that needs to be constant. If the index assumption is not true, then ratios of species counted at different places or times will only provide a biased estimate of the relative richness of two places or points in time (MacKenzie & Kendall 2002). The index assumption should be tested whenever counts are used in ecological studies or in a monitoring programme (Conn, Bailey & Sauer 2004).

In 2001, the Swiss government launched a comprehensive biodiversity monitoring scheme to comply with the requirements of the Convention on Biodiversity (http://www.biodiv.org/convention/default.shtml): Biodiversity Monitoring (BDM; Hintermann, Weber & Zangger 2000; also see http://www.biodiversitymonitoring.ch). Probably in common with most other monitoring programs, BDM aims at measuring change only. It does not aim at testing hypotheses about causes of those changes in the system nor have prescribed management actions been linked to specified rates of change.

One of the main indicators for the biological diversity in the country is the species richness of birds, butterflies, molluscs and vascular plants at a small spatial scale. Species richness is estimated based on repeat visits in 1-km2 quadrats distributed as a systematic random sample across Switzerland. Interest focuses on time trends and comparison of regions. At present, little is known about the proportion of species present that are detected by this scheme and how this proportion varies in response to what factors. This means that it is not known if species’ counts provide a valid index of true species richness. Preliminary analyses suggested that about 85% of bird (Kéry & Schmid 2004; D. Weber, unpublished data) and about 70% of butterfly (M. Kéry & M. Plattner, unpublished data) species may be detected in any year and quadrat.

Because of the analogy between individuals in a population and species in a community (Burnham & Overton 1979), any capture–recapture model for estimation of population size can be used to estimate detectability-corrected true species richness and hence to test the index assumption in a monitoring programme. Capture–recapture models have been used increasingly to estimate species richness and also parameters of community dynamics (rates of colonization and extinction) for birds, fish and plants (Nichols et al. 1998a,b; Boulinier et al. 1998a,b, 2001; Cam et al. 2002a,b; Lekve et al. 2002; Selmi & Boulinier 2003; Kéry & Schmid 2004; Kéry 2004).

Using this approach, we analysed data from some 150 breeding bird species in 268 quadrats surveyed during 2001–03 to estimate the mean proportion of species detected and to identify determinants of mean species detectability in the national Swiss breeding bird survey Monitoring Häufige Brutvögel (MHB; Kéry & Schmid 2004; Schmid, Zbinden & Keller 2004), which provides the data for the main avian module of BDM. We first estimated species richness for every quadrat–year combination and then modelled resulting detectability estimates as a function of covariates of interest. As a test of the index assumption, we were particularly interested in systematic temporal and spatial variation of detectability and in its dependence on observers. We also investigated the potential effects on detectability of factors that might be standardized in the design of the monitoring programme. Although some details of our study are specific to the Swiss programme, testing the index assumption as a way of calibrating a monitoring programme is a broadly applicable topic of interest to many taxa and studies.

Methods

study area

Switzerland is a small (41 285 km2), mountainous country in western Europe with altitudes ranging from 200 to 4600 m a.s.l. The average altitude of 268 survey quadrats in our study was 1192 m, with a range of 250–2750 m. Breeding birds are virtually lacking at altitudes higher than that. Forests in Switzerland are small and fragmented and cover on average 34% of the area of survey quadrats (range 0–99%). Most areas outside forests and below 600 m altitude are urban or used for small-scale, highly subsidized and intensive agriculture.

the swiss breeding bird survey mhb

The Swiss Ornithological Institute (Sempach, Switzerland) runs the national breeding bird survey MHB (Schmid, Zbinden & Keller 2004). During our study, the sampling frame consisted of 268 1-km2 quadrats chosen according to a systematic random design. Because of relative inaccessibility, 33 alpine quadrats in the original sample were replaced by neighbouring quadrats with similar altitude, forest cover, aspect and slope. Since 1999, during every breeding season (15 April−15 July) three visits are conducted in each quadrat by highly qualified volunteers using a simplified territory mapping method (Bibby, Burgess & Hill 1992; Schmid, Zbinden & Keller 2004). Forty-seven high-altitude quadrats with forest cover < 10% are visited only twice. Their average altitude is 2150 m a.s.l. (range 1550–2750 m), while the other quadrats average 990 m a.s.l. (range 250–2150 m). The sample quadrats belong to six ecological regions: the Jura mountains (28 quadrats), plateau (68), northern slope of the Alps (71), west-central Alps (31), east-central Alps (33) and southern slope of the Alps (37).

Visits follow an irregular transect route that aims to cover as much as possible of a quadrat and that, once chosen in a quadrat, remains constant during subsequent years. Route length averages 5·1 km (range 1·2–9·4 km). The mean duration of a single visit is 229 min (range 60–427) and mean survey effort (time per unit transect length) 48 min km−1 (range 15–167). Visits are conducted during the first daylight hours and aim to detect as many species and individuals of potential breeding species as possible. Every visual or acoustic contact with a potential breeding species is mapped, and behavioural data used to delimit territories are also noted for a total of about 150 species. The average date of the first to third visits is 8 May, 28 May and 8 June, respectively. Maps from all visits are later combined and putative territories determined based on the knowledge of species-specific territory size, clustering of observations and behaviour recorded.

treatment of migrant species

To avoid counting transient migrants in MHB, 33 late-arriving species are not surveyed until after a species-specific threshold date: 25 April (five species), 1 May (11 species), 5 May (one species), 10 May (five species), 15 May (seven species) and 20 May (four species). An exception is made for observations providing strong evidence of breeding, such as a pair carrying nest material. Visits conducted earlier than the threshold date of a species induce structural zeroes in the data for that species. Because such a species is then genuinely absent, the corresponding non-observation does not contain any information on detectability. If not accounted for in a capture–recapture analysis, this may introduce a negative bias for average species detectability and therefore a positive bias for estimated species richness. In our data set, there were many detections of ‘threshold species’ 1–3 days earlier than the respective threshold date but very few detections during visits conducted more than 3 days earlier than the threshold. Hence we assumed that for each threshold species, a zero during a visit more than 3 days earlier than the threshold date was a structural zero rather than a sampling zero because of imperfect detectability. We called these species ‘absent species’ for the first visit. We could not directly accommodate such structural zeroes in our analysis. Instead, we quantified their proportion and checked if they induced a measurable change into our estimates of detectability, i.e. we looked for a correlation between the proportion of absent species and the detectability estimate (see below).

statistical analysis

We conducted our analysis in two steps: first we estimated detectability and secondly we regressed these estimates on covariates of interest. Models have now been developed that combine the two steps but they are not yet available in a user-friendly way (Dorazio & Royle 2005).

Capture–recapture estimation of species richness and mean species detectability

For every quadrat and year, the survey data can be summarized in a capture-history matrix, with rows denoting the species detected and columns denoting visits. Depending on the number of visits, there are two or three columns. Element arc of the matrix equals 1 when species r is detected during visit c, and 0 otherwise. This matrix forms the basis of our capture–recapture modelling of species richness.

Because species differ so much with respect to size, colour, behaviour, habitat and density, heterogeneity among species in detectability can be expected a priori and has indeed been found in previous studies (Boulinier et al. 1998a; Diefenbach, Brauning & Mattice 2003; Kéry, Royle & Schmid 2005). The procedure most widely used for species richness estimation therefore assumes that detectability differs among species and estimates species richness by a jackknife estimator (Burnham & Overton 1979). This estimator has performed consistently well in comparisons (Otis et al. 1978; Palmer 1990, 1991; Brose, Martinez & Williams 2003; Baker 2004) and has become a standard for estimation of species richness and species detectability (Nichols et al. 1998a,b; Boulinier et al. 1998a,b, 2001; Cam et al. 2002a,b; Lekve et al. 2002; Selmi & Boulinier 2003; Kéry & Schmid 2004). It is implemented in several freely available programs, capture (Otis et al. 1978), comdyn (Hines et al. 1999), EstimateS (Colwell 1997) and mark (White & Burnham 1999). We used the interpolated jackknife estimator for species richness for each quadrat–year combination to estimate detectability P as =C/&#x004e;̂, i.e. mean species detectability for all visits in a season combined was the ratio of the observed (C) and estimated number of species (&#x004e;̂) (Boulinier et al. 1998a).

Determinants of species detectability

To test for factors affecting mean species detectability, we used GenStat (Thompson & Welham 2003) to model the resulting detectability estimates as a function of quadrat, year and covariate effects. We used a linear mixed model (Littell et al. 1996; Pinheiro & Bates 2000; Littell 2002) with random quadrat and observer effects to account for possible correlations of visits from different years within the same quadrat and of visits made by the same observer. Under this model, mean estimated species detectability Pij for year i and quadrat j may be written as:

Pij=µ+βi+c×xijjkij

where µ is the grand mean, βi is the fixed effect of year i, c is the per-unit change in P as a result of a unit change in covariate x, τj is a random effect for quadrat j, υk is a random effect for observer k, and ɛij is the usual normally distributed error component, i.e. the sum of all effects not explained by the other model terms. Covariate x may vary among years, quadrats or observers. In practice we had not one but many covariates x in the model (see below). Our analysis assumes that the random effects τ and υ are independent zero-mean normal variables, i.e. τjN(0, inline image) and υkN(0, inline image). We tested these assumptions by visual inspection of histograms and Q-Q-plots. Because the distribution of errors ɛij was not distinguishable from normal, we did not transform Pij. We excluded from analysis one extraordinary low species detectability estimate of 0·50, so our analysis is based on 787 quadrat–year combinations.

Some hypotheses tested in the model are specified in terms of fixed effects and others in terms of random effects. We tested fixed effects using Wald tests. As the random effects for quadrats (n = 268) and observers (n = 237 observers) had a large number of levels, we tested them by considering the ratio of their variance component estimate to its standard error as a standard normal random variable. This test is one-sided because the alternative hypothesis is directed (Littell et al. 1996). We predicted a priori that detectability may be affected by factors summarized under the rubrics of space, time, observers, survey effort and biology (Table 1). Observers and quadrats were somewhat confounded in our analysis, therefore we refit the model in Table 2 with only one of them. Significance and direction of the fixed effects were very robust to the specification of the random part of the model. In the analysis in Table 2, species count C appears both as a covariate and as a component of the detectability estimator . This may induce a sampling covariance and lead to the spurious ‘detection’ of a significant relationship where in fact there is none. Therefore, we also ran the analysis without species count and found no difference in terms of significance and direction of the remaining estimated effects.

Table 1.  Twelve hypotheses about determinants of the mean proportion (P) of bird species detected (or equivalently, mean species detectability) in the Swiss breeding bird survey (MHB). These ideas were generated before looking at the data
Type of hypothesis (H)/factorReason
Spatial hypotheses
1 Ecological regionBecause of the spatial structure of species composition, abundance, habitat or other spatially varying factors, p may be different in six ecological regions of Switzerland
2 QuadratAs for H 1
3 ElevationAs for H 1
Temporal hypotheses
4 YearBecause of the temporal change of species composition, abundance of individual species or other time-dependent factors (e.g. weather), p may vary by year
Observer hypotheses
5 ObserverObservers may differ intrinsically in their abilities
6 Experience (including first-year observer effect)More experienced observers may detect a larger proportion p of species present. In particular, a negative first-year observer effect may be expected
Survey effort hypotheses
7 Transect lengthP is higher in quadrats with longer survey transects because the area is surveyed more thoroughly
8 Survey effort (and mean visit duration)A greater proportion of species is detected when more time (either per unit transect length or per full transect) is spent on a transect
9 Number of visitsA greater proportion of species is detected when three instead of two visits are conducted
‘Biological’ hypotheses
10 Forest coverIn more open habitats, a greater proportion of species is detected. This may be confounded with H 11
11 Species richnessPoorer communities may be sampled more exhaustively. This may partly be an effect of the openness of terrain, as the poorest communities are high-altitude quadrats lacking forest cover (see H 10). Note as a caveat the sampling covariance between C and P̂.
‘Nuisance’ hypotheses
12 Population closureMean detectability is lower with greater proportions of species absent during first visits
Table 2.  Determinants of mean species detectability in the Swiss breeding bird survey (MHB) analysed using a linear mixed model with random effects for quadrats and observers. We show estimates of fixed effects with a single degree of freedom and of variance components for random effects. Detectability is expressed here as a percentage to avoid excessive zeroes in the estimates. There were 268 quadrats and 237 observers for a total of 787 quadrat–year combinations
Source of variationEffectTest statisticd.f.Ρ
Fixed effects Wald χ2  
Proportion species absent−3·098 0·661  0·418
Region13·265  0·021
Altitude−0·0008 2·071  0·150
Year 3·172  0·205
Experience 0·0648 0·891  0·345
Route length 0·6525 1·881  0·171
Effort 0·0409 1·521  0·217
Number of visits 0. 1913 3·771  0·052
Forest cover 0·0032 0·351  0·555
Species count−0·134113·201< 0·001
Random effects z  
Quadrat 3·21 1·741  0·020
Observer 2·92 1·661  0·024
Residual29·48

Closure of the species pool

Our use of the jackknife estimator of species richness assumes a closed species pool, i.e. that all species are present during the entire season. This assumption was violated in our study because some late-arriving migrant species are not recorded until after a species-specific threshold date. This concerned almost exclusively the first visits. To test and correct for any bias induced by species absent during the first visit, we fitted a covariate that quantified the degree of violation of population closure. For each quadrat–year combination, we counted the number of species that were ‘absent’ during the first visit (see absent species in Treatment of migrant species). The proportion of absent among all species detected was used as a covariate in the mixed model to test if closure violation measurably affected our analysis.

Results

characteristics of the swiss breeding bird survey mhb during 2001–03

Among 268 quadrats, 255 were surveyed in all 3 years, 10 in 2 years and 3 in a single year only; 200 were surveyed by the same observer during all years, 56 by two observers, and 12 by three different observers. Because MHB had started in 1999, even in 2001 most volunteers had already surveyed the same quadrat for more than 1 year. Of 261 quadrats surveyed in 2001, 34, 30 and 197 had been surveyed by the same observer for 1, 2 and 3 years, respectively. Of 263 quadrats in 2002, 38, 20, 24 and 181 had been surveyed by the same observer for 1, 2, 3 and 4 years, respectively. and of 264 quadrats in 2003, 50, 24, 15, 20 and 155, had been surveyed by the same observer for 1, 2, 3, 4 and 5 years, respectively. Therefore, in any one year about 15% of quadrats were surveyed by first-year observers. The mean proportion of absent species during the first visit was only 4·2% (range 0–26·3%), i.e. first visits were conducted more than 3 days earlier than the threshold date for 4·2% of all species eventually detected.

determinants of species detectability

In 268 1-km2 quadrats surveyed between 2001 and 2003, on average 33·0 (range 4–61) species were detected. The interpolated jackknife estimate of species richness was 37·3 on average (range 5–72; Fig. 1). The mean proportion of bird species detected was 0·89 (SD 0·06; range 0·72–1·00; Fig. 2). Hence, with two to three visits during a breeding season, between 0% and 28% of all breeding species present in a 1-km2 sample quadrat in Switzerland were estimated to be overlooked.

Figure 1.

Observed and estimated number of bird species (species richness) per sampling quadrat in the Swiss breeding bird survey (MHB). Data for 2001–03 are combined.

Figure 2.

Mean avian species detectability in the Swiss breeding bird survey (MHB) based on jackknife estimates of species richness. Sample size (n) is the number of quadrat–year combinations. One extreme case at 0·50 has been omitted.

A mixed model for the mean species detectability estimates indicated that the proportion of species detected was fairly stable across fixed, recognized, potential sources of variation (Table 2). Most notably, the proportion of species absent during the first visit had no discernible effect on estimated mean species detectability. Hence, a slight violation of closure of the species pool because of late arrival of some migrant species did not induce a measurable bias in our results. There were significant, but fairly small, differences among six ecological regions of Switzerland (Fig. 3). Mean species detectability declined non-significantly with altitude. There were no differences among years nor, surprisingly, did we detect any significant effects of observer experience, survey route length or survey effort. A slightly larger proportion of species was detected in quadrats surveyed three times rather than twice. There was no relationship between forest cover of a quadrat and mean species detectability; in contrast, mean species detectability was greater in smaller bird communities, where size was expressed as the number of species actually detected (Fig. 4a). In addition, mean species detectability varied significantly among quadrats and observers beyond what could be explained by the above factors; the variance components for quadrats and observers were both significantly larger than zero (Table 2).

Figure 3.

Regional differences (SE) in the expected avian species detectability in the Swiss breeding bird survey (MHB). Estimates and SEs were obtained under the model in Table 2.

Figure 4.

Relationship between expected avian species detectability and (a) community size, expressed as the number of species detected (Species count), and (b) mean visit duration (min), in the Swiss breeding bird survey (MHB). Estimates and 1 SE bands were obtained under the model in Table 2.

We also ran two variations of the model in Table 2. In the first, the continuous explanatory variable ‘Experience’ was replaced by an indicator for first-year observers. First-year observers did not detect a smaller proportion of species present than more experienced observers (inline image = 0·34, P = 0·562). In the second, we dropped route length and mean survey effort (min km−1) and instead added the product of these two variables, mean visit duration. Mean visit duration had a significantly positive effect on mean species detectability (Fig. 4b; inline image = 7·65, P = 0·006).

Discussion

calibration of a monitoring programme: p and its determinants

A biological monitoring programme can be viewed as a complex counting device for species or individuals, except that most monitoring programmes cannot directly measure the true number of species or individuals, N, but rather the product, Np, where p is detectability. A study not interested in absolute abundance may use species counts, Np, as an index for species richness, N, provided the index assumption is true, i.e. the expectation of detectability, E(p), is constant over the dimensions to be compared (Williams, Nichols & Conroy 2002; Conn, Bailey & Sauer 2004). In a monitoring programme, comparisons are most frequently made over time to detect trends and over space to contrast regions.

We calibrated the main avian module of the Swiss BDM in all 268 1-km2 sampling quadrats surveyed during 2001–03 and found that, on average, 11% (range 0–28%) of the species estimated to be present in the quadrat were missed during all visits. Detectability in our study is thus rather high compared with the North American breeding bird survey (BBS; c. 76%; Boulinier et al. 1998a). It is also higher than our previous estimate for nine quadrat–year combinations of the MHB (85%; Kéry & Schmid 2004) and much higher than species detectability in the BDM for butterflies estimated in six quadrats with seven repeat surveys (c. 70%; M. Kéry & M. Plattner, unpublished data). Over the admittedly very short time period of 3 years, there were no discernible differences among years, which is also what Boulinier et al. (1998a) found in their comparison of BBS data from four USA states in 1970 and 1990. Their latter result was counter-intuitive, as Sauer, Peterjohn & Link (1994), over a period of 26 years, found indications for increasing detectability over time. Like Boulinier et al. (1998a), we found slight, although significant, regional differences in mean species detectability. This indicates that species composition, species abundance, average observer skills, habitats or other determinants of detectability differed systematically among regions and hence calls for caution when comparing raw counts across regions. Similarly, the significant quadrat random effect indicates that, at the short time scale of our study, these or similar factors differ systematically at the 1-km2 scale. In contrast, Boulinier et al. (1998a) did not find correlations between detectability for the same route over two decades.

Sauer, Peterjohn & Link (1994) studied among-observer variability and found that successive observers on the same sampling unit yielded significantly increasing species counts, suggesting that newly recruited volunteers tended to be more skilled than those they replaced. Kendall, Peterjohn & Sauer (1996) studied within-observer variability and found a negative effect of the first year of service on the number of individuals counted. Boulinier et al. (1998a) also found that skilled observers missed a smaller proportion of species than less skilled observers, where skill was based on such information as the number of species and individuals counted in earlier years. In our study, we found variation among 237 observers as shown by the significant observer variance component in the mixed model. However, this variation could not be ascribed to experience expressed as number of years of service on the same quadrat.

This finding seems surprising, but we can think of three explanations. First, it is likely that an increase in experience yields diminishing returns for detectability. So if observer quality in the MHB is relatively high on average, then no large differences in detectability can be expected. This explanation seems plausible, as many observers in the MHB have many years of experience as field ornithologists. For instance, many of them had already participated in the second Swiss Atlas project (Schmid et al. 1998), where essentially the same field method was used as in the MHB. Secondly, it may be that the number of years of service in a quadrat is not an adequate measure of observer skills. This explanation is also possible, as, to some degree, observer quality is likely to vary independently of knowledge of an area. It is likely that part of the variation in mean species detectability (range 0·72–1·00) ought to be explainable by a better measure of observer quality. Nevertheless, we think that the absence of a first-year observer effect indicates that most observers already had a fairly high level of experience at the start of their service for the MHB. Finally, it may be that a large number of the species present are fairly common and easy to find and hence are detected even by less experienced observers.

Species in larger communities tended to be overlooked more easily. This interesting result is mirrored by the earlier finding that singing individuals are also detected at a lower rate when density is high than when density is lower (Bart & Schoultz 1984). This makes a strong argument for using a formal detectability correction when estimating abundance or species richness, as not only are N and p confounded in a raw count, C, but they interact. This means, for example, that the value of sites with high species numbers may be relatively underestimated.

As a caveat we note again that C and in our study had a non-zero sampling covariance. Still, we feel that the strong evidence for a negative effect of C can probably not be explained by this covariance alone but rather indicates that there is a true negative relationship between the two. In the future, it would seem worthwhile to test this hypothesis with two independent data sets to avoid any ambiguity.

Our study focused on overall species richness because this is one of the main measures of biological diversity used in the Swiss BDM. Obviously the use of such a broad summary statistic may mask important dynamics in a community occurring at the level of individual species, such as abundance, or of subsets of species, such as number of indicator species (Hausner, Yoccoz & Ims 2003). Species of conservation concern are likely to be rare, hence their detectability will be lower and perhaps more dependent on external factors. For them it may be even more desirable to use an approach correcting for detectability when estimating their number. Alternatively, it is also possible to estimate other measures of biological diversity (e.g. the Shannon index) that combine species richness and species abundance while taking account of the fact that not all species are detected (Chao & Shen 2003).

the assumption of population closure

In contrast to traditional capture–recapture models for the size of a population of individuals, models for the estimation of species richness must be able to account for species-specific detectability. Species differ much more among each other than individuals within a population, and if this heterogeneity is not accounted for seriously biased estimates may result. Although an open population model accounting for heterogeneity by finite mixtures is being developed (Pledger, Pollock & Norris 2003; S. Pledger, personal communication), currently available models that account for heterogeneity all assume a closed population. The closure assumption is thus critical for species richness estimation.

In avian communities, the closure assumption is typically violated because of the seasonal movements of migrant species. In our study, the closure violation was fairly slight, as on average only 4·2% of the species eventually detected had not yet arrived during the first visit, and virtually all species were present during the second and third visits. There was no significant relationship between the proportion of such ‘absent’ species and the estimated detectability. Hence our study shows that a known (but minor) violation of the closure assumption may not be devastating. This is good news for avian monitoring, as the staggered arrival or departure of part of the species is a kind of Markovian temporary emigration that introduces bias into closed population estimates (Kendall 1999).

We dealt with closure violation in a somewhat ad hoc fashion for want of an available closed capture–recapture model that can account for structural zeroes in the encounter history of a species. Alternatively, for each species one might condition on the first observation in a year and only model detections thereafter (J. Nichols, personal communication). A more elegant and efficient analysis would be to account directly for structural zeroes by building a combined likelihood for all species, quadrats and years (Dorazio & Royle 2005) and declaring some visits as providing only missing values for some species–quadrat combinations. However, until such models are made available to biologists, we believe that our approach can be useful, particularly for avian communities, where arrival and departure times of all species are usually well known and the probable degree of closure violation can thus be quantified.

design topics

As in experimental design (Mead 1988), three important issues for successful biological monitoring are standardization, replication and, of course, randomization. Standardization is the traditional way of dealing with heterogeneous detectability p and the resulting problems when comparing species counts. Hence in most monitoring programmes visits will be conducted under conditions as constant as possible. In a way, a detectability-corrected (e.g. capture–recapture) model-based estimate &#x004e;̂ of species richness N represents the ultimate standardization, in that it ideally eliminates the result of all factors that may introduce variance into p. However, it is still desirable to eliminate as many sources of variation for p as possible a priori, i.e. at the design state, as a posteriori model-based corrections are dependent upon the model chosen being an adequate approximation of the system under study. Furthermore, greater variation in p will translate into greater imprecision in the estimated species richness &#x004e;̂. Hence, while not sufficient to deal with detectability problems, standardization is also important in a capture–recapture estimation framework.

Among the factors that might potentially be controlled in the survey design, only visit duration had a significant, positive effect. This is consistent with our previous finding for the MHB that, at the level of the individual territory, detectability was higher in quadrats where more time was spent per unit transect length (Kéry, Royle & Schmid 2005). We suggest that some of this variability ought to be eliminated in the future.

Nevertheless, there are probably limits to standardization. First, there are tremendous differences among quadrats. It does not seem to make sense to survey a flat quadrat containing only open agricultural ‘steppe’ for the same amount of time as another quadrat that contains a steep, rocky mountain side with 700 m of difference in altitude between the lowest and highest points and with a mosaic of open and wooded areas. It appears likely that visit duration interacts with ruggedness of terrain to affect the effective sampling area associated with a transect, and thus detectability of the species. Secondly, most quadrats in Switzerland are surveyed by volunteers and a somewhat less rigid protocol can be imposed on them than on paid staff (Kéry, Royle & Schmid 2005). There is a trade-off between the number of volunteers available for a survey and the maximum possible rigidity of a field protocol. For instance, field workers in the butterfly survey of the Swiss biodiversity monitoring programme are paid and, consequently, a very rigid standardization can be imposed on their fieldwork.

Replication is the traditional way of dealing with p < 1, as it is intuitively clear that increasing the number of visits also increases the net detectability over that of a single visit. Still, it is rarely possible or may be prohibitively expensive to sample a community exhaustively. So even with replicated visits, extrapolation is usually necessary to estimate true species richness.

In a capture–recapture framework, replication is the key to the proper estimation of species richness. Hence biological monitoring programmes should always be repeated in time, preferably within short periods, to enable simpler closed, rather than open, population models to be used. However, new methods are continually being developed and also incorporated into easily accessible computer programs such as mark (White & Burnham 1999). It is foreseeable that many more sampling designs will provide data amenable to rigorous statistical analysis in the future. It is even possible to obtain an estimate of species richness from a single visit by using a version of the jackknife estimator when all individuals seen are counted for each species (Burnham & Overton 1979; Boulinier et al. 1998a). However, it is likely that resulting estimates are less precise and that better estimates are available under models that use data from repeated visits.

And finally, as in any empirical study, only a random sample permits valid inference about a statistical population. In MHB, quadrats were selected according to a systematic random (grid) sample, with only a few quadrats in that primary random sample rejected and replaced by ‘similar’ quadrats to ensure surveyability in the field. While this may slightly bias results towards more accessible areas of the country, we do not believe that this effect is important.

conclusion

For the Swiss breeding bird survey we have tested the assumption that mean species detectability, p, is sufficiently constant over time, space and other potential dimensions of comparison, so that raw species counts, C, can be used as a proxy for true species richness, N. We found that for the temporal and spatial scale of our study and for the factors investigated, p was reasonably constant, so use of C as an index for relative abundance might be warranted for Switzerland during the 3 years considered.

The demonstration that use of species counts may not always induce serious bias may give some comfort to other biological monitoring schemes aimed at measuring species richness. However, conditions in other schemes may be very different and there may be other factors not studied here that are related with detectability. Furthermore, our study only looked at a 3-year period, while a monitoring programme may be used to infer trends over decades (Sauer, Peterjohn & Link 1994). Hence, it is likely that in our programme there are hidden effects on detectability that could cause biases in some comparisons, or that counts compared over many more years will be a biased measure of relative richness.

We believe that the use of species counts as a proxy for true species richness is scientifically defensible only if the index assumption is tested every time it is made. Only if the adequacy of the index assumption has first been established can ratios of species counts be used for comparative analyses, where interest does not lie in absolute but rather in relative numbers of species. This may yield greater precision of trend estimates (Skalski & Robson 1992) However, in order to test the index assumption one typically already has to go the full way towards an explicit estimate of true species richness, so why should anyone want to use the index any more? After all, interest usually lies in species richness, N, and not in the product, Np.

Our study shows that using capture–recapture methods to estimate true species richness need not always be costly. The Swiss breeding bird survey was not designed with such analyses in mind, and yet the natural practice in avian monitoring of conducting repeat visits lends itself nicely to such analyses. As a minimum, we suggest that species richness should be estimated in a subsample of each monitoring programme (double sampling). Estimates of detectability can then be applied to the remaining units to obtain proper estimates of true species richness (Pollock et al. 2002). Particularly in larger countries with less resources this may be the only viable approach. However, the smaller the size of the subsample, the larger the variances of resulting estimators. If double sampling can be achieved relatively cheaply, it is desirable to collect the extra information for inference about detectability on all sampling units, as this will greatly improve precision. Use of capture–recapture or related methods is an important insurance against a serious malfunction of an expensive species-measuring device.

Acknowledgements

We wish to thank the dedicated and highly qualified volunteers that conduct the fieldwork of the Swiss breeding bird survey MHB. We thank J. Nichols, N. Yoccoz, L. Jenni, D. Weber and N. Zbinden for comments, A. Royle for discussing species richness estimation and sharing with us his newly developed models, and P. Nietlisbach for help with data preparation. M. Kéry thanks B. Schmid for continuing support. We also thank the Swiss biodiversity monitoring programme (BDM) of the Swiss Agency for the Environment, Forests and Landscape (SAEFL) for funding this study.

Ancillary