Hierarchical Bayes estimation of species richness and occupancy in spatially replicated surveys

Authors


*Correspondence author. E-mail: marc.kery@vogelwarte.ch

Summary

  • 1Species richness is the most widely used biodiversity metric, but cannot be observed directly as, typically, some species are overlooked. Imperfect detectability must therefore be accounted for to obtain unbiased species-richness estimates. When richness is assessed at multiple sites, two approaches can be used to estimate species richness: either estimating for each site separately, or pooling all samples. The first approach produces imprecise estimates, while the second loses site-specific information.
  • 2In contrast, a hierarchical Bayes (HB) multispecies site-occupancy model benefits from the combination of information across sites without losing site-specific information and also yields occupancy estimates for each species. The heart of the model is an estimate of the incompletely observed presence–absence matrix, a centrepiece of biogeography and monitoring studies. We illustrate the model using Swiss breeding bird survey data, and compare its estimates with the widely used jackknife species-richness estimator and raw species counts.
  • 3Two independent observers each conducted three surveys in 26 1-km2 quadrats, and detected 27–56 (total 103) species. The average estimated proportion of species detected after three surveys was 0·87 under the HB model. Jackknife estimates were less precise (less repeatable between observers) than raw counts, but HB estimates were as repeatable as raw counts. The combination of information in the HB model thus resulted in species-richness estimates presumably at least as unbiased as previous approaches that correct for detectability, but without costs in precision relative to uncorrected, biased species counts.
  • 4Total species richness in the entire region sampled was estimated at 113·1 (CI 106–123); species detectability ranged from 0·08 to 0·99, illustrating very heterogeneous species detectability; and species occupancy was 0·06–0·96. Even after six surveys, absolute bias in observed occupancy was estimated at up to 0·40.
  • 5Synthesis and applications. The HB model for species-richness estimation combines information across sites and enjoys more precise, and presumably less biased, estimates than previous approaches. It also yields estimates of several measures of community size and composition. Covariates for occupancy and detectability can be included. We believe it has considerable potential for monitoring programmes as well as in biogeography and community ecology.

Introduction

Species richness is the most widely used measure of biodiversity (Purvis & Hector 2000). It is applied frequently in monitoring programmes (Yoccoz et al. 2001) and is an important criterion for selecting nature reserves (Howard et al. 2000). Furthermore, explaining spatial patterns in species richness forms a major part of biogeography (Jetz & Rahbek 2002). Interestingly, species richness is usually not observed directly; rather, it has to be estimated to account for species that are present but undetected. This may be because many species are rare, inconspicuous or temporarily absent from a particular sampling area, or because the investigator does not know them. Surprisingly, most ecologists and managers of monitoring programmes have been slow to acknowledge this. Consequently, instead of true species richness, raw (uncorrected) numbers of species detected are reported by most monitoring programmes (Weber, Hintermann & Zangger 2004) and also in most biogeography and community ecology research; for notable exceptions see Doherty et al. (2003); Karanth et al. (2006).

Conceptually, the expectation of a raw count of species E(C) is related to true species richness (N) by E(C) = Np, where p can be thought of as the average species detection probability. Hence interpreting raw species counts as if they were true species richness will result in an underestimation whenever p  1, which is probably always. Even when interest is focused on ‘relative’ species richness, the expectation of p (though not p itself) needs to remain constant over spatial, temporal or other (e.g. habitat) dimensions of comparison. In raw species counts, patterns in p are confounded with patterns in N, hence the use of C instead of N may lead to the ‘detection’ of spurious patterns or, alternatively, may mask real patterns in species richness.

Many statistical methods have been developed to estimate true species richness (Chao 2005), and capture–recapture methods have been used frequently in recent studies (Boulinier et al. 1998a, 1998b; Nichols et al. 1998a, 1998b; Boulinier et al. 2001; Cam et al. 2002; Lekve et al. 2002; Kéry & Schmid 2006). In most of these studies, species richness was assessed across multiple sites or years. In addition, the relationships among richness, species detectability and some covariates were usually investigated. A separate estimate was obtained for each sample, resulting in many individual estimates likely to be inefficient and some even impossible because of the small sample sizes. Alternatively, it would have been possible to pool all samples, but then sample-specific (e.g. covariate) information should have been discarded and site-specific species richness could no longer be estimated. Further, to study relationships between, say, species detectability and habitat variables, a two-stage analysis needs to be applied: first, detectability estimates are obtained in one analysis, and these are treated as data in another (Boulinier et al. 1998a; Kéry & Schmid 2006). It is difficult in this ad hoc approach to account for the statistical uncertainty in parameter estimates (Link 1999). Hence an integrated approach would be desirable that combines information across multiple sites and allows covariates to be introduced directly.

Here we describe such a model, a hierarchical Bayesian (HB) multispecies site-occupancy model (Dorazio & Royle 2005; Dorazio et al. 2006; Royle et al. 2007). Our model is an extension to all species – detected or undetected – in a community of single-species site-occupancy models (MacKenzie et al. 2006; Royle & Kéry 2007).The heart of the model is an estimate of the incompletely observed species-by-site incidence matrix (also called presence–absence) matrix, a centrepiece of biogeography and monitoring studies (McCoy & Heck 1987; Gotelli 2000). The proper estimation of this matrix, free of the distorting effects of imperfect detectability, enables us to obtain unbiased estimates of site-specific and total species richness, species-specific site occupancy, and similarity between sites and between species, and allows covariate relations to be introduced, while at the same time fully accounting for all parameter estimation uncertainty.

Our first aim is to introduce this model in a monitoring context, although it appears equally useful for many other branches of ecology. We revisit the task of calibrating a monitoring programme by considering the Swiss breeding bird survey Monitoring Häufige Brutvögel (MHB; Kéry & Schmid 2006) and asking what proportion of the species present are actually detected. Second, our study attempts to field-test the multispecies site-occupancy model of Dorazio & Royle (2005). Our data come from a set of two independent observers who each conducted three surveys in 26 1-km2 quadrats, resulting in two single-observer data sets and one for both observers combined. In the absence of known truth, a useful criterion for a good estimation procedure is the degree to which independent samples give similar estimates (O’Hara 2005). We compare raw species counts and species-richness estimates from the jackknife and the new HB model between replicated samples of three surveys in 26 quadrats. We show that the combination of information in the HB model resulted in species-richness estimates presumably at least as unbiased as previous approaches that correct for detectability, but without costs in precision relative to uncorrected, biased species counts

Materials and methods

study area and field protocol

Switzerland is a small (41 285-km2) western European country with elevation ranging from 200 to 4600 m asl and average forest cover of 27%. The background to our study is the Swiss national common breeding bird survey (‘Monitoring Häufige Brutvögel’; MHB) run since 1999 by the Swiss Ornithological Institute (Schmid et al. 2004), and the Swiss federal Biodiversity Monitoring (BDM; Weber et al. 2004; http://www.biodiversitymonitoring.ch). BDM is a very broad, multitaxon and multi-indicator monitoring scheme, part of which is focused on changes in species richness at the 1-km2 scale. From 15 April to 15 July, 267 1-km2 quadrats chosen according to a systematic random (grid) design are each surveyed three times by qualified volunteers using a simplified territory-mapping method along an irregular transect route of 5·1 km on average. For more information on the survey and its protocol see Schmid et al. (2004); Kéry et al. (2005); Kéry & Schmid (2006).

For our study, as part of the BDM quality control, 26 quadrats among the regular MHB sample were selected in the northern and western parts of Switzerland. Although selected haphazardly, we believe they are representative for the breeding bird community found at lower and medium elevations north-west of the Swiss Alps. Mean elevation was 680 m (range 250–1350) and mean forest cover 30% (range 0–82). In one year from 2001 to 2003, each quadrat was surveyed three more times by an observer independently of the three regular surveys. These additional surveys were conducted in the same breeding season and along the same quadrat-specific transect route. For each observer and quadrat, territory counts for each detected species were quantized to produce an X(i,j) matrix containing the number of times (= surveys) that species i was detected in quadrat j (Fig. 1). For each quadrat, three data sets of type X(i,j) were available: for the first, for the second, and for both observers combined.

Figure 1.

Conceptual outline of the multispecies site-occupancy model showing the relationship between the observed data [the X(i, j) matrix] augmented by O – n all-zero detection histories and the partially observed latent presence–absence matrix [Z(i, j)]. The number of rows of Z represents total community size (species richness) N. Z consists of n species detected somewhere among the 26 quadrats and N – n species not detected anywhere. The model serves to impute the missing values (na) in the observed Z matrix to account for true presence (1) or absence (0) and to account for all J sites at which the N – n undetected species are estimated to occur.

two previous approaches to species-richness estimation

The most common approach to species-richness estimation is really no estimation at all: mere use of raw totals of detected species. This is also current practice in that part of the Swiss BDM programme (Weber et al. 2004) that is fed by MHB data. We compare such raw species counts with two kinds of species-richness estimators, the widely used jackknife estimator of Burnham & Overton (1979) applied separately to each quadrat and data set, and an HB implementation of the new multispecies site-occupancy model of Dorazio & Royle (2005) that integrates data across all quadrats (see below).

The heterogeneity model Mh underlying the jackknife estimator assumes that the species pool is closed during all sampling events, and that every species i has a detection probability pi constant across surveys (Otis et al. 1978; Burnham & Overton 1979). The estimator uses capture frequencies (fh), the numbers of species detected exactly h = 1,2, ... , K times (here K = 3), and has the following general form:

image(eqn 1)

where S is the number of species detected and αhk are constants associated with jackknife estimators of order k. This estimator has been used frequently in recent work on species-richness estimation in community ecology (Boulinier et al. 1998; Nichols et al. 1998a, b; Hines et al. 1999; Cam et al. 2002; Doherty et al. 2003) and monitoring studies (Kéry & Schmid 2006). In this study, we used the interpolated jackknife estimator provided by the program capture (Otis et al. 1978).

the multispecies site-occupancy model

Models of species richness and community structure are naturally formulated in terms of species-specific models of occurrence probability or occupancy. Let z(i,j) be an indicator of whether species i is present at sample location j. Most summaries of community structure, including richness, can be formulated as a function of these indicator variables (Dorazio & Royle 2005) and the statistical objective becomes, in essence, to estimate or predict these latent indicator variables. Dorazio & Royle (2005) develop this multispecies occupancy model in terms of these latent indicator variables for each species, i = 1,2, ... , N in the community. One important consideration is that N is not observed. Dorazio & Royle (2005) deal with this by developing a Bayesian analysis of the so-called conditional likelihood, which is the likelihood for the observed sample of size n species. Dorazio et al. (2006) and Royle et al. (2007) developed a parameterization of the unconditional likelihood using a data-augmentation approach, the benefit of which is that it yields to a simple Bayesian implementation.

For the most basic model within this class, we suppose that logit(ψi) = αi and logit(pi) = βi. Here, ψi is the probability of occurrence (occupancy) for species i in the area from which spatial replicate samples were taken, and pi is the probability of detection for species i. Both αi and βi are normally distributed, species-specific random effects such that αi − Nα, inline image) and βi − Nβ, inline image). We suppose here that the random effects are independent of one another, but the model can be extended to accommodate correlation between occurrence and detection probabilities (Dorazio & Royle 2005; Kéry & Royle 2008). As the only effect assumed present for detection probability pi is that of species identity, our model is equivalent to model Mh in a site-occupancy context (Otis et al. 1978). Key assumptions of the model are population closure (no gains or losses of species during the entire sampling period), that species identity is the only effect on pi, and that heterogeneity among species in occupancy and detection on the logit scale is adequately described by a normal distribution.

The implementation we adopt here is based on the data-augmentation framework developed by Royle et al. (2007). We augment the data set by physically adding an arbitrarily large number of all-zero observations (O – n in Fig. 1) corresponding to hypothetical, unobserved species. Conceptually, we can think of this as a superpopulation of species from which the sampled community was drawn. In a sense, there are two levels of sampling: in the first, for each species in the superpopulation, occurrence anywhere among the sampled sites is determined (each member of the supercommunity may be an element of the population actually exposed to sampling); and in the second, those species in the exposed population are subjected to repeated sampling by the survey. This allows us to formulate the model as a zero-inflated version of the complete data model. That is, we can formulate the model for the data as if all N species were observed, but include a zero-inflation parameter, Ω, to allow for excess zeros (non-occurring species) in the augmented data set. We also note that the mathematical justification for data augmentation is that a flat uniform prior for N on {0,1,2, ... , O} is equivalent to treating [N|Ω] as binomial(O,Ω), with zero-inflation parameter Ω having a uniform prior (W.A. Link, personal communication).

With only five parameters, this model represents a remarkably parsimonious community description. We have Ω and mean and variance of the two normal distributions used to specify species heterogeneity in occurrence (µα and inline image) and detection probability (µβ and inline image). For further description of the model, and of the data-augmentation technique, see Dorazio & Royle (2005); Dorazio et al. (2006) ; Kéry & Royle (2008); Royle et al. (2007).

fitting the model

We used WinBUGS 1·4 (Spiegelhalter et al. 2003) to fit the model using conventional methods of Markov chain Monte Carlo (MCMC). MCMC is a simulation tool to draw samples from complex joint probability distributions of the parameters in a statistical model that typically is intractable analytically (Brooks 2003; Link et al. 2002). Starting at some arbitrary initial values, the simulation chain moves through a transition phase and usually converges onto some stationary distribution. Samples from the transition phase are discarded as a ‘burn-in’. The stationary distribution is representative of the joint posterior distribution of the parameters; summaries of sample draws, such as mean and standard deviation, can be used as point estimates and standard errors. Bayesian analysis also requires quantification of the uncertainty about the likely values of the model parameters (specification of prior distributions). In this analysis, we used conventional vague priors that reflect our indifference about the parameter values: Ω ~ uniform(0,1); µα, µβ ~ normal(0,1000); 1/inline image, 1/inline image ~ gamma(0.01,0.01). For an accessible introduction to Bayesian inference, MCMC and WinBUGS for biologists, see Link et al. (2002); Brooks (2003); Ellison (2004).

We augmented the observed species-detection data with 100 all-zero histories, which was sufficient as shown by the posterior for total community size, which had an upper bound well below O = 203 in our analysis (see Fig. 3b). For the combined data set, we also ran the analysis with 200 all-zero histories added and obtained virtually identical estimates, again showing that our results were robust with respect to the choice of the number of all-zero histories added. After some experimentation with Markov chains of different lengths, we ran three parallel chains of 106 iterations with different initial values, discarded the first half and thinned by 50, which resulted in 30 k iterations used for inference. For analysis of the combined data set, only two parallel chains were run due to lack of sufficient computer memory. The Gelman–Rubin statistic (Gelman & Rubin 1992) indicated acceptable convergence for all chains (values of the Gelman–Rubin statistic for all parameters were <1·002) and for all three data sets. This statistic compares the variance within and among multiple Markov chains in a fashion similar to anova, and is equal to 1 at convergence.

Figure 3.

Posterior distribution of species richness under the HB model applied to the combined data set. (a) Local species richness in quadrat 557118 (39 species observed). (b) Total species richness in northern and western Switzerland (103 species observed).

Results

comparison of species counts, jackknife and hierarchical bayes estimates

Two independent observers detected 25–53 species per quadrat (totals 96 and 99; grand total 103). Four patterns were discernible when comparing raw species counts, jackknife and HB estimates (Table 1). First, counts were always lower than either estimate, emphasizing that mean species detectability was indeed <1. Second, counts and estimates for the two single-observer data sets consisting of three surveys were lower than for the combined data set consisting of all six surveys. Third, HB estimates were more precise (had narrower standard errors) than jackknife estimates. This was particularly striking for the combined data sets, but also held for the single-observer data sets. Finally, jackknife point estimates for the combined data sets were unreasonably high in some quadrats, e.g. 527174, 569198, 611254 (Table 1). Repeatability between two observers was the same for raw species counts and HB estimates (R2 ≥ 0·35 in both) but much lower for jackknife estimates (R2 = 0·16; Fig. 2).

Table 1.  Quadrat traits, species counts and estimates of avian species richness for the Swiss breeding birds in 26 quadrats
QuadratElevation%ForestC_1C_2C_BothJack_1SE_J1Jack_2SE_J2Jack_bothSE_JBHB_1SD_HB1HB_2SD_HB2HB_bothSD_HB_B
  1. Elevation (m asl); forest cover (%); C_1, C_2 and C_B denote raw species counts for first, second and combined data sets; likewise for jackknife (Jack) and hierarchical Bayes (HB) estimates. SE, standard errors; SD, posterior standard deviations.

5211505502443348524·57383·33575·1050·502·4937·872·3153·112·12
52717465060443750534·45403·048512·7150·722·3640·911·9654·522·02
5451507509434349452·62504·12502·9149·842·6347·532·0754·512·22
557118125058293839322·84423·21423·4036·232·6942·592·1245·032·32
55713435041444451462·63483·23553·6550·182·5048·141·9655·682·04
563142135064393643433·21403·04638·5444·562·4341·072·2048·482·22
5691826500454152513·79504·45749·4751·792·6346·572·2557·522·20
56919855025314149353·21535·208914·3238·702·7345·732·1254·502·19
575158115039393845484·69433·40617·8343·882·6042·392·0550·052·14
5751905505363540434·02393·33423·1443·332·7340·422·2746·402·39
5752064501313338363·33373·21454·8238·452·7438·272·2643·912·31
58720655053293235353·59352·90393·6336·832·8037·592·3041·652·43
59920655016383540454·04362·39412·8644·532·7140·702·3246·672·43
59922255082343037424·36312·40455·4940·312·6735·152·2043·082·36
60524675063354345403·50483·39493·6341·142·6546·931·9950·402·21
61125435019404751453·45574·787810·1046·702·6150·861·9555·892·13
6112702500252527281·51281·85311·8133·672·9131·632·4933·802·46
62319085043413041463·58302·14453·6246·802·4735·852·3546·882·27
6292147503323639352·90413·36402·8639·852·8241·122·2145·442·40
62924685046434953463·03553·79583·8549·292·5452·601·8658·122·13
65326245034535156634·78604·47583·1357·772·2654·141·7960·622·05
67125455068344042342·14402·14482·2442·002·8144·132·0247·472·23
6892148509504252604·88453·04605·8355·542·3946·112·0157·252·15
68924665019504454584·35472·84614·5655·542·4148·172·0258·512·04
69522295014404752453·58595·27646·4746·602·5950·751·8956·922·10
6952385509384244411·56442·66462·8945·262·7146·552·1049·922·30
Figure 2.

Repeatability between two observers of species counts (Count) and of estimates from the jackknife estimator (Jackknife) and from a hierarchical Bayes model (HB). Line indicates one-to-one correspondence of repeated estimates for the same quadrat.

inference under the hierarchical bayes model

The estimated number of species under the HB model ranged from 32 to 58 when considering three surveys (Table 1); accordingly, the average proportion of species present and detected by a single observer was estimated at 0·87 (range 0·74–0·94). The HB model enables both local and global estimates of species richness. For instance, in quadrat 557118, two observers between them detected a total of 39 species. The point estimate of species richness in that quadrat was 45·0 (Fig. 3a), and the 95% credible interval ranged from 41–50. This means it is very unlikely that all species present were detected in this quadrat; at least two and up to 11 species were probably overlooked by both observers. Also note that the posterior has no mass at <39 species: the information on the actual number of species observed is fully taken into account in the Bayesian analysis.

In addition, the combination of the analysis across all quadrats also permits an estimate of the number of species present in the area for which the sampled quadrats are a representative sample. Total species richness in the entire region was estimated at 113·1 (95% CI 106–123; Fig. 3b). As 103 species were observed, there were probably about 10 species overlooked or not present at any of the 26 quadrats, and this number could be as low as 3 or as high as 20.

Under the HB model, mean estimated species detection probability p for a single survey was, on average 0·52, but ranged from as low as 0·08 (European sparrowhawk, Accipiter nisus) to as high as 0·99 (chaffinch, Fringilla coelebs; blackbird, Turdus merula; Appendix S1 in Supplementary Material), emphasizing the strong heterogeneity among species in detection probability, some of which is due to heterogeneity in abundance among species (Royle & Nichols 2003). Similarly, some species were estimated to occur in essentially all quadrats, such as chaffinch and blackbird (estimated occupancy Ψ = 0·96, CI = 0·87–0·99), while others, such as great crested grebe (Podiceps cristatus) or whinchat (Saxicola rubetra), were estimated to occur in barely 6% of quadrats. As expected, observed occupancy rates were more strongly biased downwards for species with lower detection probability (Fig. 4). Most occupied quadrats were detected only for species with a per-survey detection probability of greater than about 0·5.

Figure 4.

Relationship between estimated absolute bias of observed occupancy after six surveys and estimated detection probability for 103 detected species.

Discussion

Species richness is a very widely used quantity in ecology and its applications, and in most cases is assessed at multiple sites. Previous approaches have either ignored detectability and used raw species counts, leading to questionable inference about patterns of absolute or even relative species richness; or have applied an estimation procedure such as the jackknife estimator to each site separately (Boulinier et al. 1998a,b; Kéry & Schmid 2006). This represented an inefficient use of available data, because all sites at which a particular species is observed provide information on its detection probability, and this is not accounted for in a piecemeal analysis. In addition, where very few species are observed at a site, no estimate at all may be possible. Finally, the jackknife estimator does not necessarily perform very well when the number of temporal replicates is small (e.g. <5; Burnham & Overton 1979). In contrast, the multispecies site-occupancy model of Dorazio & Royle (2005), which we present in a monitoring context here, integrates the information from all sites surveyed.

We compared the three approaches and suggest that the integrated model provides estimates that are probably less biased than raw species counts, owing to the fact that they correct for imperfect species detectability. Mean detectability was estimated at 0·87 in this study, comparable with the earlier estimate of 0·89 for a much larger sample of sites and years from the entire Swiss MHB (Kéry & Schmid 2006). Furthermore, the new estimates were more precise (had smaller SE) and consistent (had much greater repeatability in repeated samples) than the jackknife estimator of species richness. We believe that the new multispecies site-occupancy model has considerable potential for monitoring programmes, as well as in biogeography and community ecology. Here we discuss issues related to the definition and estimatability of species richness; field tests of models and the selection of an adequate species-richness measure; and assets and assumptions of the new model.

Interestingly, Gelfand et al. (2005) developed a similar model that expresses species richness as the sum of occurrences of individual species. The main difference from our approach is that they do not include unseen species in their estimate of species richness, but only those detected in at least one site. Furthermore, they use spatially replicated samples within their 1·55 × 1·85-km sampling units to inform detectability. This assumes that a species occurs in either 0 or 100% of the total area of a sampling unit and therefore confounds small-scale variation in occupancy and variation in detection.

what species richness?

In a setting such as our Swiss breeding bird survey, local and global species richness may be distinguished. The former are the 26 species totals, while the latter can consist of all species occurring in the particular 26 quadrats studied, or the total size of the community of which these quadrats are thought to be a representative sample (Dorazio et al. 2006). Implicit in this view is community closure: that there is a constant number of species occurring at a point in space that can be estimated consistently. Similarly, for the analogous problem of estimating population size, Dorazio et al. (2006) argue that once a geographical frame has been determined, the frequently used qualifier ‘local’ is redundant to ‘population size’. However, there is evidence that species richness is a more elusive concept, and that over longer periods, the total number of species detected in inventories continues to rise (Longino, Coddington & Colwell 2002 for tropical ants; D. Weber, personal communication for butterflies in the Swiss BDM). Hence there often appears to be a large pool of rare species that may continue to show up under repeated sampling. Thus community closure is best seen as an approximation valid over some time interval. The concept of species richness clearly requires specification of both a geographical reference and a time frame (Nichols et al. 1998a, 1998b).

That this time frame may be short can be seen in a comparison of the observed or estimated richness estimates from the single observer with that from the combined data set. In virtually all cases, the estimates were greater for six than for three surveys, suggesting that some species may move in and out of the sample quadrats, representing a form of temporary emigration. A single observer is therefore unlikely to sample exactly the same community (with respect to time) as both observers together.

Some have questioned whether species richness can be estimated at all (Link 2003; Mao & Colwell 2005; O’Hara 2005). We believe it is useful to distinguish two modes of species-richness estimation: absolute and relative. Some studies are interested in absolute richness. They are typically conducted at one or a few locations, often in hyperdiverse communities in the tropics, and aim to quantify the number of all species present (Novotny & Basset 2000; for tropical insects; Longino et al. 2002 for tropical ants). In contrast, most species-richness studies are conducted at many sites and are more interested in richness at one site compared with that at others, that is, in temporal or spatial species-richness patterns or patterns related to covariates such as elevation, latitude or habitat types. The monitoring background of our study is a perfect example of this. In addition, most comparative studies typically deal with fairly species-poor communities in less rich regions, where the pool of unsampled species will be much smaller than in tropical areas. In the relative mode of species-richness estimation, the aim is to eliminate to the extent possible nuisance effects of detectability that could induce spurious, or mask real, patterns in observed richness. In accordance with the suggestion that species-richness estimators can at best provide a lower bound for true richness (Mao & Colwell 2005), they are therefore perhaps best seen as improved measures of relative richness that improve on raw species counts by eliminating potential bias and noise due to patterns in detectability.

It is also important to note that the size of the detectable community is being estimated. If some habitats within sampling units are not sampled at all, then the community size estimate will exclude all species that occur only there. It is possible that an observer does not recognize all bird species; for instance, he or she may not hear the high-pitched calls of the two European Regulus species. Then the species-richness estimate will refer to the total size of the avian community minus two.

testing models and choice of species-richness estimators

In recent years there has been exuberant growth of models for estimation in animal and plant populations. As an example, even the landmark book by Williams et al. (2002) is already somewhat dated with respect to some recent developments in estimating species richness and site occupancy. Although many agree that quantities such as abundance or species richness need to be estimated, as they can only be imperfectly observed, there is also a need to test and compare models in practical applications (Conn et al. 2006). Our study aims to do so.

In theory, models could be tested in three ways. First, simulated data could be analysed, which has the advantage that truth is known. However, simulated data may not capture the full complexities of real-world data, therefore conclusions from simulation studies need to be complemented with some kind of field tests. Furthermore, it is usually known from theory that an estimator produces valid estimates if its assumptions are adequately met. Hence tests of models are actually tests of how well their underlying assumptions are met in a particular case. A second possibility is to test estimators on communities of known size. This has been done for classical abundance estimators (Conn et al. 2006), but is much more difficult for species richness. A third and indirect possibility is to apply the same estimator to repeated samples from the same community. O’Hara (2005) suggested that, in the absence of known truth, a useful criterion is whether independent samples give similar estimates. This is what we tested by comparing the estimates for two independent observers in the same quadrat.

We note that we cannot expect a zero difference between repeated estimates for the same quadrat; hence, even for a perfect estimator, two observers conducting three surveys each would not normally yield identical species-richness estimates for a quadrat. This is because it is most likely that the detectable community size would differ between them. Several factors may add variance in repeated counts of species: true presence/absence of species; virtual absence due to animal behaviour (e.g. strictly nocturnal species may not be detected during daytime surveys); virtual absence due to observer behaviour (e.g. a species unknown to an observer will not be part of his detectable community); overlooking of a species that is detectable in principle and known to the surveyor (narrow-sense detectability); and spatial coverage bias, for example when different sampling routes are chosen. This means that our repeatability estimates are underestimates if they are meant to characterize a particular method.

In this study, we compared three types of species-richness estimators between replicated samples in 26 quadrats: raw counts (which can be viewed as providing lower bound estimates) and two kinds of estimator of true species richness, the jackknife and the new hierarchical Bayes model developed by Dorazio & Royle (2005). Most practitioners would probably concede that correction of the observed number of species (raw counts) to account for undetected species is important, but would claim that true rendering of pattern (for example in space or over time) is more important than a fully unbiased estimate of species richness. In these cases, the comparison of jackknife estimates with raw counts would suggest that it is better to use raw counts because of their much greater repeatability. The advantage of the new integrated model is that bias is reduced owing to the correction for detection probability, but this does not entail costs in terms of reduced repeatability, as our study shows.

assets and assumptions of the new model

When the same kind of analysis is used for the same kind of data from multiple and comparable samples, it is intuitively clear that a combined analysis must represent the best use of the information available. For instance, in an earlier study (Kéry & Schmid 2006), we used the jackknife estimator for the entire Swiss MHB data (267 quadrats) over 3 years individually and then related the resulting detectability estimates to likely covariates, to identify patterns in detectability that might bias analysis of raw species counts. Our present study suggests that a combined analysis of the data using the new model would have yielded considerable gains in terms of the precision of the estimates of species richness or detectability, particularly for sites where few species were detected.

Additional benefits of the new model are species-specific estimates of occupancy and detection probability. Occupancy is an important metric on its own (MacKenzie et al. 2006) and may be interpreted as a proxy for abundance. Furthermore, based on the argument that greater occupancy will mean more local populations and more locally adapted genotypes, it may even be a proxy for genetic variation in naturally occurring animals or plants. In contrast to our earlier approach using the jackknife estimator (Kéry & Schmid 2006; Kéry & Plattner 2007), the new model also yields estimates of detection probability for each species (Appendix S1). This is important information for survey design planning and for the interpretation of existing inventories (Kéry 2002).

The presence–absence matrix, where rows are species and columns sites, has been called the fundamental unit of analysis in biogeography and community ecology (McCoy & Heck 1987; Gotelli 2000). We suspect that it has never been properly recorded in any biogeographical study: as absence cannot be observed without error, so the presence–absence matrix must be flawed. Rather, it is a latent, partially observable structure and must be estimated. The z-matrix of our model is exactly that: an estimate of the presence–absence matrix that accounts for false negative (detection) errors. It has huge potential for the above and related fields. For instance, row totals yield site-specific richness estimates and column totals the number of occurrences of each species, or, divided by the total number of sites sampled, an estimate of occupancy. Furthermore, species can be compared in terms of their occurrence at sites, and sites can be compared in terms of their species occurrences. The z-matrix provides for readily defined measures of similarity between individual species or individual sites (Dorazio & Royle 2005; Dorazio et al. 2006).

Of course, there are a few assumptions that must be met. First, quadrats must be a random or otherwise representative sample of the greater area about which inference is desired. Otherwise inference is restricted to the quadrats actually sampled. In our study, we believe that the quadrats sampled are representative for a large area north of the Swiss alps. Second, the community must be closed to additions and deletions of species during the time of sampling: the closure assumption. Known violations of closure, for instance owing to the staggered arrival of migrants, could be accommodated by turning some zeroes in the detection histories of some species into missing values, or by fitting a seasonal effect of time into detection probability. In our case, we did not account for this as our earlier analysis (Kéry & Schmid 2006) revealed that this hardly introduces any bias in our case. Third, any structure in detection probability must be modelled adequately. We discuss this in the previous section. Fourth, species heterogeneity in occupancy and detection probability, on the logit scale, must be represented adequately by a normal distribution, an assumption that could be tested. Other mixture distributions could be employed, although the ability to distinguish among them based on the actual data may be limited (Link 2003). Fifth, an important and probably underscrutinized assumption common to all methods of quantifying species richness in our study is that of no false-positive errors: species detections can arise only when a species is actually present. There is growing evidence that such misclassification error may occur and, more worryingly, its violation has strong effects on occupancy estimates in single-species models (Royle & Link 2006). Similar effects are likely for the multispecies site-occupancy model featured here, especially in surveys that aim to detect about 150 species, such as MHB. That violation of this assumption is also a problem for all other approaches should be no consolation, and every attempt must be made to use observers who are as well trained as possible. In conclusion, before application of our (or similar) models to single- or multiple-species occupancy data, these assumptions must be scrutinized thoroughly. Finally, it is hard to overstate the value of effort spent at the design state of such studies (MacKenzie & Royle 2005; Bailey et al. 2007).

Acknowledgements

We thank the many dedicated and excellent volunteers who conduct the MHB field work each year. H. Schmid was very helpful, as always, in answering our questions about the MHB. L. Jenni, D. Weber, as well as N.G. Yoccoz and another referee, provided valuable comments on the manuscript. We also thank the Swiss Biodiversity Monitoring programme (BDM) of the Swiss Federal Office for the Environment (FOEN) for allowing us to use BDM data.

Ancillary