Accounting for non-independent detection when estimating abundance of organisms with a Bayesian approach

Authors


Correspondence author. E-mail: julien.martin@myfwc.com

Summary

1. Binomial mixture models use repeated count data to estimate abundance. They are becoming increasingly popular because they provide a simple and cost-effective way to account for imperfect detection. However, these models assume that individuals are detected independently of each other. This assumption may often be violated in the field. For instance, manatees (Trichechus manatus latirostris) may surface in turbid water (i.e. become available for detection during aerial surveys) in a correlated manner (i.e. in groups). However, correlated behaviour, affecting the non-independence of individual detections, may also be relevant in other systems (e.g. correlated patterns of singing in birds and amphibians).

2. We extend binomial mixture models to account for correlated behaviour and therefore to account for non-independent detection of individuals. We simulated correlated behaviour using beta-binomial random variables. Our approach can be used to simultaneously estimate abundance, detection probability and a correlation parameter.

3. Fitting binomial mixture models to data that followed a beta-binomial distribution resulted in an overestimation of abundance even for moderate levels of correlation. In contrast, the beta-binomial mixture model performed considerably better in our simulation scenarios. We also present a goodness-of-fit procedure to evaluate the fit of beta-binomial mixture models.

4. We illustrate our approach by fitting both binomial and beta-binomial mixture models to aerial survey data of manatees in Florida. We found that the binomial mixture model did not fit the data, whereas there was no evidence of lack of fit for the beta-binomial mixture model. This example helps illustrate the importance of using simulations and assessing goodness-of-fit when analysing ecological data with N-mixture models. Indeed, both the simulations and the goodness-of-fit procedure highlighted the limitations of the standard binomial mixture model for aerial manatee surveys.

5. Overestimation of abundance by binomial mixture models owing to non-independent detections is problematic for ecological studies, but also for conservation. For example, in the case of endangered species, it could lead to inappropriate management decisions, such as downlisting. These issues will be increasingly relevant as more ecologists apply flexible N-mixture models to ecological data.

Introduction

Recently developed binomial mixture models use repeated count data to estimate abundance (Royle 2004a,b). These models are becoming increasingly popular because they provide a simple and cost-effective way to account for imperfect detection (Kéry 2008). Accounting for detection probability is important to obtain reliable estimates of abundance and to make appropriate inferences (Williams, Nichols, & Conroy 2002). However, the standard binomial mixture model assumes that organisms are detected independently of each other (Royle & Dorazio 2008). This assumption may be violated when there is some correlation in the behaviour of animals, which can affect their probability of being detected by observers. For instance, several monitoring studies of birds and amphibians have used acoustic signals as a way to estimate abundance (Dawson & Efford 2009; McClintock et al. 2010), but it is reasonable to imagine that when one bird sings, its behaviour will influence the singing behaviour of its neighbours. In this case, these birds may not be detected independently of each other. Another example is the case of marine mammals that are monitored with aerial surveys (Edwards et al. 2007). In these surveys, the animals become available for detection when they come to the surface to breathe (Edwards et al. 2007). Our study was motivated by a monitoring program of manatees in Florida (Trichechus manatus latirostris), which are surveyed at warm water outfalls of power plants and in coastal areas (Edwards et al. 2007). In most instances, the water is too murky or too deep for observers to count manatees that are resting deep under the surface. It has been suggested that manatees may come to the surface in groups (Edwards et al. 2007).

Behaviour such as movement in groups (e.g. manatees surfacing in groups) may induce a correlation in detection probability of individuals. This may lead to biased estimates in a model that assumes independence of detections, such as the standard binomial mixture model. The beta-binomial distribution has been proposed as a method to account for correlated Bernoulli outcomes (Hisakado, Kitsukawa, & Mori 2006).

Here, we develop beta-binomial mixture models that account for correlated Bernoulli outcomes in the data (i.e. correlated detections of individuals) and examine the performance of the standard binomial and the beta-binomial mixture models under a range of scenarios including correlated outcomes, heterogeneity in detection, and zero inflation in abundance. We consider heterogeneity in capture probabilities and different levels of correlations (among Bernoulli outcomes) across sites. Zero inflation of abundance may result from the fact that in some ecological applications, many sampling units are unoccupied by the organism of interest (e.g. Wenger & Freeman 2008). We apply the binomial and beta-binomial mixture models to a case study with manatee aerial surveys and discuss the benefits and limitations of these approaches. Understanding how different sources of variation can affect estimates under N-mixture models (e.g. standard binomial and beta-binomial mixture models) is important given that these models are increasingly used for ecological research and also for decision-making regarding threatened and endangered species (e.g. Fonnesbeck et al. unpublished; Johnson 2010).

Methods

Standard binomial and beta-binomial mixture models

We compared the performance of the standard binomial mixture model and beta-binomial mixture models (we refer to them collectively as N-mixture models) to infer abundance from simulated repeated count data (see columns of Table 1). The biological variable (or biological process) in our case was the number of individuals (N) per sampling unit (indexed i) and was assumed to follow a Poisson distribution (Royle & Dorazio 2008):

image(eqn 1)
Table 1.   Estimates of λ, p, ρ, relative bias of abundance (inline image) and 95% confidence intervals (CI) for four estimation approaches (binomial mixture model (Bin mix); beta-binomial mixture model (BB mix)) applied to simulated data that followed four distributions (binomial (B); beta-binomial (BB); beta-binomial with heterogeneity (BBH); and zero-inflated beta-binomial (ZBB)). These results are based on 500 simulations (except for scenario [14], which was based on 9 simulations out of a total of 50 [because of convergence issues]). inline image (Brooks-Gelman-Rubin convergence diagnostic) value was <1·2 for all the results presented in the table. Each scenario is indicated by a number within square brackets. Biases associated with first count and maximum counts are also presented.
Estimation approachSimulated data
B: λ = 4·6BB: λ = 4·6BBH: λ = 4·6ZBB: λ = 4·6; ψ = 0·25; λψ = 1·15
  1. FC, first count; MC, maximum counts.

  2. The ‘*’ indicates that this analysis was based on only nine simulations because the model converged only for 18% of the simulated data sets (i.e. only 9 out of 50 simulations converged)

B mix
 λ[1]4·68 (4·11, 5·37)[5]10·26 (7·87, 13·62)[9]6·27 (5·24, 7·72)[13]1·07 (0·77, 1·40)
 p0·50 (0·44, 0·56) 0·23 (0·17, 0·29) 0·37 (0·30, 0·44) 0·54 (0·46, 0·62)
 ρ       
 inline image0·01 (−0·10, 0·16) 1·23 (0·75, 1·89) 0·36 (0·17, 0·65) −0·08 (−0·18, 0·03)
BB mix
 λ[2]4·30 (3·79, 4·90)[6]4·69 (4·17, 5·34)[10]3·98 (3·62, 4·38)[14]3·46 (1·78, 4·34)*
 p0·55 (0·50, 0·62) 0·50 (0·42, 0·55) 0·57 (0·53, 0·60) 0·13 (0·1, 0·2)*
 ρ0·03 (0·01, 0·07) 0·30 (0·22, 0·37) 0·35 (0·28, 0·41) 0·58 (0·5, 0·6)*
 inline image−0·06 (−0·16, 0·06) 0·02 (−0·08, 0·16) −0·14 (−0·19, −0·08) 2 (0·55, 2·78)*
FC
 inline image[3]−0·50 (−0·53, −0·47)[7]−0·50 (−0·55, −0·45)[11]−0·50 (−0·55, −0·46)[15]−0·50 (−0·58, −0·41)
MC
 inline image[4]−0·32 (−0·34, −0·29)[8]−0·22 (−0·25, −0·18)[12]−0·27 (−0·30, −0·23)[16]−0·22 (−0·28, −0·15)

where λ is the mean number of individuals per sampling unit. The observation process in this case was the number of individuals detected during repeated count yit at site i during time t (i.e. for each repeated count). This process was assumed to follow a binomial process, implying that individuals are detected independently of each other:

image(eqn 2)

where pit is detection probability at site i during survey t. In our case, pit was constant.

Alternatively, we simulated the influence of correlation on detection by defining pit such that:

image(eqn 3)

This results in a beta-binomial model for the observations, which can be written as:

image(eqn 4)

where ρ is a correlation parameter (Hisakado, Kitsukawa, & Mori 2006), such that

image(eqn 5)

The beta-binomial model can be used to model heterogeneity in detection probabilities between groups or correlation within groups. In biological applications, we can view ρ as a measure of the correlation in animal behaviour or characteristics of a site such as habitat that could affect detection. However, these two sources of non-independent detection are not distinguishable when analysing real data with beta-binomial mixture models. We note that pit (in eqn 3 and 4) comes from a single beta distribution, with no distinction between repeated samples at the same site and different sites. As explained in section Characterization of estimator bias, we used a Bayesian approach to estimate the parameters of interest. The likelihood for the binomial and beta-binomial mixture models are presented in the WinBUGS specification available online (Appendix S1 and S2).

Simulation study

We simulated 500 datasets with λ = 4·6 under four different data-generating models: the (standard) binomial, the beta-binomial, a beta-binomial with heterogeneity in detection and the zero-inflated beta-binomial. Values of λ (the mean number of individuals per sample unit) and p (detection probability) were based on our analyses of aerial surveys of manatees (see section Analysis of aerial survey data of manatees in Florida). For each data set, we assumed 200 sites with three replicate surveys. In the binomial model, we set p = 0·5. For the beta-binomial model, we set α = β = 1·167, which resulted in correlation, ρ = 0·3 and detection, p = 0·5. To simulate data with heterogeneity in the detection process, the data from one half of the sites (100) followed a beta-binomial distribution with p = 0·3 and ρ = 0·3 and from the other half a beta-binomial distribution with p = 0·7 and ρ = 0·15.

To simulate zero-inflated beta-binomial data, we added a ‘suitability’ parameter, ω, to our data-generating model such that:

image(eqn 6)

where zi ∼ Bernoulli (ω) is a binary variable indicating the latent suitability (yes or no) of a site for a species and ω is the probability that a site is suitable in principle.

We set ω = 0·25, which seemed reasonable based on prior information available from a pilot study of manatees using aerial surveys (see section on analysis of manatee data).

Each of the four simulated data types were analysed using binomial mixture (Royle 2004a; Royle, Nichols, & Kéry 2005) and beta-binomial mixture models and using two count indices (see rows of Table 1).

Truncation

One approach to dealing with zero-inflated data is simply to remove all zeros from the data set and apply the truncated version of a standard distribution model. This approach may be sensible when the probability of not seeing any individuals after multiple surveys is small, and if λ (eqn 1) is large enough so that the probability of obtaining a zero from the Poisson distribution is small. To evaluate the potential bias because of truncation, we (i) simulated data under a binomial distribution (eqn 1, 2) and fitted a binomial mixture model and (ii) simulated data under a beta-binomial distribution (eqn 1, 4) and fitted a beta-binomial mixture model to these data. In both cases, all zeros were removed from the data before fitting these models. However, in the case of truncation, it is better to assume that abundance Ni follows a zero-truncated Poisson distribution instead of a Poisson distribution (see eqn 1), to account for the truncation in the data. We used this approach for the analysis of aerial survey data of manatees.

Characterisation of estimator bias

We estimated Nj, the total abundance over all sites in simulation j, from the j = 1…500 replicated data sets and computed inline image based on the results for each model. inline image was computed as a posterior mean using a Bayesian approach. Assuming that n sites are surveyed (n = 200 in all simulated analyses):

image(eqn 7)

where Nij is the abundance at site i for simulated data set j.

We estimated the relative bias inline image of each estimator for each simulated data set j according to:

image(eqn 8)

where Nj corresponds to the ‘true’ abundance for each simulated data set j; inline image is the estimate of Nj. For each analysis, we presented the average relative bias: inline image. If inline image > 0, the estimator was positively biased (i.e. overestimated the true abundance), whereas inline image < 0 indicated that the estimator was negatively biased (i.e. underestimated the true abundance). We also present the 2·5 and 97·5 percentiles of the relative bias for each 500 replicated data sets. For simplicity, we referred to this interval as 95% confidence interval (hereafter simply noted as 95% CI).

We also computed inline image and 95% CI for the first count (FC) and maximum counts (MC). FC was obtained by summing yi during the first visit of the repeated count survey, simulating a situation where only a single, unreplicated survey was made at each site. MC was obtained by summing the highest yi of all the visits, a common ‘naive’ estimator of abundance with replicated counts that does not account for imperfect detection. Even though it is well known that FC and MC lead to estimates that are negatively biased, these types of surveys are still commonly found in the ecological and conservation literature (Martin, Kitchens, & Hines 2007).

All data sets were simulated in program R version 2·9 (R Development Core Team, 2010). We fitted all of our models using the Bayesian approach and Markov chain Monte Carlo (MCMC) simulation methods (e.g. Link et al. 2002). We ran three parallel chains, each with 14 000 iterations and discarded the first 4000 (the so-called ‘burn in’) iterations.

We ran three chains with initial values picked randomly from their priors for each parameter. We assessed the chains’ convergence to their stationary distributions using the Brooks–Gelman–Rubin diagnostic (also called inline image, Gelman et al. 2004). We used conventional vague priors for all parameters: a gamma distribution for λ (eqn 1), α (eqn 3), β (eqn 3) and ‘flat’ uniform distributions for all other parameters (see also online material Appendix S1). All data sets (simulated and real) were analysed with WinBUGS version 1·4 (Lunn et al. 2000) in batch mode using the R package R2WinBUGS (Sturtz, Ligges, & Gelman 2005; see also Kéry 2008, for similar approach).

Example: manatee data

We applied the (standard) binomial mixture model and beta-binomial mixture model to aerial surveys of manatees in South Florida. These data came from a pilot study conducted by the Fish and Wildlife Conservation Commission. Our purpose was not to provide a detailed analysis of this data set. Instead, we intended to illustrate the application of N-mixture models to real data. We expected manatees to come to the surface in a correlated manner (i.e. in groups), which would result in non-independent detection of individuals.

Study area

The waters of eight southwest Florida counties extending from Pinellas County in Tampa Bay to Monroe County, Everglades, were included in surveys conducted in the winter of 2009. Estuaries, rivers, creeks and coastline (to about 500 m offshore) were included as potential survey sites.

Survey method

Potential manatee habitats within the eight counties were included in the sampling frame. We used 358 sample units (plots) that were drawn randomly from the sampling, survey plots were approximately 1·3 km2 in size. The surveyed plots selected for this analysis were surveyed three times each. This approach consisted of flying three replicate consecutive passes over a given sample unit (plot), during which the number of individuals were counted and recorded independently for each count.

Statistical analyses

Because our manatee data set included a large number of sites without observations, we fitted the beta-binomial mixture model to a data set in which we removed all sites with only zero observations. This was reasonable in our case because the probability of a site being occupied by at least one manatee, given that it was not detected after three surveys, was small (0·0046, see online material Appendix S3 for how this probability was estimated). There were 89 sites with at least some non-zero counts after truncating the data. For comparison, we also applied the (standard) binomial mixture model to the same data set.

To account for the variation in size of the sampling units, we used the log of the size of the sampling unit as an additive offset. In this case, we assumed that the number of individuals at each sampled site followed a Poisson distribution with mean λAi, where Ai is the size of the sampling unit i:

Ni ∼ Poisson (λAi)

(see Appendix S4 to see how to implement the offset in WinBUGS).

We also included binomial and beta-binomial mixture models that explicitly accounted for the truncation in the data by assuming abundance for each plot to follow a zero-truncated Poisson distribution (Appendix S5). However, for simplicity in this later analysis we did not use the log of the size of the sampling unit as an offset.

Goodness-of-fit

To assess the fit of the binomial and beta-binomial mixture models, we used posterior predictive distributions (see Gelman, Meng, & Stern 1996; Gelman et al. 2004; Kéry & Schaub, in press. ). We conducted posterior predictive checks to evaluate whether the models considered could likely have generated data sets similar to our observed data. This procedure uses parameter values estimated from the observed data from the model to be evaluated to generate simulated data sets. Chi-square or similar fit statistics are computed to quantify the lack of fit both for the observed data and for the simulated data sets, and a Bayesian P-value is calculated by comparing how many times (among the available number of MCMC iterations) the fit statistic for the simulated, ‘perfect’ data sets are greater than the fit statistics for the observed data. The simulated data sets can be viewed as ‘perfect’ because they conform exactly to the assumptions made by the model that is used for inference about the observed data. A Bayesian P-value close to 0·5 indicates a model that appears to fit the data. In contrast, a P-value close to 0 or 1 would suggest a model that does not adequately fit the data (in Appendix S5, we provide the codes for the goodness-of-fit procedure).

Results

N-Mixture models

As expected based on the previous work, binomial mixture models performed well when the data followed a binomial distribution (Table 1, scenario [1], scenarios are noted by square brackets in Table 1, see also Appendix S6 for a graphical representation of the relative bias). However, the model yielded estimates that were positively biased when the data followed a beta-binomial distribution (Table 1, scenario [5]).

We found that the beta-binomial mixture model performed well when the data followed a beta-binomial distribution (Table 1, scenario [6]). When the data followed a binomial distribution, the chains for the beta-binomial mixture model did not mix well. This can be explained by the fact that the parameter ρ for the beta-binomial can be close to 0 but cannot be exactly 0; yet, as the parameter ρ for the beta-binomial distribution approaches 0, the distribution becomes more similar to a binomial distribution. Note that although the beta-binomial formulation as a mixture necessitates ρ >0, the distribution itself can extend to include zero and some limited negative values of ρ (Prentice 1986). Based on estimates only for MCMC runs that converged (i.e. where inline image < 1·2), the abundance estimates were negatively biased but not by very much (Table 1, scenario [2]). If the data followed a beta-binomial distribution with heterogeneity, estimates from the beta-binomial mixture model were negatively biased (Table 1, scenario [10]). Analysing data that followed a beta-binomial distribution with heterogeneity with a binomial mixture model resulted in a substantial reduction in the positive bias when compared to data that followed a beta-binomial distribution without heterogeneity (Table 1, scenario [9]). When the data followed a zero-inflated beta-binomial distribution, it was generally not possible to obtain estimates from the beta-binomial mixture model implemented in WinBUGS (the chains did not converge or WinBUGS displayed an error message). For the few simulated data sets (9 out of 50 simulated data sets) for which it was possible to obtain an estimate of abundance, there was a large positive bias (Table 1, scenario [14], see Appendix S6). There may be a problem of parameter redundancy, where the zeros from the zero inflation are not distinguishable from zeros that could result from high correlation or a low detection probability. In contrast, the binomial mixture model converged when the data followed a zero-inflated beta-binomial and resulted in biased estimates, but the level of bias was relatively small (Table 1, scenario [13]). In addition, we note that for certain combinations of p and ρ, we were not able to get convergence with WinBUGS.

The bias because of the truncation of the data (i.e. removing all zeros from the data before applying N-mixture models) was relatively small when λ = 4·6. Indeed, after truncating the data, the relative bias was 0·05 (−0·08, 0·22) for scenario [1] and 0·05 (−0·08, 0·27) for scenario [2]. Not surprisingly, the bias for the count indices (FC and MC) was always negative (Table 1).

Analysis of aerial survey data of manatees in Florida

Based on the beta-binomial mixture model (with abundance assumed to follow a Poisson distribution), the posterior mean for λ was 4·53 (95% CI: 3·78, 5·69), p was 0·52 (95% CI: 0·41, 0·61) and ρ was 0·35 (95% CI: 0·25, 0·45). Based on the binomial mixture model (with abundance assumed to follow a Poisson distribution), the posterior mean of λ was 4·56 (95% CI: 3·97, 5·21) and p was 0·55 (95% CI: 0·49, 0·61). When we assumed that abundance followed a zero-truncated Poisson distribution instead of a Poisson distribution, the beta-binomial mixture model estimated that the posterior mean for λ was 4·150 (95% CI: 3·52, 5·07), p was 0·56 (95% CI: 0·45, 0·64) and ρ was 0·33 (95% CI: 0·22, 0·45) [see Appendix S7 to see the posterior distribution of λ, p and ρ]. Whereas the binomial mixture model (with abundance assumed to follow a zero-truncated Poisson distribution) estimated the posterior mean of λ was 4·6 (95% CI: 4·36, 5·33) and p was 0·54 (95% CI: 0·48, 0·60).

The goodness-of-fit procedure resulted in a Bayesian P-value of 0·0 when fitting the binomial mixture model to the manatee data, whereas the P-value was 0·53 when the beta-binomial mixture model was fitted. These results suggest that the binomial mixture did not adequately fit the data, whereas there was no evidence of lack of fit for the beta-binomial mixture model.

Discussion

Our simulation study suggests that standard binomial mixture models (Royle 2004a) may be very sensitive to the violation of the assumption of independent detection of individuals, which in practice can be induced by correlated behaviour of organisms. We found that even moderate levels of correlation can lead to substantial underestimation of detection probability and therefore an overestimation of abundance. The fact that the binomial mixture model is biased when the data follows a distribution that is different from the one assumed by the model is hardly surprising; the more important issue is how substantial the bias is. In our case, we found that the bias of the binomial mixture was so large when the data followed a beta-binomial distribution that this model could not be considered particularly useful (see scenario [5] in Table 1, the relative bias was 1·23, which clearly is unacceptable).

This can have important implications, particularly when we are interested in estimating the abundance of species that are rare or endangered for the purposes of management and conservation. Not surprisingly, not accounting for heterogeneity in the data, if it was in fact present, resulted in negatively biased estimates when analysing the data with beta-binomial mixture data. This can be explained by the fact that organisms with higher detection probabilities are detected more often, which results in estimates of detection probabilities that are positively biased, and therefore the estimate of abundance end up being negatively biased (Williams, Nichols, & Conroy 2002). Interestingly, estimates of abundance under the standard binomial mixture model obtained from data simulated under a beta-binomial data with heterogeneity were much less biased than in the absence of heterogeneity in the data. This reduced bias may be explained by two counteracting sources of biases partly cancelling each other. Indeed, as just explained analysing beta-binomial data (without heterogeneity) with a (standard) binomial mixture model yields estimates that are positively biased (this study), whereas heterogeneity in detection probabilities leads to a negative bias. Similarly, analysing zero-inflated beta-binomial data with binomial mixture resulted in a relatively small bias, whereas the binomial mixture model resulted in a large positive bias in the absence of zero inflation in the data. As expected, the binomial mixture model resulted in a negative bias if the data followed a zero-inflated binomial distribution, and the bias was larger (inline image = −0·27, 95% CI: −0·34, −0·19) than when analysing zero-inflated beta-binomial (see scenario [13] in Table 1). This again points to the compensation of two different sources of biases. This does not diminish the importance of considering these sources of bias, but instead emphasizes that sources of biases that partially cancel each other may give the misleading impression that an estimator is unbiased or only slightly biased. This can be a serious problem because one would not be able to assess the reliability of the estimator, which of course would limit its usefulness. For instance, if one cannot determine the level of heterogeneity in detection within and among plots in a real data set, then it would be difficult to determine the level of potential bias from the estimator. In other words, we would not be able to know whether the bias is small or large; but our simulation work showed that if the heterogeneity within plots was large (and the heterogeneity among plots was negligible), the bias could potentially be very large.

To illustrate the relevance of our analyses for real ecological data, we analysed aerial survey data on manatees with binomial mixture and beta-binomial mixture models. Because the probability of a site being occupied by at least one manatee, given that it was not detected after three surveys, was close to 0 (0·0046, see Appendix S3), we applied the beta-binomial mixture model to a data set in which we removed all sites with zeros. Such a truncation can be problematic when λ is small, but becomes less problematic as λ increases, because the probability of obtaining zero from the Poisson distribution decreases as λ increases. Based on simulations of truncated data, we found that the bias was relatively small when λ was 4·6. To account for the truncation, we also included models that assumed abundance per plots to follow a zero-truncated Poisson distribution. In this case, the estimate of λ was 4·15 for the beta-binomial mixture and 4·6 for the binomial mixture. The estimate of the correlation parameter based on the beta-binomial mixture model was positive and could be interpreted in two ways: (i) indicated some positive level of correlation as was expected by biologists (H. Edwards, pers. obs., & C. Deutsch, pers. comm.) or (ii) indicated some heterogeneity between groups. However, estimates of abundance under the binomial mixture and beta-binomial mixture models were quite similar (though, the standard binomial model led to greater values of λ than the beta-binomial mixture model when we assumed abundance to follow a zero-truncated Poisson distribution). Our simulation work suggests that if the manatee data strictly followed a beta-binomial distribution with ρ = 0·3 and p = 0·5, the binomial mixture model would lead to estimates with a large positive bias. However, the presence of unaccounted heterogeneity in the data would reduce the bias significantly. Thus, unmodeled heterogeneity may explain the observed results. We would like to emphasize that the purpose of analysing the aerial survey data of manatees was not to provide a rigorous analysis for determining the abundance of manatees in a particular region. Instead, our goal was to illustrate the importance of evaluating the potential bias associated with various sources of bias when analysing real data.

We also encourage users of N-mixture models to check the goodness-of-fit of their models. In our case, we found evidence that the standard binomial mixture model was inappropriate for the manatee data, but that there was no evidence of lack of fit for the more-general beta-binomial mixture model. Of course, the goodness-of-fit procedure that we applied does not guarantee that the model actually fits the data, but it is nonetheless a good warning sign when it shows evidence of lack of fit.

Conclusion

We believe that correlated behaviour that can potentially influence detection of organisms may be fairly common in nature. We have mentioned the case of manatees and possibly other marine mammals, which are typically surveyed from aerial surveys (Edwards et al. 2007) and may come to the surface in a correlated manner. We also mentioned the case of birds and amphibians that are often detected because of their calls (Dawson & Efford 2009; McClintock et al. 2010), but of course the violation of the assumption of independent detection may arise in other ways. For example, the behaviour of the observer may lead to non-independent detection of individuals, if he/she focused more attention to areas where they have already observed the organisms of interest. Therefore, we suggest that investigators carefully consider these possibilities when analysing their data. This message should be particularly relevant to researchers who have applied or plan on applying N-mixture models to endangered species. Indeed, presenting estimates that are positively biased is especially problematic for rare species, because it can lead to inappropriate management decisions (e.g. downlisting).

Our study provides ecologists with an approach to deal with data that may be affected by correlated behaviour of organisms and provides a way to quantify the extent of the bias when certain assumptions are violated (e.g. independent detection or zero inflation). We also emphasize the value of complementing the analysis of an actual data set with simulations that approximate the conditions of their study system before analysing their data. This may give much insight into both the system analysed as well as the performance of the intended inference framework under the actual conditions (e.g. given the actual number of sites and replicated surveys). As illustrated by our own study, this may be a good way to identify potential problems with the models considered, the data or even with the statistical software used.

Acknowledgements

We thank R. Dorazio, C.J. Fonnesbeck, G. McRae and two anonymous reviewers for their valuable comments and insights.

Ancillary