Correspondence site: http://www.respond2articles.com/MEE/

# Design of occupancy studies with imperfect detection

Article first published online: 9 MAR 2010

DOI: 10.1111/j.2041-210X.2010.00017.x

© 2010 The Authors. Journal compilation © 2010 British Ecological Society

Additional Information

#### How to Cite

Guillera-Arroita, G., Ridout, M. S. and Morgan, B. J. T. (2010), Design of occupancy studies with imperfect detection. Methods in Ecology and Evolution, 1: 131–139. doi: 10.1111/j.2041-210X.2010.00017.x

#### Publication History

- Issue published online: 4 MAY 2010
- Article first published online: 9 MAR 2010
- Received 7 December 2009; accepted 2 February 2010 Handling Editor: Robert P. Freckleton

- Abstract
- Article
- References
- Cited By

### Keywords:

- imperfect detection;
- occupancy;
- small sample size;
- study design

### Summary

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

**1.** Occupancy is an important concept in ecology. To obtain an unbiased estimator of occupancy it is necessary to address the issue of imperfect detection, which requires conducting replicate surveys at the sites being sampled. As the allocation of total effort can be done in different ways, occupancy studies should be designed carefully to ensure an efficient use of available resources.

**2.** In this paper we address the design of single-season single-species occupancy studies with a focus on: (1) issues relating to small sample sizes and (2) the potential relevance of including the precision of the detectability estimator as a criterion for design. We explore analytically the model with constant probabilities and examine how bias and precision are affected by the numbers of sites and replicates used.

**3.** We show how, for small sample sizes, the estimator properties depart from those predicted by large sample approximations, emphasize the need to use simulations when designing for small sample sizes and provide a new software tool that can assist in this process.

**4.** We offer advice on the amount of replication needed when the probability of detection is a quantity of interest and show that, in this case, it is more efficient to reduce the number of sites and increase the amount of replication per site compared with situations where only occupancy is of concern.

**5.** *Synthesis and applications*. It is essential to have clearly stated objectives before starting a study and to design the sampling accordingly. As the allocation of effort into replication and sites can be done in different ways, occupancy studies should be designed carefully to ensure an efficient use of available resources. To avoid waste, it is crucial to anticipate the quality of the estimates that can be expected from a particular study design. The discussion and guidance provided here is of special interest for those designing occupancy studies with small sample sizes, something not uncommon in the context of ecology and conservation.

### Introduction

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

Occupancy, defined as the proportion of sites occupied by a species, is a state variable commonly used in ecology for the modelling of habitat relationships, metapopulation studies and wildlife monitoring programmes. When species detection is imperfect, occupied sites may be classified as unoccupied based on survey data. If not accounted for, these false absences lead to underestimates of occupancy. The issue of imperfect detection in the context of occupancy studies has received much attention in recent years. MacKenzie *et al.* (2002) presented a modelling approach for addressing the simultaneous estimation of occupancy and detectability which has since been developed in a number of ways including extensions to cover multiple seasons (MacKenzie *et al.* 2003), multiple species (MacKenzie, Bailey, & Nichols 2004) and heterogeneity in detection probability (Royle 2006). To account for imperfect detection when modelling occupancy, replicate surveys have to be carried out at sampled sites. Replication is commonly achieved by conducting repeated surveys at different points in time or by surveying different sectors of each sampled site. Other methods include independent surveys carried out by different observers within a single visit or the simultaneous use of independent detection methods. The need for replication creates a trade-off between the number of sites to survey and the number of replicate surveys to carry out per site.

Several papers have addressed the issue of study design in the context of occupancy modelling. MacKenzie *et al.* (2002), Tyre *et al.* (2003) and Field, Tyre, & Possingham (2005a) provided some guidance on the number of replicate surveys needed based on simulations. MacKenzie & Royle (2005) presented the first detailed investigation on this subject, giving advice on general issues and providing specific recommendations for the most efficient allocation of survey effort under three sampling schemes and different cost function scenarios. They based their guidance on analytic results obtained by considering the large sample properties of the maximum-likelihood estimator for occupancy probability under a model with constant probabilities of occupancy and detectability. Bailey *et al.* (2007) later described a software tool developed for exploring design trade-offs for different occupancy models, either using analytic approximations or simulations. They presented an example and noted that the use of simulations is important when working with small sample sizes.

Small sample sizes are not uncommon in ecological studies. In particular they are frequently encountered in surveys linked to conservation projects, as these often have limited resources and tend to focus on rare species. Pilot studies, by their nature, also tend to deal with relatively small amounts of data. Under these circumstances the large sample approximations may be poor. In our experience, the effects of working with small sample sizes are not always addressed in practice and the use of simulations as a tool for assisting study design appears not to be widespread.

While for many studies the primary object of inference is the probability of occupancy, with the probability of detection being regarded merely as a nuisance parameter, there are circumstances when the latter is a quantity of interest in its own right. For instance, this is the case when the estimates obtained from a (pilot) study are to be used as input for the design of subsequent monitoring protocols (e.g. Field *et al.* 2005b; Pellet 2008) or when there is interest in evaluating the performance of detection methods (e.g. Mortelliti & Boitani 2008). Detectability may also be of interest when it reflects some important characteristic of the ecological system. For example, it could be associated with reproduction (Best & Petersen 1982). Detectability estimates provide information on the number of times that a site needs to be visited before stating with a given degree of certainty whether the species of interest is present or absent at that particular location. This information can be especially relevant in the context of environmental impact assessments. Under these scenarios there is a benefit in obtaining a precise estimate of detection probability.

In this paper we address the design of single-season single-species occupancy studies with a focus on: (1) issues relating to small sample sizes and (2) the potential relevance of including the precision of the detectability estimator as a criterion for design. We investigate analytically the quality of the maximum-likelihood estimators for the occupancy model with constant probabilities of occupancy and detection. We also show how bias and precision are affected by the number of sites and replicates employed and illustrate how the predictions made by large sample theory diverge from the actual distribution of the estimator when sample sizes are small. We discuss how studies are designed using recommendations based on asymptotic approximations and provide guidance to assist survey design when detection probability is a parameter of interest. Finally, we describe the design procedure with an emphasis on the need to use simulations as a tool for sampling design when the sample size is small and provide a numerical example to illustrate the steps. In this context we present a new software application (Single-season Occupancy study Design Assistant, soda) that can assist in the process by automating the search for a suitable design.

### Modelling occupancy under imperfect detection: estimator properties

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

The detailed formulation of occupancy models with imperfect detection is well covered in the literature (e.g. MacKenzie *et al.* 2006); so, here we limit the description to key aspects relevant to our analysis. Let *ψ* be the probability of occupancy, *p* the probability of detection, *S* the number of sites to be surveyed and *K* the number of replicate surveys per sampling site. We assume that both occupancy and detection probabilities are constant in time and space. Although in practice this simplification may not always be reasonable, it is necessary in order to provide general study design guidelines. We use the maximum-likelihood approach for model fitting as proposed by MacKenzie *et al.* (2002) and assume a standard survey design with *K* surveys carried out in all *S* sampling sites.

The likelihood function corresponding to the constant probability occupancy model for a standard design can be written in a compact form as follows:

- (eqn1)

where *S*_{D} is the number of sites where the species was detected at least once, *d* is the total number of detections in the detection history and *p*** *= 1 − (1 − *p*)^{K} is the probability of detecting the species in at least one of the *K* surveys carried out at an occupied site. Note that (*S*_{D}, *d*) is a sufficient statistic as it summarizes the detection history with no loss of information. MacKenzie *et al.* (2006, p. 95) point out that the analytical solution for the maximum-likelihood parameter estimates (MLEs) satisfies the equations:

- (eqn2)

That is, as gets smaller, the estimate of occupancy () increases compared with the naïve estimate obtained assuming that the species was not missed at any of the occupied sites (). When evaluating the performance of this model via simulations, MacKenzie *et al.* (2002) noted that, when working with small probabilities of detection, they sometimes obtained estimates of occupancy that tended to 1. By studying the model analytically we identified the detection histories that result in boundary estimates (). It can be shown that the MLE expressions given by eqn 2 are only valid as long as the observed detection history fulfils the following condition:

- (eqn3)

and that, otherwise, the MLEs are:

- (eqn4)

Eqn 3 indicates that the occupancy estimate hits the boundary when the proportion of sites where the species was not detected (left term) is smaller than the proportion of zeros in the history raised to the power of *K* (right term). This suggests that boundary estimates may be an issue when working with small sample sizes and low probabilities, especially when the amount of replication is small.

A graphical representation of all the MLEs obtainable for a given design illustrates the issues resulting from small sample sizes and the effect that increasing the number of sites or replicates has on the quality of the estimates (Fig. 1). Given a finite number of sites (*S*) and replicates (*K*) there is a finite number of histories that can be theoretically observed (i.e. 2^{SK} possible combinations of zeros and ones). Under the model with constant probabilities of occupancy and detectability all those histories that share the same *S*_{D} and *d* produce the same estimates of occupancy and detection (eqn 2). This results in (*S *+* *1)[1 + *S*(*K* − 1)/2] possible estimate points in the parameter space (dots in the figure).When sample sizes are very small, there are only a few distinct detection histories that can be observed and, correspondingly, few possible parameter estimate values (Fig. 1a). The parameter space is sparsely covered by the MLEs, which means that the estimator is not precise, an effect more pronounced as probabilities of occupancy and detection get smaller. In fact there are no solutions covering the area corresponding to the lowest probabilities, which causes the estimator to be substantially biased in this region. As more samples are added to the study, the MLE solutions cover more of the probability space. Additional replication results in a better coverage of the area corresponding to low probabilities of detection (Fig. 1b), while an increase in the number of sampling sites achieves a more even coverage in the area corresponding to high probabilities of detection (Fig. 1c). When the amount of replication is large, the MLEs coincide with the naïve estimates in most cases as *p** is close to unity except for very low values of *p*.

Likelihood theory provides tools for approximating the properties of the MLEs when the sample size is large. The theory indicates that the estimators are asymptotically unbiased and thus mean square error (MSE) and variance are the same. The asymptotic variance–covariance matrix can be derived by inverting the information matrix (i.e. the expectation of the second derivative of the negative log-likelihood with respect to the parameters, Severini 2000, p. 91). MacKenzie & Royle (2005) presented the formula for the asymptotic variance of the occupancy estimator:

- (eqn5)

where *TS* = *SK* is the total effort assigned to the survey. They noted that as *p** approaches unity, the variance of reduces to the variance of a binomial proportion [i.e. *ψ*(1 − *ψ*)/*S*]. It can be shown that the remaining elements of the variance–covariance matrix are:

- (eqn6)

- (eqn7)

As *p** approaches unity, the covariance tends to zero and the variance of approaches *p*(1 − *p*)/(*TS* × ψ). For a fixed total effort, as replication increases the variance of first decreases as this allows false and true detections to be distinguished and then starts increasing as dictated by the binomial proportion due to the reduction in the number of sampling sites. The variance of also starts by decreasing as more replication is added to the design but then remains at about a constant level as the variance is dictated by the total amount of effort, no matter whether it is spent on additional sites or replicates.

### Design of occupancy studies

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

Large sample approximations and simulations are tools that can assist in the design of occupancy studies. Here, we comment on these two approaches and provide an overall picture of the design process with an emphasis on small sample sizes. Note that, to design a study, we need to assume values for the parameters to be estimated.

#### Optimal design based on asymptotic approximations

The asymptotic variance approximations can be of use when designing occupancy studies as they allow us to explore analytically how estimator precision changes for different design parameters. MacKenzie & Royle (2005) derive study design recommendations based on the asymptotic approximation of the variance of the occupancy estimator (Table 1a). Recommendations can also be produced incorporating the variance of as part of the design criterion, which is useful when detectability is itself a parameter of interest. There are different criteria that can be used for optimal design. For a discussion on the merits of the different methods, see Atkinson & Donev (1992, p. 106). One common approach is to minimize the trace of the variance–covariance matrix, that is, the sum of the variances of the parameters. This is called A-optimality and it gives equal weight to the two variances rather than minimizing the variance of each of the parameters separately (i.e. the variance of and in our case). Alternatively D-optimality minimizes the determinant of the variance–covariance matrix. For large samples, the maximum-likelihood estimators and are approximately normally distributed and D-optimal design minimizes the area of elliptical confidence region based on this distribution. Here, we derive the optimal number of replicate surveys to be carried out at each sampling site using the A-optimality (Table 1b) and D-optimality (Table 1c) criteria. The optimal number of replicates increases driven by the variance of , with larger changes observed for low probabilities of occupancy and low probabilities of detection respectively. As happens when considering the variance of the occupancy estimator only, the optimal number of replicate surveys in these two cases is determined by the parameter values (*ψ* and *p*) irrespective of the total effort assigned to the survey (*TS*). Note that the optimal number of replicates is the same regardless of whether the study is designed to minimize survey effort or estimator variance (measured through any of the three criteria above).

ψ | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

0·1 | 0·2 | 0·3 | 0·4 | 0·5 | 0·6 | 0·7 | 0·8 | 0·9 | ||

(a) | ||||||||||

p | 0·1 | 14 | 15 | 16 | 17 | 18 | 20 | 23 | 26 | 34 |

0·2 | 7 | 7 | 8 | 8 | 9 | 10 | 11 | 13 | 16 | |

0·3 | 5 | 5 | 5 | 5 | 6 | 6 | 7 | 8 | 10 | |

0·4 | 3 | 4 | 4 | 4 | 4 | 5 | 5 | 6 | 7 | |

0·5 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 5 | |

0·6 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 | |

0·7 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | |

0·8 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |

0·9 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |

(b) | ||||||||||

p | 0·1 | 19 | 16 | 17 | 17 | 19 | 20 | 23 | 27 | 34 |

0·2 | 13 | 10 | 9 | 9 | 9 | 10 | 11 | 13 | 16 | |

0·3 | 10 | 7 | 7 | 6 | 6 | 7 | 7 | 8 | 10 | |

0·4 | 8 | 6 | 5 | 5 | 5 | 5 | 5 | 6 | 7 | |

0·5 | 7 | 5 | 4 | 4 | 4 | 4 | 4 | 5 | 6 | |

0·6 | 6 | 4 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | |

0·7 | 5 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | |

0·8 | 4 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 3 | |

0·9 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |

(c) | ||||||||||

p | 0·1 | 19 | 19 | 20 | 21 | 23 | 24 | 27 | 30 | 36 |

0·2 | 9 | 10 | 10 | 11 | 11 | 12 | 13 | 14 | 17 | |

0·3 | 6 | 6 | 7 | 7 | 7 | 8 | 8 | 9 | 11 | |

0·4 | 5 | 5 | 5 | 5 | 5 | 6 | 6 | 7 | 8 | |

0·5 | 4 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 6 | |

0·6 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 5 | |

0·7 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | |

0·8 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | |

0·9 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

#### Design based on a simulation study

Likelihood theory tells us that asymptotic approximations are good when the sample size is large enough; however, it does not tell us how large it needs to be. In Fig. 2 we illustrate how the properties of the MLEs under the constant occupancy model depart from the asymptotic approximation for a combination of design parameter values that is realistic within the context of ecological studies (168 units of total effort). The difference between the approximated and actual estimator distributions is larger for low probabilities of occupancy and detection. Designing an occupancy study based on asymptotic properties of the estimators is therefore not appropriate if the intended sample size is small, especially when dealing with rare and elusive species. Under these circumstances, the actual quality of the estimators may be very different from that predicted by the asymptotic variance expressions and the design identified as optimal using large sample approximations may not be the best available, as illustrated in the example section. In these cases the most appropriate method for designing a study relies on the use of simulations.

#### Sampling design procedure for occupancy surveys: the big picture

The design of an occupancy survey (Fig. 3) should start with a clear statement of the project requirements in terms of the quality of the estimators (e.g. maximum allowed variance) and total survey effort available. With this in mind the design can be made to either (A) maximize the quality of the estimators or (B) minimize the effort employed. We also need to assume initial values for the parameters to be estimated. These can be based on the results of a pilot study, on studies carried out for the same or similar species in comparable circumstances or on expert opinion. The first issue to address is whether the sample size can be considered large enough to base the choice of design parameters on asymptotic approximations. If the total effort available is large and the probabilities of occupancy and detectability are expected to be relatively high, the design can safely be based on these approximations. Nevertheless, we recommend verifying that the approximations are valid before proceeding to collect data. This involves running a simulation with the chosen design parameters (*K* and *S*) and given parameter assumptions (*ψ* and *p*). If the sample size is not large enough for the asymptotic approximation to be good, the design needs to be based on a simulation study, in which the quality of estimators is evaluated for different combinations of design parameters. There is software which allows simulating the model with a given set of *K*, *S*, *ψ* and *p* to evaluate estimator bias and variance (genpres, Bailey *et al.* 2007). Program soda offers the possibility of running an automated search for a suitable design which explores different combinations of *K* and *S* given the assumptions and requirements specified by the user. The tool allows the user to select whether priority is given to maximizing estimator quality or minimizing total effort, and allows detectability to be incorporated as part of the design criteria. Program soda can be freely downloaded at http://www.kent.ac.uk/ims/personal/msr/soda.html. An R function for evaluating the performance of a given design is available at the same site.

Once a candidate design is identified, either through asymptotic approximations or simulations, we need to verify whether it fulfils the requirements of the project. If it does, the study can proceed to data collection. Otherwise, if no suitable design was found, the objectives and constraints of the project need to be reconsidered: can more resources be allocated to this study? Could less precise estimates still be informative for the purpose of the study? If the answer to these questions is negative the study should not continue as it would be a waste of resources that could be used elsewhere (Legg & Nagy 2006). If the project objectives or constraints are redefined, a new design should be sought given the new requirements.

#### Example: designing an occupancy study when sample size is small

As an illustration of the design process let us assume that (1) our target is for the occupancy estimator to be approximately unbiased with a maximum SE of 0·075 (i.e. maximum RMSE 0·075), (2) the maximum effort that can be employed in the study is *TS*_{max} = 350 and (3) the probabilities of occupancy and detectability are thought to be *ψ*_{i} ≈ 0·2 and *p*_{i} ≈ 0·3. If we decide to start our study design from the recommendations derived from the asymptotic properties of the estimators, the first thing to do is to find the optimal number of replicates to be used, in this case *K *=* *5 (Table 1a). Let us first assume that our priority is to minimize the variance (option A in Fig. 3). In this case we will make use of the total available effort and the number of sites to be surveyed (*S*) will be derived as *S *= *TS**/K *=* *350/5 = 70. We should now evaluate the variance of the occupancy estimator under this design (*K *=* *5 and *S *=* *70) to verify whether it is within our target. From the expression of the asymptotic variance of the occupancy estimator (eqn 5) we get:

which gives an SE of 0·057. According to the asymptotic approximation the estimator is unbiased; so, the RMSE is also 0·057. This RMSE is within the target that our project set (0·057 < 0·075); so, the design seems good. In order to verify that the approximations made for the design were appropriate we would now run a simulation for the chosen design parameters (*K *=* *5 and *S *=* *70) and assumptions (*ψ*_{i} = 0·2 and *p*_{i} = 0·3). A simulation with 50 000 iterations reveals that the actual RMSE of the occupancy estimator (0·070) is higher than predicted by the approximation (0·057), although still within the project target, so the design could be kept. However, given that the approximation was not very accurate it may be worth exploring other combinations of parameters as there is no guarantee of the optimality of the chosen design. For instance, a design with *K *=* *6 and *S *=* *58 would be a better choice (Table 2). Note also that increasing the replication (*K *=* *7) would provide a more suitable design if detection probability was to be considered as part of the design criterion instead of occupancy only.

K | ||||||
---|---|---|---|---|---|---|

4 | 5 | 6 | 7 | 8 | 9 | |

^{}Three levels of total effort ( *TS*= 250, 300 and 350) and six levels of replication (*K*=*TS*= 350, the sum of the mean-squared errors (A-optimality criterion) and the determinant of the MSE matrix (D-optimality criterion) are also shown.
| ||||||

TS ≈ 250 | ||||||

S | 62 | 50 | 42 | 36 | 31 | 28 |

aRMSE (×10^{2}) | 6·9/9·6 | 6·8/8·6 | 6·9/7·9 | 7·1/7·5 | 7·5/7·3 | 7·8/7·1 |

RMSE (×10^{2}) | 12·6/10·1 | 10·6/9·3 | 9·6/8·7 | 9·3/8·4 | 9·6/8·2 | 9·6/8·0 |

RMSE* (×10^{2}) | 9·3/9·7 | 8·2/9·0 | 7·7/8·4 | 7·5/8·1 | 7·7/8·0 | 7·9/7·7 |

Boundary estimates | 1·1% | 0·7% | 0·5% | 0·5% | 0·5% | 0·5% |

TS ≈ 300 | ||||||

S | 75 | 60 | 50 | 43 | 37 | 33 |

aRMSE (×10^{2}) | 6·3/8·7 | 6·2/7·9 | 6·3/7·3 | 6·6/6·9 | 6·9/6·7 | 7·2/6·5 |

RMSE (×10^{2}) | 10·1/9·2 | 8·4/8·4 | 7·8/7·8 | 7·5/7·5 | 7·9/7·4 | 8·1/7·2 |

RMSE* (×10^{2}) | 8·2/8·9 | 7·2/8·2 | 6·9/7·7 | 6·7/7·4 | 7·0/7·3 | 7·2/7·1 |

Boundary estimates | 0·5% | 0·3% | 0·2% | 0·2% | 0·2% | 0·3% |

TS ≈ 350 | ||||||

S | 87 | 70 | 58 | 50 | 43 | 39 |

aRMSE (×10^{2}) | 5·8/8·1 | 5·7/7·3 | 5·9/6·8 | 6·1/6·4 | 6·4/6·2 | 6·6/6·0 |

RMSE (×10^{2}) | 8·3/8·5 | 7·0/7·6 | 6·6/7·2 | 6·7/6·9 | 6·9/6·7 | 7·1/6·6 |

RMSE* (×10^{2}) | 7·4/8·4 | 6·5/7·6 | 6·3/7·2 | 6·3/6·9 | 6·5/6·6 | 6·6/6·5 |

Boundary estimates | 0·2% | 0·1% | 0·1% | 0·1% | 0·1% | 0·1% |

A-optimality criterion (×10^{3}) | 14·1 | 10·7 | 9·5 | 9·2 | 9·3 | 9·3 |

D-optimality criterion (×10^{−5}) | 3·28 | 2·21 | 1·95 | 1·96 | 2·04 | 2·05 |

We now repeat the process assuming that the priority is on minimizing the total effort *TS* (option B in Fig. 3). In this case the number of sites to be surveyed (*S*) is derived from the expression of the asymptotic variance of the occupancy estimator (eqn 5):

The total effort required for this design (205) is within the target that our project set (350); so, the design seems good. However, simulations show that the occupancy estimator has some bias and large variance; its RMSE (0·1391) is almost twice the maximum RMSE allowed by the project (0·075), which renders this design unsuitable. The asymptotic approximation is poor for the sample size in this study; so, it is best to choose the design via simulations. By exploring different combinations of *K* and *S* we can identify the design that fulfils the variance target with the minimum effort. In this case, *K *=* *7 and *S *=* *43 would be a good choice. Note that the number of replicates (7) differs from the optimal *K* suggested by the asymptotic approximations (5) and the total effort required is substantially larger (301 vs. 205).

### Discussion

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

When faced with the task of planning a study it is essential to address explicitly three basic questions: (1) why is the study needed, (2) what is a suitable state variable and (3) how to do the sampling? (Yoccoz, Nichols, & Boulinier 2001). Here, we have concentrated on aspects related to the ‘how’ question in the context of occupancy studies, in particular on issues derived from the trade-off resulting from the allocation of survey effort between number of sites and number of replicates. However, we emphasize the need to first deal properly with the ‘why’ and ‘what’ questions, as well as to consider other elements related to the ‘how’ such as the selection of sites, the timing of surveys (MacKenzie & Royle 2005) or decisions on the type of replication to be used.

Addressing the ‘why’ question requires a clear statement of the objectives of the study from which design requirements can be derived, including the maximum survey effort available and the level of precision needed for results to be meaningful (Field *et al.* 2007). Defining this is not just a statistical decision and should incorporate considerations of the species biology and the system in general. For instance, management decisions should explicitly evaluate the costs associated with false positives and false negatives when detecting trends, costs that are not necessarily equal (Field *et al.* 2005a). Although studies often focus on the estimate of occupancy, here we argue that there are situations when the probability of detection is also of interest. In these cases it is natural for the precision of *p* to be included as part of the design criterion. We show that, under these scenarios, the best design will tend to require more replication than in cases where only the precision of the occupancy estimator is considered, especially when working with rare species.

Ecological studies often involve small sample sizes. This is particularly true for studies related to conservation. Here, we show that the asymptotic approximations to the distributions of the maximum-likelihood estimators are unreliable for sample sizes that, although small, are realistic in the context of ecology. Estimators are biased and less precise than indicated by these large sample approximations. This is especially relevant when working with rare and elusive species as then the probabilities of occupancy and detection are low. We highlight the importance of taking these issues into consideration when designing occupancy studies and argue that simulations should be used in the design process. It is essential to determine the actual properties of the estimators under a chosen design, to make sure that they fulfil the design targets before spending, and maybe wasting, time and effort in the field. With a clear description of the overall design procedure, supported by a numerical example and a new software application, we aim at promoting the good practice of addressing small sample considerations when designing occupancy studies. However, it is important to note that this guidance does not replace the careful evaluation of each project’s characteristics. Apart from the requirements addressed here, there may be other issues that need to be incorporated in the design process such as decisions on the minimum number of sites that the program aims to survey, the cost of each survey or other logistical considerations. The large sample recommendations discussed are based on the model with constant probabilities. We do not give specific recommendations for studies involving covariates (e.g. occupancy in two habitats) but the same general approach is applicable and the use of simulations remains the best tool to guide study design. Here, we have concentrated on maximum-likelihood inference. An alternative Bayesian approach avoids asymptotic assumptions; however, it is still necessary to select an optimal design and prior sensitivity needs to be considered.

Designing a study requires initial values of the parameters to be estimated. It is important to realize that the actual performance of the chosen design depends on the correctness of these initial values. Given that these parameters are the object under study, there may be considerable uncertainty about their true values. Before deciding on a final design, we recommend exploring the sensitivity of the design to a change in these initial values. Bayesian experimental design (Chaloner & Verdinelli 1995) provides a systematic framework to account for prior knowledge on the parameters in the design process. Sequential methods divide studies into stages, with later stages designed using the results of earlier ones to update the initial estimates (Abdelbasit & Plackett 1983). The potential of these techniques in the context of occupancy study design is the subject of future work.

### Acknowledgements

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

This research has been supported by an EPSRC/NCSE grant. The authors thank Darryl MacKenzie and one anonymous reviewer for valuable comments that improved the quality of this manuscript.

### References

- Top of page
- Summary
- Introduction
- Modelling occupancy under imperfect detection: estimator properties
- Design of occupancy studies
- Discussion
- Acknowledgements
- References

- 1983) Experimental design for binary data. Journal of the American Statistical Association, 78, 90–98. & (
- 1992) Optimum Experimental Designs. Clarendon Press, Oxford. & (
- 2007) Sampling design trade-offs in occupancy studies with imperfect detection: examples and software. Ecological Applications, 17, 281–290. , , & (
- 1982) Effects of stage of the breeding cycle on sage sparrow detectability. The Auk, 99, 788. & (
- 1995) Bayesian experimental design: a review. Statistical Science, 10, 273–304. & (
- 2005a) Optimizing allocation of monitoring effort under economic and observational constraints. Journal of Wildlife Management, 69, 473–482. , & (
- 2005b) Improving the efficiency of wildlife monitoring by estimating detectability: a case study of foxes (
*Vulpes vulpes*) on the Eyre Peninsula, South Australia. Wildlife Research, 32, 253–258. , , , & ( - 2007) Making monitoring meaningful. Austral Ecology, 32, 485–491. , , & (
- 2006) Why most conservation monitoring is, but need not be, a waste of time. Journal of Environmental Management, 78, 194–199. & (
- 2005) Designing occupancy studies: general advice and allocating survey effort. Journal of Applied Ecology, 42, 1105–1114. & (
- 2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83, 2248–2255. , , , , & (
- 2003) Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology, 84, 2200–2207. , , , & (
- 2004) Investigating species co-occurrence patterns when species are detected imperfectly. Journal of Animal Ecology, 73, 546–555. , & (
- 2006) Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence. Academic Press, New York, USA. , , , , & (
- 2008) Inferring red squirrel (
*Sciurus vulgaris*) absence with hair tubes surveys: a sampling protocol. European Journal of Wildlife Research, 54, 353–356. & ( - 2008) Seasonal variation in detectability of butterflies surveyed with Pollard walks. Journal of Insect Conservation, 12, 155–162. (
- 2006) Site occupancy models with heterogeneous detection probabilities. Biometrics, 62, 97–102. (
- 2000) Likelihood Methods in Statistics. Oxford University Press, Oxford. (
- 2003) Improving precision and reducing bias in biological surveys: estimating false-negative error rates. Ecological Applications, 13, 1790–1801. , , , , & (
- 2001) Monitoring of biological diversity in space and time. Trends in Ecology & Evolution, 16, 446–453. , & (