## Introduction

Occupancy, defined as the proportion of sites occupied by a species, is a state variable of interest in various areas of ecology (MacKenzie *et al.* 2006). In most cases, species detection is imperfect, which can lead to the incorrect classification of occupied sites as empty. If imperfect detection is not accounted for, bias is induced in the occupancy estimator. To tackle this problem, MacKenzie *et al.* (2002) and Tyre *et al.* (2003) proposed the joint modelling of occupancy and detection probabilities based on data resulting from a sampling protocol in which discrete replicate surveys are carried out at each sampling site, a modelling framework that has become widely used by ecologists.

To ensure that occupancy studies provide meaningful results and that, therefore, valuable monitoring resources are not wasted, it is critical to pay attention to survey design (Yoccoz, Nichols & Boulinier 2001; Legg & Nagy 2006). Unfortunately, enough care is not always devoted to this important stage, and providing simple tools to facilitate the process can help promote its rightful consideration. For instance, in the context of single-season occupancy studies, tables have been developed to assist in the allocation of survey effort between number of sites and replicate visits (MacKenzie & Royle 2005; Guillera-Arroita, Ridout & Morgan 2010). Sample size can then be determined by setting a target variance for the occupancy estimator, either choosing the minimum number of sites to achieve this target or as many as allowed by the available effort (and checking whether the variance target is met).

Rather than making inference about species occupancy at a given point in time, area or habitat type, there is often interest in assessing whether there are differences in occupancy between two samples. Occupancy has been proposed as a useful state variable for various large-scale monitoring programmes (MacKenzie *et al.* 2006, pp 41–44), with occupancy declines being of particular interest when dealing with species of conservation concern and increases when tracking the spread of invasives. In IUCN Red List assessments, estimated occupancy declines are used as part of criteria A and B, to reflect declines of population size and geographical range (IUCN 2001). In active adaptive management (McCarthy & Possingham 2007), the difference in species occupancy before and after the experimental intervention can be a state variable of interest. The focus might also be in assessing occupancy differences between two geographical areas or habitat types. In studies aimed at detecting occupancy differences, the criterion for sample size selection can be expressed in terms of their desired power, that is, the probability that the study will detect a significant difference, given that the true difference is of a given size (Cohen 1988).

The concept of power analysis is closely related to that of null hypothesis significance testing (NHST). As an inferential procedure, NHST has long received criticism (for a comprehensive review see Nickerson 2000), among other reasons because it lends itself to confusion regarding statistical vs. scientific significance. We do not extend here on this issue as there is a wealth of papers addressing it, including many in the context of ecological studies (e.g. Yoccoz 1991; Cherry 1998; Johnson 1999; Di Stefano 2004). A common understanding is that it is better to focus on providing an estimation of the magnitude of the effect in the form of confidence intervals, as this conveys more information than the outcome of a significance test. However, despite the NHST controversy, prospective power analysis is widely recognized as a useful tool for study design (Cohen 1990; Thomas & Juanes 1996; Steidl, Hayes & Schauber 1997; Johnson 1999; Di Stefano 2001; Legg & Nagy 2006). In fact, a particularly beneficial aspect of power analysis is that it requires an explicit consideration of what constitutes a biologically significant result, allowing us to determine whether a given design renders our study a good chance of producing statistically significant results when the actual effect size (occupancy difference in our case) is biologically significant.

While simulations provide a tool for power analysis, they can be time-consuming. Closed formulae can sometimes be derived to determine more easily the sample size required to achieve a given power. The development and performance evaluation of such formulae for a test comparing two independent binomial proportions have received a lot of attention in the literature (e.g. Cochran & Cox 1957, p. 27; Fleiss 1973, p. 30; Casagrande, Pike & Smith 1978; Walters 1979; Fleiss, Tytun & Ury 1980; Ury & Fleiss 1980; Dobson & Gebski 1986; Gordon & Watson 1996; Vorburger & Munoz 2006). These formulae are routinely used in different areas, such as the design of clinical trials (Donner 1984). However, as they assume that the outcome of the experiment, whether success or failure, is always observed without error, they are not applicable for occupancy studies, except for the unusual case in which species detection is perfect or enough replicate surveys are carried out to ensure its detection is practically certain.

To our knowledge, sample size formulae for models that account for imperfect detection when comparing binomial proportions have not been proposed or evaluated to date. In this study, we address this problem. We discuss how to design studies to detect a difference in occupancy between two samples with a given power when species detection is imperfect, and we present tools to accomplish this. We provide an approximate expression to calculate power and derive a closed-formula that conveniently allows the number of sites that need to be sampled to be determined with just a few simple calculations, while accounting for species detectability. Using this expression, we examine how the power of a study changes depending on the allocation of survey effort between number of sites and number of replicate visits and thus revisit the issue of optimal replication that had previously been addressed from the point of view of minimizing the variance of the occupancy estimator in single-season studies (MacKenzie & Royle 2005; Bailey *et al.* 2007; Guillera-Arroita, Ridout & Morgan 2010). As the derived sample size formula involves asymptotic (i.e. large sample) approximations, its performance needs to be assessed, as this is essential to understand its applicability. For this, we run Monte Carlo simulations and check how the resulting sample sizes compare to those indicated by the formula, at the same time also evaluating the performance of various significance tests. In the context of studies that assess occupancy changes in time, we demonstrate that the results and discussion in the paper are applicable regardless of whether independence or Markovian dependence is assumed in the occupancy status of sites between seasons (MacKenzie *et al.* 2006, pp. 186–212), and illustrate their utility when designing to detect a trend in multiple-season studies. Finally, we provide R code for conducting power analysis, both based on the formula and via simulations (Appendix S3).