Throughout the following section we make the simplifying assumption that occupancy (ψ) and detection probabilities (*p*) are constant across both space and time. While these assumptions may not always be reasonable in practice, it is usually necessary to make some simplifications of reality when designing a study. It should also be noted that, when designing studies, initial values need to be assumed for the population parameters of interest (here ψ and *p*).

We consider three general sampling schemes that have been used or proposed in the literature: (i) a standard design where *s* sites are each surveyed *K* times; (ii) a double sampling design where *s*_{K} sites are surveyed *K* times and *s*_{1} sites surveyed once (note it is ‘double’ sampling in the sense that a second round of sampling is performed to select the sites at which the repeated surveys will be conducted); and (iii) a removal design where *s* sites are surveyed up to a maximum of *K* times, but surveying halts at a site once the species is detected. For each design we also consider the impact of a cost function, where the cost of conducting subsequent surveys may be different from that of an initial survey (which may arise, for example, if multiple surveys are conducted during the same visit or if there is a set-up cost for establishing a new site).

We assume a general situation where the study is to be designed with an objective based on the variance of the occupancy estimator (var ()). Specifically, the study is to be designed either to (i) achieve a desired level of precision for minimal total survey effort; or (ii) minimize the variance for a given total number of surveys. The intent is therefore to determine what values of *s* and *K* will most efficiently achieve the study's objective, given the assumed values of ψ and *p*.

#### standard design

For a standard design (where all sites are surveyed *K* times), using the MacKenzie *et al*. (2002) maximum likelihood approach to estimating ψ, it can be shown that the asymptotic variance for (derived from the Fisher's information matrix; Williams, Nichols & Conroy 2002) is:

- ( eqn 1 )

where *p** = 1 − (1 − *p*)^{K} is the probability of detecting the species at least once during *K* surveys of an occupied site. Note that as *p** approaches 1·0, var() → ψ(1 − ψ)/*s*, the variance for a simple binomial proportion. Further, the total number of surveys (*TS*) in a standard design will be:

If the study is to be designed such that should achieve a desired level of precision, then to find the optimum combination of *s* and *K* the basic procedure would be to rearrange equation 1 to make *s* the subject and substitute into equation 2, to give:

- ( eqn 3 )

As values for ψ and *p* will be assumed, *K* is the only unknown in equation 3. Therefore the minimum number of surveys required to obtain a given level of precision can be found by differentiating equation 3 with respect to *K*, setting to zero and solving for *K*. This may be done analytically or numerically. Once an optimum value for *K* has been found, this can be substituted into the rearranged equation 1 to give the optimum number of sites to survey.

Alternatively, if the study is to be designed in terms of minimizing the variance for a fixed total number of surveys, then equation 2 should be rearranged to make *s* the subject, and substituted in equation 1, giving:

- ( eqn 4 )

Equation 4 would then be minimized with respect to *K*, set to zero and solved for *K*. The optimum value of *K* could then be substituted into the rearranged equation 2 to give the optimum number of sites to survey.

However, consider the similar forms of equations 3 and 4. In both cases they could be expressed as:

- ( eqn 5 )

where *C* is a constant with respect to *K*. Therefore, the value of *K* that minimizes *f*(*K*) will not depend upon *C*. This means that, regardless of whether the study is designed to minimize total survey effort to achieve a specified value of var() or to minimize var() for a fixed level of total survey effort, the optimum value of *K* will be the same. How the study is designed only determines the optimum number of sites to survey. This result is useful as it means standard tables can be used to give the optimum number of surveys that should be conducted at each site for given values of ψ and *p*.

The above results assume that the cost of conducting surveys is immaterial, or that the cost is equal across all sites and all surveys. However, in reality, costs will often be one of the major limiting factors when designing a study. When the cost associated with conducting a survey may vary either between sites or between survey occasions, then the above approach can be generalized such that an optimal design can be found given a specific cost function. Here we only consider cost functions of the form:

- Cost =
*c*_{0} + *s*[*c*_{1} + *c*_{2}(*K* − 1)]

where *c*_{0} is a fixed overhead cost, *c*_{1} is the cost of conducting the first survey of a site, and *c*_{2} is the cost of conducting subsequent surveys, although other cost functions could be considered (Field, Tyre & Possingham 2005; MacKenzie *et al*. 2005).

Once the cost function has been defined, then it is possible to design a study either in terms of (i) minimizing cost while obtaining a desired level of precision, or (ii) minimizing the variance given a fixed total budget. For situations where the cost of conducting surveys does not vary among sites (as is the case here), then a similar result to above holds where the optimal number of surveys to conduct per site is independent of whether a study is designed in terms of minimizing total cost or minimizing the variance of the occupancy estimate. This means tables can again be constructed of the optimal number of surveys to conduct at each site for given relative costs of an initial to a subsequent survey.

In Table 1 we present the optimal number of surveys per site (*K*) where the cost of an initial survey is equal to, five times greater and 10 times greater than the cost of a subsequent survey for selected values of ψ and *p*. The first thing to note with Table 1 is that the optimal value for *K* is never 1. That is, whenever the probability of detecting a species is < 1, the most efficient use of resources is never to survey all sites only once (in fact, occupancy and detection probabilities are not identifiable without auxiliary information if such a design was used). Further, Table 1 suggests that the optimal number of surveys required for each site decreases as detection probability increases. However, an interesting aspect of Table 1 is that the optimal value for *K* increases as the probability of occupancy also increases. This implies that an optimal strategy for rare species is to conduct fewer surveys at more sites, while for a common species the optimal strategy is to conduct more surveys at fewer sites. A further interesting point related to Table 1 is that, given the optimal values for *K* it is possible to calculate the optimal probability of detecting the species at least once at an occupied site (*p**; i.e. the probability of confirming the target species is present at a site). While not presented here, generally the optimal surveying strategy requires a reasonable degree of confirmation that the target species occupies a site (0·85 < *p** < 0·95). The optimal value for *K* generally changes little for the type of cost function considered here, although note that when subsequent surveys can be conducted relatively cheaply, an optimal strategy is to increase the number of repeat surveys.

Table 1. Optimum number of surveys to conduct at each site for a standard design where all sites are surveyed an equal number of times, and the cost of conducting the first survey of a site is *x* times greater than the cost of a subsequent survey *p* | *x* | ψ |
---|

0·1 | 0·2 | 0·3 | 0·4 | 0·5 | 0·6 | 0·7 | 0·8 | 0·9 |
---|

0·1 | 1 | 14 | 15 | 16 | 17 | 18 | 20 | 23 | 26 | 34 |

5 | 16 | 17 | 18 | 19 | 20 | 22 | 24 | 28 | 35 |

10 | 18 | 19 | 20 | 21 | 22 | 24 | 26 | 30 | 37 |

0·2 | 1 | 7 | 7 | 8 | 8 | 9 | 10 | 11 | 13 | 16 |

5 | 9 | 9 | 9 | 10 | 11 | 11 | 12 | 14 | 17 |

10 | 10 | 10 | 11 | 11 | 12 | 13 | 14 | 15 | 18 |

0·3 | 1 | 5 | 5 | 5 | 5 | 6 | 6 | 7 | 8 | 10 |

5 | 6 | 6 | 6 | 7 | 7 | 8 | 8 | 9 | 11 |

10 | 7 | 7 | 7 | 8 | 8 | 9 | 9 | 10 | 12 |

0·4 | 1 | 3 | 4 | 4 | 4 | 4 | 5 | 5 | 6 | 7 |

5 | 4 | 5 | 5 | 5 | 5 | 6 | 6 | 7 | 8 |

10 | 5 | 5 | 6 | 6 | 6 | 6 | 7 | 8 | 9 |

0·5 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 5 |

5 | 4 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 6 |

10 | 4 | 4 | 4 | 5 | 5 | 5 | 5 | 6 | 7 |

0·6 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 |

5 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 5 |

10 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 5 | 5 |

0·7 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 |

5 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 4 |

10 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 |

0·8 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

5 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 |

10 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |

0·9 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

5 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

10 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

#### double sampling

A double sampling design (where repeat surveys are conducted at a subset of sites only) is completely compatible with the modelling approach of MacKenzie *et al*. (2002). Initially, double sampling appears attractive as it seems reasonable that at some point the collection of additional information about detectability (by repeated surveys) may be inefficient, and there is greater benefit (in terms of precision) in increasing the total number of sites that are surveyed. Such a design has been proposed by MacKenzie *et al*. (2002, 2003), MacKenzie, Bailey & Nichols (2004) and MacKenzie (2005b). The above approaches can be generalized to determine the optimal allocation of sampling effort between sites and repeated surveys, when a double sampling scheme is being used.

When a double sampling scheme is used with *s*_{K} sites surveyed *K* times and *s*_{1} sites surveyed once, assuming detection probability is constant, it can be shown that the asymptotic variance for is:

- ( eqn 6 )

While appearing unwieldy, note that the general form of equation 6 is similar to equation 1: a simple binomial proportion variance with additional penalty terms as a result of the imperfect detection of the species. As for var() from a standard design, note that var() →ψ(1 − ψ)/(*s*_{K} + *s*_{1}) as *p** ≡ 1 and furthermore that, if *s*_{1} = 0, then equation 6 equates to equation 1 (with *s*_{K} = *s*), as would be expected.

When the cost of initial and subsequent surveys are equal (or not an issue), it was found that generally there is little advantage in using an optimal double sampling design compared with an optimal standard design with the same number of total surveys. Table 2 presents the optimal fraction of total survey effort that should be used to survey *s*_{1} sites only once. However, even when it is suggested that a reasonable fraction of the total survey effort should be used, the percentage improvement in the standard error compared with an optimal standard survey is small unless ψ is small and *p* is large. Hence, speculation by MacKenzie *et al*. (2002, 2003), MacKenzie, Bailey & Nichols (2004) and MacKenzie (2005b) that a double sampling scheme may generally be more efficient is unsubstantiated.

Table 2. Optimal fraction of total survey effort (expressed as a percentage) that should be used to survey *s*_{1} sites only once using a double sampling design, where cost of the first and subsequent surveys are equal *p* | ψ | |
---|

0·1 | 0·2 | 0·3 | 0·4 | 0·5 | 0·6 | 0·7 | 0·8 | 0·9 |
---|

0·1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0·2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0·3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0·4 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0·5 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0·6 | 0 | 0 | 0 | 12 | 4 | 0 | 0 | 0 | 0 |

0·7 | 9 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0·8 | 33 | 30 | 26 | 21 | 14 | 5 | 0 | 0 | 0 |

0·9 | 56 | 54 | 51 | 48 | 44 | 39 | 31 | 17 | 0 |

Given these results, it would be expected that a double sampling design would only become more efficient than a standard design in situations where the cost of surveying a new site for the first time is lower than resurveying a site that had been surveyed previously (i.e. where *c*_{1} < *c*_{2}), which was confirmed using numerical approaches. As it is difficult to imagine a situation where this may occur in practice, a double sampling scheme may not be a good design in most circumstances.

#### removal sampling

The logic behind a removal sampling scheme (where surveying of a site halts once the species is detected or *K* surveys have been conducted) is that the main piece of information with respect to occupancy has been collected once the species has been confirmed at a site. We refer to this type of design as a removal sampling scheme as sites are removed from the pool of sites being surveyed once the species has been detected, and also because of the analogy with removal studies conducted on animal populations (where individual animals are physically removed from the population upon first capture; Otis *et al*. 1978; Williams, Nichols & Conroy 2002).

Using a removal sampling design, it can be shown that the asymptotic variance for is:

- ( eqn 7 )

Again, note the general form of the equation and the fact that var() = ψ(1 − ψ)/*s* as *p** ≡ 1.

As with a standard design, regardless of whether the study is to be designed in terms of achieving a specified level of precision for minimal effort or to minimize the variance for a fixed level of effort, equation 7 could be re-arranged into a form similar to equation 5, i.e.:

Again, this implies that there is an optimal value for *K* (now the maximum number of repeat surveys) that is consistent for either design approach. Note that these optimal values (Table 3) are generally larger than the values for the standard design (Table 1). To compare the relative efficiency of an optimal removal design with the optimal standard design, Table 4 presents the ratio of the expected standard errors for for these two designs with the same (expected) total number of surveys. Values < 1 indicate situations where the optimal standard design is more efficient in terms of obtaining a smaller standard error, which only occurs when the level of occupancy is < 0·3. This suggests that, generally, an optimal removal design is more efficient than an optimal standard design; however, the implication is that one must be prepared to conduct a greater maximum number of surveys in order to realize fully the gain in efficiency. For example, if ψ = 0·8 and *p* = 0·3 the standard error of an optimal standard design with eight repeat surveys per site will be 42% greater than that of an optimal removal design, but sites would have to be surveyed up to a maximum of 12 times.

Table 3. Optimal maximum number of surveys to conduct at each site for a removal design where all sites are surveyed until the species is first detected, where cost of the first and subsequent surveys are equal *p* | ψ |
---|

0·1 | 0·2 | 0·3 | 0·4 | 0·5 | 0·6 | 0·7 | 0·8 | 0·9 |
---|

0·1 | 23 | 24 | 25 | 26 | 28 | 31 | 34 | 39 | 49 |

0·2 | 11 | 11 | 12 | 13 | 13 | 15 | 16 | 19 | 23 |

0·3 | 7 | 7 | 7 | 8 | 8 | 9 | 10 | 12 | 14 |

0·4 | 5 | 5 | 5 | 6 | 6 | 6 | 7 | 8 | 10 |

0·5 | 4 | 4 | 4 | 4 | 4 | 5 | 5 | 6 | 8 |

0·6 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 5 | 6 |

0·7 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 | 5 |

0·8 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 |

0·9 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 |

Table 4. Ratio of standard errors for optimal standard and removal designs, where cost of the first and subsequent surveys are equal. Values < 1 indicate situations where an optimal standard design has a smaller standard error *p* | ψ |
---|

0·1 | 0·2 | 0·3 | 0·4 | 0·5 | 0·6 | 0·7 | 0·8 | 0·9 |
---|

0·1 | 0·90 | 0·94 | 0·98 | 1·04 | 1·10 | 1·18 | 1·30 | 1·46 | 1·74 |

0·2 | 0·91 | 0·94 | 0·99 | 1·04 | 1·10 | 1·18 | 1·28 | 1·44 | 1·71 |

0·3 | 0·92 | 0·95 | 0·99 | 1·04 | 1·10 | 1·17 | 1·27 | 1·42 | 1·68 |

0·4 | 0·93 | 0·96 | 0·99 | 1·03 | 1·09 | 1·17 | 1·26 | 1·40 | 1·64 |

0·5 | 0·93 | 0·96 | 1·00 | 1·04 | 1·08 | 1·16 | 1·24 | 1·37 | 1·60 |

0·6 | 0·94 | 0·97 | 1·01 | 1·06 | 1·09 | 1·15 | 1·22 | 1·35 | 1·55 |

0·7 | 0·95 | 0·96 | 0·97 | 1·01 | 1·07 | 1·13 | 1·22 | 1·31 | 1·48 |

0·8 | 1·00 | 1·02 | 1·04 | 1·07 | 1·09 | 1·11 | 1·15 | 1·25 | 1·45 |

0·9 | 1·02 | 1·05 | 1·07 | 1·10 | 1·13 | 1·17 | 1·20 | 1·24 | 1·31 |

The effect of incorporating differential costs for initial and subsequent surveys here is of a similar magnitude to that for the standard design. Unless detection probability is high (> 0·8), the optimal maximum number of surveys increases if the cost of an initial survey is substantially higher than the cost of a subsequent survey (e.g. *c*_{1} ≥ 5*c*_{2}) and decreases slightly if the subsequent survey cost is higher than the cost of an initial survey.

#### example: designing a study to achieve a specified precision for

Consider a situation where we wish to conduct a study where it is thought that ψ ≈ 0·7 and *p* ≈ 0·4, and it is assumed all surveys will have the same cost. The study is to be designed such that the estimated level of occupancy has a standard error of 0·04, using as few surveys as possible. If a standard design is used, then from Table 1 the optimal number of repeat surveys per site is 5, and the probability of detecting the species at least once is *p** = 1 − (1 − 0·4)^{5} = 0·92. To determine the number of sites to survey, the respective values can be inserted into equation 1, and solved for *s*, i.e.:

(note there may be some small discrepancy as a result of rounding errors). Based on the above results, this suggests that surveying 183 sites, each five times (915 total surveys), should be the most efficient allocation of resources for a standard design, provided the assumed values for occupancy and detectability are reasonable.

If a removal design was to be used, then from Table 3 the maximum number of surveys per site is 7. Given the assumed values, now *p** = 0·97, which gives the number of sites to survey as (from equation 7):

Therefore surveying 152 sites until first detection of the species will give a design with an expected standard error for of 0·04. As the decision of when to stop surveying a site relies upon an element of chance, the total number of surveys required for a removal design is actually a random variable but, for the above situation, the expected number of surveys required is 578.

In this instance, using a standard design may require 58% more surveys than a removal design to obtain a standard error of 0·04, although surveyors must be prepared to survey sites up to seven times rather than consistently surveying all sites five times. Note that if it was decided that a maximum of five surveys could be conducted per site, to obtain the desired standard error 226 sites would need to be used for a removal design, requiring an expected 703 total number of surveys.