#### field methods

Point transect trapping surveys comprise a main survey, in which numbers of animals captured are recorded, and separate experimental trials, for which a subset of animals have known initial locations with respect to the traps. For the main survey, the trapping locations should be determined by an appropriate randomized design as for a normal point transect survey (Strindberg, Buckland & Thomas 2004). For example, a systematic grid of points, randomly superimposed over the survey region, might be used.

Conceptually we take a ‘snapshot’ of where animals are located at a single point in time (corresponding to when traps are set) and choose trap locations that are independent of the animal locations at the snapshot moment. However, if trap separation is such that at most one trap is accessible to a given animal, the snapshot moment can be different for each trap. The time interval from the snapshot moment to the checking of traps should be the same in the experimental trials as for the main survey.

For the trials, animals do not need to be located independently of trap locations, but they should have a representative spread of distances over which capture is possible. If it is feasible to radio-tag perhaps 40 or more animals, then this provides a means of locating a subset of animals on which to conduct trials when traps are set. If animal behaviour is believed not to change after being trapped, the same animal might be used for more than one trial, with traps at different locations within its home range. (Differences in trappability between first and subsequent captures could be tested for.)

In the absence of radio-tagged animals, the same traps might be used both for catching animals for the trials and for the main survey. In this case, an initial sample of animals should be caught and marked, and their initial location is the location of the trap that caught them (assuming release at that location). The traps should then be relocated according to a randomized scheme, as otherwise animals are likely to be recaptured at zero distance from first capture, and a markedly non-representative set of distances obtained. At most one of the two sample occasions should use a systematic grid of points, as otherwise the trap separations (and hence distances for estimating the detection function) from the trials will be poorly distributed.

The danger of using the same trapping method for marking a sample of animals as for the main survey is that some animals may be inherently more catchable than others, or animals may become trap-shy after capture. In these circumstances, it would be preferable to have a totally different method of catching animals for the trials than for the main survey.

In the case of lures, it is first necessary to locate a subset of animals. If the animals stay at that location, one observer can wait while a second observer moves to a predetermined distance and sets the lure. The second observer records whether or not there is a response; the first observer verifies that it is the located animal that responds. If animals do not stay put, it may be necessary to have the two observers searching simultaneously some distance apart. When one locates an animal, the other sets the lure and, again, the observers determine whether or not there is a detectable response.

For either lures or traps, if covariates can be recorded that correlate with how detectable an animal is, then bias arising from heterogeneity in detectability will be reduced. Provided animals at or near the point are certain to be detected or trapped, and using a flexible model for the detection function, then the pooling robustness property ensures that estimation is asymptotically unbiased, even if such heterogeneity is not modelled (Burnham *et al*. 2004).

#### modelling of the data

In the following development, we consider the case that animals occur in clusters (e.g. flocks). This is very often likely to be the case when lures are used, but only rarely for trapping surveys. If animals do not occur in clusters, cluster sizes in the following development should all be set to unity. The terminology below assumes that we are conducting a survey using lures; the same formulae apply for point transect trapping surveys.

For the point transect trapping or lure survey, let:

*n*_{k} = number of clusters detected from point *k*, *k* = 1, … , *K*,

*s*_{ik}= size of cluster *i* detected from point *k*, *i* = 1, … , *n*_{k},

= total number of clusters detected.

For the trial data from which the detection function is to be modelled, let:

*m*= number of trials (i.e. number of clusters tested)

*r*_{i} = initial distance of cluster *i* from lure, *i* = 1, … , *m*,

*z*_{ij} = value of covariate *j* for cluster *i*, *i* = 1, … , *m*, *j* = 1, … , *J*.

Note that *r*_{i} cannot be observed for the *n* clusters detected during the main survey. We assume that covariates *z* can all be recorded for these clusters. One of these *J* covariates is likely to be cluster size.

We can model the probability of detection by fitting a generalized linear model (McCullagh & Nelder 1989) or generalized additive model (Hastie & Tibshirani 1990) for binary data to the observations *y*_{i}. Distances *r*_{i} should be included as a covariate, and other covariates *z*_{ij} might be tested for inclusion. If we use standard logistic regression, then:

- ( eqn 1)

with the corresponding fitted values p̂.

We now use this fitted model to estimate the probability of detection of those clusters detected in the main survey. For each of these detections, we can readily substitute values *z*_{ij} into the fitted model, but we do not know *r*_{i}. Thus p̂ is a function of the unknown *r*_{i}: p̂≡p̂(*r*; *z*_{i1}, … , *z*_{iJ}) for 0 ≤ *r* ≤ *w*, where *w* is some large distance at which a reaction to the lure is believed to be very unlikely. We therefore estimate the probability of detection of cluster *i* unconditional on its distance from the point by integrating over the unknown *r*:

- ( eqn 2)

where π(*r*), 0 ≤ *r* ≤ *w*, is the probability density function of distances of clusters (whether detected or not) from the point. In conventional point transect sampling, clusters within one of the circles of radius *w* are assumed to be randomly positioned in the circle, so that π(*r*) = 2*r*/(*w*^{2}). This assumption is assured through random point placement (or, more usually, a systematic grid of points, randomly located). Edge effects caused by some circles extending beyond the survey region, where cluster density may differ, are addressed by one of two ways. The first is ‘plus sampling’, in which points beyond the survey region boundary, but within *w* of it, are sampled, and animals detected from such points are recorded only if they are inside the survey region (Strindberg, Buckland & Thomas 2004). This option is unsatisfactory in the current context because traps or lures set outside the survey region boundary may be in unsuitable habitat, and therefore fail to attract animals, or, if they are in suitable habitat, we will be unable to distinguish whether detected animals were originally within the survey region or not.

The second way of addressing the problem in conventional point transect sampling is to ignore the edge effect. In effect, this means that we model the product of detectability and availability, that is π(*r*) p̂(*r*; *z*_{i1}, … , *z*_{iJ}) in the above notation. The pooling robustness property (Burnham *et al*. 2004) ensures that this does not bias abundance estimates. Thus if no animals occur beyond the survey region boundary, the reduction in detections at points close to the boundary is compensated for by the reduction in the apparent probability of detection, caused by the reduced availability. This compensation does not occur in the current context, because we model detectability as a separate exercise. There are two possible solutions to this difficulty. First, if few of the sampled points lie within *w* of the survey region boundary, or if cluster density is similar either side of the survey region boundary, then bias will be small if we ignore the problem, and assume π(*r*) = 2*r*/(*w*^{2}).

Secondly, it can be noted that, given random point placement and assuming that animals do not occur beyond the survey region boundary, the availability function is:

where *q*_{k}(*r*) is the proportion of the circumference of a circle of radius *r* centred on point *k* that lies within the survey region, for 0 *r* *w*, and , 0 ≤ *r* ≤ *w*. If this proportion is always 1, then π(*r*) = 2*r*/(*w*^{2}) as expected.

Estimation of abundance now proceeds using a Horvitz–Thompson-like estimator (Borchers *et al*. 1998). The estimated number of animals in the covered region is:

- ( eqn 3)

and estimated abundance in the entire survey region of size *A* is:

- ( eqn 4)

where *A*_{c} is the size of the covered region. The covered area within distance *w* of point *k* is:

If *q*(*r*) is always one, then *A*_{c} = *K*π*w*^{2}.

If animals do not occur in clusters, *s*_{i} is set to 1 in equation 3 for each detection. This also gives the estimated abundance of clusters for clustered populations.

Analytic variances for N̂_{c} and N̂ may be obtained by adapting the results of Borchers *et al*. (1998). Let **Y** = (*y*_{1}, ... , *y*_{m}) be the set of binary data from the experiment and the set of binary data from the main survey, where δ_{i} = 1 if cluster *i* was detected and δ_{i} = 0 otherwise, *i* = 1, ... , *N*_{c}. Then:

var(N̂_{c}) = *E*_{Δ}[var_{Y}(N̂_{c} | Δ, **Y**)] + var_{Δ}(*E*_{Y}[N̂_{c} | Δ, **Y**]).

The expression *E*_{Δ}[var_{Y}(N̂_{c} | Δ, **Y**)] may be estimated by:

where = (α, β_{0}, β_{1}, ... , β_{J}) and **Î**() is the estimated information matrix from the logistic regression.

Further var_{Δ}(*E*_{Y}[N̂_{c} | Δ, **Y**]) can be estimated by:

- ( eqn 5)

An approximate confidence interval for *N* may be found assuming log-normality (Buckland *et al*. 2001).

Perhaps a more robust way to estimate variance of *n* is to treat the *n*_{k} as independent observations on the expected number of detections per point. This suggests a bootstrap approach: a resample corresponding to the main survey is obtained by sampling with replacement the *K* points along with their data, and a resample corresponding to the experiment for fitting the detection function is obtained by resampling the *m* clusters from the experiment. The above methods are applied to these resampled data, and bootstrap estimates of *N*_{c} and *N* are obtained. This is repeated a large number of times, and the sample variance of the bootstrap estimates of a parameter provides the required variance estimate. This approach also allows uncertainty over which model to use for the detection function to be incorporated into the variance, by re-evaluating for each sample which model fits the data best (Buckland, Burnham & Augustin 1997). For example, Akaike's information criterion (AIC) can be evaluated for each model, and the model with the smallest AIC selected as the best approximating model for the original data, and similarly for each of the bootstrap resamples.

#### crossbill surveys

To illustrate the methods, we used data pooled across sexes and species of crossbill, and obtained estimates of abundance of crossbills at two sites for which pilots of the point transect lure survey were conducted: Abernethy Forest (57°15′ N, 3°40′ W, 34·1 km^{2}) and Glenmore Forest (57°10′ N, 3°40′ W, 18·5 km^{2}) in the central highlands of Scotland (Fig. 1). For the main survey, separate estimates will be obtained for males and females, as many females will be incubating at the time of the survey, and these are unlikely to respond to the lure. Further, it will be necessary to estimate abundance separately for the three species, Scottish crossbill *L. scotica*, common crossbill *Loxia curvirostra Linn.* and parrot crossbill *Loxia pytyopsittacus* Borkhavsen. For reliable identification of the three species, calls will be recorded for subsequent computer analysis (Summers *et al*. 2002). For the analyses presented here, we ignore these issues.

In order to determine the probability of response to the lure, trials were conducted during 2002–05 at a number of sites throughout northern Scotland. Several detection function models were fitted to the data from 152 trials. AIC resulted in selecting the model in which probability of a response is a function of distance from the point alone (Table 1). The fitted model is illustrated in Fig. 2; its functional form is:

Table 1. Logistic regression models fitted to the crossbill trial data (*n* = 152). *D* represents distance from the point, *Y* is days from 1 January, *S* is size of flock, *B* a behavioural factor with three levels, and *H* a habitat factor with two levels. Model *D* corresponds to a logistic regression of response on distance from the point, *D*+*S* indicates a logistic regression of response on distance and flock size, etc. At each step, the variable selected for elimination corresponded to the largest reduction in AIC. Time of day was not recorded for some records, so is not included in this table. Its coefficient did not differ significantly from zero Model | AIC | ΔAIC |
---|

*D* + *Y* + *S + B* + *H* | 158·4 | 2·6 |

*D* + *S* + *B + H* | 157·4 | 1·6 |

*D* + *S* + *B* | 156·4 | 0·6 |

*D* + *S* | 156·0 | 0·2 |

*D* | 155·8 | 0·0 |

Null | 202·1 | 46·3 |

The model fits the observations well, as judged from the close proximity of the plotted response means by interval to the fitted curve; the residual deviance of the model is 151·8 with 150 degrees of freedom, also indicating a good fit.

A truncation distance of *w*= 1 km was selected, beyond which probability of a response was deemed to be zero; estimation was insensitive to this choice for values above *c.* 700 m. It was estimated that 5·8% of birds within the circular plots of radius 1 km responded to the lure; this equates to an effective radius of detection of around 240 m, and an effective area surveyed around each point of about 18 ha. Almost all birds that respond are estimated to have been within 500 m of the lure, so that the 1-km separation of points in Abernethy ensures that the assumption of independence between points is likely to be reasonable. For Glenmore, separation between points was around 700 m (Fig. 1), so that it is possible that a few birds were lured away from their initial location, and hence unavailable to detection from the next point, when initially they may have been within responding range. Given the pooling robustness property of distance sampling estimators (Burnham *et al*. 2004), the resulting bias is likely to be slight, as birds very close to one point will be around 700 m from the next nearest point, and so very unlikely to be lured away (Fig. 2).

For the favoured model, P̂(*z*_{i1}, ... , *z*_{iJ}) = P̂, independent of the covariates other than distance. The function *q*(*r*), representing the average proportion of land within distance *r* of a point that is within woodland, was estimated as follows. For each point in each site, additional points were located at 125-m intervals out to 1000 m, to the north, south, east and west. The initial points were recorded as 1 (i.e. inside the region) and the additional points were recorded as 1 or 0, depending on whether they were inside the survey region or outside. A logistic regression was then used to estimate *q*(*r*), separately for Abernethy and Glenmore. Uncertainty in these estimates was allowed for by bootstrapping the initial points in each site, along with their associated additional points. Estimation error in *q*(*r*) could be avoided by digitizing the boundaries and points and using a geographical information system to evaluate *q*(*r*).

There were just 35 points in the pilot survey of Abernethy, for which 16 birds in 11 clusters were detected. For Glenmore, there were 34 points, and 54 birds in 31 clusters were detected. Application of equations 3 and 4 gave abundance estimates of 95 crossbills in Abernethy (corresponding density estimate 2·8 birds km^{−2}), and 182 crossbills in Glenmore (9·8 birds km^{−2}). Corresponding bootstrap standard errors based on 3999 resamples were 40 and 64, respectively, and 95% percentile confidence intervals were (37, 193) birds for Abernethy and (90, 339) birds for Glenmore. For comparison, the standard errors obtained from equation 5 were 37 birds for Abernethy and 57 birds for Glenmore, suggesting slight underestimation of variance by the analytic method. If the edge effect is ignored (*q*(*r*) = 1 for 0 ≤ *r* ≤ *w*), the abundance estimate is around 10% lower for both Abernethy and Glenmore. In this case, there is fairly substantial bias if the edge effect is ignored.

The national survey will have several hundred survey points, so that the component of variance for encounter rate will be much smaller than in these pilot surveys. However, precision of the estimated detection function is dictated by the number of trials conducted, and this component of variance can only be reduced by conducting more trials. For our pilot surveys, just over 60% of the variance in the Abernethy abundance estimate was because of estimating the detection function, and for Glenmore it was just over 40%.