The approach is based on the repeated sampling of the presence or absence of animal signs at locations of interest over a time interval (t days). Times between sampling occasions need not be of equal duration. The motivating example for this work was a data set of weekly collections (t = 7) of otter spraints at commercial fish farms. At the end of each visit, signs were recorded and then removed to ensure that only new signs would be found during the next visit. Observed signs were classified as fresh (from the previous night) and aged (from the previous t – 1 nights). For otters, aged spraints can be determined by their dried surface and solid consistency, whereas fresh spraints from the last day have a wet and soft consistency (Macdonald & Mason 1987). This can vary slightly, depending on weather conditions and the time of day of the survey. Experienced researchers are able to determine the age of spraints correctly but a pilot study, where spraints are sampled daily, is recommended. This classification scheme was appropriate for otter spraints, but could be adjusted easily to accommodate different situations.
This sampling design results in two types of information for each location: the number of sampling occasions where (a) fresh or (b) aged spraints were detected during the survey. The use of the second piece of information requires that the sampling is sufficiently frequent so that aged signs do not become obliterated between sampling occasions, but not so long that aged signs are always present. This imposes limits on the maximum and minimum time span between sampling periods, which will depend upon the species and the type of sign used. In the case of otters, spraints are obliterated typically after about 7 weeks for L. lutra (Jenkins & Harper 1980) and 4–5 weeks for L. maculicollis (Rowe-Rowe 1992). Generally speaking, the distinction between fresh and aged signs provides information on the visitation rate on two different (a shorter and a longer) time scales, therefore estimators that use this additional information should perform better than those based on a single (shorter or longer) time scale.
derivation of estimators
In the following, we derive maximum likelihood estimators for visitation rate and visitation probability for our new method and for new and existing estimators. Table 1 provides a summary of the notation used. Note that visitation rate and visitation probability are different manifestations of the same matter. Visitation rate is calculated as the number of visits per unit time, whereas the visitation probability is the per-unit time probability that an animal visits a site.
Table 1. Description of variables and parameters used in the derivation of the visitation rate/probability estimators
|f||Number of sampling occasions in which fresh signs/tracks were observed at a site|
|a||Number of sampling occasions in which old signs/tracks were observed at a site|
|c||Number of sampling occasions in which signs/tracks are found without distinction between old and new signs/tracks|
|n||Number of sampling occasions.|
|t||Interval in number of days between sampling occasions|
|p||True (simulated) daily visitation probability|
|Estimator of daily visitation probability|
|Estimator of visitation rate|
|, ||Estimator of visitation rate/probability using fresh signs only|
|, ||Estimator of visitation rate/probability using aged signs only|
|, ||Estimator of visitation rate/probability using aged and fresh signs without distinction (used in the literature)|
|, ||Estimator of visitation rate/probability using information on aged and fresh signs|
We assume that the visitation process to a site is a Poisson process and that the visitation rate (number of visits/day) is constant for the entire duration of a study (assumptions will be discussed below). Under these assumptions, the probability that a site will receive h visits in some time interval of length t is given by the Poisson distribution:
where l is the per-day visitation rate we wish to estimate from the data. The probability of observing no visits (absence) after t days is:
whereas the probability of observing at least one visit (presence) during t days is simply the complement of the above:
When only fresh signs are taken into account, t = 1, and the probability of observing f presences and n – f absences in n observations is given by the binomial distribution:
where pfr= 1 −e−λ. As we are interested in the likelihood L, the binomial coefficient, which is a normalization constant, may be dropped, yielding:
The negative log likelihood is then:
Taking the first derivative of the negative log likelihood with respect to λ, setting this equal to zero and solving for λ yields the maximum likelihood estimator for λ given only data on fresh signs:
Inserting equation 2 into equation 7, the maximum likelihood estimator for the visitation probability is:
Next, we assume that a distinction can be made between fresh and aged signs. When only aged signs are used, the time interval is t – 1 days. Denoting a as the number of presences and n – a as the number of absences in n observations, the negative log likelihood can be derived in the same manner as above, giving:
where pag= 1 −e−λ(t−1). Taking the first derivative with respect to λ, setting it equal to zero and solving for λ yields the maximum likelihood estimator given aged data only:
The estimator for the probability of visitation over t – 1 days is:
This can be converted into a daily visitation probability by noting that 1 – a/n is the probability that no visit occurred in t – 1 days. The (t – 1)th root of 1 – a/n is the per-day probability that the location was not visited. The complement of this quantity is the daily visitation probability:
When no distinction between new and aged tracks is possible, or when all data are combined into one presence/absence data set, simply recording the number of presences, c, and absences in n observations yields:
where pcm= 1 −e−λt. Taking the first derivative with respect to λ, setting it equal to zero, and solving for λ yields the maximum likelihood estimator given combined data only:
and for the (t)-days-visitation probability:
which can be converted into a daily visitation probability:
This is the only estimator for visitation rates/probabilities used previously in the literature.
We now wish to incorporate the new and aged sign data together in the same framework. Because we have assumed a Poisson process, the presence or absence of signs in an interval of time does not affect the probability of signs being deposited in the next time interval. We may therefore multiply the likelihood for the fresh data and that for aged data together to yield the likelihood for the full data set:
The negative log likelihood is then:
In contrast to the estimators above, this maximum likelihood estimator cannot be solved for in general. Analytical solutions are available only for t 5 days, but can be extremely cumbersome even for these cases. However, for t = 2, it has a particularly simple form:
Other cases can be solved numerically.
In the above derivations, the time intervals among sampling occasions have been assumed to be constant. This assumption may be relaxed and different time intervals between sampling occasions may be used. Then the likelihoods for the sampling occasions of different time intervals may be multiplied. Upon taking the negative log of this likelihood, one obtains:
for k groups of sampling occasions with time intervals identical within, but different among, groups. Now, ak and nk represent the number of presences of aged signs and the number of occasions in sampling group k, respectively. The probability pag,k is calculated by substituting t in equation 1 by tk, the length of time of sampling group k. Note that the terms dealing with fresh signs are unaffected, because by definition the time interval for fresh signs is always 1 day. Therefore, f and n remain the total number of presences and the total number of sampling occasions, respectively, across all time intervals. All the above-derived estimators are different cases of the same general framework. They rely on the same basic assumptions and they differ only in the data they use.
Approximate confidence intervals can be constructed for all estimators based on the likelihood-ratio test statistic (Hilborn & Mangel 1997: 162). The –2 log-likelihood of the maximum likelihood parameter estimate has an approximate χ2 distribution with nr degrees of freedom, where nr is the number of fitted parameters. For example, in the case of 1 degree of freedom, the upper and lower 95% (99%) confidence limits are those parameter values for which the corresponding –2 log-likelihood is larger by 3·84 (6·63) than the minimum –2 log-likelihood.
To make this method as accessible as possible, we provide an Exel workbook, ‘DoubleTrack’, which contains a worksheet-based user interface and embedded macros that implement the numerical procedures described above. The workbook and a detailed description of how to use it to estimate visitation rates, visitation probabilities and confidence intervals for both regular and irregular time intervals between sampling occasions may be found for download in the online appendix (see Supplementary material, Appendices S1, S2) and on the authors’ website [http://www.ufz.de/index.php?en = 1902].
We used simulations to compare the performance of the different estimators. A Poisson arrival process was simulated to generate data sets using weekly sampling occasions (constant time intervals t = 7). We varied the number of sampling occasions (n) between 2 and 42 (21 levels), and the daily visitation probability (P) between 0·05 and 0·95 (19 levels). Although the Poisson process is defined in terms of a rate, we used visitation probabilities in the following examples to maintain consistency with the literature (Marques et al. 1987; Tuyttens et al. 2001; Webbon, Baker & Harris 2004; Prokesova, Barancekova & Homolka 2006). The visitation rate can be calculated directly from the visitation probability using equation 2. All estimators showed an almost exact fit when more than 42 sampling occasions were used; hence we restrict our analyses to a maximum of 42 weeks. The trivial cases of P = 0 and P = 1 are exact for all estimators and were also omitted.
For each of the 21 × 19 = 399 scenarios (combinations of parameter values), we simulated r = 100 repetitions and calculated relative bias, relative precision and relative accuracy for each estimator (Hellmann & Fowler 1999). We used relative performance metrics to be able to combine all runs in a meaningful way because, for example, absolute differences of 0·05 are much more severe when the true visitation probability is P = 0·1 compared to when the true probability is P = 0·9.
We calculated relative bias for each scenario as:
where P is the true visitation probability and is the estimated visitation probability in the ith simulation. The relative bias can be interpreted as percentage away from the true value, i.e. a relative bias of +0·1 means that, on average, the estimator is 10% higher than the true value. A perfect estimator has a relative bias of zero and the distribution of relative biases over all 399 runs should be symmetrical around zero.
Relative precision is measured by the relative variance:
Relative precision is a measure of how much variation in estimates exists around the true parameter value. Note that precision is expressed in a somewhat counterintuitive way, such that a value of zero indicates the highest precision. We combined relative bias and relative precision into a single quantity, relative accuracy (Hellmann & Fowler 1999), which is expressed as a mean square error (MSE):
A perfect estimator would result in a relative accuracy value of zero. The closer the relative accuracy of an estimator is to zero across the whole parameter space, the better its overall performance.