Spatial capture–recapture with random thinning for unidentified encounters

Abstract Spatial capture–recapture (SCR) models have increasingly been used as a basis for combining capture–recapture data types with variable levels of individual identity information to estimate population density and other demographic parameters. Recent examples are the unmarked SCR (or spatial count model), where no individual identities are available and spatial mark–resight (SMR) where individual identities are available for only a marked subset of the population. Currently lacking, though, is a model that allows unidentified samples to be combined with identified samples when there are no separate classes of “marked” and “unmarked” individuals and when the two sample types cannot be considered as arising from two independent observation models. This is a common scenario when using noninvasive sampling methods, for example, when analyzing data on identified and unidentified photographs or scats from the same sites. Here we describe a “random thinning” SCR model that utilizes encounters of both known and unknown identity samples using a natural mechanistic dependence between samples arising from a single observation model. Our model was fitted in a Bayesian framework using NIMBLE. We investigate the improvement in parameter estimates by including the unknown identity samples, which was notable (up to 79% more precise) in low‐density populations with a low rate of identified encounters. We then applied the random thinning SCR model to a noninvasive genetic sampling study of brown bear (Ursus arctos) density in Oriental Cantabrian Mountains (North Spain). Our model can improve density estimation for noninvasive sampling studies for low‐density populations with low rates of individual identification, by making use of available data that might otherwise be discarded.


| INTRODUC TI ON
The estimation of population size using capture-recapture models is a standard approach in wildlife research and provides a rigorous quantitative method for informing species conservation and management (Williams et al., 2002). Traditional sampling requires the physical capture and artificial marking of individuals over multiple surveys to create encounter histories. Capture-recapture models then use the encounter histories to estimate the probability of capture and, by extension, the proportion of the total population that was captured (Otis et al., 1978). A key requirement of this approach is the individual identification associated with each capture event.
More recently, noninvasive sampling methods have made use of naturally occurring marks, whether in the form of distinguishing physical features that can be photographed (e.g., spot patterns) or DNA samples that can be passively collected and used to identify individuals.
The increased application of noninvasive sampling can be attributed to many factors, including logistical conveniences of data collection, improvements in technology, and animal welfare concerns (Long et al., 2008). Noninvasive sampling has been a particularly useful approach for monitoring wide-ranging mammals (e.g., carnivores), which occur at low densities and are otherwise difficult to physically capture over the large extents necessary for useful inferences.
One challenge associated with noninvasive sampling in the context of capture-recapture models is that natural marks, including genotypes, are often imperfectly observed. Photographs, especially from camera traps, may not provide a sufficiently clear and complete view of natural marks to verify individual identification for each encounter (Burton et al., 2015). Further, most wildlife species do not have individually unique features to allow for photo-identification methods, limiting more widespread application of capture-recapture models to camera trapping. Noninvasive genetic techniques can be applied to any species that deposits genetic material (e.g., hair or scat), which can be collected to extract DNA and identify individual genotypes (Waits & Paetkau, 2005); however, genetic sampling has its own challenges for establishing the individual identity associated with each sample (Augustine et al., 2020;Augustine et al., 2019). The quantity and quality of DNA from noninvasive samples is typically very low, and extraction procedures often require extensive replication to improve genotyping accuracy (Taberlet et al., 1999). Environmental degradation can render any sample unusable for determining identity, either through a lack of DNA amplification or errors caused by contamination or random chance (Waits & Paetkau, 2005). Samples whose individual identities cannot be established with high confidence are typically discarded, which can often comprise a large portion of all samples collected. This data reduction has the effect of reducing the precision of population parameter estimates; thus, statistical methods that can utilize these discarded samples are desirable.
Substantial progress has been made using spatial capture-recapture (SCR) as the basis for jointly analyzing samples with known and unknown individual identities. Chandler and Royle (2013) first introduced an SCR model with fully latent individual identities that can be applied to a collection of samples with no individual identities at all, demonstrating that there is information about the individual identities of samples contained in their spatial location of capture that can be utilized to estimate population parameters. While the estimates from this model are frequently of very low precision, Chandler and Royle (2013) demonstrated that this method produced much better estimates when the set of unmarked samples is combined with a set of samples from marked individuals with known individual identities. This approach combining data from marked and unmarked individuals was later termed "spatial mark-resight" (SMR; Sollmann et al. (2013)), which has been quite successful for estimating population parameters for partially marked populations where a subset of individuals is either manually or naturally marked.
While SMR provides a framework for combining known and unknown identity samples, it is limited to the case where individuals can be separated into "marked" and "unmarked" classes and where "unmarked" individuals can never provide an individual identity.
However, for noninvasive sampling and natural marks, it is often the case that although all individuals are identifiable, any individual can produce either individually identified or unidentified samples. Thus, all samples could theoretically be identifiable, but not all end up so.
For example, individual identities in genetic capture-recapture are lost due to features of the individual DNA sample and the process of DNA amplification, not due to features of the individual. Further, when camera trapping a species where individuals are nearly equally identifiable (e.g., using flank patterns), identities are lost at the sample level due to poor animal orientation or photograph quality.
For both remote cameras and genetic sampling, the level of resulting data loss can be substantial. For example, Kendall et al. (2016) identified 382 samples as grizzly bear (Ursus arctos), among which 127 (33.2%) were identified at individual level. Hooker et al. (2015) identified 9%-13% of hair samples from American black bear (Ursus americanus); Murphy et al. (2018)  With the hypothesis that the use of all data could improve the precision of the estimates, we describe an SCR model that combines identified and unidentified samples from a single class of individuals (as opposed to "marked" and "unmarked" classification in SMR) using modified methods from . The model allows for inference on abundance and distribution while accounting for uncertainty about the partially observed encounter histories.
We demonstrate the performance of this model relative to using the identified encounter histories alone via simulation and apply it to a large-scale noninvasive genetic sampling effort of brown bear (Ursus arctos) across a 2,624 km 2 region of the Eastern Cantabrian Mountains, North Spain, where only 60% of the DNA samples provided an individual identity. We illustrate how noninvasive sampling studies can maximize the information used to provide population inferences from capture-recapture designs despite difficulties in determining individual identity.

| Model formulation
Our model is a standard SCR model, with an additional random thinning process that determines which samples' individual identities are observed. The standard SCR model assumes that individual activity centers i = 1, 2, ⋯, N are distributed over a region or state space (S) and individuals are exposed to sampling by some trap or detector array within S. The distribution of individuals activity centers s i = s i 1 , s i 2 is typically described by a homogeneous point process, such that s i ∼ Uniform (S), which we will adopt here. Inhomogeneous point processes can also be used to model variation in the distribution of individuals (Borchers & Efford, 2008;Royle et al., 2014). The activity centers are latent variables to be estimated by the model given the trap-specific encounters for the n observed individuals at traps j = 1, 2, ⋯J with locations x j = x j 1 , x j 2 . Assuming that encounter frequencies are Poisson-distributed and a decreasing function of the distance d ij between individual activity center s i and trap location x j , the expected encounter rate can be specified as: Here, 0 is the expected encounter rate when d ij = 0, indicating direct overlap of an activity center with a trap, and is the scale parameter of the half-normal detection function. The expected encounter rate can covary as a function of trap-and individual-specific attributes, or by trapping occasion for sampling efforts with multiple occasions (e.g., ijk for K > 1).
We assume the encounter histories for the N observed individuals arise following y true ijk ∼ Poisson ijk , though other count distributions could also be used. The true encounter frequencies y true ijk for the n individuals with at least one detection are what would be observed if all samples were individually identifiable. Under a Bayesian approach to capture-recapture with unknown N, data augmentation can be used to estimate the number of unobserved individuals (Royle et al., 2007). We augment the n observed encounter histories with M − n "all-zero" histories, choosing a value such that M ≫ N.
The likelihood for the zero-inflated true encounter frequencies y true The process of assigning individual identities to samples in capture-recapture can be conceptualized as a random thinning process, where samples lose their individual identities at random with a probability 1 − . This process produces two types of data sets, one with individual identities and one without. Hereafter, this approach will be called the "random thinning SCR model" (see Figure 1 and DAG in Appendix S1). Thus, the new feature of our model is a submodel for individual identification conditional on the true encounter frequencies y true ijk . We define y ID ijk to be identified samples from individual i, at trap j on occasion k. Then, we assume: The individual identities of unrecognizable encounter frequencies y noID ijk are then latent and y noID ijk = y true ijk − y ID ijk . For the unidentified samples, we only observe the trap by occasion counts summed across captured individuals, nnid jk = ∑ N i = 1 y noID ijk . Thus, the same individual could be in both encounter histories -identified and not-at the same trap on the same sampling occasion. Note also that individuals with unidentified samples are not required to also be in the set of identified samples. We fit this model in NIMBLE (NIMBLE Development Team, 2019) using a custom Metropolis-Hastings update for y true ijk that obeys the constraint y noID ijk = y true ijk − y ID ijk . This Metropolis-Hastings sampler was used because the full conditional distribution for y true . jk used by  is no longer valid when some individual identities may be known for the same individuals that also have latent identity samples (Appendix S2) and there is no default function implemented in NIMBLE to do this. We provide 2 versions of this sampler with and without the K dimension, which is faster when there are not occasion-specific covariates or behavioral responses to capture (Appendix S2).

| Simulation
We simulated 12 scenarios with 100 data sets in each scenario, in densities close to previous SCR density estimates from several species. We explored each case to assess the accuracy and precision of density estimates for the random thinning SCR model compared to a standard SCR model. We used two different trapping arrays, with higher density scenarios simulated on the smaller array requiring a lower population size (N) to achieve the desired density (D), thus reducing computation time for the simulation study. We explored low population densities (individuals/unit 2 ) equivalent to population size For the simulated data sets, we calculated the number of individuals captured, number of captures with identification ("ID"), captures without identification ("non-ID"), number of recaptures, and number of spatial recaptures (mean and 95% confidence interval, see Appendix S3). Both the random thinning and regular SCR models were fit using NIMBLE (NIMBLE Development Team, 2019) in R (R Core Team, 2020) (Appendix S2). We fitted the random thinning model using both identified and unidentified samples, and we fitted the standard SCR models to the subset of encounter histories of identified individuals. In each case, we ran 3 chains of 50,000 (standard SCR models)-500,000 iterations (random thinning SCR models), discarding 5,000-50,000 iterations as burn-in and thinning by 5 or 100, respectively. We compared the posterior mean, median, and mode for point estimates. We calculated the root-meansquare error (RMSE) and the relative bias (RB) using the package SimDesign (Chalmers, 2020) in R and compared the improvements in RMSE using random thinning SCR models. We also calculated the coverage rates for the 95% highest posterior density (HPD) intervals for population sizes.

F I G U R E 1
Graphical depiction of the random thinning spatial capture-recapture model. Random thinning SCR is hierarchical model with two processes: ecological (population size and location-s i -of individuals) and observation. In this model (like in standard SCR), the detection rate of each individual depends on (i) Euclidean distance between individual's locations and traps (centroids of polygonal grid in the study case); (ii) baseline detection rate ( 0 ) that here depends on sampling effort (length of transect in each polygon); and (iii) the scale parameter ( ) from the half-normal detection function, that describes the animal movement. In the observation process, we obtain two types of data: encounters with identification (y ID ) and non-ID data (y noID ) or counts. Random thinning SCR model uses ID data (in red) like in standard SCR to make inferences about population size and individuals' distribution (including nonobserved individuals, in gray), but also uses the counts (in orange) with a constraint (y noID = y true − y ID ) using a Metropolis-Hastings algorithm-in a mechanistic approach-to make a probabilistic reconstruction of the true encounter frequencies (y true ), thus assigning identities to non-ID samples

| Application
The brown bear ( Figure 2) is considered an "endangered species" under Spanish law (Real Decreto 139/2011, 2011 and is a "priority species" and a "species of community interest in need of strict estimates using spatial capture-recapture models. They fit standard SCR models discretizing the space by a grid of hexagonal polygons and using the track length in each polygon as an effort covariate for the baseline detection rate 0 : The full details and data of this study can be found in Lopez-Bao et al. (2020). We used the SNP data and fit the random thinning SCR model. We also fit a standard SCR model using the known ID encounters without incorporating the unknown ID detections. The

| Simulations
Summary In the simulations, the improvement in population size estimates using the random thinning SCR model is more important in low-density scenarios and is greater for lower rates of individual identification (Figure 4 and Appendix S4  . In high-density scenarios, the improvement in relative bias is minimal ( Figure 5). Sigma was less biased using the random thinning SCR model at low and medium density, but there was no improvement at high density ( Figure 5 and Appendix S4). Coverage of 95% HPD intervals was close to the nominal values in all cases (Appendix S4).
In scenarios with low density (D = 0.1, N = 20) and = 0.1, the simulated data sets resulted in no spatial recaptures in the SCR data with individual identities for 33/100 data sets. In these cases, the random thinning SCR model was unbiased and worked correctly in situations where using an SCR model would be inadvisable. In such cases, the full data sets (prethinning) included spatial recaptures, which were lost in the ID random thinning, process, but remained in our data as unidentified capture events. These latent identity observations were used in the random thinning SCR model to probabilistically reconstruct latent spatial recapture events, allowing parameters to be estimated with minimal bias (RMSE = 3.83 and RB = −0.01), whereas the spatial scale parameter was estimated with strong bias in the reduced data. Similarly, for simulated data sets with only one spatial recapture (36 cases) using standard SCR, RMSE = 11.61 and RB = 0.31.
According to our simulations, the random thinning SCR model performed well when density was low. For example, using a 12 × 12 grid of detectors and N = 20, even when the range of individuals detected was 4-10, 4-14 individuals were identified, 0-5 individuals were recaptures (0-4 spatial) and 87-143 detections could not be identified, and the population size error was quite low (RMSE = 3.5) (Appendices S3 and S4).
In summary, our simulations showed the random thinning SCR model yielded higher posterior precision than the standard SCR G moderate, and at high density, there was no improvement over standard SCR models. In all cases, the improvement increased as the individual identification rate ( ) decreased.

| Brown bear application
In applying the model to the brown bear data set, we used the posterior median as the point estimate (appropriate at low density and high identification rate-see Appendix S4). The estimate of bear density was slightly (0.8%) lower for the random thinning SCR model (1.019 ± 0.172 bear/100 km 2 ) than for the standard SCR model (1.027 ± 0.195 bear/100 km 2 ; Table 1). By incorporating the unknown ID encounters (see Figures 1 and 3) into the random thinning SCR model, the CV of the density posterior was reduced by 11.8%.
The movement parameter ( ) was also smaller (13.6%) for the random thinning SCR model than for the standard SCR model, and the CV was reduced by 31.6%. The increased baseline encounter rate, 0 , logically reflected the additional encounters used by the random thinning SCR model (Table 1).

| D ISCUSS I ON
We demonstrated that the integration of unknown ID encounters with known ID encounters in an SCR modeling framework can improve the precision of density estimates while making use of available data that are often discarded in studies using noninvasive sampling techniques. A key aspect of the random thinning model that we proposed is that there is a natural mechanistic dependence between the unknown ID encounters and the known ID encounters in that they can arise from the same visitation of an individual to a location. This model structure allows for other count distributions (e.g., negative binomial) to be substituted for the Poisson encounter model without introducing dependence between sample types, and allows for variable thinning rates (discussed below).
Wildlife surveys that use remote camera trapping or passive/ active collection of genetic material commonly produce encounter data that can be attributed to species at a much higher rate than to individuals. Approaches to population size estimation using capturerecapture, including the SCR model extensions (Royle et al., 2014), have traditionally required certainty in the assignment of identity to encounter data. This limitation can result in discarded information, which in other forms might be useful for modeling species

Relative Bias
distributions and/or habitat associations (e.g., Long et al. 2011). We illustrated that the inclusion of species detections (i.e., unknown ID encounters) provided useful gains in precision even under a basic model structure; further model complexity involving relationships between spatial environmental covariates and density or encounter probability (e.g., Royle et al.2013;Sutherland et al. 2015) would be expected to benefit similarly from the additional information that uses random thinning SCR.
In the brown bear application, density was very low but the proportion of identified individuals was high enough (~0.60) that a standard SCR model using only known ID encounters was able to provide a reasonable estimate of density, but even in these cases The random thinning SCR model can be seen as an intermediate model between unmarked SCR ( = 0;  and standard SCR ( = 1). Unlike the standard SMR model, the random thinning SCR model described here does not separate individuals into "marked" and "unmarked" classes. Note, this is the same "random thinning" model previously used by Jiménez, Chandler, et al. (2019) where it was applied to only a subset of individuals that were marked in an SMR framework, whereas we apply it to all individuals. We consider that all individuals may produce identified and unidentified samples, and the process of individual identification occurs with a success rate such that observed data with individual identities are a thinned version of the true encounter histories.
The success rate can be a function of the random failure to map an encounter to an individual (e.g., poor quality photograph or DNA) or a deterministic decision based on study design considerations (e.g., Chandler & Clark, 2014).
Variation in the thinning rate could arise in a variety of ways.
Individual, trap, and occasion-specific covariates or random effects can be modeled on . For example, individuals might vary in their identifiability in camera trap photographs, or DNA amplification TA B L E 1 Posterior summary statistics for both a standard spatial capture-recapture (SCR) model and the random thinning SCR model for a bear population in a 2,624 km 2 (and 6 km buffer) region in the Eastern Cantabrian Mountains in Spain Note: We used the posterior median for all parameters, and presented the standard deviation and 95% Bayesian credible interval (BCI). D is population density (individuals/100 km 2 ); ̂ 0 is the intercept for effort; ̂ 2 is the slope for effort; ̂ 3 is the quadratic parameter for effort; ̂ is the identification rate; ̂ is the parameter for data augmentation; and ̂ is the scale parameter for the half-normal distribution, related to movement of animals.
might be a function of occasion or trap by occasion weather covariates. In fact, our random thinning model could potentially be used for studies where only a subset of naturally marked individuals are identifiable, which are currently analyzed using SMR. This approach would obviate the need to classify photographs as belonging to "marked" or "unmarked" individuals, which is often difficult to determine using natural marks in photographs, but at the expense of reduced information in the unidentified observations (mark status).  (2017) it is not a problem. Another difference of our random thinning SCR model is that by conditioning on the latent encounter history, the model can accommodate a behavioral response to capture.
Computation is slow for the random thinning SCR model, and even using the faster version in NIMBLE without temporal variation on 0 or , is slower to converge than the standard SCR model (in which we will need 50,000 iterations with 3 chains, that requires 20-30 min using a 2.5 Ghz computer). Random thinning SCR requires a high number of iterations, especially at low rates of identified events (typically we will need 500.000-1,000,000 iterations with a thinning of 100 for a = 0.1, which requires 3-6 hr). The mixing in Nimble can be improved and runtime reduced by changing the default update chosen by Nimble for the activity centers (Appendix S2).
According to our simulations (Appendices S3 and S4), the random thinning SCR model outperformed standard SCR model when density and were low. In high-density scenarios with high identification rates, there is almost no improvement in precision from using the random thinning SCR model, and it is advisable to discard the unidentified samples and fit standard SCR models to the observed encounter histories (though these samples may still be useful in a vidual identification under 0.5 are common (e.g., Aziz et al., 2017;Hooker et al., 2015;Kendall et al., 2016;Molina et al., 2017;Murphy et al., ,2016Murphy et al., , , 2018Ngoprasert et al., 2012;Sun et al., 2017), and the random thinning model may therefore be widely applicable.

ACK N OWLED G M ENTS
This research has received financial support from the Spanish for providing the original brown bear data set from the Eastern Cantabrian Mountains (Spain). We thank Sarah Young for English edition. We also thank Cat Sun for her constructive review of earlier version of this manuscript, as well as two anonymous reviewers for constructive reviews that significantly improved the manuscript.
Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

CO N FLI C T O F I NTE R E S T
None declared.