Response to ‘Improving abundance estimation by combining capture–recapture and presence–absence data: Example with a large carnivore’

1. A key area of research in ecological statistics involves combining data sources from multiple streams to improve population estimates. One such model attempts to integrate capture–recapture and presence–absence data to estimate the population size of Eurasian lynx in the Jura Mountains, eastern France. 2. This model has been observed to underestimate population sizes. We conducted an extensive simulation study to evaluate the model's performance. We describe our methods for generation of simultaneous capture–recapture and presence– absence data and demonstrate that the model is flawed. 3. Finally, we give an outline of our hypothesis on why the model underestimates population sizes.

accurate estimates for population parameters from sparse data (e.g.Morera-Pujol et al., 2023;Murphy et al., 2018).We focus on one approach for combining presence-absence and mark-recapture data suggested by Blanc et al. (2014) and show that the method suffers from a conceptual flaw which leads to erroneous estimates of the size of a population except in extreme circumstances.We offer this as a warning to both the statistical and wildlife management communities and emphasize that care must be taken when combining data collected by different means.
The first methods we are aware of to combine data on the same population collected from different sources were the integrated population models (IPMs) of Besbeas et al. (2002).The specific model combined census data with ring-recovery data using the Kalman filter to construct a joint likelihood.The authors noted that the same approach could be applied to combine census data with capture-recapture data, and Besbeas et al. (2005) extended the model using multivariate normal approximations to combine the likelihoods from separate data sources.In a recent overview of IPM, Zipkin and Saunders (2018) argued that local-scale studies are often not broad enough to cover an entire population and make accurate estimates of abundance.They conclude that IPMs make the best use of all the available data by borrowing information from other submodels to produce the most accurate population estimates.
More recently, variations of the IPM have been proposed to combine different types of data with different statistical methods.Renner et al. (2019) combined data from three sources to estimate the abundance of Eurasian lynx in the Jura Mountains, eastern France: sightings of individuals, interference with livestock and pictures of lynx taken by camera traps.The combined model improved population size estimation by incorporating an area-interaction into the likelihood (Renner et al., 2019) but required the assumption that occupancy is fixed (i.e.sites are always occupied or always unoccupied).The model of Renner et al. (2019) also incorporates spatial dependence to avoid the assumption that observations at different locations are independent.Strebel et al. (2022) estimated abundances for two species of bird in Switzerland, the common nightingale in the lowlands and rock bunting birds in the Southern Alps, by combining data from four sources without any marked individuals through a binomial N-mixture model: count data, presenceabsence data, presence-only data, and absence-only data.Strebel et al. (2022) found that their approach improved estimates when any single source of data is sparse, but cautioned against the use of some sources, particularly presence-only data, as bias might be introduced if some sites are preferentially sampled.Specific to combining capture-recapture with presence-absence data, Conroy et al. (2008) considered a two-stage experiment in which an occupancy study is conducted first and then followed by mark-recapture restricted to regions in which the species of interest was detected.Their specific application assumes that individuals do not move between the regions, so that the mark-recapture samples are independent.Jiménez et al. (2022) combined presence-absence data with capture-recapture data to estimate the abundance of stone marten.In this case, the data all came from camera traps.
However, only a portion of the population had sufficient markings to be individually identified meaning that some photographs can be connected to construct capture histories while others provide only presence-absence data.More generally, Chandler and Clark (2014) developed a model that explicitly combines mark-recapture and presence-absence data by modelling the movements of all individuals within a population.Latent variables are introduced to match individuals, whether marked or unmarked, with each detection in the presence-absence data and the model is fit in the Bayesian framework using Markov chain Monte Carlo to sample from the posterior distribution including these variables.The specific application is somewhat limited in that only one type of data is collected on each occasion (e.g.alternating between mark-recapture and occupancy studies).The authors also note that the method is 'computationally demanding, especially when the number of individuals in the population is large' (Chandler & Clark, 2014, pg. 1359).All of these authors note that the inclusion of presence-absence data produced more accurate and precise estimates of the population size than the capture-recapture data alone.However, it is clear that further development in this area is required to create widely applicable methods that overcome the computational complexities.
Here, we focus on the method developed by Blanc et al. (2014) to estimate the size of a population from combined presence-absence and capture-recapture data.The method attempts to connect the two sources of data by equating the probability that a site is occupied and the probability that the population size is greater than zero.The researchers applied their method to estimate the population size of Eurasian lynx in the Jura Mountains, eastern France.
Capture-recapture data were obtained using camera traps, and individuals were identifiable by natural marks.Observers also collected separate presence-absence data through extensive surveys of hair samples, tracks, scat, and/or livestock and wildlife killed.A total of nine unique individuals were identified during the study, and the final estimate of the population size was 10, after rounding, (95% CI: 9.0; 13.0).In comparison, their model of the capture-recapture data alone produced an estimate of 14.46 (95% CI: 9.0; 35.0) and previous spatially explicit capture-recapture analysis by a subset of the authors (Blanc et al., 2013) produced an estimate of 12.04 (95% CI: 9.0; 18) individuals.However, Blanc et al. (2014) did not provide results from either a mathematical analysis of the model or a simulation study to assess the performance of their combined data approach.
We believe that Blanc et al. (2014)'s method is flawed and systematically underestimates population sizes.In a previous study that included this method, Jahid et al. (2022) compared results from six methods to estimate the population size of grizzly bears in the Rocky Mountains, Alberta, using camera trap (presence-absence) data, hair trap (DNA capture-recapture) data or both.These six models were: 1. Closed capture-recapture model, which is the traditional approach of Otis et al. (1978).Blanc et al. (2014).

Combined model of
3. Spatially explicit capture-recapture (SECR) model from detection data of the hair traps of Efford (2004).5. Model SECR with sex as a covariate.
6. Model SECR-O with sex as a covariate.
In their analysis, Jahid et al. (2022) found that the method of Blanc et al. (2014) produced results that were very different from the other methods with a much lower estimate of abundance and a very small posterior standard deviation.The capture-recapture data contain records of 37 individually identified bears over 8 occasions.Of these, 19 were captured only once, and 13 were captured

| Simulation study
The method developed by Blanc et al. (2014) starts by assuming that the population over the entire study area N is a realization of a Poisson distribution with rate parameter .The probability that a new population would have size n is Next, Blanc et al. (2014) model presence-absence at each site with a Bernoulli random variable.They let Z i = 1 and Z i = 0 to indicate if site i is occupied or not occupied, respectively and denote the probability that a site i is occupied as To combine the two data types, Blanc et al. (2014) equate the probability that a site is occupied with the probability that there is at least one animal in the overall population: Assuming that N follows (1), Blanc et al. (2014) conclude that: Using (2), they define: Solving for , they obtain: This is the key equation of Blanc et al. (2014), which connects the capture-recapture and presence-absence components of the model together.The problem is that it operates across two different scales by linking site-specific presence-absence, i , with the overall population size, N. As discussed in Efford and Dawson (2012), the link between presence-absence of individual sites and the total abundance depends heavily on the movement of individuals and the relative sizes of the sampling sites and the individuals' ranges.If individuals have large ranges and move quickly then they may leave signs in multiple sites in a single sampling period.In the limit as either the animal speed or the length of the period increases, only one individual would be needed to occupy all sites on a single occasion.At the opposite end of the spectrum, individuals will only occupy one site on each occasion if their home ranges are small relative to the size of the sampling sites or if they move slowly.
We have assessed the model of Blanc et al. (2014) by conducting an extensive simulation study.On each replicate, we simulated trap locations and a population of individuals, and derived both the capture-recapture and presence-absence data from these individuals.The first step to generating spatial capture-recapture and presence-absence data is defining home ranges.As in Efford et al. (2000), we assumed animals in a population tend to stay within a certain range, denoted as their home range.We assumed the study area to be a unit square for simplicity.Given a fixed population size (N), we used a Poisson process to generate home-range centres X i , Y i , where X i , Y i iid ∼ Uniform(0, 1) for individ-

| Generating capture-recapture data
Capture-recapture data were simulated to mimic camera traps.
Individuals interact with the camera traps and are captured based on probabilities.Given a fixed number of cameras (m 1 ), we generated their locations in the study area, ∼ Uniform(0, 1) for camera traps j = 1, 2, … , m 1 .We assumed that the probability an individual is detected by a camera is related to the distance between a home-range centre and a camera location as well as the amount of movement the species exhibits.
Specifically, we implemented a half-normal detection function so that the probability individual i is detected by trap j on each occasion is where p ∈ (0, 1) is the baseline capture probability for a camera trap at distance 0 from the home-range centre,  > 0 is the movement parameter for the species, and d ij is the distance between the home-range centre i and the camera location j.When is large, animals move throughout the entire study area and in the limit as (1) ( (5 → ∞ p ij → p for all i and j.On the other hand, when is close to zero, animals do not move significantly from their home-range centres and p ij ≈ 0, unless the location of camera j happens to coincide with the home-range centre.Once all the camera traps are placed, we simulate the interactions of individuals with the camera traps.We denote the number of capture-recapture sampling occasions by K 1 and generate a three-dimensional array in which cell i, j, k identifies whether individual i was captured by trap j at time k.This three-dimensional array contains the capture events

| Generating presence-absence data
Presence-absence data generation starts with dividing the study area into regions.We do this simply by dividing the square study area into a grid of equally sized square cells.Following the terminology of Blanc et al. (2014), each grid cell is henceforth referred to as a site.We let m 2 denote the number of sites and K 2 the number of times each site is searched for signs of presence.To model the presence-absence observations, we assume that the number of signs left by an individual within a site follows a Poisson distribution dependent on the proportion of time the individual spends within the site, ij , and the average number of signs it deposits per sampling period, .Let Z ijk be the number of signs that individual i deposits in cell j during period k and D ijk = 1 Z ijk > 0 .Assuming that animals move about their home ranges according to a circular bivariate normal distribution with variance 2 (movement parameter squared), the proportion of time that individual i spends within cell j is where x 0j , y 0j , x 1j , y 1j denote the bounds of the cell.The mean number of signs the individual deposits in cell j is then equal to ij so that and The final indicator of presence-absence in cell j during period k is then which follows a Bernoulli distribution with Note that the probability of detecting presence within a cell increases as either the number of signs deposited ( ) or the proportion of time individuals spend within the cell ( ij ) increase.Blanc et al. (2014) provided supplementary code to implement their model in JAGS (Just Another Gibbs Sampler) using the R package rjags as an interface (Denwood, 2016).We used R version 4.1.2(R Core Team, 2021) to implement our simulation.We compared the averaged population sizes with the true population sizes by computing the bias and mean square error of the point estimates (posterior mean), the bias of the posterior standard errors, and the coverage probability and width of the 95% credible intervals.We defined standard deviation bias to be the difference between Blanc et al.'s (2014) model output posterior standard deviations for the population size and our own calculated standard deviations of the population estimates based on all simulation replicates.We define the coverage probability to be the percentage of cases where the true population value was within the 95% credible interval reported by the model output.The generated capture-recapture and presence-absence data were used as input for the supplementary code from Blanc et al. (2014).Parameters for each simulation scenario were the true population size (N), movement parameter ( ), number of camera traps in the study (m 1 ), number of capture-recapture sampling occasions (K 1 ), number of presence-absence sampling occasions (K 2 ), and the number of sites in the study area (m 2 ).

| Testing the model
We present two case studies, A and B. The aim of case study A was to determine if the performance of the estimates from the model of Blanc et al. (2014) depends on the true population size, N. To study this, we varied N, , m 1 , and m 2 , while holding p, , K 1 , and K 2 as constants.We formed different scenarios from all possible combinations of the parameter values in Table 1.For case study B, our goal was to determine if the behaviour of the model changed from case study A by varying p and .Case study B only had two different values of N, while varying , p, and ; and holding K 1 , m 1 , m 2 , and K 2 as constants.We formed different scenarios from all possible combinations of the parameter values in Table 2.In both case studies, we generated 50 replicate data sets per scenario.Note that

TA B L E 1
Case study A parameter settings to assess the effect of N, , m 1 , and m 2 on the population size estimates.

| RE SULTS
The first key result of our study is that the estimate of the population size depended only on the number of individuals captured at least one time; not the true population size, how often individuals were captured, or the value of any of the parameters (see Figure 1).In fact, the estimate of the population size was almost exactly linearly related to the number of animals captured with correlation = 0.999 . This held even across different scenarios with very different numbers of individuals captured but the same true population size.The estimates of the population size did approach the truth as the capture probability increased to 0.75, but this occurred only because the probability that the entire population was captured at least once became very close to 1 (effectively, the population was censused).
Figure 2 shows that the same result held as increased from 0.15 to 0.55.Increasing animal movement ( ), increases the probability that all individuals are captured at least once, and eventually, the estimate coincides with the truth.Varying the mean number of signs deposited per period ( ) did not change the near one-to-one correspondence of the number of captured individuals and the population estimate from the model (Figure 3).Our conclusion is that the method of Blanc et al. (2014) always underestimates the true population size except for extreme situations in which all individuals are captured.
Our second key conclusion is that the estimate of the population size is completely independent of the parameters that control how signs are deposited.Figure 4 shows how the mean estimated population size varies with , , and p.The results show that the mean estimate is completely determined by and p and does not vary at all with .This implies that the estimate of N does not depend on the parameters of the presence-absence surveys once the parameters controlling the mark-recapture portion of the study are set, and, hence, that the estimate of abundance is unaffected by the presence-absence data.
Credible intervals and estimates of uncertainty based on the method of Blanc et al. (2014) also performed very poorly.The coverage of the 95% credible intervals increased systematically with when p was large enough for the entire population to be captured (p > 0.5) ranging from a value of only 0.1 when was small to almost 1.0 when was large (see Figure 5).Similarly, posterior standard deviations can vastly underestimate the true uncertainty in the estimate of the population size (see Figures A1 and A2 in Web Appendix 1 in the online Supporting Information).The posterior standard deviation underestimated the standard deviation of the estimates of population size from the simulated data sets in all scenarios, and the magnitude of the mean relative bias was as high in magnitude as 0.95.The bias was generally reduced as p and both increased, but the effect was small and the mean relative bias remained below −0.5 in all cases.

| DISCUSS ION
The results of our simulation study show that the estimates of abundance produced by Blanc et al.'s (2014)    we would expect that ≈ 10.Equation (6) would then imply that i = 1 − e −10 = 1 − .00005.This is so close to 1 that we would expect every site to be occupied.Since i is a monotone increasing function of , the situation only becomes worse as N increases.
Failing to observe signs in any site is so unlikely in this case that even one site with no presence-absence signs found in one period forces i and to be as small as possible.Since is bounded below by the number of unique individuals captured, we essentially end up with = n unless signs are detected at every site on every occasion.
Although Blanc et al. (2014) do discuss the use of SECR models, it is in reference to a violation of the assumption about the total abundance, N, being a Poisson random variable.In particular, they suggest that their model may underestimate the variance of N if abundance does not follow a 'homogeneous Poisson random variable' (Blanc et al., 2014(Blanc et al., , p. 1738)).This statement seems to confuse two concepts: that of a Poisson random variable and that of a Poisson process.Homogeneity refers to whether the intensity of a Poisson process is constant over space or time.In fact, the overall abundance generated by a non-homogeneous Poisson F I G U R E 3 Population estimate versus number of observed individuals (number captured).Parameters were selected from case study B and varied by the mean number of signs deposited per survey period ( ).A slope 1 line is plotted.process is still a Poisson random variable, so modelling the distribution of individuals across space as suggested would not affect the distribution of N.This means that incorporating SECR to model the distribution of animals will not solve any problems relating to the Poisson assumption.Moreover, as our simulation shows, the approach does not work even when the distribution of animals is homogeneous.Unmentioned by Blanc et al. (2014) is the issue of spatial scales, with the occupancy likelihood defined over site yet the capture-recapture model collapsed over all sites.It seems they did not recognize the spatial-scales issue even though they report an estimated abundance of 9.96 for their combined model and 14.46 [95% CI: 9.0; 35.0] for the CR model alone.We note that the lower bound of the interval estimate is 9.0, the number of observed individuals.Thus, clearly, Blanc et al. (2014) were encountering the issue of their integrated model returning the number of observed individuals, but they did not explore the underlying cause as we have done here.Blanc et al. (2014) are correct that the problem of combining a SECR model likelihood with an occupancy model likelihood is still open (Chandler & Clark, 2014).Our simulation study demonstrates that parameter estimates can be erroneous if proper care is not taken to ensure that spatial scales are handled appropriately.In cases such as the Eurasian lynx, spatial information should be incorporated and a SECR-type model should be employed.More care is needed when implementing new models; we need to ask whether it makes sense to combine data types especially when they concern different spatial scales.If models such as Blanc et al.'s (2014) are used without such care, there could be repercussions on conservation decisions or future study designs.
Our results show that Blanc et al.'s (2014) approach systematically underestimates population sizes which could misdirect resources and neglect areas that need support, potentially jeopardizing the survival of endangered species.Additionally, our simulation methodology can serve as a tool for researchers developing and evaluating novel population estimation models that integrate spatial capture-recapture or presence-absence data.Researchers must carefully evaluate any novel population estimation model that integrates capture-recapture or presence-absence data before their application in actual conservation scenarios.Integrating multiple data sources to improve population estimates is a promising avenue for future research.However, a cautious approach that prioritizes methodological rigour and considers both model assumptions and the potential pitfalls of combining data sources is paramount for effective conservation efforts.
detection data from hair traps and detection from camera traps model (SECR-O) fromTourani et al. (2020).
twice.The estimate of the population size from the method of Blanc et al. (2014) was only 40.00 (SD = 1.56) while the capture-recaptureonly model estimated 73.36 bears (SD = 31.84),and the SECR model estimated 380.28 bears (SD = 87.92).Although, Jahid et al. (2022) questioned the validity of the estimate from Blanc et al. (2014)'s model, they did not identify the source of the difference or determine whether this was due to a flaw in the model or a violation of assumptions specific to the data set.We explain the reason for the discrepancy and show mathematically and through simulation that Blanc et al. (2014)'s method contains a systematic error that leads to erroneous estimates of abundance.
model do not accurately reflect the true population size.The estimate of abundance was always equal to, or just slightly greater than, the number of individuals captured during the study.This quantity was determined by both the true population size and the overall probability that an individual was captured during the study, which was determined by the movement parameter, , the baseline capture probability p and the number of capture occasions K 1 .The number of individuals captured during a study must always be less than the true population size, barring violations of the basic mark-recapture assumptions like tag-loss, and so the method ofBlanc et al. (2014) was severely negatively biased in most of the scenarios.Further to this, the uncertainty was also vastly underestimated.The posterior standard deviations showed severe negative bias in all scenarios, Case study B parameter settings to assess the effect of p and on the population size estimates.and the 95% credible intervals computed from Blanc et al. (2014)'s model failed to capture the true population size adequately.For the scenarios in which p < 0.5 and ≤ 0.55 the coverage was as low as 0% and increased only as high as 75%, never reaching the nominal coverage rate (Figure 5).The only scenarios in which Blanc et al.'s (2014) model did provide accurate estimates of the population size were those in which the movement parameter was large, = 0.55; meaning, effectively, that the study comprises only one site because individuals move through the entire area.However, the probability that individuals are captured is almost 1 in these scenarios, meaning that almost all individuals are captured and the population is effectively censused.Moreover, the model still did not provide valid estimates of the uncertainty in these scenarios providing posterior standard deviations that still underestimated the true standard deviation of the estimator and credible intervals that failed to meet the nominal coverage probability.Our other finding is that, in contrast to the aim of Blanc et al. (2014) which is to combine information from mark-recapture and presence-absence data, the presence-absence data does not contribute to the estimation of the population size.The number of presence detections does show a positive relationship with the estimate of population size, but this is driven simply by the relationship between the different types of data.Increasing the movement F I G U R E 1 Correlation of the estimated population size versus number of observed individuals (number captured) during the study period for selected parameter sets from case studies (A and B).(a) Had: N = 200, m 1 = 5, m 2 = 100.(b) Had: N = 500, m 2 = 5, m 2 = 64.Both (a) and (b) had: movement parameter ∈ (0.15, 0.25, 0.35, 0.45, 0.55), number of capture-recapture sampling occasions K 1 = 7, and number of presence-absence sampling occasions K 2 = 5.Each point on (a) and (b) represents a value of .
parameter, , increases both the probability that individuals are captured and the number of sites with signs of presence since individuals that move further will leave signs in more sites.Beyond this, the estimates of abundance show no relationship with the parameters that determine the number of signs an individual deposits for each presence-absence survey, .If the presence-absence data were contributing to the estimate, then we would expect the distribution of the estimator to change, and hopefully improve, as increases and more presence signs are found.Since this does not occur, we conclude that the relationship between the number of presenceabsence detections and the estimate of abundance is being driven purely by the relationship of both to the number of individuals captured and that the presence-absence data does not contribute at all.Furthermore, varying the number of sites, number of capturerecapture sampling occasions, number of presence-absence sampling occasions, and the number of camera traps as in case study A and case study B did not affect the outcome of the model.The population size estimates were always just above the number of animals captured.Why does the model produce these results?As mentioned in Section 2.1, the problem is that two different scales are being equated in Equation 6.The mean abundance for the population F I G U R E 2 Population estimate versus (top) and credible interval width versus (bottom).Parameters selected from case study A. True population size was set to 200 with capture probability set to p = 0.75, number of camera traps m 1 = 5, and number of sites m 2 = 100.The dotted red line in the top panel represents the true population.over the entire study area, , is related with the probability that a single site is occupied, .The equation does not take into account animal movements across sites or the size of the sites.Blanc et al.'s (2014) model does account for the fact that sampling of the presence-absence data happens over a period of time, not instantaneously, so that individuals may leave signs in multiple sites during a single period and will become occupied by every individual given enough time(Efford & Dawson, 2012).To understand why the estimator of N essentially equals the number of captured animals consider the following.Since there is only one realization of N, the overall abundance, we would expect ≈ 10.If, for example, N = 10 , a rather small population size, then

F
Mean population size estimate versus the average number of signs deposited per survey period ( ).The true population size, N = 500, is indicated by the solid horizontal line in each panel.Panels are separated by the values of and the colours indicate the value of p.The number of camera traps and the number of sites were fixed at m 1 = 5 and m 2 = 64.
Downloaded from https://besjournals.onlinelibrary.wiley.com/doi/10.1002/2688-8319.12368,Wiley Online Library on [12/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License to encounter traps and unlikely to be captured if they do.In these cases, we only analysed the data sets that produced some captures since it is not possible to estimate abundance from presence-absence data alone.Summaries of the unique number of individuals captured and the number of presence-absence signs detected in the different scenarios are provided in Tables A1-A4 in Web Appendix 2 in the online Supporting Information.