Improving the reliability of eDNA data interpretation

Global declines in biodiversity highlight the need to effectively monitor the density and distribution of threatened species. In recent years, molecular survey methods detecting DNA released by target‐species into their environment (eDNA) have been rapidly on the rise. Despite providing new, cost‐effective tools for conservation, eDNA‐based methods are prone to errors. Best field and laboratory practices can mitigate some, but the risks of errors cannot be eliminated and need to be accounted for. Here, we synthesize recent advances in data processing tools that increase the reliability of interpretations drawn from eDNA data. We review advances in occupancy models to consider spatial data‐structures and simultaneously assess rates of false positive and negative results. Further, we introduce process‐based models and the integration of metabarcoding data as complementing approaches to increase the reliability of target‐species assessments. These tools will be most effective when capitalizing on multi‐source data sets collating eDNA with classical survey and citizen‐science approaches, paving the way for more robust decision‐making processes in conservation planning.


| INTRODUC TI ON
Since the beginning of the last century, global species extinctions have occurred at unprecedented rates and currently over a million species are at risk (IPBES, 2019). Rapid transformations of natural ecosystems and habitat degradation highlight the urgent need for effective conservation strategies to mitigate further biodiversity losses. A prerequisite for the development of such strategies is the provision of comprehensive, reliable and frequently updated monitoring data, recording distribution changes of vulnerable, endangered and invasive species.
A promising approach that is currently gaining momentum and fulfils such monitoring objectives relies on the detection of genetic traces left by organisms, also referred to as environmental DNA (or eDNA). Substantial advantages of eDNA-based methods are higher cost and time effectiveness compared to many traditional survey methods , their noninvasive nature (Cristescu & Hebert, 2018), and high specificity and sensitivity (Wilcox et al., 2013). However, eDNA-based methods have only been applied for about a decade in conservation management, and method reliability and accuracy are still being refined (Cristescu & Hebert, 2018). laboratory practices cannot remove all sources of error. Therefore, in the second part of this article, we focus on the use of statistical tools such as occupancy and process-based models to mitigate errors in single-species targeted and multispecies metabarcoding approaches. Combined with rapid increases in data availability due to lower analytical costs, these tools are already showing promise in providing enough leverage to lift eDNA-based applications in terrestrial and aquatic environments to a new level.

| D IFFERENTIATI ON OF SOURCE S OF ERROR AND THEIR MITI G ATI ON
We will differentiate here between sources of errors emerging at the laboratory and the field processing levels and highlight their potential to trigger false negative or false positive results (Table 1). Such differentiation is equally valid for terrestrial and aquatic systems as well as single (targeted) and multispecies (metabarcoding) sequencing approaches. Multispecies approaches are, however, susceptible to additional bioinformatical challenges, which have been extensively covered elsewhere (e.g. Callahan et al., 2016;Zinger et al., 2019).
Most methodological eDNA studies that aim to improve field sampling methods focus on specific technical aspects of sampling protocols, for instance addressing flocculation vs. filtration methods to concentrate eDNA, filter pore sizes, sample preservation or TA B L E 1 Potential sources of error of eDNA-based methods which emerge at the field or laboratory process levels and culminate either in false positive or false negative result. Descritption of the error sources and their mitigation potential are displayed water collation strategies Li et al., 2018;Spens et al., 2017). These are important contributions improving method reliability, but it is crucial to acknowledge that incorrect results can frequently emerge from mechanisms that are relatively independent of sampling and laboratory protocols, and which can only partly be mitigated (Table 1).
One important factor leading to false negatives is a highly stochastic distribution of eDNA in the field emerging from environmental and ecological drivers (Figure 1). For example, low eDNA concentrations of a target species and small-scale heterogeneity in its distribution can reduce the probability of collecting target eDNA in any given sample (Case 1 in Figure 1). A mitigation strategy to account for resulting false negatives is to increase the sampling effort and adjust the number of natural replicates (independent eDNA samples taken per site) and technical qPCR replicates (Mauvisseau et al., 2019). Further, environmental heterogeneity (Case 2 in Figure 1) can partly be accounted for by considering the small-scale habitat requirements of target species and increasing the number of sampling sites (Troth et al., 2021).
Further, the risk of false negatives is exacerbated by external environmental conditions (Table 1). For example, false negatives can be generated by weather events, such as rainfall or storms, effectively reducing eDNA turnover times (higher flow rates in aquatic environments, washing away of eDNA in terrestrial environments; Sales et al., 2020). Likewise, seasonal or species-specific physiological factors are known to affect species activity and eDNA shedding rates (Buxton et al., 2017;Wood et al., 2019). This will affect detection probabilities and may increase the risk of false negatives during parts of the target species' annual cycle (Troth et al., 2021). eDNA turnover is also strongly impacted by microbial activity and ultraviolet radiation (Buxton et al., 2017), both of which vary seasonally, and may result in decreased detection probabilities. However, low turnover rates can also trigger false positive results as slow eDNA degradation, or resuspension of historical eDNA can wrongly indicate the presence of populations that went extinct or emigrated from sampling sites (Goldberg et al., 2018).
Another factor which can generate false presence indications is downstream transportation of eDNA in rivers (Case 3 in Figure 1). Studies have demonstrated that eDNA can be detected at distances of greater than 10 km, and potentially up to 100 km, downstream from source populations (Pont et al., 2018).
The degree of influence of downstream transportation is determined by a range of factors including eDNA turnover times, shedding rates and the size of source populations (e.g., highlighted in Buxton et al., 2017). Finally, false positive results can also result from sampling-independent introduction of target eDNA into unoccupied habitats. Such "contamination" can either be caused by human activities (e.g., release of ballast water from ships) or naturally via, for example, faeces of the target species' predators (Case 4, Figure 1; Merkes et al., 2014). F I G U R E 1 (a) A hypothetical river system sampled for eDNA to explore the presence/absence of three different 'target' species, an endemic and endangered rhithral species (green), an invasive rhithral species (red) and a native potamal species (yellow). (b) Illustration of true distributions of the three species (coloured river sections) and results of the eDNA sampling (empty circle and full circle represent negative and positive results, respectively). Numbers highlight four mismatches between actual species distributions and eDNA-based surveys. Case #1: A false negative occurs either due to low eDNA concentrations (e.g. high flow velocities and low shedding rates) or inhibition in the headwaters. Case #2: The endangered target species (red) is present, but survey results indicate its absence in the river section due to environmental heterogeneity and poor selection of sampling sites. Case #3: A river section is wrongly assessed as occupied by an invasive species. Here, eDNA from an upstream population is transported downstream leading to ecologically incorrect conclusions. Case #4: Horizontal eDNA transfer resulting from predator faeces or non-sampling-related human activities culminates in a false-positive detection. All four error examples are largely independent of sampling and laboratory protocols but can be mitigated with data processing tools. Cases 1, 2 and 4 can emerge in freshwater, marine and terrestrial environments. Case 3 is restricted to areas with directional eDNA transport (e.g. via wind in terrestrial or currents in marine systems) In the course of laboratory analyses, false positives mainly result from (i) technical contamination through inappropriate procedures and (ii) nontarget eDNA triggering detection (i.e., insufficient specificity; Goldberg et al., 2016;Wilcox et al., 2013). Failures to detect eDNA, on the other hand, can emerge from a diverse range of sources including inadequate storage of samples or extraction protocols (Goldberg et al., 2016), DNA degradation after extraction, inhibition by co-extracted compounds, low sensitivity (i.e., failure to detect low eDNA concentrations; Klymus et al., 2019) and insufficient specificity (e.g., not all genetic variants of a species trigger positive results; Mauvisseau et al., 2019).
Recommendations to reduce the impact of these laboratorybased sources of error focus on the implementation of appropriate laboratory procedures and improvements of method sensitivity (Ficetola et al., 2016;Goldberg et al., 2016). Important measures include the optimization of primer design (Klymus et al., 2019) using negative controls at multiple levels (e.g., extraction of blank samples, and post-extraction controls; Ficetola et al., 2016) and thorough in silico, in vitro and in vivo testing (Ficetola et al., 2016)based on established guidelines (e.g., Limit of Detection and Limit of Quantification; Klymus et al., 2019). Finally, inhibition of target eDNA is a frequently encountered challenge closely linked to the sampling environment (e.g., turbidity; Goldberg et al., 2016).
Inhibition can be detected by adding synthetic DNA as an internal control (Klymus et al., 2019) and can be mitigated using inhibitor removal kits or by diluting DNA templates (Goldberg et al., 2016).
However, both approaches also reduce concentrations of target eDNA and therefore inadvertently have the potential to increase the probability of false negative results.
This short synthesis of common sources of error clearly highlights that methodological optimizations and the reliance on sound ecological background knowledge are crucial for eDNA-based applications. However, a diverse set of errors can only partially be mitigated (Table 1) and will partly persist even when best practices are applied (Lahoz-Monfort et al., 2016). As already low error rates can severely affect the interpretation of results (Ruiz-Gutierrez et al., 2016), we want to next indicate how analytical tools can be utilized, complementary to those practices identified above, to increase the reliability of the use of eDNA-based methods.

| DATA PRO CE SS ING TOOL S TO ACCOUNT FOR SOURCE S OF ERROR S
Two powerful tools to account for emerging errors in survey data are the application of hierarchical occupancy and process-based models. These frameworks take different approaches, but both may estimate uncertainties related to eDNA-based species detection. They attempt to account for the probability of false negative and/or false positive results, thus increasing the information content of survey data and facilitating better-informed decision-making processes in ecosystem management and conservation.

| Occupancy models
Occupancy modelling has developed primarily from statistical approaches to model species distributions accounting for false negatives (Guillera-Arroita, 2017). They are based on a hierarchical structure, recognizing that the probability of detecting a species is contingent on the species being present (see Web Panel 1 for a brief and basic introduction). Models therefore evaluate occupancy probabilities (i.e., the probability that the target species is present at a given site) and detection probabilities (i.e., the probability of detection given that the species is present) as responses to environmental factors or/and co-occurrence of other species (Goldberg et al., 2018;Orzechowski et al., 2019). Specific eDNA occupancy models also evaluate a third probability, the probability of eDNA capture (Doi et al., 2019). Capture probabilities account for the chance of collecting target eDNA in a natural replicate, while detection probabilities denote the probability of detecting captured eDNA with one technical (PCR) replicate. This facilitates the consideration of complex ecological and environmental interactions, at least when sufficiently large data sets of presence-absence records are available to support model structures (Mackenzie & Royle, 2005). One fundamental data requirement is the availability of multiple observations per site (at least two) within a given time period of assumed constant occupancy (referred to as the "closure assumption"; Rota et al., 2009). eDNAbased assays incorporate the simultaneous collection of multiple natural replicates per site as a standard approach, and consequently occupancy models represent a very well-matched tool to increase the reliability and applicability of such data (Brost et al., 2018).
Occupancy models can either be based on frequentist or Bayesian statistical frameworks (Bailey et al., 2014;Ferguson et al., 2015). The main implementation difference between the two lies in the computational methods for parameter estimation: whilst frequentist approaches apply maximum likelihood estimation, Bayesian models are in most cases based on Markov chain Monte Carlo simulation (MCMC) procedures (Web Panel 1). Both frameworks can be implemented using various platforms (summarized in Table S1). A major advantage of Bayesian models is the possibility to include prior information (e.g., expert opinion on the likelihood of species presence) in the process of parameter estimation (Griffin et al., 2019). Further, Bayesian approaches are characterized by a higher inherent flexibility supporting models of greater real-world complexity (Guillera-Arroita, 2017), which has driven the recent surge in their application (Dorazio & Erickson, 2018;Orzechowski et al., 2019). However, they are computationally demanding and can, at times, differ in the data requirement needed to establish models of any given complexity.
Consequently, the choice of framework and platform used in an eDNA context should be adjusted in accordance with data characteristics as well as management and scientific objectives.
Up to now, most occupancy models used in an eDNA context have been applied to single-species data, with multispecies approaches starting to be developed more recently (Doi et al., 2019;McClenaghan et al., 2020). Such multispecies metabarcoding-based assessments have the major advantage of providing information on biodiversity and community composition. However, their drawbacks include higher rates of false positive (Ficetola et al., 2016;Zinger et al., 2019) and false negative results (Harper et al., 2018). Thanks to bioinformatic or methodological advancements, false positives emerging from, for example, tag-jumps, formation of chimeric fragments and reagent contaminants can be mitigated (Schnell et al., 2015;Zinger et al., 2019;Zizka et al., 2019). False negative rates, on the other hand, represent a major challenge (naïve occupancy rates can be half that of targeted approaches; Harper et al., 2018), highlighting the necessity to account for them with data-processing tools.

BOX 1 Capitalizing on metabarcoding community data to correct for false negatives
Conservation and ecosystem management requires reliable monitoring of overall biodiversity as well as species of particular interest. Metabarcoding approaches are powerful tools to capture species diversity but they are hampered by higher false negative rates compared to targeted approaches (Harper et al., 2018), resulting in methodological trade-offs .
Here we assume that a conservation manager needs to monitor biodiversity and the distribution of an endangered species and choose a metabarcoding approach to assess both. A potential tool to correct for false negative measurements of the target species is the use of community data to detect them. In an eDNA context, such tools are largely underexplored. Therefore, we introduce here two options to account for species interactions in occupancy models using metabarcoding data.
Option 1-PCA approach: Co-occurring species can affect the distribution (through, for example, competition or predation) and the detection (induced through behavioural changes or changes in relative abundance) of the target species. One possibility to integrate such effects into occupancy models is to condense the information recorded in the community matrix (ASV table) using a principal component analysis (PCA). Single PCs can then be included alongside environmental variables as predictors of the target species' occupancy and detection probability. The number of included PCs would then depend on the size of the data set and the variance they explain. This approach is simple to implement and suitable for small data sets. However, the establishment of PCs is untargeted, and the condensed information does not necessarily relate directly to the target species.
Option 2-targeted beta diversity approach: A more focused but currently still more experimental approach is the inclusion of a sitespecific prior into Bayesian occupancy models. A site-specific prior score reflects how likely the target species is to occur at a site with a given community composition and is established in a two-step procedure. First, the overall strength of the prior (flat vs. strong) across all sites needs to be determined. This can be accomplished by calculating a community similarity matrix, which is condensed into a nonmetric multidimensional scaling (NMDS) plot (see below). If positive sites are tightly clustered, overall community composition is strongly associated with the target species' occurrence and the prior score needs to differ substantially across sites. The clustering of positive sites can be determined by dividing the 95% range of all positive sites (range that probably contains 95% of all positive sites; red solid line) by the 95% range of all sites (yellow line). The inverse of this fraction can then be used to determine the prior range in the occupancy model.
In a second step, an individual prior score for each site needs to be established. This can be achieved using a density probability function (e.g., based on the log-distance to the centroid of all positive sites) to determine the likelihood of each negative site being a false negative. This likelihood can be standardized by the prior range and then be included in the occupancy model. The utility of this approach still awaits testing, but it provides the substantial advantage of facilitating the incorporation of complex targeted information in occupancy model frameworks without substantially increasing data requirements.

| Computational "add-ons" to occupancy models
A potentially powerful option to mitigate higher false negative rates is the consideration of wider community data as co-existing species often shape the realized niche of the target organisms (Box 1).
Recently developed multispecies occupancy models for eDNA data (Doi et al., 2019;McClenaghan et al., 2020) are an important advancement but do not account for species interactions in their model structure. The other end of the spectrum is represented by joint distribution models, which incorporate complex interactions among species or functional groups but which are often highly complex and data hungry (Pollock et al., 2014). A middle way that facilitates the consideration of species interactions, or at least the co-occurrence of their DNA at a sampling location without requiring hundreds of data points, is the use of community similarity indices, which we introduce in Box 1. Their performance could be further improved by including additional variables such as sequencing depth (McClenaghan et al., 2020) or absolute abundance measures (e.g., acoustic surveys of fish biomass) accounting for the dependency of false negative rates on total eDNA densities in occupancy models.
A major current challenge for occupancy models is that most do not account for false positives (Ferguson et al., 2015), although even low levels of false positives can substantially affect the reliability of model predictions (e.g., 2%-3% false positives may result in 50% overestimation of occurrence; Ruiz-Gutierrez et al., 2016). Further, the impact of non-accounted false positives is dependent on the number of natural replicates used in a survey (Ficetola et al., 2015).
Normally, one would expect that the accuracy of model predictions will be positively affected by a higher number of natural replicates.
However, once false positive rates increase, the gain provided by the higher number of natural replicates quickly disappears and is turned into a negative effect (Figure 2). These nontrivial relationships can be partly compensated for by setting detection thresholds (Ficetola et al., 2015) but, if overlooked, they can result in major flaws in sampling design and strategies.
A key difficulty in developing occupancy models correcting for false positives (emerging from either a method or a process type F I G U R E 2 The performance of occupancy models varies in response to a number of factors including sampling design (e.g. number of natural replicates), reliability of data collection (false-positive and false-negative rates) and species' occurrence (occupancy rates). Here, we assess interactions between these factors in their impact on the reliability of model predictions. Each of the simulated 2000 points per panel reflects a landscape with 80 sampling sites (see the supplementary information for details). Panels present comparisons of the true rate of occupancy (dotted red line) with the modelled rate of occupancy (blue line, shaded area reflects 95% range of data) across a range of different detection probabilities. Increases in detection probabilities generally improve model accuracy (difference between true and modelled rate of occupancy) and precision (spread of model outcomes, blue shaded area). In contrast, the impact of higher numbers of natural replicates was strongly dependent on the rate of false positives. When false positives were absent, more natural replicates improved model performance (mostly precision; a-c). However, the opposite was true when false positives occurred (d-f), whereas the impact size of this effect increased with decreasing occupancy rates (g-i). The fact that occupancy modelling performs at first glance better (i.e. higher precision) but in fact is hampered by low accuracy highlights the importance of minimizing false-positive rates as far as possible and accounting for them in modelling approaches error, see Darling & Mahon, 2011) is that it is mathematically challenging to simultaneously account for false positives and false negatives using a single data set (Miller et al., 2011). New approaches, which have been developed Miller et al., 2011) and are in the process of being applied (Louvrier et al., 2019), often depend on the use of multiple data sources (Chambert et al., 2015). Specifically, one data set with high credibility (available for a subset of sampling sites), is initially needed to establish true positive and true negative results (Brost et al., 2018). True positives and negatives can then be compared with a second data set (e.g., and support better-informed decision-making processes. Indeed, citizen science has the potential to collect large volumes of data over vast areas (Larson et al., 2020). Despite potential bias when covering larger geographical areas, data quality has been shown to be highly improved with limited training and the use of validated and standardized protocols (Larson et al., 2020) as discussed above.

| Integration of spatial patterns and processbased tools
Another key component that is starting to be integrated into species distribution (Domisch et al., 2019;Pacifici et al., 2017) and eDNA occupancy models (Chen & Ficetola, 2019) is the spatial distribution of species occurrence in their habitat. A number of factors, including spatial autocorrelation of environmental conditions, population distribution patterns and/or eco-geographical factors, result in spatial coupling of species occurrence (Legendre, 1993). Spatial population structures can be accounted for in occupancy models by integrating autoregressive terms (Domisch et al., 2019;Pacifici et al., 2017),

BOX 2 Spatially explicit models in river networks
Spatial non-independence is a common phenomenon across terrestrial and aquatic habitats. The consideration of spatial dependencies is therefore an important step in the determination of species distributions. Despite the generality of spatial interdependencies, most methodological advances accounting for such effects have been generated in aquatic research. In freshwater ecosystems, upstream environments tend to influence more downstream-located habitats. The resulting directionality and overall nestedness leads to a strong spatial autocorrelation among river reaches (Legendre, 1993) that needs to be accounted for in any spatial model (Dormann et al. 2007). Dormann et al. (2007) provided three main arguments highlighting the necessity to integrate spatial autocorrelation in modelling approaches: (i) species dispersal is distance-related, (ii) the nonlinear relationships between environment and species cannot be modelled as linear, and (iii) the fact that a nonspatial statistical model would fail to account for environmental determinants, which are spatially structured, and whose spatial structuring cascades into the response.
One way to account for spatial autocorrelation in river networks is provided by simultaneous autoregressive (SAR) models. SAR models represent the directed version of conditional autoregressive (CAR) models and incorporate the possibility to apply an asymmetric covariance matrix.
More recently, Peterson et al. (2013) have introduced the spatial statistical stream network (SSN) model framework, tailored explicitly towards stream network applications; Hoef et al., 2014, Ver Hoef & Peterson, 2010. Here, the model allows us to accommodate so-called "tail-up" and "tail-down" models, where the former refers to accounting for autocorrelation between flow-connected locations, while the latter also allows spatial autocorrelation between flow-connected and flow-unconnected locations (Peterson et al. 2013).
Models that account for spatial autocorrelation outperform nonspatial models (Domisch et al., 2019;Ver Hoef & Peterson, 2010), but the preprocessing regarding hydrological connectivity and spatial weights in the spatial models requires advanced GIS skills, posing a challenge to the wide application across disciplines in freshwater research.
Occupancy models (see Web Panel 1) account for the detection probability of species, which is crucial especially when modelling species that are difficult to detect (Comte & Grenouillet, 2013). Such models can be extended to spatially explicit occupancy models by incorporating spatial random effects via CAR or SAR component in the model (Chen & Ficetola, 2019;Latimer et al., 2006), and can then be applied to river networks (Domisch et al., 2016, Domisch et al., 2019. several positive measurements. Applications of autoregressive terms in eDNA occupancy and species distribution models have only recently been demonstrated to improve model performances (Chen & Ficetola, 2019), and represent a promising approach to enhance the reliability of eDNA surveys.
Autoregressive models have also been developed to account for directed effects (e.g., simultaneously autoregressive models; see Box 2). Directed autoregressive terms are certainly powerful tools but they have limitations when dealing with systematic errors (Box 2) such as the downstream transport of eDNA in rivers causing false positive results (Pont et al., 2018). A more bespoke tool to correct for systematic and directed increases in the risk of false positives is the application of process-based models.
Process-based models provide an alternative to statistical ap- To account for such false positives, eDNA export curves can be established to quantify the transport of upstream eDNA to downstream habitats. Such export curves can be deduced from hydrological models and mesocosm eDNA degradation experiments (e.g., Seymour et al., 2018;Song et al., 2017), or simply by measuring in situ the downstream transport of introduced eDNA (e.g., Pont et al., 2018). Once established, export curves from upstream sampling points (Figure 3b) can be compared to measured eDNA concentrations of downstream sites. This requires the establishment of upper (C pred.upper ) and lower (C pred.lower  Alternatively, the output of process-based models can be included as prior information in hierarchical (e.g. occupancy) models that account for false positives, which allows capitalizing on synergies between the two approaches.

| OUTLOOK AND FUTURE CHALLENG E S
Good field and laboratory practices are essential for minimizing many sources of error associated with the use of eDNA, but even best practices cannot exclude the occurrence of false positive and false negative results. Data processing tools such as occupancy or process-based models provide opportunities to mitigate the impact of many sources of error. Increases in the size of eDNA data sets (1) due to improved cost-efficiency and further technical advances will substantially help to further improve the power of these data processing tools. However, eDNA-based monitoring should not be seen in isolation or as a replacement for traditional survey approaches.
The true potential of eDNA-based methods can only be capitalized on when they are combined with other sampling data and jointly integrated with data processing tools Pacifici et al., 2017). Consequently, an important future challenge will be the coordination and scaling of different assessments, such as traditional sampling methods, eDNA-based methods and citizen science campaigns. Only together can these approaches raise the necessary public awareness and provide the reliable baseline data required to meet future challenges in conservation and ecosystem management.

ACK N OWLED G EM ENTS
The study was funded by the Global Challenges Research Fund, UK, to M.S. and the Leibniz Competition to S.D. (J45/2018).

DATA AVA I L A B I L I T Y S TAT E M E N T
Data are provided in supplementary information. to describe the two sources of uncertainty during a survey. One is the sampling variation and the other is the imperfect detection.

O RCI D
In the simplest case, a site is repeatedly visited to detect the presence of a target species. By chance, the target species may not be present during the time of a specific visit. As a result, the number of detections x from a total number of visits n is a random variable and is usually modelled by the binomial distribution. In almost all occupancy surveys, our method of detection is rarely perfect. We may fail to detect the target species when it is present. The imperfect detection process is modelled by a Bernoulli distribution (a special case of the binomial distribution when n = 1). When we conduct a survey, we observed the number of times (x) we detect the target in n visits. Because of imperfect detection, we cannot definitely tell that an observed 0 is because the target is absent or because of detection failure. However, mathematically, we can describe the data-generating process as a result of the two separate random processes.
The number of presences (x) is modelled by the binomial distribution: where θ is the probability of detecting the target species. The probability θ is the product of the probability of being present (ψ) and the probability of detecting the target when it is present (p). The probability of being present ψ is the occupancy probability and the probability of detection is a conditional probability characterizing the survey method. Both probabilities are of interest. However, the data we have (x, n) has only information for θ = ψ × p. That is, ψ and p are numerically unidentifiable in the simplest case. Additional information is needed for separating ψ from p.
A typical occupancy study is designed to provide information to separately estimate ψ and p. For example, the ad hoc two-step approach developed by Geissler and Fuller (1987) is designed to estimate p first and then used the observed x to estimate θ. The process of estimating p requires repeated sampling of the same sites and counting the number of times the target species is detected and the total number of visits after the first detection (when the presence of the target species is confirmed). The detection probability is then approximated by the number of detections divided by the number of visits. Other survey designs are aimed at using covariates to provide the necessary information to better separate the two probabilities.
These designs do not always work well under the classical statistics framework, where the maximum likelihood estimator is usually used.
The underlying numerical identification problem is always lurking, in addition to the inherent positive correlation between the detection probability and the occupation probability. That is, the more abundant the target species is, the easier it is to detect them.

Bayesian analysis of occupancy models
The arrival of the MCMC method , especially its computer implementation in software packages such as winbugs, jags and now stan made the computation under the Bayesian seemingly straightforward. For example, the  (Qian, 2012). Without additional information (e.g., a proper informative joint prior), these numerical issues will always be present.
Consequently, the key to a successful Bayesian occupancy model lies in the development of a proper joint prior distribution of the two probabilities.
When using eDNA for occupancy modelling, we face not only the imperfect detection probability (false negative), but also false positive. Let p p and p n be the probability of a false positive and false negative, respectively (note that p = 1 − p n ). The observed number of presence x is still a binomial random variable, but the probability of observing a presence (positive eDNA) is now θ = ψ(1 − p n ) + (1 − ψ)p p .
Without proper (informative) priors for p p and p n , occupancy modelling will always be numerically unstable.