Finite mixtures in capture-recapture surveys for modelling residency patterns in marine wildlife populations

In this work, the goal is to estimate the abundance of an animal population using data coming from capture-recapture surveys. We leverage the prior knowledge about the population's structure to specify a parsimonious finite mixture model tailored to its behavioral pattern. Inference is carried out under the Bayesian framework, where we discuss suitable priors' specification that could alleviate label-switching and non-identifiability issues affecting finite mixtures. We conduct simulation experiments to show the competitive advantage of our proposal over less specific alternatives. Finally, the proposed model is used to estimate the common bottlenose dolphins' population size at the Tiber River estuary (Mediterranean Sea), using data collected via photo-identification from 2018 to 2020. Results provide novel insights on the population's size and structure, and shed light on some of the ecological processes governing the population dynamics.


Introduction
Capture-recapture (CR) methods are statistical techniques widely employed to estimate the size of an elusive population for which it is impossible to get a complete enumeration.This task, applied initially to ecology for the study of fish and wildlife populations (Otis et al., 1978;Wu and Holan, 2017;Matechou and Argiento, 2022), is now common to many other application fields such as epidemiology (Chao et al., 2001;Böhning et al., 2020;Maruotti et al., 2023) and social sciences (Böhning and van der Heijden, 2009;Brittain and Böhning, 2009;Böhning et al., 2018;Silverman, 2020;Di Cecco et al., 2020;Farcomeni, 2022).The term capture is inherited from the traditional way wild animals have been identified for decades -namely through capture, marking and release -but it is not necessarily intended for its physical sense anymore.Researchers increasingly adopt non-invasive methods for monitoring wild populations to minimise the costs and impact on the population of interest.Among those, photo-identification (Royle et al., 2009;Pace et al., 2021) and DNA sampling (Bravington et al., 2016;Morin et al., 2016) are becoming increasingly popular as they minimise behavioural responses that may bias the final estimates (see Alunni Fegatelli and Tardella, 2013, and references therein).
Original applications of such methods date back to the beginning of the 20th century and were based on standard homogeneity assumptions on the population structure and the identification process (Le Cren, 1965;Amstrup et al., 2010).The literature is now rich in alternatives that can address a large variety of deviations from such basic model assumptions and suit situations where, for example, individuals exhibit heterogeneous behaviours (Pledger, 2000), sampling occurs in continuous time (Altieri et al., 2022), stop-over sites are present (Matechou et al., 2013;Worthington et al., 2019;Wu et al., 2021), temporary emigration is allowed (Zhou et al., 2019), and so on.For an exhaustive review, see King and McCrea (2019) and references therein.
Our work concentrates on the abundance estimation of a common bottlenose dolphin population inhabiting a delimited area over multiple years.Such population is known to be open and calls for using the Open Jolly-Seber model, a standard CR framework for open populations (Amstrup et al., 2010).Furthermore, the established ecological literature affirms that it comprises individuals with different residency patterns and results in a population clustered into groups with different levels of site-fidelity.Hence, the homogeneity assumption of the standard CR framework cannot hold, and the behaviour of individuals belonging to different groups (i.e.entrance, capture and survival) must be described by different parameters.Accurate estimation of the clustering structure and parameters is of utmost interest to ecologists who want to describe the population's dynamics and inform conservation policies.A widespread practice in such a heterogeneous setting is to consider the inclusion of individual covariates that can help explain the differences among the population's members.However, informative covariates are often unavailable and the heterogeneity is entirely latent, as it is in the case study under consideration.Finite Mixture Models (FMM) represent the natural solution to this impasse.Each individual is assigned to a different mixture component with its own set of common and distinct parameter values.FMM approaches to CR models have been successfully employed in a likelihood-based framework both in closed population (Pledger, 2000;Dorazio and Royle, 2003;Pledger, 2005) and open population (Pledger et al., 2010;Guéry et al., 2017) settings.From the Bayesian perspective, attempts have been made to model heterogeneity in detection and behavioural effects in closed populations by Ghosh and Norris (2005).More recently, Turek et al. (2021) proposed a non-parametric finite mixture model with an unknown number of components and different capture probabilities.We build on the Pledger et al. (2003Pledger et al. ( , 2010))'s FMM extension to the Open JS model to account for latent heterogeneity.We embed this finite mixture approach in the Royle andDorazio (2008, 2012)'s parameter-expanded data-augmentation (PX-DA) framework, which turns out to be particularly convenient to fit Bayesian CR models via standard Markov Chain Monte Carlo (MCMC) algorithms.We discuss its implementation challenges and introduce suitable prior specifications that mitigate the label-switching and non-identifiability issues of FMM.The model is tested through an extended simulation study and applied to photo-identification CR survey data of the common bottlenose dolphins (tursiops truncatus) population inhabiting the area of the Tiber River estuary in the Mediterranean Sea.In particular, we show how the prior scientific knowledge on the population of interest can be leveraged to specify a parsimonious FMM tailored to its supposed structure and check its validity.When this is the case, we show it can sensibly improve the performances of more comprehensive specifications.
The remainder of the paper is organised as follows: Section 2 describes the motivation and intuitions behind our work, with a brief description of the data that will be considered later on; Section 3 illustrates a Bayesian hierarchical formulation of the Pledger et al. (2003Pledger et al. ( , 2010))'s Jolly-Seber (JS) class of mixture models within the Royle andDorazio (2008, 2012)'s data-augmentation framework and introduces our modelling proposal as a parsimonious alternative; Section 4 reports a simulation experiment to investigate different aspects of the model and its estimation; Section 5 provides the model choice and results on the set of data originally introduced as the motivating example.

Motivating example
This work is motivated by the need to get an abundance estimate of the population of the common bottlenose dolphins inhabiting the area of the Tiber River estuary in the Mediterranean Sea.Boat-based daily surveys have been conducted between 2018 and 2020 under favourable weather conditions to collect photographic and acoustic data of the specimens encountered in the study area during the search (Papale et al., 2021;Pace et al., 2022c,a,b).The photo-identification technique was used to identify unique individuals over multiple sampling occasions and to build single capture histories.For this analysis, we focus on the so-called well-marked individuals since the probability of their misidentification can be assumed to be negligible1 .As a consequence, the final estimates are related to the subset of well-marked individuals only, representing a portion of the population visiting the study area.Figure 1a shows the cumulative number of identified individuals across the different sampling occasions, where the size of each point is proportional to the number of newly identified individuals.Its trend is known as the discovery rate, which is maximum during the first year (notice that 50% of new identifications occurred in 2018, with a maximum of 25 newly identified individuals registered in August) and slowly decreases over time.Further details about the study area, the data collection process and the analysis are available in Pace et al. (2021).Most of the recent literature about common bottlenose dolphins converges towards the identification of three groups characterised by different levels of site-fidelity to a specific area: from the most to the least frequently present (Dinis et al., 2016;Hunt et al., 2017;Haughey et al., 2020;La Manna et al., 2022).This feature is of utter interest to biologists interested in disentangling the permanent or semi-permanent population from the transient one.Generally speaking, one group is composed of individuals that (almost) never leave the study area; these are usually referred to as resident individuals, observable on many occasions and for long periods of time, and expected to have the largest number of captures (see Figure 1b).The other group includes individuals who are not continuously present in the study area but regularly visit it; these are called part-time individuals, observable throughout a wide time window but usually encountered at occasions far apart in time.The last group comprises individuals that enter the study area only once in their lifetime for a short time window and whose captures are rare; these are transient individuals that are observable only on occasions occurring on close dates.This inherent heterogeneity cannot be neglected without hindering the reliability of the final abundance estimates (Gimenez et al., 2018).To this end, Pace et al. (2021) applies a hierarchical clustering and includes the group labels in the POPAN-JS framework (Schwarz and Arnason, 1996) to model heterogeneity in the entrance, capture and survival probabilities.They find signs of the expected structure and estimate significant differences in most components of the JS model.While practical for understanding the underlying structure of the population of interest, this two-step approach has some relevant issues.First of all, there is neither quantification nor propagation of the uncertainty of the classification step onto the modelling step; this can bias the final estimates and yield over-confident conclusions.Second, the same information set is used twice in two different statistical procedures, where the latter is performed after conditioning on the first; this can lead to a confirmation bias in favour of the original hypothesis.
This paper proposes to unify the two steps into a joint statistical procedure.We embed FMM into the Open-JS framework in a Bayesian hierarchical setting, allowing for the estimation of cluster labels and population size altogether while properly propagating the uncertainty at all levels (Clark and Gelfand, 2006).

State of the art and proposed model
Let D be the number of distinct individuals observed at least once during T distinct sampling occasions.Data are collected in a (D × T ) matrix, say Y, with the generic element recording whether individual i = 1, . . ., D individual has been observed (y it = 1) or not (y it = 0) at occasion t = 1, . . ., T .Hence, the rows of Y contain the capture histories of all the encountered individuals.The open-population JS model (Schwarz and Arnason, 1996) envisions individuals entering (i.e., via birth or immigration) and exiting (i.e., via death or emigration) the population during the sampling occasions.The emigration is assumed to be permanent for identifiability purposes, i.e. once an individual has left the population, it cannot return to it.Furthermore, the JS models assume that all captures are independent across individuals and over time.The latter is achieved by considering the existence of a super-population of unknown size, say N super , accounting for all the individuals potentially available (encountered or not) in the study area between the first and the last sampling period.The super-population size is the main parameter of interest and determines the dimension of the model parameter space.Its typical estimation in the Bayesian framework involves jumping between spaces of different dimensions, hence requiring the implementation of Reversible Jump MCMC algorithms (Brooks et al., 2000).
Here, we adopt the data-augmentation approach proposed by Royle and Dorazio (2008) to keep the dimension of the parameter space fixed throughout the iterations, thus bypassing the need for the Reversible Jump MCMC (Arnold et al., 2010).

Parameter Expanded Data-Augmentation (PX-DA) formalisation
The Parameter Expanded Data-Augmentation (PX-DA) approach by Royle and Dorazio (2008)  The data generation process assumes that an individual can be recruited into the population at the beginning of each sampling period only if it has never been recruited on previous occasions.On the other hand, individuals who have already been recruited can leave the population between two subsequent sampling periods.All the D observed individuals will eventually be recruited in the population between the first and the last capture occasion, as they have been captured at least once.This dynamic is controlled through two time-varying latent binary variables: the first one, r it , is equal to 1 if and only if individual i is recruitable at time t (0, otherwise); the second one, z it , is equal to 1 if and only if individual i belongs to the population at time t (0, otherwise).
All M individuals are recruitable at the first occasion (i.e.r i1 = 1, i = 1, . . ., M ), while they become permanently non-recruitable once they have entered the population.Let ρ t , t = 1, . . ., T be the recruitment probabilities, i.e. the probability that an available (not yet entered) individual in the augmented dataset is recruited in the population at time t.Let ϕ t , t = 2, . . ., T , be the apparent survival probability2 , i.e. the probability that a recruited individual is in the population at the following sampling occasion.The following rules govern the latent label process: Thus, the distribution of the generic element of the augmented data matrix can be expressed conditionally on z it as: where p t is the capture probability at occasion t.Since y it = 0 almost surely when z it = 0, (1) is a zero-inflated binomial model.Notice that, given M , the marginal likelihood of the capture histories can be expressed as: where Z ⊂ {0, 1} M ×T contains all possible z it configurations and θ is the full set of model parameters θ = {p t , ϕ t , ρ t } T t=1 .The integral in the likelihood expression has no general closed-form solution and numerically marginalising out the latent components can be computationally intensive.This problem is common to the Hidden Markov Models (HMMs) literature, in which some efficient techniques based on the forward algorithm have been proposed to efficiently solve the integration problem Worthington et al. (2019).Alternatively, MCMC algorithms provide a viable solution to work, iteration after iteration, with the conditional likelihood: and approximate the posterior distribution of all the quantities of interests in a Bayesian setting that provides full uncertainty quantification.Furthermore, standard MCMC methods are now relatively easy to implement by practitioners due to the availability of software like JAGS (Plummer, 2003) or NIMBLE (de Valpine et al., 2017).
Marginally, the hierarchical model specification resulting from Equation (1) implies that N super ∼ Binom(M, ψ), where ψ is the overall inclusion probability (throughout all occasions) of any individual in the super-population.Royle and Dorazio (2008) show that ψ is linked to the recruitment probabilities through the following equation: (3) In particular, (3 Hence, choosing the prior distribution for ρ 1 , . . ., ρ T is crucial to determining the prior on N super .Dorazio (2020) demonstrates that the prior: induces an objective prior on N super .In terms of practical inference, the estimated population size at each time t and the overall super-population size can be derived through the latent variables z's, namely

Jolly-Seber (JS) finite mixture modelling for open populations
Most populations are composed of individuals with heterogeneous behaviours.In some cases, the heterogeneity can be assumed to be well-described by a finite number (say G) of different patterns.These depend on the individual's latent traits, and their consideration requires complex modelling tools, such as FMM.The underlying assumption of FMM is that each unit can belong to only one group g = 1, . . ., G, with unknown prior probabilities w g ( g w g = 1).
The individuals belonging to different groups in open population CR studies may have different capture, recruitment, or survival parameters.In the most general specification, the relative order among the parameters of different groups can change at each time t (e.g. group one could have the highest detection rate at the first sampling occasion but the lowest at the second one).This JS-type mixture model is known as the Interactive Heterogeneous Model (IHM) and its very rich specification depends on too many parameters for successful model fitting (see Pledger et al., 2010, for further discussion).Pledger et al. (2003Pledger et al. ( , 2010) ) explore simpler specifications that could adequately represent the population structure and introduce a convenient notation to navigate through all possible sub-models.Let t and h be the time and group heterogeneity effects, respectively.Different expressions correspond to different modelling structures: constant in time and homogeneous across groups (•); time-varying but homogeneous across groups (t); constant in time but heterogeneous across groups (h); time-varying and heterogeneous across groups with separable interaction (t + h); time-varying and heterogeneous across group with non-separable interaction (t × h).For example, the IHM corresponds to {[ρ t×h , ϕ t×h , p t×h ] G }, where the underlying population is supposed to be composed by G classes.If we want to specify a model whose heterogeneous group effect lies in the capture probabilities only, we write The subscript is moved to highlight that the mixture of G components is related only to detection.We will take advantage of this notation in the sequel of the paper.

Modelling class heterogeneity using finite mixtures within the PX-DA approach
We embed the PX-DA formalisation of the open JS model into FMM by adding one layer of hierarchy in the original hierarchical specification.Now, let c i ∈ {1, . . ., G} be the latent membership label of each individual i = 1, . . ., M in Y aug .The full hierarchical specification is as follows: where π .(•) refers to a generic prior distribution for the parameter.The hierarchical formulation of (5) further complicates the likelihood expression of Equation ( 2) to: includes the prior weights of the cluster components and the component-specific sets of parameters.Its evaluation needs further marginalisation with respect to the latent group labels, and, once again, MCMC methods are a viable solution to achieve Bayesian estimation of all quantities of interest.This modelling framework includes many possible specifications, according to what varies with time and across groups.We further generalise the model specification by adapting the survival mechanism to describe not equally spaced capture occasions.Indeed, the assumption of constant survival across identical time scales does not transfer to the not equally spaced scenario.When this is the case -i.e.l t = (τ t − τ t−1 ), t = 2, . . ., T are the time differences between subsequent occasions -the survival probabilities should be appropriately compounded.Once the time scale is set (e.g., days, weeks, months, years, etc.), we have ϕ gt = ϕ lt g , where ϕ g represents the survival probability across a single time-unit on the chosen scale.In addition, following Pledger et al. (2003), a convenient and parsimonious way to express the time-varying capture probabilities is through the logit link, i.e. logit(p gt ) = µ g + τ t , t = 1, . . ., T , where µ g determines the overall average capture probability of each group and τ t is an occasion-specific differential effect.
The last step of the Bayesian model specification involves the choice of prior distributions for all parameters and their hyper-parameters.The natural prior for the mixture weights in finite mixture models is the Dirichlet distribution Dir G (α 1 , . . ., α G ).It corresponds to a uniform distribution over the G-dimensional simplex when α g = 1, ∀g, x is usually adopted as weakly informative prior.For ρ gt , t = 1, . . ., T , we follow Dorazio (2020) and use the prior in (4).General-purpose and weakly informative priors can be ascribed to µ g .At the same time, we suggest considering a N (0, σ 2 ) on each parameter τ t , with σ 2 small, to induce only small time variations (in the logit-scale) on the capture probabilities.We further impose that t τ t = 0 to favour the otherwise weak identifiability of the µ g and the τ 's (Pledger et al., 2003).Nevertheless, such choices do not defend the model from the label-switching problem of the class-specific parameters typical of the FMM framework because of the likelihood invariance under permutations of the components' labels.

Choosing priors on class-specific parameters
The choice of the prior distribution on component-specific parameters of FMMs can be seen as a challenge and an opportunity.This class of models suffers from several sources of nonidentifiability.In our setting, the most affected are recruitment and survival probabilities.However, the final inference on the population size is usually robust with respect to their proper identification (see Mena and Walker, 2015, and references therein).The recruitment probabilities ρ gt are model devices that allow the entrance of new individuals in the population, accounting for the openness of the population (Royle and Dorazio, 2008).They can be treated as nuisance parameters without a solid biological meaning and their identifiability is not of great concern.On the other hand, the capture and survival probabilities are of biological interest.They can be seen as indicators of how long and how often units of different types visits the sampling area.
A weakly informative or regularising choice of their prior distributions can favour identifying these components.In turn, this can ease the identification of the membership labels already affected by the well-known label-switching problem.Among the different solutions that have been proposed in the literature, one relies on imposing an ordering constraint among the component-specific parameters, i.e. u 1 < • • • < u G (Richardson and Green, 1997;Diebolt and Robert, 1994;Chung et al., 2004).This task is trivial in the context of parameters belonging to the whole real domain, where truncating or shifting Gaussian distributions allows full control of the position and scale of the priors.The latter is a reasonable choice for the µ g , g = 1, . . ., G parameters.However, when u g ∈ (0, 1) , ∀ g as in the case of the survival probabilities, the conditional prior specification must account for their bounded domains.The standard solution would consider conditionally specified Uniform distributions.However, we exploit the alternative proposal of Alaimo Di Loro et al. ( 2022) that allows for more flexibility.It is based on the Beta distribution and its generalisations (truncated or restricted), and it is both effective in imposing the constraint and controlling for the shape and first moments of the induced marginal priors.In other words, one may set u 1 ∼ Beta(α 1 , β 1 ) and where tBeta and rBeta represent the Truncated Beta and Restricted Beta, respectively, and it is possible to derive closed-form expressions for the marginal prior distribution of u g , g ≥ 2, or at least its first and second moment.Further computational details are given in Section B of the Appendix, along with formal proofs of the original results.

The RPT model for the common bottlenose dolphin population
The population structure illustrated in Section 2 can be translated to reflect variations in the parameters of the JS-type PX-DA framework.We have that the resident (R) individuals, showing high site-fidelity, should be characterised by high survival and capture probabilities; the part-time resident (P) individuals, with average site-fidelity, should be characterised by high survival probabilities but eventually undetectable on some occasions; the transient individuals (T) only shortly visit the study area and hence should have very low survival.We want to build a suitably flexible encompassing model such that it is possible to establish the amount of evidence in favour of this assumption.In other words, we shall not enforce this exact structure in our model, but we can make it so that it is recognisable if present.
For instance, we could allow the three groups to have group-specific survival and capture probabilities as in the full specification of (5).That might be a too flexible model with a huge number of parameters, therefore yielding highly uncertain estimates because of weak identifiability.Alternatively, we can propose a more parsimonious specification tailored to the supposed behavioural differences among individuals from different groups.In this way, some components can be assumed to be common to different groups, reducing the model's complexity and favouring its identifiability.
First of all, we model the population recruitment dynamic assuming that each group is characterised by its own set of time-varying recruitability parameters, i.e. ρ R, t , ρ P, t , ρ T, t .This setting naturally induces a clustering in the relationship between the recruitment process and the super-population size N super , yielding group-specific inflation parameters that express the total population as the sum of three sub-populations.This modifies the analytical expression of the expected super-population size as follows: where w g and ψ g (g = R, P, T ) are, respectively, the g-th component-specific mixture weight and inflation parameter.
Second, we separate the three sub-populations into two distinct classes, characterised by different tendencies to stay in the study area.We can distinguish between short-term survivors, i.e. transient individuals who visit the area for narrow time windows, and long-term survivors, i.e. non-transient individuals (resident or part-time) who visit the area for wide time windows.To model such behavioural heterogeneity, we assume that one group (T, transient) has a smaller survival probability (ϕ T ) than the other two groups (ϕ N T ): ϕ T < ϕ N T .Notice that these two parameters represent the survival probabilities across a single time unit on the chosen scale.When the capture occasions are not equally spaced, the survival probabilities should be appropriately compounded along different lengths (cfr.Section 3.3).In practice, given the two base survival probabilities ϕ T and ϕ N T , the actual survival probabilities at time t are: . ., T are the time lags between the T subsequent occasions.Finally, the proposed model retains the time-varying structure of the detectability, as already introduced in Section 3.3, but discriminates between part-time (P) and non-part-time (NP) individuals introducing a partial undetectability component in the latter group.Loosely speaking, part-time individuals are allowed to be undetectable while alive (because temporarily not present in the study area) on some occasions chosen at random with probability δ ∈ (0, 1).This corresponds to modelling the capture probability on occasion t as: for the NP and P individuals, respectively; of course, p P, t < p N P, t , ∀t.Notice that the larger δ is, the more the part-time group is separated from the resident group.This parameter plays a similar role to the completely random emigration parameter of Kendall et al. (1997) and it is needed to control the temporary emigration pattern of part-time individuals discussed in Section 2. Appendix C shows that such parametrisation is indeed equivalent to temporary emigration under the simplifying assumption of emigration occurring at random.We name this model RPT as it encompasses three specific types of behaviour.However, we would like to point out how the model does not enforce this interpretation.For instance, both survivals could be estimated to be high, or the undetectability parameter could be estimated as approximately equal to 1, etc.

Simulation experiments
We conduct a simulation experiment to assess the performances of the RPT model whenever it is well-specified, i.e. the data are generated according to the structure described in Section 3.4.We generate multiple sets of artificial data under alternative scenarios from the RPT with fixed parameters and then estimate a pool of models on them.Our main objective is twofold: i) to evaluate the ability to recover the true values of the parameters, with a particular focus on N super ; ii) to verify whether the RPT is chosen as the best among other alternatives based on some model selection criterion.
We consider four scenarios characterised by an increasing number of sampling occasions, i.e.T ∈ {10, 20, 30, 40}, to verify the model performances for different time horizons.We suppose that in the first scenario (i.e.T = 10), all the captures are recorded within a relatively short period (e.g.within a year).Longer time horizons are included in the other scenarios, where a larger time gap (year gap) is assumed to occur every 10 occasions.Further details about the time lags are available in Section D of the Appendix.The month (and portion of months) is taken as the basic time unit to avoid the possible numerical instability related to the large values of the lags in terms of days.Note that this affects the interpretation of the survival probability parameter as it must be interpreted as the probability of surviving one month.We adopt the following parameters' values in all the scenarios.The survival probabilities are set to ϕ T = 0.01 and ϕ N T = 0.997.These two values may seem quite extreme at first glance.However, they guarantee that the short-time survivors (transient individuals) almost surely stay in the population for less than a year and that the long-term survivors (non-transient individuals) stay in the population for more than three months with a very high probability (> 0.99).Furthermore, notice that a monthly survival probability equal to 0.01 corresponds to a survival probability equal to 0.86 on a daily scale and equal to 0.34 on a weekly scale.On the other hand, a monthly survival probability equal to 0.997 corresponds to a probability of 0.87 on a four-years scale.Therefore, the monthly scale appears as a good compromise to avoid a value that is too low for ϕ T and a value that is too high for ϕ N T .The capture probabilities are obtained by setting µ = 0 and δ = 0.7, and generating τ t ∼ N (0, 0.25) in each scenario.The recruitment parameter for transient individuals is fixed to ρ T, t = 0.02 for all capture occasions (assuming a negligible temporal variation for this group).At the same time, we consider the following structure for resident and part-time individuals: We simulate independent encounter histories for K = 50 pseudo-populations.Along with the RPT model, we consider 10 different alternatives in the class of JS-type models described in Section 3.2 and 3.3, all having time-varying recruitment parameters.In the simplest case, we consider a model with homogeneous capture and survival probability (M 1 : {ρ t , ϕ, p t }).When the population is supposed to be structured in G groups, we suppose that each group has its own time-varying recruitment probability and that mixture components may vary by capture probability (M 2 -M 4 : 2,3,4).Each simulated dataset was augmented by 500 all-zero capture histories to implement the PX-DA approach so that D k + 500 = M k ̸ = M * for all simulated sets.We consider the prior setting described in Section 3. Notably, we specify a N (0, 10) for the intercept µ.When the survival probability is the same for all the individuals, a standard Uniform prior is placed on that parameter.When two survival probabilities are considered (generically, ϕ 1 and ϕ 2 ), we use ϕ 1 ∼ Beta(1, 2) and ϕ 2 |ϕ T ∼ tBeta(1, 1, ϕ T , 1), with the latter marginally yielding ϕ 2 ∼ Beta(2, 1).Then enforcing ϕ 1 < ϕ 2 induces a slight repulsion between the two parameters.When one considers more than two survival parameters, constrained Uniform priors are instead chosen.These different prior choices for the models do not substantially affect the model selection criterion associated with each model butconversely -are useful to induce a better separation between couples of survival parameters.Estimation is carried out using JAGS, in which we run 2 chains with 20, 000 iterations each, discarding 5, 000 as burn-in and thinning by 2 the remainder to save storage space (Brooks et al., 2004).
We chose median posterior estimates instead of simple averages, mitigating the effect of anomalies that can result in occasionally low-informative datasets.In the same spirit, we rely on the Mean Absolute Error (MAE) as an accuracy measure instead of the widely employed Root Mean Squared Error.Interval estimation is assessed through the percentage of times the 95% credible intervals contain the true value of the parameter (i.e. the coverage) and the average 95% credible interval width (CIW).The overall goodness-of-fit is measured via the Watanabe-Akaike Information Criterion (WAIC, Watanabe and Opper (2010)), following the good practice of Gelman et al. (2014) whenever finite mixtures are fitted.We also report the overlapping index (OV , Pastore (2018); Pastore and Calcagnì (2019)) between the posterior distributions of the two survival parameters, averaged over the K = 50 replicas.If OV = 0, the two distributions are completely separated, while if OV = 1, the two distributions perfectly overlap.This metric is particularly appealing to understand whether the posterior distributions of the group-specific parameters are well-separated or not, justifying the related model parameterisation.Finally, we investigate the fuzzy classification ability of the RPT model using multi-class AUC (mAUC) (Hand and Till, 2001).Let us remark that the classification performances can only be evaluated in the simulation setting and for the RPT model, as the true group labels are known and are consistent with the estimated ones.Notice that individuals who have not been observed are not provided with a capture history, thus, it is impossible to infer the group they belong to.Therefore, the estimated mixture weights (referred to as the full M k -sized pseudo-population) are not expected to align with the mixture weights employed in the data generation process.Indeed, the latter was used to generate the cluster labels of the whole pseudo-population, of which some individuals (i.e. the pseudo-individuals) never become part of the real population.
Figure 2 shows the differences between the estimated ( Nsuper ) and true (N super ) super-population size for each of the K = 50 replicas.The error is divided by the expected value of N super in the corresponding scenario to allow for a meaningful comparison between scenarios having expected super-population sizes of different magnitudes.We notice that the RPT uniformly provides the best results overall.On the contrary, its competing models consistently underestimate or overestimate the super-population size.Notably, as T increases, the underestimation is more evident for those models that do not account for heterogeneity in survival probabilities (i.e.M 1 -M 4 ) -cfr.Pledger et al. (2003Pledger et al. ( , 2010) -, while the overestimation is substantial for those that consider an excessive number of parameters (i.e.M 8 -M 10 ).Intuitively, a greater number of parameters controlling the capture probabilities tends to infer an excessive number of uncaught individuals from the zero-histories.
Table 1 reports useful summaries to assess the models' performances in estimating N super .The MAE and CIW are scale-dependent measures and do not allow for scenario comparisons.Therefore, we consider relative versions of these measures by dividing both by the corresponding expected value of N super .The RPT model returns the lowest relative MAEs in all the scenarios that involve more than 1 year of observation, while its competitors are associated with errors, which mostly increase with T .The WAIC seems to fail in selecting the RPT model when the number of sampling occasions is rather small (i.e.T = 10) by attaining the lowest values in correspondence of models that do not account for survival heterogeneity (i.e.M 1 -M 4 in 52% of the replicas).This is a reasonable outcome since it is extremely complicated (if not impossible) to distinguish individuals with low and high survivals when the occasions span only 1 year.Notably, in such a short-term scenario, the low survival of a transient individual is likely to be confounded with an exceptionally low capture probability.This would also explain why model {ρ t×h4 , ϕ, p t+h4 } with four component-specific capture probabilities is selected in almost 1 out of 3 replicas according to the WAIC.However, as T increases, the WAIC tends to favour the true model, yielding the lowest median score and selecting it (i.e., returning the lowest WAIC) in most replicas.The classification performances of the RPT model are quite good in all the scenarios, with the mAUC improving as T increases.Notably, the resulting median mAUC is always ≥ 0.82 and above 0.95 when a year change occurs.Furthermore, by assigning the labels to each encountered individual according to the Maximum a Posteriori (MAP) rule, the median accuracy (across replicas) lies between 75% and 95% in all scenarios (once again improving as T increases).
Table 2 shows the RPT model performance in estimating some time-constant parameters.The estimates of the undetectability and the survival parameters have a very small mean absolute error, although they fail to attain the nominal credible interval coverage of 95% in most scenarios.This indicates a good accuracy of the point estimates associated with over-confidence (visible in the low average CIWs), thus reducing the nominal coverage.Nonetheless, it settles to a fair and acceptable level.The result is not particularly surprising since these component-specific parameters govern latent ecological processes and are potentially mutually confounded.Finally, we observe that the two survival ) are well separated in all scenarios, with an OV equal to 0.078 when T = 10 and equal to 0 for larger T .

Real data analysis
We apply the RPT model to estimate the total population size of the common bottlenose dolphins inhabiting the Tiber River estuary, as introduced in Section 2. The data are the detection histories of D = 195 well-marked dolphins that have been sighted in the area between June 2018 and November 2020, for a total of T = 87 occasions.After some preliminary runs with different values of M , we finally set M − D = 500 rows of pseudo-individuals (i.e. with null capture histories), thus yielding M = 695.Other choices of M led to similar results, with larger values of M only straining the computational burden, in terms of runtime and storage.
We run 2 parallel chains, each with 20, 000 iterations with a burn-in of 5, 000 iterations and no thinning.We compare the performances of the RPT model with the wide range of alternative models illustrated in Section 4, using the same prior setting specified in the simulation study for all the considered models.Table 3 reports the results on abundance estimation along with the WAIC associated with each competing model.We notice that the RPT model does yield  5108.7 230 (216, 242) the lowest WAIC score.Interestingly, the second best choice according to the WAIC is model {[ρ t×h , ϕ t×h , p t×h ] 3 }, which indeed resembles the same structure as the RPT model but without allowing for mixture components sharing common parameters.This lack of parsimony results in an overestimation of the individuals in the super-population, similar to what has been observed in the simulation study (cfr.Section 4).Thus, these results suggest the presence of unobserved heterogeneity in the population, which seems to be better described by the more parsimonious structure of the RPT model than by a generic three-group specification.
The two annual survival probabilities are well separated with an OV = 0 and posterior estimates are φT = 1.06 × 10 −8 (CI 0.95 = [0, 2.8 × 10 −6 ]) for the group of transient individuals and φNT = 0.71 (CI 0.95 = [0.62,0.80]) for the resident and part-time individuals.Notice that the estimated survival parameter φT is of little interpretability on the annual scale.However, it corresponds to a probability of 0.73 on the weekly scale and 0.26 on the monthly scale.The average capture probability of the resident and transient individuals is pNP = logit −1 (μ) ≈ 0.19; the corresponding temporal variations, captured by τ t , result in the time-dependent posterior distributions of p N P, t reported in Figure 8 in the Appendix.
The parameter δ regulating the undetectability of the part-time individuals is estimated to be δ ≈ 0.74.This means that individuals in that group are present in the area approximately for the 26% of their lifetime.Although the estimates of the recruitment parameters have little interpretation in the Royle and Dorazio (2008)'s considered framework, it is worth noticing that, on average, the recruitment probabilities are higher during the first year for the resident individuals, while approximately constant for part-time and transient individuals (cfr.Figure 9 in the Appendix).This is a model artefact motivated by all the individuals already present in the population before the start of the survey (i.e.mostly residents) and that are virtually recruited on the first occasion.If we consider the second year only (most reliable in terms of recruitment probability estimation and individual classification), the average recruitment probabilities are approximately 7 × 10 −4 , 1 × 10 −3 , 6 × 10 −3 for the three groups, respectively.This aligns with our expectations, as recruiting more stable individuals is a slower process than recruiting less stable ones.
The final abundance estimate of the super-population in the whole observation window is of Nsuper = 311 (CI 0.95 = [266, 373]), with yearly variations Ny (y = 2018, 2019, 2020) that show a peak in 2019 and a decrease in the last year of observation (see Figure 3a).It is, however, interesting to look at Figure 3b, where we report the yearly abundance estimates by group to better understand the behaviour of the aggregated yearly pattern in Figure 3a).Indeed, it seems that transients' abundance ( NT = 174 (CI 0.95 = [123, 236]) is the main factor affecting the overall yearly counts, especially in 2019 and 2020.They show a substantial decrease in 2020, while the abundances of residents ( NR = 57 By assigning the 195 well-marked individuals observed between June 2018 and November 2020 to a single group according to the Maximum a Posteriori (MAP) allocation, we have that 51 are assigned to the group of residents, 54 to the group of part-time and 90 to the group of transients.Notice that the MAP is a straightforward and well-established method to attain classification in finite mixture modelling.However, it is known to be sub-optimal in some contexts (Stephens, 2000;McLachlan et al., 2019) and other appealing methods have been proposed in the recent literature (e.g.Wade and Ghahramani (2018)).The problem of summarising membership probabilities into a crisp classification is a challenging and long-debated issue on which there is no general agreement.Nevertheless, results from the simulation study (see Section 4) showed that the MAP is sufficiently reliable in the proposed framework.On the other hand, we are aware that any procedure deriving a crisp classification from membership probabilities over-simplifies the complexity of the inferred results as it does not give any information about the strength each individual is assigned to a specific group.The latter is one of the main advantages of the soft clustering returned by finite mixture models, and we wish to exploit its fuzziness to quantify how decisively each individual is assigned to one group or the other.For instance, 90% of the individuals classified into the Resident group have been assigned to it with a probability greater than 0.9.This probability is greater than 0.54 for the Part-Times and 0.53 for the Transient.
In this regard, it is possible to visualise the classification results in Figure 4 using a ternary diagram, which allows quantifying the probability that each individual belongs to each group.The figure highlights three well-distinguishable capture history patterns that again comply with the typical RPT behaviour.Residents are available in the area for the whole study period and are spotted very frequently in subsequent sampling occasions; their capture histories are very informative.Hence, their identification is clear-cut (intense colour) in most cases.Part-time individuals are available in the area for most of the study period and are frequently spotted, but not as often as residents.Their classification is crisp if they have been encountered for the first time toward the beginning of the study period, while it is dicey when they have been encountered toward the end for the first time.Finally, the transients show short capture histories with few captures that never cross two years; this reflects that they are spotted only a few times and do not visit (survive) the area for long.Given the little information provided by their capture histories, their classification is slightly vaguer compared to the other groups, especially for individuals only observed in the last sampling occasions.
Finally, it is important to remark that the proportions of the three groups do not align with the estimated weights ŵ = (0.09, 0.39, 0.52).This is because the prior weights are not a good indicator of the true impact of each component in the super-population, but only of their impact on the augmented dataset (as mentioned in Section 4).However, we can infer on the composition of the N − D uncaught individuals, namely those individuals that were recruited (i.e.not pseudo-individuals) but were never captured.The model allocates (on average) the uncaught individuals to the resident group for the 2%, the part-time group for the 15% and the transient group for the 83%.This result complies with the short-term survival of the transient individuals and the more elusive nature of the part-time ones.On the other hand, the residents are always present in the area and, therefore, more susceptible to capture in a broad sense.The results suggest that most of the resident individuals (98%) have already been observed and documented, a result which is in line with the progressive decrease of the discovery rate highlighted in Section 2 (cfr.Figure 1a).

Discussion
Estimating the abundance of marine wildlife species is a challenging but critical activity that can tell much about the undergoing ecological processes.Thus, combining high-quality data with solid analytical approaches is essential to improve our knowledge of these dynamics and increase the potential for management actions (Vella et al., 2021;Lin et al., 2022).In this work, we pursued a Bayesian estimation of the size of a common bottlenose dolphin population, which is organised into three groups with different residency and site-fidelity patterns.Accounting for such unobserved heterogeneity is a very common problem in the environmental literature.However, few papers approach the problem from the Bayesian perspective and develop ad-hoc solutions based on prior scientific knowledge of the population of interest.
We proposed a parsimonious specification of a finite mixture model within the PX-DA setting for CR analysis, which we named RPT.This specification reflects the typical bottlenose dolphin residency pattern, with individuals showing high, partial or low site-fidelity (Dinis et al., 2016;Hunt et al., 2017;Haughey et al., 2020;La Manna et al., 2022).Its adoption, characterised by fewer free parameters, simplifies the identification of all the model components compared to more generic and flexible alternatives.Furthermore, while the application of finite mixture models to bring evidence about the true existence of different population groups has sometimes been discouraged (Pledger, 2000;Pledger et al., 2003), here we have shown how they can be exploited to identify classes of individuals sharing similar profiles whenever strong scientific evidence of population groups' existence is available.
We devoted part of the model description to the discussion of a suitable prior elicitation preventing the label-switching issue of FMM.Formal derivations have enriched the results on constrained Beta priors from Alaimo Di Loro et al. (2022).The performances of our proposal have been assessed through a simulation study that considered scenarios with the increasing length of the observation window and, hence, of the capture histories (i.e.mimicking continuous monitoring of a population of interest over time).When data were generated from the RPT model, we evaluated the influence of the number of sampling occasions on the estimates' quality.As expected, yielded better accuracy and coverage thanks to the larger sample size.Comparison with alternative model versions over many replicas showed that the model estimation exhibited satisfying and robust performances when the observation window was long enough (more than 1 year of monitoring).
We estimated the RPT model and other alternatives on the data that motivated our work.The results showed that, in terms of the WAIC score, our model outplays several well-established competitors in the class of JS-type models.The results correspond to biologically meaningful findings and align with previous research work.The estimated abundance is of N * super = 311, of which 51 are estimated to be resident members of the population.This quantity is particularly important as it is a proxy of the breeding population in the area.The yearly trend shows an increase between 2018 and 2019 and a decrease between 2019 and 2020.While the estimated overall abundances are close to the ones obtained by Pace et al. (2021), the trend is slightly in contrast with their results (non-significant increasing trend between 2019 and 2020).This difference must be due to the better quantification of the overall uncertainty in our analysis.This is particularly important in this context as the major interest lies in estimating a reasonable range that accounts for both the worst and best case scenario.Indeed, we estimate that the driving factor of the variations in the population size across the years is due to the abundance variation of the transient individuals, the proportion of which is dynamic and not static.Changes in the presence of such dolphins in an area may reflect changes in driving features like habitat quality, prey distribution, or anthropogenic disturbance.The year 2020 was the COVID-19 pandemic, with changes in ecological conditions and human pressures on coastal marine waters (Carome et al., 2022).Although difficult to assess, the influence of these factors on the number of transient individuals cannot be ruled out, as several dolphins may have been induced to reduce their mobility toward the study area by improved conditions in their native habitat.
One basic assumption of CR experiments that we do not drop is that captures of different individuals are independent.However, it is well known that bottlenose dolphin populations can form structured societies with complex social networks (Pace et al., 2022a).In future studies, we would like to drop the independence assumption and include information on the population's social structure and the consequent statistical dependence in the capture histories.Including the effect of external or individual covariates would also be interesting.For instance, it is possible to recognise the gender and age class in high-quality pictures.This partial information may be incorporated in the Bayesian framework and could help for a better assessment, for example, of the membership of different individuals in different groups or of their marking probability (Wu et al., 2021).The capture-recapture model to estimate the abundance of the bottlenose dolphins could be extended by adopting a stopover model (Pledger et al., 2009;Worthington et al., 2019), which allows capture and survival probabilities to depend both on time and time since arrival in the population.In that case, the model could be formulated as a multi-state Hidden Markov model (Worthington et al., 2019), where the different states refer to the different groups of individuals in the population.Last but not least, it would be extremely interesting to conduct the survey on a larger spatial scale and include external information to account for the spatial heterogeneity (Wu and Holan, 2017).
1. it has never been part of the population 2. it is part of the population 3. it was part of the population, but now it is not We first notice that when an individual becomes part of the population, it cannot be recruited any more: following the notation introduced in Section 3.1, this implies that for t > 1, r it and z it cannot be simultaneously equal to 1.
In the JS modelling framework individuals that leave the population cannot return to it.Hence, state 3 is an absorbing state.Let us momentarily ignore the population heterogeneity (clustering structure) for the sake of clarity.If we allow temporal heterogeneity, then the transition probability matrix associated with the three states at times t = 2, . . ., T is: where rows and columns represent the states at time t and t + 1, respectively.At period t = 1, all individuals can be recruited in the population.As t increases, more and more individuals enter the population or, equivalently, are removed from state 1.All the observed individuals will eventually be recruited in the population before time T , but not all of them will leave it (may have survived to future, unobserved, periods).Remember that individuals are exposed to capture, with probability p t , only during their transitory stay in state 2. Hence, also a portion of never observed individuals may have been recruited into the population at some point without ever being captured.They represent the unknown part we aim to estimate.

B More details on the prior specification
The standard solution is that of concatenating conditionally specified Uniform distributions as follows: where the u j 's are, for instance, the survival probabilities (Turek et al., 2021).While effective in imposing the constraint, the priors in (7) do not allow for the inclusion of previous information that can ease parameters' identification.Alaimo Di Loro et al. ( 2022) explore alternative conditional prior specification that, while implementing the ordering constraint, allows to control for the shape and first moments of the induced marginal prior distributions: where S g−1 is a simplex of order g − 1. Possible choices are the Beta and Truncated Beta distributions.The latter corresponds to the following set of prior distributions: where tBeta(α g , β g ; l, 1) denotes the Truncated Beta distribution in (l, 1).Note that for α g = β g = 1 we obtain (7).When G = 2 and α 2 = 1, the prior specification in (8) induces the following marginal prior density on u 2 : where ) and the density in ( 9) is a Beta(k + 1, k).The two distributions are mirrored with respect to the vertical line v = 0.5 (see Figure 5a of the Appendix as an example).Hence, we have ).The low-parametrised structure induces well-separated prior means or prior modes for u 1 , u 2 marginal distributions, favoring the mixture components' separation.Alternative settings inducing well separated modes in beta-type priors are reported in Section B.2 of the Appendix.

B.1 The Beta and Truncated Beta distribution
The density of a Beta random variable with shape α and rate β is: where Γ(•) is the Euler Gamma function.The truncation requires normalising the same distribution over the truncated domain.We say u 2 |u 1 is a truncated Beta in (u 1 , 1) when: with F Beta(α2,β2) (•) being the cdf of a Beta(α 2 , β 2 ).Now, suppose that u 1 ∼ Beta(α 1 , β 1 ) and u 2 |u 1 ∼ tBeta(α 2 , β 2 , u 1 , 1), then if α 2 = 1 the marginal prior distribution induced on u 2 is given by: where we exploited the fact that Observe that the constraint β 1 > β 2 is essential to avoid the divergence of the beta function.

B.2 The Beta and Restricted Beta
An alternative conditional specification that allows for a properly informed marginal prior can be obtained using Restricted Beta distribution.Also known as 4-parameters Beta, it is a Beta r.v. that has been shifted and scaled to reside on the domain (l, u): .
We can use it to specify recursively a set of G conditional priors as: where rBeta(α g , β g ; u g−1 , 1) denotes the Beta restricted to (u g−1 , 1).The corresponding joint prior is: which does not allow for an analytical marginalisation to get π(u 2 ) in the general scenario.However, the rBeta expected value and variance are available in closed form and hence we can use the law of total expectation to derive the marginal expected value and variance of all the components.For g = 2, . . ., G: , . Therefore, one can define a system of equations to find the combination of α g , β g that complies with a prior knowledge on the moments of the parameters (see Figure 5b as an example).Marginalising with respect to u 1 we get: which is a standard Beta density π (u 2 ) = Beta(u 2 | α + 1, β).Expected value and variance can then be derived from basic properties of the Beta distribution.

C Modelling the temporary emigration
Here, we show that the introduction of the undetectability parameter δ on the part-time individuals is equivalent to allowing for random temporary emigration.We focus on the higher hierarchy level of the part-time model specification, as all other components are not affected.For the sake of clarity, we drop the g-subscript and let the reference to the part-time group be implied.
The proposed model specification for the part-time group is as follows: which is different from the other groups only through the introduction of the parameter δ ∈ (0, 1) in the detection process.
Let us recall that z it is a latent variable indicating whether individual i is "alive" at time t.This is generally confounded with permanent emigration, but it cannot account for temporary emigration as exited individuals cannot ever re-enter the study area and return susceptible to captures.Therefore, this first latent variable is only able to model the time at which individual i starts visiting the area (is born) and the time at which it stops visiting it for good (dies).The explicit modelling of temporary emigration within this time window requires the introduction of an additional latent variable v it denoting whether individual i is present given that z it = 1.The specification of Equation ( 10) arises if we assume that temporary emigration occurs randomly and with equal probability δ while individual i is alive.This corresponds to the following hierarchical specification: where y it ⊥ z it if v it is known.We can easily marginalise v it out of Equation ( 11) by noting that each y it |z it is a Bernoulli random variable with probability of success: This equivalence and the corresponding interpretation are what motivates the use of a multiplicative parametrisation on the capture probability at the invlogit scale in place of a more straightforward group-specific intercept within the logit specification.

D Details about time lags used in the simulation study
The number of days between two consecutive capture occasions (daily time lags, henceforth) within a single year has been simulated from a shifted geometric distribution with probability 0.05, which has an expected value equal to 20 and a standard deviation equal to 19.5.The resulting random sequence of time lags is (20,1,12,15,56,9,9,12,10) and, for scenarios that contemplate more than one year of study, the same sequence is repeated during each new year.The shift of year occurring each 10 occasions is achieved by using a higher constant time lag (i.e.240 days) between the (10k)th occasion and the (10k + 1)th occasion, with k = 1, 2, 3.This results in a scenario k composed by k years of study, for k = 1, 2, 3, 4. For example, scenario 2 (T = 20) is composed of the following sequence of time lags, resulting in 2 years of capture occasions: ( 20,1,12,15,56,9,9,12,10,240,20,1,12,15,56,9,9,12,10) E Convergence of relevant parameters estimated on real data We checked the convergence of the parameter chains in the real data application through the general-purpose Gelman diagnostic.All potential scale reduction factors R are below 1.01, which suggests a good mixing of all parameter chains.
We show the behaviour of the traceplots and density plots of the most relevant time-static parameters for the sake of saving space.We can observe how the two chains explore the same parameters space in all cases and produce a well-shaped posterior distribution, with no bad behaviour.
F Estimated time-varying parameters of the RPT model on the real data application

Figure 1 :
Figure 1: (a) Cumulative number of individual identifications with size proportional to the number of newly identified individuals; (b) Total number of captures by individual.
sets the dimension of the parameter space equal to a fixed value M ≫ N super ≥ D. In this way, solving the varying dimension issue by converting it into a more manageable missing data problem in the multi-state processes context (see Section A of the Appendix).The rows of the observed data matrix Y are augmented to M , hence defining Y aug = {Y, 0 M −D }, where 0 M −D is an (M − D) × T matrix of zeroes.M must be set such that N super ∈ {D, D + 1, . . ., M } and, consequently, N super − D among the M − D rows of zeroes correspond to individuals who belong to the super-population but have never been encountered.The remaining M − N super correspond to pseudo-individuals that have never been part of the population during the observation window and hence do not belong to the super-population.

.
The mixture weights for the three groups are set to w R = 0.2, w P = 0.45 and w T = 0.35.We envision an augmented super-population of M * = 500, that yields an expected super-population size E [N super ] ∈ {170, 209, 243, 271} for T = 10, 20, 30, 40, respectively.Notice that N super increases with T as more individuals can visit the study area during a longer time horizon.

Figure 2 :
Figure 2: Relative estimation error of the super-population (N super ) abundance for increasing number of sampling occasions (T ), calculated for each of the K = 50 independent replicas, by the RPT and the ten alternative Pledger et al. (2010)'s mixture models.

Figure 3 :
Figure 3: Point estimates and posterior 95% credible intervals of the yearly super-population size (a) and by group (b).

Figure 4 :
Figure 4: Individual cumulative frequencies of capture for all the encountered individuals divided into the three groups defined by the RPT model.Posterior allocation was based on MAP.

Figure 6 :
Figure 6: Traceplots of posterior samples for the main parameters of interest of model RPT.

Figure 7 :
Figure 7: Densities of posterior samples for the main parameters of interest of model RPT.

Figure 8 :
Figure 8: Posterior estimates and 95% credible intervals for capture probabilities of resident and transient individuals at each sampling occasion.

Table 1 :
Estimates of relative MAE (MAE rel ), coverage (Cov.) and relative average width of the 95% credible intervals (CIW rel ) for N super , median WAIC and percentage of times each competing model has achieved best WAIC (%waic).All these summaries have been obtained when data are simulated from the RPT model.

Table 2 :
Estimates of MAE, coverage (Cov.)andaverage width of the 95% credible intervals (CIW) for some timeconstant parameters of model RPT.All these summaries have been obtained when data are simulated from the RPT model.
probabilities (i.e.ϕ T and ϕ N T