A general modeling framework for open wildlife populations based on the Polya tree prior

Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analyzing these data. Although these models have been parameterized and fitted using different approaches, they have all been designed to either model the pattern with which individuals enter and/or exit the population, or to estimate the population size by accounting for the corresponding observation process, or both. However, existing approaches rely on a predefined model structure and complexity, either by assuming that parameters linked to the entry and exit pattern (EEP) are specific to sampling occasions, or by employing parametric curves to describe the EEP. Instead, we propose a novel Bayesian nonparametric framework for modeling EEPs based on the Polya tree (PT) prior for densities. Our Bayesian nonparametric approach avoids overfitting when inferring EEPs, while simultaneously allowing more flexibility than is possible using parametric curves. Finally, we introduce the replicate PT prior for defining classes of models for these data allowing us to impose constraints on the EEPs, when required. We demonstrate our new approach using capture–recapture, count, and ring‐recovery data for two different case studies.


INTRODUCTION
In recent years, there has been an increased interest in monitoring wildlife populations due to the ongoing effects of climate change and habitat destruction. However, such monitoring is challenging because we cannot observe all animals present in the wild. Therefore, statistical models need to be employed to infer population sizes (Royle, 2004), migration (Pledger et al., 2009), phenological This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2022 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society. (Dennis et al., 2017), survival (McCrea et al., 2013, or entry and exit patterns (EEPs) (Matechou et al., 2016) from ecological data at one or more sites.
Although these models are developed for data collected using different sampling protocols, they all focus on the estimation of the EEPs of populations, where entry can correspond to arrival/birth, exit to departure/death, and corresponding length of stay (LOS) to retention/survival at the site. These existing methods either assume that Biometrics. 2023;79:2171-2183 wileyonlinelibrary.com/journal/biom 2171 parameters related to EEPs are completely unconstrained, requiring two parameters to be estimated for each sampling occasion, one for entry and one for exit (Lyons et al., 2016;Pledger et al., 2009), or use parametric curves to constrain entry (Matechou et al., 2014) and exit (Jimenez-Muñoz et al., 2019;McCrea et al., 2013) patterns. However, in the former case, the model can be overparameterized and difficult to interpret, whereas in the latter case, the degree of smoothness is predefined and hence the model may not be flexible enough.
In recent years, Bayesian nonparametric (BNP) ecological models have been developed as an option for flexibly modeling EEPs without making parametric assumptions. An attractive feature of BNP models is that they can be defined such that the prior predictive distribution (or, equivalently, the prior mean) is given by a parametric distribution, referred to as the centering distribution. By centering a BNP model over a parametric distribution, we obtain a posterior distribution that is a compromise between a fully unconstrained model and the parametric model used to specify the centering distribution. In particular, models have been built using the Dirichlet Process mixture prior (Lo, 1984) (see, e.g., Diana et al., 2020;Ford et al., 2015;Matechou & Caron, 2017), and the Polya tree (PT) prior (Ferguson, 1974;Lavine, 1994) (see, e.g., Diana et al., 2018). The Dirichlet process mixture defines a prior for a distribution as an infinite mixture, whereas the PT prior is constructed by recursively assigning probability mass to sequences of nested partitions of the sample space, which follow a tree structure.
In ecological surveys, data are typically collected at discrete observation times, referred to as sampling occasions, and every pair of consecutive sampling occasions defines an interval, during which individuals may enter or exit the study area. No data are available between sampling occasions, which suggests that EEPs should be modeled on the grid defined by the sampling occasions. In this paper, we develop a general framework for modeling EEPs based on the PT prior using this grid, extending the approach of Diana et al. (2018). The use of the PT allows us to build a model directly on the distributions of EEPs, with minimal parametric assumptions on the shape of these distributions. Additionally, as we demonstrate in this paper, the PT prior can lead to positive correlation between consecutive entry (exit) intervals (Lavine, 1994). This results in a smoother EEP than unconstrained models with the degree of smoothness informed by the data instead of being predefined by parametric curves. Our proposed PT approach leads to more efficient inference methods compared to existing BNP approaches based on Dirichlet process mixtures, which introduce individual entry and exit times as continuous latent variables in Markov chain Monte Carlo (MCMC) samplers. Instead, with our PT approach, only the number of individuals in each cell of the grid needs to be inferred, which means that the number of latent variables grows with the number of sampling occasions and not with the number of individuals in the population.
Finally, we introduce the replicate PT (RPT) framework, which allows us to impose constraints on the parameters of the entry (exit) process as required, by replicating parts of the tree. These constraints are similar to those typically employed in ecological models, such as constraining the distribution of LOS to be the same for all entry times, leading to more parsimonious models with interesting ecological interpretation. For example, we use the RPT to assume age-dependence when modeling the LOS distribution in the context of ring-recovery (RR) data and to assume time-dependence when modeling the exit distribution in the context of capture-recapture (CR) data.
We demonstrate our novel framework using two different case studies. The first considers the estimation of EEPs of Eurasian Spoonbills (Platalea leucorodia) using a joint model of CR and count data collected on the same population. The second considers the estimation of age-specific survival rates using RR data of a population of mallards (Anas platyrhynchos), where only some of the birds are of known age, and demonstrates the advantage of the PT in modeling survival from sparse data, without resorting to the use of parametric curves.
The paper is structured in the following way. In Section 2, we describe the main features of the PT prior and use a time-to-event example as an illustration of univariate partitions. In Section 3, we introduce the RPT and use it to define constrained versions of the time-to-event example. In Section 4, we define bivariate partitions, whereas in Section 5, we summarize the Diana et al. (2018) model, we define the joint model for CR data and count data and the model for RR data. The models are demonstrated using two case studies in Section 6, whereas Section 7 concludes the paper and introduces some potential future directions.

THE POLYA TREE PRIOR
The PT prior is a nonparametric prior for a probability distribution with sample space Ω. The PT has two parameter sets: the first is a sequence of nested partitions Π of the sample space Ω, whereas the second parameter, , is a sequence of positive numbers associated with each set of each partition. The partition at the first level, 1 , is obtained by splitting the sample space into two sets, { 0 , 1 }. Subsequently, to build the partition at the second level, 2 , each set is split into two sets, { 0 , 1 }, next each set is split into the sets { 0 , 1 }, and so on. The PT recursively assigns a prior to ( ) for all 's in the sequence of partitions according to the following procedure. By defining 1 … as a generic sequence of 0s and 1s, with ∈ ℕ + , and 1 … as the corresponding set of the partition, the random mass associated with 1 … is given by where 1 … 0 ∼ Beta( 1 … 0 , 1 … 1 ) and is the conditional probability of falling in set 1 … given that falls in the set 1 … −1 or the proportion of ( 1 … −1 ) allocated to ( 1 … ). We note that in the case of nondyadic splits, that is, in the case where each set is split into more than two sets, the PT can be defined analogously by replacing the Beta distributions with Dirichlet distributions. In its standard form, the PT is defined for an infinite sequence of partitions. However, in this paper, we assume a PT prior for the distributions of EPPs, which cannot be inferred at a resolution finer than the times of observation, that is, at the times when the sampling occasions take place, 1 , … , , hence in what follows we only define finite partitions.
The PT can be centered on any distribution 0 in the sense that, for every set of the partition, [ ( )] = 0 ( ). This is achieved by setting the s in definition (1) such that 1 … 0 , for all 1 … . We note that this defines the s only up to proportionality, as the proportionality constant is defined by the variance around the centering distribution. In the case above of time-to-event data and a forward partition, the are defined such that The following proposition, proven in the Supporting Information, is useful for understanding how the choice of partition affects the smoothness of a sample from a PT. Proposition 1. Let be a sample from a PT using the forward partition, where ∼ Beta( , ) and let = ([ , +1 )). If is an increasing sequence of positive numbers, it follows that Cor( , +1 ) < Cor( +1 , +2 ).
Proposition 1 shows that if the distributions of the random variables (RVs) 0 , 1 , … have parameters whose sum is not decreasing, the correlation between the masses of adjacent intervals increases for intervals corresponding to later times or, equivalently, that the distribution of the 's becomes smoother toward the right tail. This is the case for the forward partition, defined above. Conversely, we can assume that the distribution of the is smoother in the left tail, by reversing the order of the split so that 0 = [ , ∞) and 1 = (0, ) at the first level, 10 = [ −1 , ) and 11 = (0, −1 ) at the second level, and so on. We term this partition scheme the backward partition.
For ease of reference, we list below the univariate partitions on which we are going to rely for the rest of the paper. In all cases, the partition is defined using the observation times 1 , … , . We note that by convention, we define 0 = 0 and +1 = ∞.
• Forward partition: this partition corresponds to the idea of Lavine (1994), with the first split defined using 1 and the last using . • Backward partition: this partition is analogous to the forward partition but now the first split is defined using and the last using 1 . • Uniform partition: this partition is built in a single step by splitting using all the observation times simultaneously. That is, Ω is split into ( 0 , … , ), where = ( , +1 ], resulting in a tree that has a single level. Since this split is not dyadic, the masses are assigned according to a -dimensional Dirichlet distribution with parameters 0 ( 0 ), … , 0 ( ). We note that, in contrast to F I G U R E 1 Sequence of splits (left) and structure (right) of the partition for the time-to-event data example. The random variable assigning the ratio of the masses of two sets for each branch is represented on the left branch. The mass assigned to the set in each right branch is always one minus the mass assigned to the set in the corresponding left branch.
the forward and backward partitions, the uniform partition imposes a negative correlation between the masses of any two intervals, which can be useful in situations where entry and/or exit patterns are not expected to be smooth.

Example: Time-to-event
In this section, we demonstrate how to apply the PT prior to the estimation of survival probabilities using time-toevent data. We assume that each individual is followed starting at a corresponding time in { 1 , … , }, referred to as time of first observation (TFO), when they are of age 1, until their death, which is observed at a subsequent time.
A common modeling approach is to assume that individuals of age at time remain alive until time +1 with probability , . To formulate this approach in a PT framework, we first arrange all the followed individuals in a set of vectors = ( 1 , … , − +1 ), where indicates the number of individuals with TFO equal to and with death in the interval ( + −1 , + ), that is, observed at time + . Next, we assume a distribution on the time of death of the individuals belonging to vector . The sample space for is Ω = ( , ∞), as time of death is left truncated by the TFO for each individual. We assume independent PT priors for the 's, with partition taken to be the forward partition, according to which the sample space Ω is split into 0 = ( , +1 ] and 1 = ( +1 , ∞), next 1 is further split into 10 = ( +1 , +2 ] and 11 = ( +2 , ∞), and so on. The last level of the partition corresponds to the sets ( , +1 ], … , ( , ∞), and hence, the PT prior for defines a prior for the probability that death occurs in each of these intervals. We note that the partitions have different depths, as for individuals with TFO equal to , the partition is built only for − levels.
In this example, each is defined by 0 , … , − −1 , . Therefore, corresponds to the probability that an individual with TFO is not present at time + +1 conditional on being present at time + , that is, = 1 − + , +1 . Finally, if is the number of individuals with TFO , the model for vector is given by

REPLICATE PT
In the example discussed in Section 2.1, the parameters belong to independent PTs and hence the probability of survival varies freely by time and age. To build simpler models, we need to reduce the number of parameters by introducing constraints on the 's within our RPT framework. For example, the number of parameters can be reduced by assuming that the probability of survival is either age-dependent or time-dependent, which is standard practice in the ecology literature, leading to a more parsimonious model.
In the PT framework, assumptions of this kind can easily be incorporated into the model by sharing RVs across different PTs or, in other words, via "replicating" tree structures across different trees. This idea will also be useful later when defining models for CR and RR data, and hence, we formalize it with the following definition. For simplicity, we provide the definition for two trees but it is straightforward to extend it to multiple trees.
Let Σ be a rule to generate a sequence of partitions from an initial seed set. For example, the rule Σ can correspond to the forward partition. Let 1 and 2 be distributions having PT priors with partition Π 1 and Π 2 , respectively, with 1 and 2 the sets of RVs corresponding to 1 and 2 , respectively, where ,0 = ( 0 ) ( ) as defined in Section 2. We say that 1 and 2 have an RPT structure across two sets 1 ∈ Π 1 and 2 ′ ∈ Π 2 if the following constraints hold: • The partitions of the trees starting from 1 and 2 ′ are generated according to the same rule Σ, although the two partitions could be stopped after a different number of steps; • For all 1 , … , , The first condition states that the partitions of the trees starting from 1 and 2 ′ are the same (even if they might not have the same depth), whereas the second states that the RVs in the two trees are the same. We note that the definition can also be used with 1 ∈ Π 1 and 1 ′ ∈ Π 1 that allows sharing across the same tree.
We can use the RPT to impose different constraints on the probability of survival for the time-to-event example of Section 2.1, which we list below.
• Unconstrained case: The most general case is obtained by assuming that all the are different. In this case, the probability of survival varies by age and by time and no replicate structure is assumed ( Figure S4 of the Supporting Information).
• Age-dependent case: ⋅, ≡ ⇒ ≡ ∀ . This is equivalent to assuming that ≡ ∀ . In the RPT framework, this is equivalent to requiring that the have a replicate structure across the sets Ω 1 , … , Ω ( Figure S2 of the Supporting Information).
• Time-dependent case: ,⋅ ≡ ⇒ ≡ + . This is equivalent to assuming that the distribution of ( ) is the same as the distribution of −1 ( | ∈ −1,1 ) ∀ . In the RPT framework, this is obtained assuming a replicate structure across the pairs of trees with seed sets: Ω 2 and 1 1 , Ω 3 and 2 1 , Ω 4 and 3 1 , and so on ( Figure S3 of the Supporting Information).
• Constant case: , ≡ ⇒ ≡ ∀ , . This is equivalent to assuming that the probability of survival is always constant. In this case, all the trees collapse to a single RV.
We note that employing constraints of the type induced by the RPT structure gives rise to models that assume independence between exit/LOS and TFO. This idea is extended further in Section 5. For ease of exposition, in all the case studies shown later, we only consider the most commonly employed constraint in each case. However, different RPT structures can be employed to assume different constraints as required.

BIVARIATE PARTITIONS
To define joint models for entry and exit, or entry and LOS, we need to introduce partitions for distributions on ℝ 2 . These partitions can be built sequentially using the schemes described in Section 2. A useful bivariate partition can be constructed by first applying the forward partition in one dimension (entry) and next the backward partition in the other dimension (exit), as shown in Figure 2. We call this the entry and exit partition, and we give a formal definition in Section 3.1 of the Supporting Information. This partition is useful when we expect the entry pattern to be less smooth in the left tail compared to the right, with the opposite being true for the exit pattern. Examples of this EEP are stopover sites (Matechou et al., 2013) or breeding sites (Diana et al., 2020), where most individuals enter at the start of the study period and exit at the end, with entry and exit taking place in waves, and hence the entry (exit) pattern being more spiky at the start (end) and smoother at the end (start).
The entry and exit partition allows us to elicit prior information on the entry and exit distributions. However, in some cases, prior information is available on the LOS distribution instead. For eliciting a prior on the LOS distribution, we define a bivariate partition based on the forward partition, by first splitting the sample space with respect to the LOS and then with respect to the entry intervals. We describe how to construct this partition in the following paragraph.
First, we split the sample space into the set 0 = {( , ) ∶ < < < +1 ∀ }, consisting of the individuals exiting immediately after entry and the set 0 = {( , ) ∶ < < +1 ≤ ∀ }, consisting of the remainder of the individuals (Level 1 of Figure 3). Next, we split 0 into the set 1 = {( , ) ∶ < < +1 < < +2 ∀ } of individuals staying for only one interval and the set 1 = {( , ) ∶ < < +1 , ≥ +2 ∀ } of individuals remaining for more than one interval (Level 2 of Figure 3). The process is repeated times for the remaining observation times. Next, each set is split according to the entry intervals using the uniform partition. This partition is termed entry and LOS partition and an application is presented in Section 5.3.

ECOLOGICAL MODELS
In this section, we define ecological models for count, CR and RR data. These data are collected according to different survey methods, which are all prone to false-negative observation errors, and hence, not every individual that is present on a sampling occasion will be observed on that occasion. Therefore, each individual has a latent presence history, which is a vector with entries equal to 1 on sampling occasions when the individual was present and 0 otherwise. The observation process for each survey type is defined conditional on these latent presence histories.
F I G U R E 2 Entry and exit partition. Partition of a PT prior for a distribution defined on {( , ) ∶ ( , ) ∈ ℝ 2 , > }, built by first using the forward partition and next using the backward partition. The first steps are performed using the forward partition in the entry dimension, and the next steps using the backward partition in the exit dimension. The shaded area represents the region excluded from the sample space, because departure is left truncated by arrival. The ticks on the x and y-axes represent the sampling occasions.

F I G U R E 3
Entry and LOS partition. Partition of a PT prior for a distribution defined on {( , ) ∶ ( , ) ∈ ℝ 2 , > } built using the entry and LOS partition. The first levels are built splitting according to the LOS, whereas the + 1th split is performed according to the entry dimension. The shaded area represents the region excluded from the sample space, because departure is left truncated by arrival. The ticks on the x and y-axes represent the sampling occasions.
Hence, in this section, we extend the ideas introduced in Section 2.1, where the time-to-event was observed for each individual, to modeling ecological data where the entry and exit times for each individual, as well as the size of the population, are unobservable. Therefore, within this framework, we infer the number of individuals in each cell of the grid defined in Section 4, and as a result also infer the population size.

Example: Count data model
In this section, we briefly summarize the model of Diana et al. (2018) developed for count data as an illustration of the entry and exit partition for modeling ecological data. Count data are collected by visiting the site and simply detecting a number of individuals, without attempting to uniquely identify them. The data can be summarized in a vector ( 1 , … , ), where denotes the number of individuals detected on sampling occasion .
The latent presence histories can be summarized in a matrix { } , =0,…, , where is the number of individuals with entry interval ( , +1 ) and exit interval ( , +1 ). The number of individuals available for detection on sampling occasion , , is readily available from the matrix . Assuming that each individual can be detected with probability independently of the other individuals, the observation process can be expressed as ∼ Bin( , ), = 1, … , . A prior distribution is defined on the probability that an individual belongs to cell using a PT prior distribution with the entry and exit bivariate partition of Section 4. We define and to be the entry and exit time and let = ℙ( ∈ ( , +1 ]| ∈ ( , ∞)), for = 0, … , − 1, and = ℙ( ∈ ( − , − +1 ]| ∈ ( , − +1 ], ∈ ( , +1 ]), for = 0, … , − − 1. The probabilities can be obtained as a product of 's and 's as in definition (1). For example, 23 is equal to (1 − 0 ) 1 (1 − 2 0 ) 2 1 , where the first two terms represent the probability of entering in ( 2 , 3 ] and the last two terms represent the probability of exiting in ( 3 , 4 ] conditional on entering in ( 2 , 3 ]. Assuming a Poisson( ) prior distribution on the popu- , the model can be summarized as where the expression on the right summarizes the prior distribution assumed for through the 's and 's.

Capture-recapture
In this section, we demonstrate how to adapt the model for count data described in the previous section to CR data using an RPT. In the case of CR data, on each sampling occasion, a number of individuals are caught, and newly caught individuals are uniquely marked. If is the total number of captured individuals, the captures can be summarized in a × matrix, , where is 1 if the th individual was captured on sampling occasion and 0 otherwise. Each row of the matrix is called a capture history.
To represent the latent presence histories in this case, we define the set of matrices { } = { } =1,…, +1, =1,…, +1 , where denotes the number of individuals with first possible capture at time , first capture at time and last possible capture at time (clearly, = 0 for > ). For = 0, we define 0 as the matrix containing the individuals never captured. The likelihood of the individ-uals with first capture on sampling occasion can be written explicitly given (Section 4 of the Supporting Information).
We denote by the sum of the elements in each matrix , and we note that is known for ≠ 0, as it is equal to the number of individuals with first capture at time , whereas 0 is unknown, as it is equal to the number of individuals never captured. Finally, the population size, , is obtained as = ∑

=0
. For each matrix , we define a distribution on the probabilities of an individual belonging to each cell of the matrix { }. Assuming a Poisson( ) prior distribution on , the model can be written as We assume a PT prior for each using the bivariate entry and exit partition of Section 4. We make two assumptions on the probabilities , which will be employed using specific RPT structures. First, we assume that the entry and exit distribution is the same for individuals first captured on different sampling occasions. However, we note that each matrix is defined on a different sample space, as is defined for ≤ , > , as individuals in the th matrix have to enter before the th sampling occasion and exit after. A representation of one matrix is shown in Figure S5 of the Supporting Information. Therefore, we assume that the hazards of the distributions are the same, that is, if and are the entry and exit time of an individual first captured on sampling occasion , the RVs = ℙ( ∈ ( , +1 ]| ∈ ( , ∞)), = 0, … , − 1, and ( ) ( ) = ℙ( ∈ ( − , − +1 ]| ∈ ( , − +1 ], ∈ ( , +1 ]), = 0, … , − , are the same for all .
We also make the assumption that the exit distribution is time-dependent, in the sense that the exit interval of an individual does not depend on their entry interval, in a similar way to Section 3. This is achieved by imposing a RPT structure for each distribution across the sets 0 = {( , ) ∶ ≤ 1 }, 1 = {( , ) ∶ 1 < ≤ 2 }, … , −1 = {( , ) ∶ −1 < ≤ } (see level of Figure 2 for a visual representation of the sets). With reference to the notation of and introduced in Section 5.1, this is equivalent to assuming that ≡ . This assumption leads to a substantial reduction in the number of parameters but can be relaxed if required by not employing the RPT structure leading to different set of variables , = 0, … , − , for each time of entry = 0, … , . We note that an alternative approach to write the likelihood explicitly is to model the entry and exit intervals of each individual, as opposed to using the grid approach. This alternative approach is useful when the number of captured individuals is low compared to the number of sampling occasions, because it can lead to a reduced number of latent variables. This is the case for the case study considered in this paper, because there are only 40 marked individuals and eight sampling occasions. Section 5 of the Supporting Information presents the expression for the number of latent variables in each case.

Ring recovery
According to the RR protocol, individuals are captured and marked each year and, after being marked, they can be recovered soon after they die. The data are summarized in an upper-triangular × matrix, , where cell ( , ) denotes the number of individuals marked in year and recovered dead in year , and in a vector ( 1 , … , ) where denotes the total number of individuals marked in year . In the following, to be consistent with the notation used in the previous section, we will refer to the years as sampling occasions.
Similarly to the CR case, to jointly model EEPs, we work with a set of matrices = { }, where in this case, represents the latent number of individuals marked on sampling occasion that were present in the population for sampling occasions before being marked and remained for sampling occasions after being marked. We assume that individuals could have entered the population for up to sampling occasions before they were marked and could have stayed for up to sampling occasions after being marked, which is equivalent to assuming that the dimension of each matrix is ( + 1) × ( + 1). We have made this assumption of symmetry for simplicity. The choice of is not critical as is only used to fix the dimension of the grid and serves as an upper bound for the lifespan of the individuals. Therefore, as long as is sufficiently larger than the lifespan of the individual, results are not sensitive to this choice.
We assume that an individual is recovered on the sampling occasion immediately following the interval in which they died with probability (and that the individual cannot be subsequently recovered). The number of individuals marked at time that can be recovered at time + can be obtained by the matrix as ∑ We assume a PT prior for the probabilities of an individual belonging to the cell , with partition taken to be the entry and LOS partition built in Section 4, and we define as the distribution of the probabilities of each cell for the matrix .
According to the entry and LOS partition, first, the sample space is split into 0 and 0 (Level 1 of Figure 4), that is, the individuals staying for 0 sampling occasions and those staying for more than 0. Next, 0 is split into 1 and 1 (Level 2 of Figure 4), that is, the individuals staying for 1 sampling occasion and staying for more than 1, and so on, for 2 levels. The first 2 steps split the sample space according to the LOS. Similarly to Section 2.1, we define as the probability that an individual marked on sampling occasion dies after + 1 sampling occasions conditional on being alive for sampling occasions after . Therefore, At the latest level (level 2 of Figure 4), we use the uniform partition to split each diagonal according to the entry (or exit) intervals. To perform this step, we generate a set of RVs ( 0 , … , ) ∼ Dirichlet( 0 , … , ) and we assign the masses of diagonal , ( 0 , … , ), proportionally to 0 , … , , that is, as The RVs can be interpreted as the relative probabilities of entering in each interval previous to the time of marking.
Similarly to the CR case in Section 5.2, an assumption needs to be made about how the RVs , = 1, … , vary across the time of marking . In the case of RR data, we cannot model the process by which individuals are first caught but instead only the recruitment of marked individuals. Hence, we choose to make the assumption that regardless of the time of marking, individuals of unknown age are equally likely to have entered , = 0, … , , intervals before being marked. This is equivalent to assuming that the proportion of marked individuals of each age group is constant for each sampling occasion, in a similar way to McCrea et al. (2013). This assumption is imposed by assuming that ≡ , = 1, … , . The second assumption we make is that the survival probability is age-dependent, that is, the probability of an individual surviving until the next sampling occasion depends only on the age of the individual and not on the sampling occasion the individual was marked. If we define Ω to be the sample space of the distribution modeling the individuals marked at time , this assumption can be achieved by assuming for 1 , … , an RPT across Ω 1 , … , Ω . This forces the probabilities to be the same for each , that is, ≡ . This is equivalent to assuming that ≡ or, similarly to the time-to-event model,

F I G U R E 4
Partition of the PT in the case of ring-recovery data for the matrix corresponding to the individuals marked at time . The sample space is split into the set of individuals staying for less than one sampling occasion, 0 , and the set of individuals staying for more than one sampling occasion, 1 . After this process is repeated for 2 sampling occasions, each set is split according to the entry intervals.
if , is the probability of an individual of age that is present at time to remain until the following sampling occasion, that , ≡ . Because of this assumption, the posterior distributions on the survival probabilities are the same for each time of marking.
Assuming also a Beta prior for the recovery probability , the model can be summarized as We show an application of this model in Section 6.2.

Capture-recapture and count data
We jointly apply the models of Section 5.1 and 5.2 to a count and a CR dataset, respectively, collected on a population of Eurasian Spoonbills (Platalea leucorodia) in the southern Po delta, in North-East Italy. Birds are captured as chicks in previous years and are uniquely marked. The CR dataset is collected by resighting adult birds through photographs obtained using camera traps and by visiting their nests on eight separate sampling occasions. No attempt is made to mark new adult individuals and instead only previously marked individuals can be resighted. In addition, unmarked birds are detected on each sampling occasion. In this case, since there are fewer than 40 resighted individuals, we perform inference on the marked individuals resighted by modeling explicitly their individual entry and exit intervals, as mentioned in Section 5.2.
As opposed to the model described in Section 5.2, we have two different populations of individuals, the ones already marked and the unmarked, and hence, we have to modify the model that has been introduced before. We model the marked individuals resighted by introducing the variables ( 1 , 2 ), where 1 and 2 are the entry and exit interval of the th marked individual resighted, respectively, whereas the marked individuals never resighted and the unmarked individuals are summarized in two matrices { } and { }, respectively, where similarly to the count data model of Section 5.1, cell ( , ), corresponds to the number of individuals first available for capture at time and last available at time . We assume the same distribution on the entry and exit intervals of marked and unmarked birds.
The number of marked birds, , and the number of unmarked birds, , are assigned two Poisson prior distributions, ∼ Poisson( ) and ∼ Poisson( ), respectively. The model for and can therefore be summarized as . We assume that each marked individual can be resighted with probability by visiting its nest and through camera traps with . As the unmarked birds are also detected through camera traps, we assume that they can be detected with the same probability of resighting an already marked bird. We note that this is a realistic, but also necessary, assumption in this case, as otherwise the counts do not include enough information to separately estimate the population size of unmarked birds and the probability of resighting an unmarked bird, , if that is different to marked birds. In Diana et al. (2018), this identifiability issue was overcome by employing informative prior distributions.
We choose a uniform prior distribution for the resighting probabilities and a Gamma prior distribution with mean 50 and variance 4000 for the intensities and of the two population sizes. The entry and exit distributions are centered on Laplace distributions Lap( 1 , 1 ) and Lap( 2 , 2 ) respectively, where 1 ∼ N( 1 , 30 2 ), 2 ∼ N( 4 , 30 2 ) and ∼ Gamma( , ), such that [ ] = 20 and Var[ ] = 80, where days are the unit of measure. The resulting prior distributions on the entry and exit distributions are shown in Figure 5(A). Inference is performed using an MCMC algorithm by sampling from the posterior distribution of the individual entry and exit intervals ( 1 , 2 ), the two matrices { } and { }, as well as the probabilities of the PT and the resighting probabilities. The Markov chain Monte Carlo (MCMC) scheme is described in Section 6.1 of the Supporting Information.
The posterior means of the entry and exit cumulative distribution functions (CDFs) are shown in Figure 5(A). The CDF of the exit distribution suggests that individuals exit the area throughout the study period and, since it does not reach 1 at the end of the study period, that some birds remain at the site using the colony as a roost. Similarly, the CDF of the entry intervals is at around 0.4 at the start of the study, suggesting that a large proportion of birds are present at the site when sampling starts, with over 80% of birds estimated to have arrived by the second sampling occasion toward the end of April.
The posterior distributions of the two population sizes and resighting probabilities are shown in Figure 5(B). The camera trap resight probability, , is slightly lower than the resight probability through nest visits, , because cameras are pointed toward the nest, where it is unlikely to see floaters and prospectors. The difference between the population sizes of marked and unmarked birds is due to the fact that not all the chicks are marked each year, and out of those marked, only a small proportion return to breed as adults. In fact, local recruitment rate is thought to be around 0.12, whereas the proportion of immigrants on total number of recruits ranges from 0.49 to 0.83 (Tenan et al., 2017). Further details regarding the MCMC runs are given in the Supporting Information.

Ring recovery
We apply the model for RR defined in Section 5.3 to a dataset collected in Minnesota, USA. A total of 100, 127 female mallards (Anas platyrhynchos) were banded throughout the course of the study, which lasted 51 years. Marking occurs from July to September, while recoveries occur during the hunting season immediately following marking, from September to January. Newly caught individuals can be recognized as juveniles if their age is less than 1 year at the time of capture and as adults otherwise. Therefore, individuals caught as juveniles are of known age at year of death. The entry and exit of the individuals correspond in this case to births and deaths, whereas the sampling occasions correspond to years. We summarize the data in two × matrices, and , for adults and juveniles, respectively. The total number of juveniles and adults marked in sampling occasion , are denoted by and , respectively. The first row of the matrix , { 0 } =0,…, , consists of the juveniles, and its sum is thus equal to , whereas the rows from 2 to + 1 consist of the individuals marked as adults, and their sum is equal to . We center the PT over a Weibull distribution ( , ), with additional Gamma priors on and . Assuming a Beta prior for the recovery probability, , the model can be summarized as The first row of matrix contributes to the recoveries for the juveniles and the rest of the matrix contributes to the recoveries of the adults. We perform inference by updating the random matrices , the probabilities and the recovery probability . Details of the MCMC sampler are presented in Section 6.2 of the Supporting Information.
We assume 0 = 0 = 1 and 2 = 2 = 0.5. This prior reflects an uninformative prior distribution on the hazard, as it can be seen in Figure S6 of the Supporting Information. Moreover, we assume a Beta(1,1) prior distribution for the recovery probability and we set the upper bound on the age of the individuals to 18, which is a conservative upper bound because no individual was recovered after 14 years of being marked. We compare this approach with a nonparametric unconstrained approach, where uniform prior distributions are assigned on the F I G U R E 6 Case study on mallards. 95% posterior credible intervals of the probabilities of surviving to age + 1 conditional on being alive at age for the unconstrained PT model (gray) and PT with Weibull centering distribution (black) masses in each split of the PT (which is equivalent to 1 ,…, = 1). The posterior mean of the hazards is presented in Figure 6. The overall trend agrees with the general pattern observed for bird populations, with survival being lower in very young and older ages. A similar pattern was observed by McCrea et al. (2013) when analyzing RR data for mallards (Anas platyrhynchos) and by Jimenez-Muñoz et al. (2019) for blackbirds (Turdus merula), but we note here that we have not employed a parametric curve to enforce this pattern, and instead it has been completely driven by the data. We also show the results obtained using the unconstrained approach described above. As can be seen, the estimates for older ages for the model using the Weibull centering distribution are smoother, with narrower credible intervals and avoid boundaries, typically seen as a result of data sparseness at older ages, a result with further highlights the benefit of a BNP model centered on a parametric distribution. Finally, the posterior mean of the recovery probability is equal to 0.123 (95% PCI: 0.121-0.125), which is in line with findings of similar studies (McCrea et al., 2012). Further details regarding the MCMC runs are given in the Supporting Information.

CONCLUSION
We introduced a framework for defining models for ecological data on open populations based on the PT prior. The advantage of this framework is that a wide variety of assumptions can be made on the model structure by changing the tree structure, as a result of the flexibility of the PT. We have applied our framework to different types of commonly collected ecological data, such as CR, RR, and count data. We have also introduced the RPT, which allows us to place constraints on the model parameters, such as agedependent survival probabilities or time-dependent exit probabilities. This assumption follows from the use of specific RPT structures that imply posterior independence of the entry and exit or LOS distributions, and can be relaxed by removing the RPT structure, which, however, results in a large number of unrestricted parameters to estimate. It is easy to extend the model to other protocols, as, for example, removal data (Matechou et al., 2016;Otis et al., 1978). Removal data are collected by repeatedly visiting the site and removing all caught individuals. In this case, an approach similar to CR and RR can be employed, with the time of exit known for the removed individuals and unknown number of individuals not removed by the end of the study.
The RPT, which assumes that RVs are the same across different trees, represents the strongest form of sharing as it assumes complete pooling of the RV across the different trees. This implies particular forms of association between entry and exit or entry and LOS. Alternatively, it is possible to assume intermediate forms of sharing, which are between complete pooling and no sharing at all, by considering a hierarchical prior among the different trees. This form of sharing can be achieved by using a Hierarchical PT (Christensen & Ma, 2020) and defining the model by centering the PT in each group over a common PT prior. An alternative definition is to use a logistic PT (Jara & Hanson, 2011), which makes use of the logistic normal in place of the Beta distributions. These models make weaker assumptions and so have the potential to learn more about the association between entry and exit or entry and LOS. The use of the logistic normal PT gives interesting hints for further extensions. For example, by taking advantage of the normal structure, it is possible to define regression models or time-series models, with the possibility of using the Polya-Gamma scheme (Polson et al., 2013) for a conjugate scheme for inference.
In our modeling framework, different ecological assumptions correspond to different trees or changes in the dependence structure of the RVs on the tree. The question of evaluating evidence in favor of different assumptions can be addressed using standard Bayesian model selection methods. For example, model selection can be performed by evaluating the evidence of each model by using Bayes factors. Alternatively, as a change in the ecological assumption entails a change in the structure of the trees, model selection can be performed by employing an additional prior on these different tree structures and computing the posterior of the structure of trees as part of the parameter space.
Our modeling framework also provides an alternative approach to inferring population size, compared to, for example, N-mixture models for count data (Royle, 2004) and data-augmentation approaches for CR and related data (Royle & Dorazio, 2012), which does not rely on specifying an upper bound for the population size and, as it does not infer individual entry and exit times, is also efficient for large populations.

A C K N O W L E D G M E N T S
We thank the AE and the two referees for their helpful comments. Permission to work in the spoonbill colony was issued by "Ente di gestione per i Parchi e la Biodiversita -Delta del Po". The help of Davide Emiliani and Daniela Mengoni was invaluable during eld work and data collection. The majority of mallard ringing was conducted by the Minnesota Department of Natural Resources.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings in this paper are available from the corresponding author upon reasonable request.