Consider a population consisting of *N* animals in a capture–recapture experiment over *m* capture occasions, *j* = 1,2,…,*m*. Let *Y*_{ij} be a binary outcome, equaling 1 if the *i*th animal is being caught on the *j*th capture occasion and 0 otherwise. Let *Y*_{i} = (*Y*_{i1},*Y*_{i2},…,*Y*_{im})^{′} be a random vector with the capture history of individual *i*. Let be the number of times the *i*th animal has been caught in the course of the trapping closed population study. Let *t*_{i} be the time the *i*th individual is first captured. Heterogeneity in captured probabilities is often explained by observed individual covariate *x*_{i}, such as age, sex, weight. For simplicity, we consider *x*_{i} a single covariate, but the model can be easily generalized for *x*_{i} to be considered a vector of covariates. Let the probability that the *i*th animal is captured on any trapping occasion *j*, be

- (1)

where

is the design matrix, *β *= (*β*_{0},*β*_{1})^{′} is the vector of parameters associated with the covariates, and *h*(*u*) = (1+exp(−*u*))^{−1} is the logistic function. This is an M_{h} model where variation in capture probabilities among individuals is explained by the covariate *x*_{i}. The probability of not capturing the *i*th individual on the *j*th occasion is (1−*p*_{i}(*β*)), and the variance of *Y*_{ij} is *p*_{i}(*β*)(1−*p*_{i}(*β*)) (Liang and Zeger 1986). Then, *T*_{i}∼*Bin*(*m*,*p*_{i}(*β*)) and *π*_{i}(*β*) = 1−(1−*p*_{i}(*β*))^{m} is the probability of individual *i* being captured at least once, given the covariate *x*_{i}. Let the set of distinct individuals captured at least in one occasion be indexed by *i* = 1,2,…,*n* and uncaptured individuals would be indexed by *i* = *n* + 1,…,*N* without loss of generality. To estimate the population size, once an estimate of *β* is obtained (), the Horvitz–Thompson estimator may be used as in Huggins (1989).

#### Generalized estimating equations approach

Let be the covariance matrix of *Y*_{i}, where, *A*_{i} = diag[Var(*Y*_{i1}),Var(*Y*_{i2}),…,Var(*Y*_{im})] is a *m*×*m* diagonal matrix and *R*_{i}(*α*) is known as the working correlation structure among *Y*_{i1},*Y*_{i2},…,*Y*_{im} to describe the average dependency of individuals being captured from occasion to occasion. A GEE approach permits several types of working correlation structure *R*_{i}(*α*) (for details, see Diggle et al. 1994). For the description that follows, and for simplicity, we consider an independence working correlation structure, *R*_{i}(*α*) = I where I is an identity matrix. The covariate *x*_{i} is never known for the individuals that have not been captured. Therefore, *Y*_{ij} is conditional on the captured individuals (*n*) (i.e., *T*_{i} ≥ 1) with the corresponding observed individual covariates similar to Huggins (1989) and Zhang (2012). The probability that the *i*th individual is captured on the *j*th occasion (*p*_{ij}) given that the *i*th individual is observed at least once is, . Let , and *D*_{i} be the matrix of derivatives ∂*μ*_{i}/∂*β*^{′}, where *μ*_{i} = (*μ*_{i1},*μ*_{i2},…,*μ*_{im})^{′}, hence *D*_{i} = *A*_{i}*X*_{i}. The variance *v*_{ij} of *Y*_{ij} given *T*_{i }≥ 1 is . Considering, *V*_{i} = diag(*v*_{ij}), an estimator of *β* can be obtained by solving the following generalized estimating equations:

- (2)

If covariate *x*_{i} (*i* = 1,2,…,*n*) is available for captured individuals, then the model becomes *p*_{i}(*β*) = *h*(*X*_{i}*β*). This model is not equivalent to any of those discussed in Otis et al. (1978), rather this model is a restricted version of their model M_{h} (Huggins 1991). If *p*_{i}(*β*) = *h*(*X*_{i}*β*), then following Zhang (2012), estimating equations (2) can be simplified to

- (3)

#### Methods based on a partial likelihood

The full likelihood of all model parameters is proportional to

- (5)

As the number of total individuals, *N*, is unknown and the covariates are not known for individuals that are never captured, this likelihood cannot be directly evaluated. The conditional likelihood (Huggins 1989) is the first product component, and it can be formulated as a GLM (Huggins and Hwang 2011) for the positive Binomial distribution (Patil 1962). It may be rewritten as

- (6)

When the full likelihood is partitioned into a product of conditional densities, then a partial likelihood (Cox 1975) may arise considering some of the product terms, but it involves only the parameters of interest, isolating the nuisance parameters. Therefore, the partial likelihood, PL(*β*), is the first product of the equation (6), which is the likelihood of the number of recaptures after the first capture (Stoklosa et al. 2011). For a given *t*_{i}, (*T*_{i} − 1)|*t*_{i}∼*Bin*(*m*−*t*_{i},*p*_{i}(*β*)), which is used to estimate the parameters *β*.

To utilize a simple GLMM with a random effect, we suppose that *p*_{i}(*β*) = *h*(*X*_{i}*β* + *σ*_{b}*z*_{i}) where *z*_{i} is a realization of the standard normal random variable , with *σ*_{b}>0. The use of random effects reflects the belief that there is heterogeneity that cannot be explained by covariates. The partial likelihood can be considered as the joint distribution of the response and the random effects. To estimate *β* and *σ*_{b}, the marginal likelihood of the response is obtained by integrating out the random effects. The integration can be approximated by penalized quasi-likelihood (Breslow and Clayton 1993), which enables parameter estimation via an iterative procedure.