## Introduction

Variability between individuals is an important statistical and ecological issue for mark–recapture studies. Individual heterogeneity within species can include factors such as prey preference, foraging techniques (Bolnick *et al*. 2003) and inherent individual differences in behaviour (Hammond 1990). If measurable individual-level covariates (e.g. sex, age) adequately explain individual variability, standard covariate-based extensions of mark–recapture can be used. However, it is well known that abundance estimates are biased if there is unmodelled heterogeneity in capture probability between individuals (Seber 1982). This paper describes a statistical and computational approach to handling such unmodelled heterogeneity, tests it using simulations and explores it through application to real data.

Heterogeneity and its representation in mark–recapture analysis through the incorporation of random effects have received much attention (Burnham & Overton 1978; Huggins & Yip 2001; Barry *et al*. 2003; Maunder *et al*. 2008; Royle & Dorazio 2008; Lebreton *et al*. 2009), yet few models have included individual-level random effects (Pledger, Pollock & Norris 2003; Royle 2008; Gimenez & Choquet 2010). In many mark–recapture studies, including ours, some capture histories are too short, and the number of individuals too large to allow latent heterogeneity to be modelled as individual-level fixed effects. We use the term latent heterogeneity to refer to individual heterogeneity that is not described by any covariate in the model. One approach to modelling individual heterogeneity is to use a discrete-valued prior supported on a pre-specified number of points (Pledger & Phillpot 2008). However, it may be more appropriate to assume a continuous distribution for individual-level random effects, for example with a Normal prior whose variance reflects between-individual variability. In this case, the variance needs to be estimated, which entails integration across all possible values of the individual random effects. This cannot be made analytically, so an approximate technique is required. Markov chain Monte Carlo is one possibility, but can be difficult to compute. Instead, our focus here is to show that automatic (albeit approximate) maximum-likelihood estimates of individual heterogeneity can readily be obtained with the software Automatic Differentiation Model Builder ( admb) (Skaug & Fournier 2006; Fournier *et al*. 2012), which performs the necessary integration using Laplace approximation.

Multi-state mark–recapture models, first developed by Arnason (1972, 1973), extend the traditional Cormack–Jolly–Seber models by allowing animals to be in different ‘states’. Individual animals are allowed to transition from one state to another through time, and different states are often associated with different detection probability. Multi-state mark–recapture models have been the focus of much research in recent years, and the publication by Lebreton *et al*. (2009) provides a thorough synthesis. Multi-state models are expressed in the form of a hidden Markov model. Hidden Markov models are a class of state-space models, with discrete rather than continuous underlying hidden states, which can be defined as any model with an observation and state process, where the true dynamics of the system are not directly observed (Zucchini, Raubenheimer & MacDonald 2008; Conn & Cooch 2009).

Our motivating example is a study of a subpopulation of North Atlantic humpback whales (North Atlantic HW) sighted in the Stellwagen Bank National Marine Sanctuary (SBNMS) off the northeast coast of the United States. Individual humpback whales have been intensively studied in this region since the late 1970s. The SBNMS (Fig. 1) is one of the several feeding sites of North Atlantic HW that summer in the Gulf of Maine. The SBNMS covers only a small part of the population's summer range, and although some individuals are seen regularly there during the summer, none are thought to remain permanently within its boundaries. This presents a challenge when studying the vital rates of whales using the area and the effectiveness of management initiatives.

Two primary sources of heterogeneity exist for the North Atlantic HW in this study: sighting probability and site fidelity. Heterogeneity in sighting probability is a well-known phenomenon for North Atlantic HW ( Hammond 1986, 1990). It is also plausible that individual whales vary in their propensity to use the SBNMS as opposed to other parts of the feeding range. In the Methods section, we present a hidden Markov model that allows for individual heterogeneity in sightability and/or site fidelity.

When contemplating fitting a variety of models, there are always costs or trade-offs to consider: fitting a model that is too simple may result in increased bias or variance, whereas fitting a model that is too complex may result in high prediction error or computational difficulties. One aspect of complexity is how many model parameters are allowed to have latent heterogeneity, that is, individual-level random effects. In this paper, we use simulations of a simplified version of our SBNMS situation, to investigate the costs and trade-offs associated with fitting too complex or too simple a model by testing the effect of model mis-specification (i.e. mistakenly assuming no latent heterogeneity, or assuming latent heterogeneity when none is in fact present). We also examine the effect of assuming a Normal prior on individual variability when the true variability is discrete and bimodal.