## Introduction

In the last decade, ecologists have devoted growing effort to the investigation of individual heterogeneity in wild animal population vital rates (e.g., Cam et al. 2002, 2012; Steiner et al. 2010; Orzack et al. 2011). Theoretically, individual heterogeneity can be directly captured through individual covariates, but when such covariates are not available from field data or cannot easily be identified, latent individual differences can still be modeled as finite mixtures (Pledger et al. 2010) or as individual random effects (Royle 2008). Mark–recapture analyses including random effects have grown more popular in recent years thanks to methodological developments that allow straightforward model implementation in both Bayesian (Gimenez et al. 2007, 2009; King et al. 2009; Link and Barker 2010; Kéry and Schaub 2012) and likelihood frameworks (Gimenez and Choquet 2010). Although individual random effects can be incorporated in mark–recapture models solely to deal with violations of homogeneity and independence assumptions (Marzolin et al. 2011), they are often of direct biological interest (Cam et al. 2002; Cooch et al. 2002). In these models, the magnitude of underlying individual heterogeneity is estimated through the variance parameter of the distribution of individual effects.

Translating statistical inference about variance parameters into biologically meaningful conclusions is often difficult, mainly because it is challenging to attach practical meaning to the magnitude of a variance parameter, particularly on a transformed scale such as the logit. Traditional statistical tests with a null value of zero do not take into account whether the magnitude is practically meaningful even if it is statistically different from zero, or whether plausible values for the variance are in fact practically meaningful even if very small values cannot be ruled out given the data. This issue is particularly crucial to consider when evaluating evidence for whether underlying individual heterogeneity is important to include for explaining observed differences in individual lifetime performances (Cam et al. 2012). Indeed, previous work (Tuljapurkar et al. 2009) has shown that some degree of realized heterogeneity in individual performances, such as lifetime and reproductive success, is always expected, just by chance, even in the total absence of underlying individual heterogeneity in vital rates. This phenomenon has been referred to as dynamic heterogeneity and is the result of the stochastic nature of individual life trajectories that result from a sequence of binomial events (lives/dies; breeds/does not breed). Therefore, in addition to estimating the variance parameter representing the assumed heterogeneity, it is also important to address the question of whether underlying individual effects are actually needed to explain realized levels of individual variation. To address this issue, previous authors have used model-selection methods (e.g., Cam et al. 2012) based on information criteria such as the deviance information criterion (DIC; Spiegelhalter et al. 2002) and using Bayes factors (Link and Barker 2006) to compare the support for models with and without individual random effects. However, given that no wild population can be considered as entirely homogeneous, one can easily argue that the question of individual heterogeneity comes down to assessing the magnitude of variance rather than simply choosing between two models (homogeneity vs. heterogeneity). However, both the homogeneity and heterogeneity models can be useful to address a wide range of ecological questions, and we think that a more thorough investigation of the question of individual heterogeneity involves assessing (i) the implications of ignoring individual heterogeneity in terms of ecologically meaningful measures, and (ii) how the estimated magnitude of individual heterogeneity actually manifests in those ecologically meaningful measures. Posterior predictive checks provide a straightforward method for such thorough investigation and can be useful in the process of statistical inference by providing detailed information on how a model succeeds or fails at capturing some measures of biological interest (Gelman et al. 1996). We therefore propose posterior predictive model checks (Gelman et al. 2004) as an additional and important tool for further assessing the implications of excluding or including individual random effects in demographic models.

Posterior predictive checks are based on the comparison of the distribution of model-specific data replications with observed data. The approach is aimed at identifying and quantifying systematic discrepancies between a model and the observed data, while accounting for parameter uncertainty. The idea of model checking is not new in the population ecology literature (Lebreton et al. 1992; Williams et al. 2002), but classic goodness-of-fit tests developed for mark–recapture models are only useful to assess potential violations of basic independence assumptions and are limited to a few specific model classes (Cormack–Jolly–Seber and Jolly movement models; Williams et al. 2002). Posterior predictive checks allow an assessment of the performance of any model of interest at predicting any specific aspect of the data. Furthermore, in mark–recapture studies, model checking has traditionally not been used as an informative tool to help understand the implications of different models. Here, in the spirit of Gelman et al. (2004), we argue for incorporating model checking into the inferential process, as a way to identify the shortfalls and successes of specific models and thereby improve our understanding of biological processes of interest. In particular, by investigating the lack of fit of models that exclude individual heterogeneity, posterior predictive checks allow a direct and more relevant assessment of the issues of underlying vs. dynamic heterogeneity arguments. Keys to effective model checks are the use of graphical displays and the careful choice of discrepancy measures that are relevant to the scientific question of interest. For models of individual heterogeneity, we suggest deriving data quantities that directly represent the level of realized variation in individual performance and using histograms to compare the posterior predictive distribution of these quantities of interest to the values calculated from the observed data.

In the following sections, we describe the principle of posterior predictive checks and illustrate their use with a case study that investigates the magnitude of individual reproductive heterogeneity in a population of Weddell seals (*Leptonychotes weddellii*). We note that full details of the data analyses and the biological interpretations of the results are available elsewhere (Chambert et al. 2013) and so not discussed here. Our purpose in this paper is to further discuss this model-checking inferential approach to promote stronger biological inferences about individual heterogeneity in population vital rates. We particularly emphasize the choice of relevant data features and graphical displays used as informative model diagnostics. We finally discuss how we think this approach could benefit scientific progress in ecology.