Exploring causal pathways in demographic parameter variation: path analysis of mark–recapture data

Authors

  • Olivier Gimenez,

    Corresponding author
    1. CEFE, CNRS-UMR 5175, 1919 Route de Mende, 34293 Montpellier Cedex 5, France
      Correspondence author. E-mail: olivier.gimenez@cefe.cnrs.fr
    Search for more papers by this author
  • Tycho Anker-Nilssen,

    1. Norwegian Institute for Nature Research (NINA), P.O. Box 5685 Sluppen, 7485 Trondheim, Norway
    Search for more papers by this author
  • Vladimir Grosbois

    1. Laboratoire Biométrie et Biologie Evolutive, UMR 5558, Bat. 711 Université Claude Bernard Lyon 1, 43 Boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
    2. CIRAD, Département ES, UR AGIRs, TA C 22/E, Campus International de Baillarguet, 34398 Montpellier Cedex 5, France
    Search for more papers by this author

Correspondence author. E-mail: olivier.gimenez@cefe.cnrs.fr

Summary

1. Inference about demographic parameters of animal and plant natural populations is important to evaluate the consequences of global changes on populations. Investigating the factors driving their variation over space and time allows evaluating the relative importance of biotic and abiotic variables in shaping the dynamics of a population. Although numerous studies have identified the factors possibly affecting population dynamics, they have barely formally determined the routes by which these different factors are related to demographic parameters.

2. We focus on mark–recapture (MR) models that provide unbiased estimators of demographic parameters, while explicitly coping with imperfect detection inherent to wild populations. MR models allow estimating the effect of covariates on demographic parameters and testing their significance in a regression-like framework. However, these models can only detect correlations and do not inform on causal pathways (e.g. direct vs. indirect effects) in the relationships between demographic parameters and the factors possibly explaining their variability.

3. We develop an integrated model to perform path analysis (PA) of MR data, to examine causal relationships among several (including demographic) variables. This approach is implemented in a Bayesian framework using Markov chain Monte Carlo.

4. To motivate our developments, we analyse 17 years of mark–recapture data from Atlantic puffins (Fratercula arctica), to investigate the mechanisms through which environmental conditions have an impact on puffins’ adult survival. Using our PA-based MR modelling approach, we found that local climatic conditions had an indirect and lagged impact on puffin survival through their influence on local abundance of herring. Besides, we found no evidence for any lagged effect through an alternative unknown pathway (e.g. abundance of another resource).

5. Our method allows elucidating pathways through which environmental, trophic or density-dependent factors influence demographic parameters, while accounting for detectability <1. This is a critical step to understand the interactions of a species with its environment and to predict the impacts of global change on its viability.

Introduction

For the last 40 years, the estimation of animal and vegetal demographic parameters in natural populations has been a challenging and active research area in population biology (Williams, Nichols & Conroy 2002). In particular, investigations of the factors driving the variation over space and time in demographic parameters are rapidly spreading. Such investigations are of fundamental as well as of applied interests. For instance, they allow evaluating the relative importance of density dependence and environmental forcing in shaping the dynamics of a population (Aars & Ims 2002; Lima, Stenseth & Jaksic 2002), and they contribute to the assessment of the consequences of global changes on populations (Coulson et al. 2001; Jenouvrier et al. 2009).

Of particular importance, mark–recapture (MR) models provide a general and flexible framework for the estimation and modelling of demographic parameters (survival, dispersal and recruitment among others) in the face of imperfect detection that is inherent to populations in the wild (Gimenez et al. 2008). These methods rely on the longitudinal monitoring of individuals that are marked at a series of sampling occasions, and then encountered (i.e. recaptured or resighted) on subsequent occasions. Using specific MR statistical models, demographic rates can in turn be written as functions of relevant covariates (environmental covariates like climate conditions, trophic covariates like food abundance or intrinsic covariates like density; see Pollock 2002), allowing estimating their effect and testing their significance in a regression-like framework (Lebreton et al. 1992). These MR models have allowed important insight with regard to what factors influence demographic parameters and thereby drive the dynamics of populations (e.g. Kery, Madsen & Lebreton 2006; Miller et al. 2006; Grosbois et al. 2008).

However, the adoption of this framework comes with constraints that can limit the use of covariate modelling in MR analyses. Multiple regressions can only detect correlations, but neither do they provide information on the causal pathways (e.g. direct vs. indirect effects) in the relationships between demographic parameters and environmental, trophic or intrinsic factors nor do they account for relationships among these explanatory covariates (i.e. multi-collinearity). As a consequence, although numerous studies have identified the factors that possibly affect population dynamics, they have never formally identified the routes by which these different factors are related to demographic parameters.

To illustrate this limitation, we were particularly motivated by investigating the impact of climate conditions on adult survival in Atlantic puffins (Fratercula arctica– puffins hereafter) of a population in Røst, northern Norway. In this population, a linear relationship has been detected between the sea surface temperature (SST) in the vicinity of the breeding colony during late winter and spring (the period when forage fish hatch and grow) of year t−1 and the survival of adult puffins during the 1-year period starting 1 year later (i.e. from summer t to summer + 1) (Harris et al. 2005). It was hypothesized that this lagged relationship was indirect and resulted from the combined influence of SST on the abundance of 0-group (first-year) herring (Clupea harengus, the main prey of puffins in this colony) and, thereby, the abundance of 1-group (second year) herring on the survival of adults in the subsequent 1 year period. This herring drift past the colony during their first summer (summer t), but the 1-group fish may be an important prey for adult birds when they visit the herring’s nursery areas further north in the Barents Sea, shortly after the breeding season (Anker-Nilssen & Aarvak 2009). So far, this hypothesis could not be statistically tested in a formal way. We aim to test it, considering as an alternative hypothesis that the lagged SST effect on survival resulted from an indirect pathway with an unknown (possibly trophic) intermediate factor between SST and adult survival.

In this paper, we propose a new framework integrating in MR data modelling, a technique referred to as path analysis (PA) that is traditionally used to examine causal relationships – including direct and indirect relationships – among several variables (Shipley 2000; Pugesek, Tomer & von Eye 2003). PA is a useful multivariate regression technique to formalize and confront different hypothetic scenarios linking different factors (climate, resource availability and demographic parameters here). Typically, a PA is carried out by specifying a set of pathways describing how variables may affect each other. If the model is not consistent with the data, the corresponding scenario is rejected, and an alternative hypothesis about the underlying mechanism has to be considered. The flexibility of PA to represent complex scenarios has led to an increasing number of applications in ecology and evolution (Shipley 2000; Pugesek, Tomer & von Eye 2003).

To our knowledge, the integration of MR data with PA has never been tried before, probably because standard PA requires data normality (Gajewski et al. 2006), whereas MR data are intrinsically discrete (Gimenez et al. 2007). To deal with this issue, a Bayesian approach using Markov chain Monte Carlo (MCMC) simulations is implemented for estimating parameters and drawing inference in PA of MR data (Lee 2007).

Materials and methods

Mark–recapture data

We used data on adult puffins in Røst in north Norway (67°26′N, 7 11°52′E). From 1990 to 2006, a total of 452 breeding adult birds were captured in mist nets erected immediately outside of their nests and individually marked. Birds captured for the first time were marked with a numbered metal ring and individually coded colour rings. In addition, visual searches for previously marked birds were made each year, predominantly in the area where the initial captures and marking had been undertaken, but also in the surrounding areas and other parts of the colony. See Harris et al. (2005) for further details.

Environmental data

We used January to May mean SST, which was derived from ship, buoy and bias-corrected satellite data at a resolution of 1° latitude by 1° longitude (available online at http://iridl.ldeo.columbia.edu/SOURCES/.IGOSS/.nmc/.Reyn_SmithOIv2/.monthly/.sst/) in a sea area of about 40 000 km2), around the colony. The limits of the selected area were 66–68°N and 10–14°E.

We hypothesized that the lagged influence of SST on survival, if any, would reflect the influence of SST on puffin’s main prey species, herring, during (and possibly immediately after) the breeding season, the abundance of which increases with increasing SST (e.g. Sætre, Toresen, & Anker-Nilssen 2002). We used lagged abundance estimates of 0-group Norwegian spring-spawning herring presented by ICES (2006) to formally test this scenario using a MR model integrating a PA.

PA-based MR model

The starting point was the standard Cormack–Jolly–Seber MR model (CJS hereafter; see Lebreton et al. 1992 for a review) that considers time dependence for the probability φt that an individual survives to occasion + 1, given that it is alive at time t, and for the probability pt that an individual is encountered at time t. Under appropriate assumptions (in particular, independence of individuals, e.g. Williams, Nichols, & Conroy 2002), the CJS model likelihood can be written as a product of multinomial distributions for which the cell probabilities are functions of both survival and detection probabilities (e.g. King et al. 2009 for further details).

Based on the CJS model, we built a PA-based MR model for the puffin case study, in which the hypothetic pathway (i.e. resource abundance), through which SST influences survival, is explicitly represented. We considered φt as the probability that an animal survives to summer (June–July) of year + 1, given that it is alive in summer of year t, SSTt−1 as the value of SST in January of year t−1 to May of year t−1 and Rt−1 as the index of 0-group herring in summer of year t−1. The relationships between these variables were specified as follows. First, survival was expressed as a function of both food availability and SST:

image(eqn1)

where logit(x) = log(x/(1 = x)). Then, resources were regressed on SST using the equation:

image(eqn2)

Residual terms inline image and inline image were assumed to be normally distributed with mean 0 and variances inline image and inline image, respectively, while SST was assumed to be measured without error. The θ’s are regression parameters to be estimated. Note that, in contrast to standard multiple regression, PA allows Rt to be a response variable in regression eqn 2, as well as a predictor in eqn 1. Besides, and of particular interest here, by substituting eqn 2 into eqn 1 and rearranging the terms, the effect of SST on survival through the unknown path is captured by parameter θ3, while the indirect effect of SST on survival through food availability is captured by the product θ2·θ5. The relationships between those variables are illustrated in a path diagram in Fig. 1.

Figure 1.

 Path analysis diagram of the relationship between sea surface temperature and the adult survival rates of Atlantic puffins breeding in Røst, northern Norway. The effect of an indirect relationship through food availability (young herring) is captured by the product θ2·θ5, while a direct effect is modelled by θ3. For the sake of clarity, intercept parameters θ1 and θ4 are not shown.

Note that there are two levels of variation in this model: the individual level (452 ringed birds) allows the estimation of survival, while the temporal level (16 annual time intervals) allows the study of the impact of climate and resources on survival. The originality of our approach lies in a model explicitly integrating these two levels of variation in a single framework, hence accounting for estimation of uncertainty at each level of the hierarchy.

Bayesian fitting using MCMC methods

To estimate the parameters, we first built the MR data likelihood, and then we expressed the causal relationships between survival probabilities and the other variables using eqns 1 and 2. Because the likelihood was complex, we used Bayesian theory in conjunction with MCMC methods to carry out inference (see McCarthy 2007 for an introduction). The Bayesian analysis combines the likelihood and prior probability distributions for the parameters and uses Bayes’s theorem to obtain the posterior distribution, which is used for inference. The MCMC methods simulate values for the unknown quantities of interest following a Markov chain, whose stationary distribution is the required posterior distribution. Inference is then based on the remaining simulated values, by computing numerical summaries such as empirical medians and Bayesian confidence intervals for quantities of interest. As a by-product of the MCMC simulations, we could also obtain numerical summaries for any function of the regression parameters, in particular, the indirect effect θ2·θ5 of SST on adult puffins’ survival, by applying the function to the sampled values from their posterior distributions.

To fully specify our Bayesian model, we provided non-informative prior distributions for all parameters. We used uniform distributions on the interval [0, 1] as priors for detection probabilities, normal distributions with mean 0 and large variance 103 for regression parameters (the θ‘s) and uniform distributions between 0 and 10 for the standard deviations of the temporal random effects (σ and σR). Based on preliminary runs, we generated four chains of length 50 000, discarding the first 25 000 as burn-in. Convergence was assessed using the Brooks–Gelman–Rubin statistic, which compares the within- to the between-chain variability of chains started at different and dispersed initial values (Gelman 1996). According to this criterion, the chains were found to converge. We conducted a prior sensitivity analysis to assess the influence of prior specifications on posterior inference. In addition to the priors used earlier, we considered inverse gamma distributions for the standard deviation of the random effects with parameters (0·001, 0·001) or (3, 2), and normal distributions with mean 0 and variances 1 or 10 for the regression parameters. The posterior results were not much affected, and we were led to the same conclusions. The fitting step was performed using WinBUGS (Spiegelhalter et al. 2003; Gimenez et al. 2009). The code used for fitting the model is available in the Appendix 1.

The ability of our approach to estimate the model parameters was verified using simulations. We considered a scenario mimicking the puffin case study. Specifically, we used p = 0·7, inline image, σR = 1, θ1 = 1, θ2 = 0·3, θ3 = 0, θ4 = 1 and θ5 = 0·7. We simulated 100 capture–recapture datasets with 17 sampling occasions and 50 newly individuals released at each occasion. The code used for carrying out the simulations is provided in Appendix 2. We applied our PA-based MR model on each data set. The results are shown in Fig. 2. Our approach was successful in estimating the various parameters. In particular, the values of the regression parameters were well recovered by our model.

Figure 2.

 Performance of the PA-based MR model. For each of the 100 simulated data sets, we displayed the median (circle) and the 95% credible interval (horizontal solid line) of the parameter. The actual value of the parameter is given by the vertical dashed red line. The estimated bias is provided in the legend of the X-axis. See text for notation.

Goodness-of-fit

We assessed the fit of the CJS model using program U-CARE (Choquet et al. 2009). The CJS model fitted the data poorly (inline image, P < 0·001). A closer inspection indicates that the lack of fit of the CJS model was largely due to component 2CT, which detects heterogeneity in recapture probability (inline image, P < 0·001). This indicates trap dependence on capture (Pradel 1993), and more precisely a ‘trap happiness’, meaning that capture probability at year + 1 was higher for individuals captured at year t than for individuals not captured at year t. Heterogeneity in capture probabilities is known to induce bias in survival estimates (Pradel 1993). To cope with capture heterogeneity, we incorporated an effect of time elapsed since last recapture in the modelling of recapture probability. This effect distinguishes between the two events that a capture occurred (capture probability denoted p1) or not (capture probability denoted p2), the occasion before (Pradel 1993). The fit of this new model, which explicitly accounts for a trap-dependence effect, was satisfactory (inline image, P = 0·31).

Model selection

Starting from the general model in Fig. 1, we explored the model space by assessing the relevance of including all regression parameters θ’s or excluding some of them. We were specifically interested in testing the indirect effect of climatic conditions on survival captured by both θ2 and θ5 vs. an alternative indirect effect through some unknown intermediary through parameter θ3. To do so, we undertook a model selection procedure in the Bayesian framework. Following Kuo & Mallick (1998) and Royle (2008), we introduced three indicator variables, w1, w2 and w3, having Bernoulli (0·5) prior distributions and pre-multiplying the regression parameters θ2, θ3 and θ5 respectively. For example, if w2 = 1, then the indirect effect of SST through an unknown factor was present in the model, whereas if w2 = 0, it was not. We therefore considered eight models, corresponding to the 23 possible combinations. We computed the posterior model probability for a particular model from the MCMC histories, using the ratio between the number of iterations giving this model over the total number of iterations.

Results

The model with regression parameters θ2 and θ5 was the most visited by the MCMC chains (Table 1), suggesting an indirect effect of SST on survival through food availability. This effect was more than five times as plausible as an alternative indirect effect through some unknown intermediary (ratio of posterior model probabilities = 0·331/0·060). The overall support for the inclusion of θ2, θ3 or θ5, i.e. the sum of the posterior probabilities for each of the four models including one of these parameters, was 0·699, 0·347 and 0·624, respectively.

Table 1.   Posterior model probabilities of the eight models considered in the puffin case study. In the model structure, a 1/0 indicates the presence/absence of the covariate with corresponding regression parameter θ2, θ3 and θ5, respectively (see eqns 1 and 2). For example, 101 denotes a model with an indirect effect of SST on survival, whereas 010 is a model with an indirect effect of SST through some unknown intermediary. Note that the intercepts θ1 and θ4 were always included in the model, and therefore not represented in this notation
Model structurePosterior model probability
  1. SST, sea surface temperature.

1110·092
0110·123
1010·331
1100·072
0010·078
0100·060
1000·204
0000·04

Posterior medians along with 95% posterior credible intervals for all model parameters were given in Table 2. Regarding the detection process, the geometric medians of encounter probabilities were higher if a capture had occurred the year before (p1 > p2), in agreement with Harris et al. (2005). Regarding the relationships between survival and environmental factors, there was a positive unlagged effect of SST upon 0-group herring abundance (θ5), as well a positive lagged effect of 0-group herring abundance upon survival (θ2). Overall, the indirect effect of SST on survival through herring abundance (θ2·θ5) was concentrated on positive values (median was 0·10 with a 95% posterior credible interval of [0·00; 0·35]) and was much more likely than an indirect effect of SST on survival through an unknown factor (θ3), as Pr(θ2·θ5) > 0 was 1, while Pr(θ3 > 0) was only 0·66.

Table 2.   Parameter estimates of the path analysis model applied to the Atlantic puffin data (see Fig. 1): posterior medians are provided along with 95% posterior credible intervals. Geometric means of the year-specific estimates were computed for the detection probability, given that an encounter occurred (p1) or not (p2) the occasion before (trap-dependence effect)
ParameterMedian95% Credible interval
θ12·332·05; 2·61
θ20·280·05; 0·58
θ30·06−0·25; 0·37
θ40·01−0·48; 0·56
θ50·400·04; 0·90
σR1·010·71; 1·48
σ0·410·17; 0·80
p10·870·86; 0·89
p20·820·77; 0·87

Discussion

We have proposed a new statistical approach integrating path analyses modelling in MR models. Pathways through which environmental, trophic or intrinsic factors influence survival can be described, and alternative hypotheses regarding these pathways can be disentangled using PA, the whole process being embedded in a MR model.

Applying this technique to the puffin analysis, we found that local climatic conditions had an indirect and lagged impact on puffin survival through their influence on local abundance of herring. On the other hand, we found no evidence for any lagged effect of SST on survival through an alternative unknown pathway (e.g. abundance of another resource). Overall, the PA-based MR model allowed testing formally a verbal prediction that was made previously by Harris et al. (2005) and shed light on the mechanisms through which environmental conditions had an impact on puffins’ adult survival.

Although our analysis was useful to gain insight in our case study, we have made several assumptions that need to be discussed. First, we have considered only linear relationships between variables, while other shapes may be more realistic. To avoid the need to specify an a priori parametric function, nonparametric modelling using splines can be used to gain more flexibility (Gimenez et al. 2006). Second, stratifying the data might be needed to cope with known sources of heterogeneity or to assess differences in causal scenarios, according to some qualitative variables (e.g. sex). The extension of PA-based MR models to cope with groups is straightforward.

Despite the potential of our approach, it comes with the same limitations as PA has in general (Shipley 2000; Pugesek, Tomer & von Eye 2003). Among others, we emphasize that PA-based MR modelling is a relevant option when manipulative experiments cannot be conducted, but does not provide evidence of causality. Rather, it allows testing hypotheses of causality within a system based on correlational evidence. More precisely, PA-based modelling of MR data may help in rejecting scenarios that are not supported by the data (here, a indirect effect of SST on puffins survival through some unknown intermediary), but testing biological predictions that are not rejected (here, an indirect effect of SST on puffins survival) requires appropriate experimental designs (Schwarz 2002). This remark is of particular relevance when assessing the impact of environmental conditions, for which regression-like approaches can only help in generating interesting hypotheses about the impact of climatic factors on demography (Grosbois et al. 2008). Another limitation lies in that PA deals only with variables that are directly observed and measured. We are currently working on the extension of our approach to structural equation modelling of MR data to incorporate latent variables (Cubaynes et al., in press).

Overall, we have extended standard MR models by allowing direct and indirect effects of covariates on demographic parameters (PA-based MR models). We hope that this new framework will help in increasing the number of applications of MR models in addressing questions in ecology, in a way similar to how PA models have extended the multiple regression framework.

Acknowledgements

The authors thank E. Kazakou, R. Pradel, B. Shipley, E. Cam, S. Cubaynes, L. Crespin and D. Vile for stimulating and helpful discussions, and the Institute of Marine Research in Bergen for permission to use the data series on herring abundance reported by ICES.

Ancillary