Quantifying individual variation in behaviour: mixed-effect modelling approaches

Authors

  • Niels J. Dingemanse,

    Corresponding author
    1. Evolutionary Ecology of Variation Group, Max Planck Institute for Ornithology, 82319 Seewiesen (Starnberg), Germany
    2. Department Biology II, Behavioural Ecology, Ludwig-Maximilians-University of Munich, 82152 Planegg-Martinsried, Germany
    Search for more papers by this author
  • Ned A. Dochtermann

    1. Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
    Search for more papers by this author

Summary

  1. Growing interest in proximate and ultimate causes and consequences of between- and within-individual variation in labile components of the phenotype – such as behaviour or physiology – characterizes current research in evolutionary ecology.
  2. The study of individual variation requires tools for quantification and decomposition of phenotypic variation into between- and within-individual components. This is essential as variance components differ in their ecological and evolutionary implications.
  3. We provide an overview of how mixed-effect models can be used to partition variation in, and correlations among, phenotypic attributes into between- and within-individual variance components.
  4. Optimal sampling schemes to accurately estimate (with sufficient power) a wide range of repeatabilities and key (co)variance components, such as between- and within-individual correlations, are detailed.
  5. Mixed-effect models enable the usage of unambiguous terminology for patterns of biological variation that currently lack a formal statistical definition (e.g. ‘animal personality’ or ‘behavioural syndromes’), and facilitate cross-fertilisation between disciplines such as behavioural ecology, ecological physiology and quantitative genetics.

Introduction

The proximate causes and ultimate consequences of between-individual variation have long intrigued biologists. For example, a standardized measure of individual variation, individual repeatability ('Glossary'), has often been quantified because repeatable variation approximates the raw material for selection to act upon (Endler 1986). While biologists have long studied the evolutionary repercussions of heritable between-individual variation (e.g. Lynch & Walsh 1998), more recently, behavioural ecologists have started to study from an adaptive viewpoint (i) why individuals are repeatable vs. flexible, (ii) which conditions favour between-individual vs. within-individual variance ('Glossary') or between-individual vs. within-individual correlations ('Glossary'), and (iii) how evolutionary and ecological processes are affected by individuality (Bolnick et al. 2011; Fogarty, Cote & Sih 2011; Benton 2012; Sih et al. 2012; Wolf & Weissing 2012). Many of these questions about between- and within-individual differences differ from evolutionary biologists' focus (e.g. Walsh & Blows 2009) on how genetic (co)variance affects evolution. Recent adaptive modelling has, for example, given rise to a suite of hypotheses about the ecological conditions favouring repeatable – but not necessarily heritable (Dingemanse & Wolf 2010) – variation in behavioural traits, and between-individual – but not necessarily genetic (Dingemanse & Wolf 2010) – correlations between behaviours (Wolf, van Doorn & Weissing 2008; Houston 2010; Frankenhuis & Panchanathan 2011; Mathot et al. 2012).

Questions about within- vs. between-individual variation require that phenotypic (co)variation is partitioned into variance components ('Glossary') (Lynch & Walsh 1998) (Table 1). Tools for doing so are available in the statistical (Henderson 1982; Goldstein 1995; Snijders & Bosker 1999; McCulloch & Searle 2000; Pinheiro & Bates 2000; Gelman & Hill 2007; Zuur et al. 2009; Hadfield 2010; Hox 2010) and quantitative genetics literature (Lynch & Walsh 1998; Schaeffer 2004; Wilson et al. 2010). Unfortunately, the rationale for their use is – in our opinion – often presented in a manner that is inaccessible to ecologists in part because of the statistical jargon and focus on the estimation of genetic parameters. Considerable progress has nevertheless been made over the last few years due to attempts by ecologists to bridge this gap (e.g. Bolker et al. 2009; van de Pol & Wright 2009; Nakagawa & Schielzeth 2010). Lack of familiarity with variance partitioning nevertheless remains a problem, particularly in the field of behavioural ecology where current research critically hinges upon variance partitioning, and inappropriate conclusions might be drawn without explicit knowledge about variance components. We give three illustrative examples from the current behavioural ecology literature.

Table 1. Key (co)variance components at the between-individual (ind) and within-individual level (e)
Variance componentaStatistical descriptionMixed-effect model required
  1. URIM, Univariate Random Intercept Model; URSM, Univariate Random Slopes Model; MRIM, Multivariate Random Intercepts Model; MRSM, Multivariate Random Slopes Model.

  2. a

    Variance components 7 and 8 are not detailed in the main text. For a discussion and MM-implementation of these components, see Text S10 (Supporting information).

1. inline imageIndividual variance in phenotype for attribute y/Individual variation in reaction norm intercept for attribute yURIM eqn 1b
2. inline imageWithin-individual (error) variance for attribute yURIM eqn 1b
3. inline imageIndividual variance in phenotypic plasticity (reaction norm slope) for attribute yURSM eqn 5b
4. inline imageCovariance between the intercept and slope of individual reaction norms for attribute yURSM eqn 5b
5. inline imageIndividual covariance between (reaction norm intercepts of) attributes y and zMRIM eqn 7b
6. inline imageCovariance between changes in the expression of attributes y and z within individualsMRIM eqn 7b
7. inline imageCovariance between individual mean (reaction norm intercept) of one attribute (y) and individual plasticity (reaction norm slope) of another (z)MRSM (eqn S6, Supporting information)
8. inline imageIndividual covariance between level of plasticity in attribute y and level of plasticity in attribute z/Covariance between reaction norm slopes of two attributes (y, z)MRSM (eqn S6, Supporting information)

First, behavioural consistency, for example in avian provisioning rates, has been proposed as a quality indicator trait under sexual selection (Schuett, Tregenza & Dall 2010). The hypothesis predicts that males show lower within-individual variance compared with females. Published empirical tests of the proposed hypothesis typically quantify whether males have lower repeatability compared with females, which constitutes an inappropriate test because the hypothesis concerns within-individual variance not necessarily repeatability.

Second, social interactions have been proposed to induce selection for different behavioural types, for example a mix of individuals that are relatively bold vs. relatively shy (Bergmüller & Taborsky 2010). Published empirical tests of the proposed hypothesis instead sometimes quantify the raw phenotypic variance as a function of sociality, whereas the hypothesis is solely concerned with the amount between-individual variation.

Third, behavioural ecologists have proposed that animals differ in suites of correlated behaviours (Réale et al. 2007). The hypothesis implies the presence of nonzero between-individual correlations across behavioural attributes, yet empirical research often reports phenotypic correlations based on single measurements of each behaviour (Dingemanse, Dochtermann & Nakagawa 2012).

These examples illustrate the inherent difficulty in translating ecological hypotheses that concern variation at different levels to empirical tests. We appreciate that unfamiliarity with terms such as within- and between-individual correlations contributes to confusion about appropriate study designs and statistical tests of hypotheses. In this paper, we therefore aim to provide an overview of key between- and within-individual (co)variance components to assist ecologists in testing ecological hypotheses about variation.

Using ecological jargon and examples, we detail how mixed-effect models (MMs) can be used to address questions about within- and between-individual variance components. We thereby hope to facilitate (i) usage of unambiguous statistical terminology for distinct patterns of biological variation that have typically been defined verbally (e.g. ‘animal personality’ or ‘behavioural syndromes’ ['Glossary']), (ii) usage of appropriate statistical paradigms for their estimation and (iii) design of studies aimed at estimating individual (co)variance components with sufficient power. We focus throughout on between- and within-individual variance components, which we do intentionally because many ecological questions do not require further partitioning (see above). Wilson et al. (2010) outline approaches for further partitioning into genetic (vs. nongenetic) components for pedigreed data sets. While we focus on behavioural traits as examples, our recommendations apply generally to labile phenotypic traits.

Overview

Rather than providing a small number of tutorial examples, as carried out in other ‘How to’ papers, we focus on demonstrative questions and associated MMs, which can be flexibly applied to specific interests of individual readers. Our paper therefore consists of four main parts. The first, entitled 'Mixed-effects models', introduces MMs and details the framework which we subsequently use to describe why MMs are an important alternative to conventional approaches. The second, entitled 'Univariate MMs' details how MMs can be applied to (i) estimate between- and within-individual variability and thus repeatability ('Glossary'), (ii) avoid pseudo-repeatability estimates ('Glossary') and (iii) estimate individual differences in phenotypic plasticity. The third, entitled 'Multivariate MMs', briefly introduces how MMs can be applied to decompose correlations between phenotypic attributes. Each part has distinct sections – detailing models aimed at addressing specific questions – which are constructed such that they can be consulted independently; parts two and three conclude with discussions of optimal sampling designs and sample sizes. The 'Univariate MMs' section details simple models followed by progressively more complex variations. This sequencing was not intended to encourage working towards the most significant description of the data set, because step-wise approaches are statistically problematic (e.g. Whittingham et al. 2006; Forstmeier & Schielzeth 2011; Simmons, Nelson & Simonsohn 2011), and inhibit general inferences (Dochtermann & Jenkins 2011). Instead, we recommend fitting models to test a priori hypotheses. The fourth main part is an extensive Supporting information section that includes additional text, tables and illustrations along with programming code and sample data for three commonly used software packages (ASReml, R and SAS).

We start with a listing of the sorts of questions that can be addressed with MMs and variance decomposition to avoid misunderstanding (e.g. Bennington & Thayne 1994; Engqvist 2005) of what information these methods can provide (Table 1 details where these questions are addressed):

  1. Do individuals differ in average phenotypic response? For example, over multiple reproductive events do some snakes on average produce larger clutch sizes than other conspecifics? [Variance component 1 (inline image) in Table 1];
  2. How much variation in phenotypic response is there within individuals? For example, to what extent do individual butterflies vary their metabolism between days? [Variance component 2 (inline image) in Table 1];
  3. Do individuals differ in responsiveness (plasticity) to environmental variation? For example, do some individuals of a plant species alter seed set more dramatically in response to variation in precipitation than other individuals? [Variance component 3 (inline image) Table 1]
  4. Are responses of individuals correlated across phenotypic attributes? For example, do birds that have high average levels of reactive oxygen metabolites across multiple sampling events also show higher average levels of antioxidant capacity? [Variance component 5 (inline image) in Table 1];
  5. Within the same individual, do changes in expression for one phenotypic attribute go together with changes in another? For example, are within-individual changes in testosterone correlated with within-individual changes in bill colouration? [Variance component 6 (inline image) in Table 1].

In the Text S2 (Supporting information), we provide further examples of questions that deal with more complex patterns of individual variation (variance components 4, 7 and 8 in Table 1).

Mixed-effects models

Mixed-effect models incorporate two types of parameters: fixed and random ones, and hence consist of two key parts (Eisenhart 1947; Bennington & Thayne 1994; Pinheiro & Bates 2000; Bolker et al. 2009):

  1. The effects that predictor variables – which can be continuous (covariates) or categorical (factors) – have on the mean of response variables. Such effects are called ‘fixed’ whenever specific effects are estimated at their observed levels (e.g. differences in means between four specific years of study).
  2. Effects on response variables generated by variation within and among levels of a predictor variable (factor). Effects of such predictor variables are called ‘random’ whenever variance is estimated among observed levels sampled from a population of levels (e.g. the variance among years in general inferred from a sample of 50 years).

Utility of MMs vs. Alternatives

Alternative statistical approaches to MMs are available; we compare their utility here. With regard to univariate analyses, repeatability ('Glossary') is often estimated using analyses of variance (e.g. Lessells & Boag 1987). MMs are preferable because, unlike classical approaches, they allow direct estimation of between- and within-individual variances (Nakagawa & Schielzeth 2010), providing insight into whether differences in repeatability between groups are attributable to differences specifically in between-individual variances, within-individual variances or both (Jenkins 2011). Analysis of variance approaches also assume balanced/complete sampling, a difficult condition to meet for many ecological studies and a condition not required for MM use. Further, only MMs allow for calculation of repeatability of traits with non-Gaussian error distributions (Nakagawa & Schielzeth 2010).

With regard to multivariate analyses, alternative approaches have often been applied, specifically to estimate the repeatable component of phenotypic correlations (i.e. between-individual correlations; 'Glossary'). Between-individual correlations are typically based on the correlation between an individual's average value of y (inline image) and z (inline image) or between estimates (i.e. best linear unbiased predictors or BLUPs) derived from univariate MMs. Unfortunately, such correlations between mean values provide estimates of between-individual correlations that are biased due to within-individual variation (Snijders & Bosker 1999; Dingemanse, Dochtermann & Nakagawa 2012) and do not appropriately acknowledge uncertainty around the estimates (which also applies to BLUPs; Hadfield et al. 2010). Multivariate MMs avoid these problems and can also prevent drawing additional improper inferences. Specifically, ecologists often assume that correlations based on single measures per attribute per individual represent between-individual correlations (Dingemanse, Dochtermann & Nakagawa 2012). However, raw phenotypic correlations generally poorly predict between-individual correlations, particularly when between- and within-individual correlations are very different (Fig. 1a; Text S11, Supporting information). MM-based estimates match true values much more closely (Fig. 1).

Figure 1.

Effects of sampling design on accuracy of estimates of between-individual (rind) and within-individual (re) correlations, and on the power to ‘significantly’ identify a between-individual correlation. (a) Accuracy, estimated as the root mean square error (RMSE) of between- (first row of panels) and within-individual (second row of panels) correlation estimates for varying numbers of individuals and samples per individual. RMSE was calculated based on an MM estimate of the correlation vs. the known correlation used in the generation of simulated data. In addition, we also include RMSE based on a correlation based on a single measure per individual (closed circles). Panels along rows correspond to different combinations of between- and within-individual effects. All estimates are based on repeatabilities of 0·5 for both traits. (b) Accumulation of power (1 − β) relative to different numbers of individuals and number of samples per individual for the ability to detect a between-individual correlation for two traits with repeatabilities of 0·5 and equal residual variances. Simulation methods are detailed in Texts S11–S12 (Supporting information).

Despite major advantages, MMs are complex tools and therefore easily misspecified or interpreted inappropriately (Bennington & Thayne 1994; Bolker et al. 2009; van de Pol & Wright 2009; Schielzeth & Forstmeier 2009; Zuur et al. 2009; Hadfield et al. 2010). We therefore provide considerable detail regarding their specification. However, as this study highlights the ecological questions that can be addressed by decomposing (co)variances, we do not discuss details like diagnostics tools, the method by which statistical models are computationally fit (e.g. maximum-likelihood vs. Bayesian methods), how inferences are drawn (e.g. P-values vs. information criterion), or specifics of dealing with non-normal error distributions (Bolker et al. 2009; Zuur et al. 2009; Nakagawa & Schielzeth 2010; see also Text S1, Supporting information); those issues have been extensively detailed in the statistical books and journal papers cited above or elsewhere in the animal ecology literature (e.g. Garamszegi et al. 2009). We also strongly recommend that readers properly familiarize themselves with basic model assumptions prior to applying these tools themselves (cf. Pinheiro & Bates 2000; Bolker et al. 2009; Zuur et al. 2009).

Univariate MMs

Introduction

We introduce here the notation for the simplest univariate MM, where a constant (β0) and the differences between individuals are modelled by including what are known as random intercepts to decomposes phenotypic variance (inline image) into between- and within-individual components eqn 1a:

display math(eqn 1a)

Here, a single phenotypic element (yij), such as a life history decision (lay date) by individual j exhibited at instance i, is the sum of β0 (the grand mean value of average individual responses) and each individual's unique average response. This individual contribution is estimated as the difference from the population mean by including random intercepts to model differences in mean response between individuals (ind0j). This random intercept is assumed to be normally distributed (N) with a mean of zero and a variance (Ωind) termed the between-individual variance (estimated as inline image : the variance across random intercepts of individuals; eqn 1b and can in principle be fit whenever there are individuals with multiple measurements (see section 'Sampling Designs and Sample Sizes Requirements' for more requirements). A residual error (e0ij) is also assumed to be normally distributed, with zero mean and a variance (Ωe) representing the within-individual variance (inline image; eqn 1b:

display math(eqn 1b)

We note that while inline image is the ‘residual error’ representing measurement error and general environmental variance, it has biological relevance as it includes average within-individual plasticity towards any stimulus that is statistically unaccounted (Westneat et al. 2011). This notation, used throughout, differs from the typical statistical notation but is one that we find both unambiguous and intuitive: variances are abbreviated as V, covariances as Cov and random effects for individuals as ‘ind’.

Equation 1a can be expanded to include additional fixed effects (β terms), like environmental covariates, and the impact of doing so on variance components is a focus of various later sections. Notably, the inclusion of fixed effects requires considerable thought (e.g. whether within-subject centring ['Glossary'] and transformations should be applied), as detailed in Text S3 (Supporting information).

Simple Repeatability Analysis

Univariate MMs eqn 1 can be used to decompose the ‘raw’ phenotypic variance in a single response variable (y) into between- and within-individual variances (Fig. 2a). Those components are informative in their own right (Jenkins 2011) – indicating the degree to which the expression of a trait differs between individuals vs. the degree to which a single observation differs from an individual's mean – and are also used to calculate repeatability (Falconer & Mackay 1996).

Figure 2.

We illustrate here how between- and within-individual variance components are separated by plotting seven measurements of aggressiveness (y-axis) for five individuals (numbered) whose behaviour was assayed over a range of densities (x-axis). (a) Grey lines represent the average phenotypic value of each individual; the variance among lines represents the between-individual variance (inline image). The variance in within-individual deviations from individual means represents the within-individual variance (inline image). All lines are parallel, and individuals therefore do not vary in behavioural plasticity. This is not the case in panel (b) where individuals do differ in plasticity (inline image) and where behavioural reaction norm intercepts and slopes are negatively correlated (inline image).

Repeatability is of key importance because it provides a standardized estimate of individuality that can be compared across studies and is part of quantitative genetics theory by setting an upper limit to heritability (but see, e.g. Dohm 2002 for important caveats). Repeatability represents the phenotypic variation (VP) attributable to differences between individuals eqn 2:

display math(eqn 2)

where inline image. Confidence intervals for repeatabilities can be calculated following Nakagawa & Schielzeth (2010; see Text S17, Supporting information for worked examples). Equation 2 assumes a Gaussian error distribution and that repeated measures were taken under the same conditions (Lynch & Walsh 1998). When this first assumption is not met, alternative estimators of repeatability are available (detailed by Nakagawa & Schielzeth 2010), though researchers should also consider whether additional fixed effects – for example ‘sex’ when modelling variation in morphology for a sexually dimorphic species – can account for non-normality (additional fixed effects do change the interpretation of repeatability, detailed below). When the second assumption is not met, repeatability will be misestimated.

Avoiding ‘Pseudo-Repeatability’

The study of individual variation is generally a messy undertaking: we often try to measure phenotypic attributes of a set of individuals under identical conditions but fail to do so in practice – the norm in field studies. When individuals differ among each other in the conditions under which they were assayed repeatability estimates can become inflated (Austin & Shaffer 1992; Catry et al. 1999) which leads to ‘pseudo-repeatability’ ('Glossary'). This inflation occurs when predictor variables (i.e. fixed effects) that influence the phenotype within individuals vary between individuals because of a biased sampling scheme. Imagine, for example, that one is interested in the repeatability of parental provisioning rate, and therefore aims to monitor each of n nests for four consecutive days using video recordings. If some cameras failed after having recorded data for only 2 days, while others worked fine for the whole period pseudo-repeatability would occur unless dealt with statistically: The former nests would be monitored only when the nestlings were relatively young (e.g. 8 and 9 days post-hatching), whereas all other nests were monitored over a longer period (e.g. 8–11 days post-hatching). Because provisioning rates typically increase with nestling age within nests, values of inline image (hence repeatability) would be overestimated if the between-nest variation in nestling age during sampling was ignored (Westneat et al. 2011): inline image is conflated with differences among individuals due to nestling age effects. Pseudo-repeatability can be avoided by including a between-individual fixed covariate capturing variation due to biased sampling, inline image eqn 3:

display math(eqn 3)

where inline image is calculated by averaging the covariate (xij) (i.e. nestling age for parent j during a focal observation period (i) over all observations of the same parent j (i.e. inline image for nests sampled only when the nestlings were young, and inline image for nests sampled over all 4 days), and β1B represents the regression coefficient of the dependence of y on x at the between-individual level (‘B’ for between). Pseudo-repeatability can sometimes also be avoided by including additional random effects, for example territory identity to avoid overestimation of between-individual variance (Browne et al. 2007; van de Crommenacker et al. 2011; Text S4, Supporting information).

Repeatability estimated from models controlling for between-individual fixed effects (e.g. inline image; eqn 3 represents the proportion of ‘phenotypic variance not accounted for by fixed effects’ (VP − VFIXED) explained by differences between individuals. This conditional repeatability ('Glossary') will often be the biologically relevant parameter; for example, in our provisioning example, the raw repeatability was inflated due to failure to observe all nests for the same period of time and the conditional estimate represented the biological repeatability.

Mixed-effect models with between-individual fixed effects (inline image or xj) can also be used when issues related to biased sampling schemes do not apply. Inclusion of between-individual fixed effects (like ‘sex’) enables calculation of average within-class repeatability. However, such exercises assume that inline image and inline image do not vary between the classes. In the section Comparing variance components across data sets (Text S7, Supporting information), we discuss how multivariate MMs can be used to test for violations of this assumption (see Dingemanse et al. 2012 for an univariate approach for dealing with the same issue).

Explaining Variance Components

Simple models like eqn 1 provide information about phenotypic variance exhibited between and within individuals. As a next step, researchers are often interested in factors explaining these variance components. For example, experiences during development might have long-lasting effects on an individual's later phenotype. One might therefore ask what proportion of between-individual variation (inline image) in, for example, aggressiveness was due to early-life between-individual differences, such as variation in maternal hormones across eggs. Similarly, what proportion of within-individual variation (inline image) is attributable to environmental factors? For such a question one could assay the phenotype of all individuals under the same range of environmental conditions (e.g. record aggressiveness over a gradient of conspecific density) and ask what portion of inline image was due to average within-individual plasticity. To address both of these possibilities, both between-individual (x1) and within-individual (x2) fixed effects can be included eqn 4:

display math(eqn 4)

where yij represents the level of aggressiveness of individual j at instance i. x1 could represent maternal hormone levels in the egg from which individual j was born and x2 the conspecific density experienced by individual j at instance i. As x1 is a covariate that differs between but not within individuals, it represents a between-individual effect (hence B). In contrast x2 is a covariate that varies within an individual and thus represents a within-individual effect (hence W). β1B and β2W are the coefficients relating, respectively, x1 and x2 to yij. Comparison of the values of inline image and inline image between a model where these between- and within-individual fixed effects were included eqn 4 vs. excluded (e.g. eqn 1a provides quantitative information on variance in each variance component explained by these fixed effects (for guidelines, see Snijders & Bosker 1999).

Equation 4 assumes that all individuals experienced the same set of conditions for any within-individual fixed effect x2 in eqn 4, such that the average value of such fixed effects would not vary among individuals. If this condition is not satisfied, the within-individual fixed effect would be conflated with between-individual variation, and within-subject centring ['Glossary'] methods may be needed to distinguish within- from between-individual effects (Snijders & Bosker 1999; van de Pol & Wright 2009) as detailed in Text S9 (Supporting information). Furthermore, effects of predictor variables with considerable measurement error, for example environmental states like predator density, will be estimated with bias and therefore require specific modelling approaches (e.g. Schafer 1987; Bartlett, De Stavola & Frost 2009). Care is also needed in avoiding spurious results due to failure to fit nonlinear effects of covariates, and the exact choice of the covariate (e.g. population vs. local density) generating specific results.

When one might wish to estimate how much between- or within-individual variation in one phenotypic attribute (e.g. a behavioural response) remains after controlling for variation in another (e.g. circulating hormone levels), multivariate MMs (introduced below), where both phenotypic attributes are treated as response variables (y and z), may instead be applied (see eqns S2a–d of the Text S8 (Supporting information) for how such conditional estimates are calculated). Multivariate models are particularly appropriate when it is not obvious which phenotypic attribute should be considered predictor vs. response.

Estimating Individual Variation in Plasticity

Phenotypes have thus far been considered a function of between- and within-individual fixed effects, implying that the range of phenotypes expressed by a single individual can be characterized by a regression line with the same slope for all individuals (eqns 1, 3, and 4; Fig. 2a). Individuals differed in intercept of this reaction norm by including a random intercept for each individual (ind0j) eqn 1b, but all individuals shared the same reaction norm slope (e.g. β2W in eqn 4). In other words, individuals could vary in their average phenotype but not in their phenotypic plasticity (Fig. 2a). Here, we extend MMs to include individual variation in reaction norm slopes (Fig. 2b).

Consider our previous example wherein aggressiveness of individuals was a function of maternal hormone levels (β1Bx1j; eqn 4 and density (β2Wx2ij; eqn 4. In doing so, we assumed that all individuals responded in the same manner to density. However, this assumption may often not hold because plasticity varies among individuals (reviewed by Nussey, Wilson & Brommer 2007; Dingemanse et al. 2010; Mathot et al. 2012). In our previous example, this would be characterized by individuals increasing (or decreasing) aggressive behaviour to a greater degree than others for the same change in conspecific density (Fig. 2b). We can statistically model this relationship by including a within-individual fixed effect covariate (xij; i.e. density) into our basic model eqn 1, while also fitting random slopes (ind1j) around the population-average slope β1 of the dependence of yij on xij, which is called ‘random regression’ (Henderson 1982; Meyer 1998; Schaeffer 2004) eqn 5a:

display math(eqn 5a)

where, as above, yij would represent the level of aggression displayed by individual j at instance i. Here, xij is the density experienced by individual j at instance i. β1 corresponds to β2W in eqn 4 and represents the average within-individual response to changes in density (i.e. the population-mean slope). A random intercept is fitted for each level of individual identity (+ind0j) as before. What is new here is that the individual's response to density can deviate from the population-mean slope (+ind1j), modelled as being drawn from a bivariate normal distribution (MVN), with a mean of zero. The variances and covariances for this distribution are defined by the variance in intercepts among individuals (inline image), between-individual variance in slope (inline image), and the covariance between intercepts and slopes (inline image; eqn 5b, where the error variance (e0ij) is modelled as normally distributed with a mean of zero and an estimated within-individual variance (inline image; eqn 5b:

display math(eqn 5b)

Ωind is, notably, a symmetrical matrix: the elements below the diagonal are mirrored above the diagonal. Note further that the intercept-slope covariance can be expressed as a correlation (inline image), where inline image, and that inline image, and the sign of inline image, are specific to the measurement and scaling of the covariate (Schaeffer 2004); those parameters therefore cannot typically be compared across studies.

Furthering our hypothetical measures of aggression, a negative intercept-slope correlation, as depicted in Fig. 2b, would suggest that individuals that have low average aggression scores compared with others also increase their aggression at a greater than average rate in response to increases in conspecific density. Intercept-slope correlations often differ from zero, see for example Mathot et al. (2012).

Importantly, the application of random regression implies that the between-individual variance is no longer stable over the (density) gradient, and inline image now uniquely represents the between-individual variance at a specific section of environment (i.e. where all covariates have the value zero). Similarly, if one would calculate repeatability using eqn 2, the estimated value would be solely applicable to that specific section of the data. Specifically, repeatability can vary dramatically over the gradient when inline image is tight (Text S5, Supporting information), requiring the evaluation of important assumptions (Text S6, Supporting information). For example, failure to acknowledge the presence of nonlinear effects of reaction norm slopes would automatically lead to inappropriate conclusions about the presence of individual variation in plasticity.

Sampling Designs and Sample Sizes Requirements

What type of sampling designs and sample sizes are needed to estimate specific between- and within-individual (co)variance components? Minimum design requirements for the analysis of single phenotypic attributes are provide in scenarios 3 and 4 of Table 2. Sample sizes needed for the accurate estimation for these variance components are, in contrast, less obvious.

Table 2. Four distinct sampling schemes (scenarios) and their estimable (co)variance components (defined in Table 1). We print ‘Data’ for points in time (1–4) where phenotypic data of the phenotypic attribute y and/or z have been collected for the same individual (1–3), and ‘–’ when no data has been collected
IndividualTimeScenario 1Scenario 2Scenario 3Scenario 4
y z y z y z y z
11DataDataDataDataDataData
12DataDataDataData
13Data
14Data
21DataDataDataDataDataData
22DataDataDataData
23Data
24Data
31DataDataDataDataDataData
32DataDataDataData
33Data
34Data
Component(s)aScenario 1Scenario 2Scenario 3Scenario 4
Estimable?Estimable?Estimable?Estimable?
  1. a

    Between-individual variances in level of plasticity of y and z (inline image) can in principle be estimated in scenarios 3 and 4, provided that repeated measures data were taken in different contexts.

Phenotypic variances of y and z (inline image, inline image)YesYesYesYes
Phenotypic covariance between y and z (inline image)YesYesYesYes
Between-individual variances of y and z (inline image, inline image)NoNoYesYes
Between-individual covariance between y and z (inline image)NoNoYesYes
Within-individual variances (inline image, inline image)NoNoYesYes
Within-individual covariance between y and z (inline image)NoNoYesNo

Recommendations about optimal sample sizes vary substantially, depending on (i) what one aims to optimize (accuracy or power), (ii) the variance component of interest, and (iii) constraints imposed by the study system (Snijders & Bosker 1999; Maas & Hox 2004; Scherbaum & Ferreter 2009; Hox 2010; Martin et al. 2011; van de Pol 2012). Unfortunately many published recommendations are drawn from situations relevant to the social sciences and have limited relevance for ecologists as they typically focus on cases where a few subjects (for example schools) are sampled with a great number of repeats (for example students). In contrast ecologists often work with larger numbers of subjects (individuals) but face constraints in the number of repeated samples that could possibly be collected. Martin et al. (2011) and van de Pol (2012) recently discussed sample size requirements for the estimation of intercept-slope correlations (inline image) and provide software for sampling optimization for several variance components. Thus, we focus here on simulations (detailed in Texts S13–S14, Supporting information) asking how sample size affects accuracy of and power to detect repeatabilities of different magnitudes.

Our simulations imply that when repeatability exceeds 0·5 it can be demonstrated with acceptable statistical power (~ 0·8) with few (e.g. 25) individuals sampled only twice (Fig. 3b). For lower repeatability values ≥4 samples per individual are typically required whenever the total number of individuals is low (≤100; Fig. 3b). Moreover, it will generally not be possible to detect repeatabilities of 0·1 with only two repeats per individual (Fig. 3b) as noted previously (Martin et al. 2011). Furthermore, accuracy of the estimated value of repeatability increases with true repeatability (Fig. 3a), suggesting that sampling considerations are particularly relevant for traits with low values of repeatability. Finally, optimal sample sizes greatly depend on the level of inaccuracy deemed acceptable. When the total sample size (number of individuals × number of repeats) is a constraining factor, inaccuracy can be decreased by increasing the number of samples per individual at the cost of the number of sampled individuals (Fig. 3b), though only when repeatability is ≤0·3.

Figure 3.

Effects of sampling design on the accuracy of estimates of repeatability (for definition, see eqn 2), and power to ‘significantly’ identify nonzero values of repeatability. (a) Accuracy, estimated as the root mean square error (RMSE) of repeatabilities for varying numbers of individuals and samples per individual. RMSE was calculated based on an MM estimate of repeatability vs. the known repeatability (ranging from 0·1 to 0·9) used in the generation of simulated data. (b) Accumulation of power (1 − β) relative to different numbers of individuals and number of samples per individual for the ability to detect repeatability. Simulation methods are detailed in Texts S13–S14 (Supporting information).

Summary

Univariate MMs can address a wide range of questions regarding variation – and its sources – both between and within individuals (detailed in Table S1, Supporting information). Specifically, univariate MMs facilitate:

  1. The estimation of between- and within-individual variation and repeatability;
  2. The estimation of individual variation in plasticity;
  3. Assignment of variation to fixed effects, and separation of within- and between-individual fixed effects;
  4. Statistical control for nonrandom distributions of individuals over environments.

Multivariate MMs

Introduction

Here, we discuss how multivariate MMs can be used to decompose phenotypic correlations. We detail first how between- and within-individual effects contribute to raw phenotypic correlations, as well as the biological underpinnings of correlations at each level. We then, for simplicity, focus on bivariate MMs, which may be applied whenever repeated measures for individuals are available for two phenotypic attributes (Table 2), for example a behavioural y and physiological z response. In Text S7 (Supporting information), we further detail how multivariate MMs may be used to ask whether specific variance components, or repeatabilities, of single phenotypic attributes differ between data sets (e.g. sexes, treatments, populations). In Text S16 (Supporting information), we also discuss how more complicated multivariate relationships can be evaluated.

Correlations between Labile Phenotypic Attributes

The association between two phenotypic characteristics – for example maximal metabolic rate (y) and basal metabolic rate (z) – is typically estimated by calculating a phenotypic correlation (inline image) eqn 6a:

display math(eqn 6a)

where inline image is the covariance between maximal (y) and basal metabolic rates (z), and inline image and inline image are the corresponding phenotypic variances.

As discussed earlier, inline image does not, on its own, tell us much about the nature of the association between y and z because it is shaped by correlations at two distinct levels: between and within individuals. A between-individual correlation is present when individual mean values of y (inline image) correlate with individual mean values of z (inline image) (as in Fig. 4a, panel 3). A within-individual correlation exists when an individual's change in y between time period t and + 1 is correlated with its change in z over the same period (as in Fig. 4b, panel 4). Statistically, within-individual covariances are generated by covariances between deviations from individual mean values for y (i.e. inline image) and z (i.e. inline image) (Fig. 4b, panel 4).

Figure 4.

Illustrations of situations where a positive phenotypic correlation (inline image) originates primarily from a positive (a) between-individual correlation (inline image) vs. (b) within-individual correlation (inline image), where each of nine individuals (numbers) is assayed once for each of two phenotypic attributes (y and z) within each of two time periods (scenario 3 in Table 2). For both situations we plot (from left to right): (1) y vs. z at t=1, i.e. y1j vs. z1j; (2) y vs. z at = 2, i.e. y2j vs. z2j; (3) the average value of y vs. the average value of z, i.e. inline image vs. inline image; (4) the deviation of each observation from the individual's mean of y vs. z, i.e. inline image vs. inline image. Attributes y and z are tightly correlated at each point in time, either (a) due to a tight between-individual correlation because both traits have high repeatabilities or (b) due to a tight within-individual correlation because both traits have low repeatabilities.

For two labile and repeatable phenotypic attributes (y and z), the between- and within-individual correlations jointly contribute to the phenotypic correlation as (eqn 6b; Dingemanse, Dochtermann & Nakagawa 2012):

display math(eqn 6b)

where the geometric mean repeatability – the square-root of the product of the repeatabilities of the two phenotypic attributes

display math

determines the contribution of the between-individual correlation (inline image) to the overall phenotypic correlation.

Practically, inline image will approximate the between-individual correlation (inline image) most closely for cases where y and z are both highly repeatable (as in Fig. 4a) compared with cases where they are not (as in Fig. 4b). Similarly, eqn 6b simplifies to inline image for the unlikely scenario where y and z both completely lack between-individual variance (i.e. inline image). Equation 6b can also simplify to inline image when both phenotypic attributes completely lack within-individual variability. The latter condition could apply to suites of phenotypic attributes that become fixed in adulthood (e.g. skull dimensions, arm or leg lengths). For nonlabile traits, like these inline image can be estimated from a single measurement per attribute per individual (assuming zero measurement error). Otherwise, one cannot appropriately infer inline image nor inline image without statistically decomposing inline image (as discussed in the section 'Utility of MMs vs. Alternatives'). Hence, when research questions explicitly ask for the estimation of inline image (e.g. behavioural syndrome research; Dingemanse, Dochtermann & Nakagawa 2012), or of both inline image and inline image (e.g. life history research; Reznick, Nunney & Tessier 2000), specific sampling designs (where each individual is assayed more than once) and special decomposition tools (multivariate MMs) will be necessary research requirements (detailed below).

For labile phenotypic attributes, like behavioural, physiological or life history traits, that typically exhibit intermediate repeatabilities, inline image will equal neither inline image nor inline image. For example, repeatabilities of behavioural responses average around 0·37 (Bell, Hankison & Laskowski 2009), implying an average geometric mean repeatability below 0·37. Consequently, within-individual correlations would influence phenotypic correlations at least (1–0·37)/0·37 = 1·70 times more strongly than would between-individual correlations. Phenotypic correlations for such labile phenotypic attributes therefore largely reflect within-individual correlations, and significant phenotypic correlations should not blindly be taken as evidence for between-individual correlations. Our reading of the current literature leads us to the conclusion that this concern is insufficiently appreciated. In fact, we found few ecological examples outside of the quantitative genetics literature where raw phenotypic correlations were statistically decomposed into within- vs. between-individual components (Browne et al. 2007; van de Crommenacker et al. 2011; Mutzel et al. 2011; Wilson et al. 2011; Adriaenssens & Johnsson 2012; Dochtermann et al. 2012).

Within- and between-Individual Correlations: How They Are Caused and Why They Often Differ

What are the biological underpinnings of between-individual and within-individual correlations? Proximally, we can distinguish three main contributors to phenotypic correlations: genetic mechanisms, environmental mechanisms and methodological artefacts (Fig. 5), although other contributors also exist (e.g. gene-environment interactions; Sgro & Hoffmann 2004). In the Text S15 (Supporting information), we provide a suite of examples of ecological questions that can be addressed by partitioning correlations in between- and within-individual effects. Here, we provide a brief summary before detailing the simplest implementations of MMs that may be used for such purposes.

Figure 5.

Hierarchical diagram illustrating how ‘raw’ phenotypic correlations (inline image) are underpinned by the joined influences of between- (inline image) and within-individual correlations (inline image), which are in turn shaped by genetic, permanent environment, environment and error correlations, which are themselves due to genetic and environmental variation and measurement error. A range of biological examples is given in Text S15 (Supporting information).

Genetic variation underpins phenotypic correlations whenever phenotypic attributes are linked through genetic correlations, for example, the same genes affect the expression of multiple phenotypic attributes (pleiotropy) or genes affecting the expression of one phenotypic attribute are correlated with the genes affecting the expression of another (linkage disequilibrium) (Lynch & Walsh 1998). Genes are attributes of individuals, and genetic variation thus contributes to variation at the between-individual level. Hence, genes that affect multiple aspects of the phenotype cause both between-individual (i.e. repeatable) variation in y and z (inline image) and between-individual correlations (Fig. 5).

Environmental variation can underpin correlations through a variety of mechanisms, both between and within individuals (Fig. 5). At the between-individual level environmentally-induced correlations are called permanent environment correlations, at the within-individual level simply environmental correlations (Fig. 5). Importantly, ‘permanent’ refers here to environmental variation causing between-individual differences over the time span within which the repeated measures were taken (Wilson et al. 2010); it does not imply that such environmental factors have effects that are permanent.

Finally, within-individual correlations in particular can also be underpinned by correlated measurement errors (Fig. 5). This could be due to differences in accuracies or precision of equipment or result from other methodological practices (e.g. due to effects of order in which phenotypic assays are conducted; Dochtermann 2010). Fortunately, such biases can be quantified and statistically controlled either with appropriate sampling designs (Dochtermann 2010) or the inclusion of additional random effects (Text S4, Supporting information).

Decomposing Phenotypic Covariances Using MMs

Just as univariate MMs were used to decompose phenotypic variances, multivariate MMs can be used to decompose phenotypic covariances into between- and within-individual covariance components whenever repeated measures of two (or more) phenotypic attributes are available for a set of individuals (Table 2). Multivariate MMs share similar characteristics as discussed for univariate MMs except that the covariances between the response variables are explicitly considered. A bivariate equivalent of eqn 1 – where no fixed effects are included in the linear equation except for the constant (β0), and where the phenotypic (co)variance is decomposed between vs. within individuals – is eqn 7a:

display math(eqn 7a)

where y and z represent two phenotypic attributes. As in the first examples for using univariate MMs, instance i for individual j is modelled here by fitting a random intercept for each level of individual (ind0j). Typically, β0y and β0z are modelled as being distinct (e.g. Matsuyama & Ohashi 1997).

At first glance eqn 7a appears to simply be two univariate MMs. Importantly, this is not the case because of how the between- and within-individual effects are estimated with multivariate MMs. As was the case with univariate MMs, the random intercepts (ind0j) and the within-individual contributions (e0j) to y and z are modelled as having means of zero. However, in this bivariate case, neither the random intercepts nor the residual errors are independent. Instead, the random intercepts are distributed assuming a multivariate normal distribution with a variance-covariance structure (Ωind) specifying the between-individual variances (inline image and inline image) and the between-individual covariance between the two attributes (inline image; eqn 7b. The residual errors (e0ij) are likewise assumed to be drawn from a multivariate normal distribution, with means of zero, within-individual variances (inline image and inline image), and within-individual covariances (inline image; eqn 7b:

display math(eqn 7b)

where the between- (inline image) and within-individual (inline image) correlations can be calculated from the between- and within-individual variances and covariances as eqn 7c,d:

display math(eqn 7c)
display math(eqn 7d)

Testing the Influence of Fixed Effects on Correlations

The questions asked about between- and within-individual variances can also be asked about between- and within-individual correlations. In the univariate MMs section, we detailed how the inclusion of fixed effects may help explain variance between and/or within individuals eqns 3, 4, 5. One can apply the same approaches when asking questions about sources of covariance by fitting fixed effects into the bivariate MMs described in eqn 7. For example, inclusion of genotypic information as a fixed (or random) effect would reveal the extent to which inline image was determined by a particular gene with pleiotropic effects, whereas inclusion of the time of day a measurement was taken could reveal whether inline image is attributable to diurnal variation.

Sampling Designs and Sample Sizes Requirements

For the analyses of correlations between phenotypic attributes, repeated observations for two (or more) phenotypic attributes are needed to estimate between- and within-individual correlations (Table 2). Beyond these general requirements, some specific sampling designs are as follows:

  1. Two phenotypic attributes, y and z, are both assayed at the same (two or more) points in time (scenario 3 in Table 2; Fig. 4). For example, lay date (y) and clutch size (z) are both measured at the onset of each breeding season and repeated measures collected for all individuals that survive across seasons. Such a design allows for the estimation of between-individual and within-individual correlations (Table 2).
  2. Two phenotypic attributes are both assayed repeatedly but never at the same time (scenario 4 in Table 2). For example, both foraging behaviour (y) and nestling provisioning effort (z) are assayed more than once for the same set of individuals, but the former is assayed once a month in winter, whereas the latter is assayed once per month in summer. Such a design also allows for the estimation of between-individual correlations but within-individual correlations are nonestimable (Table 2).

Other scenarios where each phenotypic attribute is assayed only once, either at the same point in time or at different points in time do not allow the decomposition of phenotypic (co)variances (scenarios 1 and 2; Table 2). Unfortunately, beyond the above mentioned general structural requirements there is little guidance regarding optimal sample sizes, an issue we addressed below using simulation studies.

For different combinations of inline image and inline image, we asked whether the accuracy with which these parameters are estimated is a function of the number of individuals and number of samples per individual, and which of these two aspects is most important (simulation details are discussed in Texts S11–S12, Supporting information). We also discuss how statistical power necessary to detect significant between-individual correlations is a function of sample size for a range of between-individual correlations (where repeatability = 0·5 and inline image).

When traits have a repeatability of 0·5 and between-individual correlations are ≥|0·5|, most sample sizes provide acceptable power (≥0·8) (Fig. 1b). When inline image is |0·5| and 25–50 individuals are sampled, power to detect a between-individual correlation is initially lower than 0·8 but rapidly increases with sample size (Fig. 1b). In contrast, sample sizes greatly affect power to detect between-individual correlations below |0·5|: A large number of individuals (≥125) should be sampled more than twice to detect values of inline image of 0·3 with acceptable power. When inline image is 0·3 and individuals are sampled only twice, sufficient power is only reached with a total of 200 individuals. Power to detect between-individual correlations of |0·1| is always extremely low (Fig. 1b) and would require total sample sizes far larger than those considered in our simulations (2000). When the total sample size is limiting our simulations suggest that power is somewhat increased by favouring more individuals rather than more samples per individual. For example, a power of 0·8 to statistically detect values of inline image of 0·5 may be achieved with either 75 individuals sampled for each attribute twice (total sample size = 150) or with 50 individuals sampled for each attribute four times (total sample size = 200) (Fig. 1b).

In addition to considering power, the accuracy of estimates of between- and within-individual correlations might also be of concern because estimated values are less accurate for between- compared with within-individual correlations (Fig. 1a). Within-individual correlations are also estimated with greater accuracy than between-individual correlations across all total sample sizes (Fig. 1a). When both correlations are of interest, sample sizes should thus be optimized with respect to between-individual correlations.

Even with large sample sizes multivariate MMs should be applied with caution (see references cited above) as they bear a larger number of assumptions compared with univariate MMs, such as an assumption of multivariate normality (e.g. Snijders & Bosker 1999). Moreover, covariance estimates derived from multivariate MMs assume that the phenotypic attributes being examined are associated in a linear manner; transformation can sometimes alleviate violations of this assumption. Nonetheless decomposition of correlations into between- and within-individual components necessitates these approaches (Dingemanse, Dochtermann & Nakagawa 2012).

Summary

Multivariate MMs can address a wide range of questions regarding covariation – and its sources – both between and within individuals. Specifically, multivariate MMs facilitate:

  1. The decomposition of phenotypic correlations into between- and within-individual correlations (Figs 4 and 5);
  2. Assignment of covariances to fixed effects;
  3. Comparison of variance components, and repeatabilities, among data sets (Text S7, Supporting information).

Discussion

We have detailed in this paper how mixed-effect models can be applied to estimate a suite of between- and within-individual variance components (Table 1) of key importance to ecologists. Our study focused primarily on key questions of sampling designs (Table 2), sample sizes (Figs 1 and 3) and models to estimate specific variance components, while detailing how bias such as pseudo-repeatability ('Glossary') can be investigated and controlled for statistically, or how one would model how variation in ecological variables (e.g. conspecific density) affects the magnitude of variance components.

We hope that our discussion of between- and within-individual variance components (Table 1) will help facilitate the introduction of statistical definitions of biological patterns of ecological relevance. For example, the animal personality literature is full of anthropomorphic and misleading verbal terminology that hampers progress. The statistical framework reviewed here provides a means to define distinct terms statistically. For example, defining personality vs. plasticity as behavioural reaction norm intercepts and slopes respectively enables these distinct patterns of variation to be studied within a single framework (Dingemanse et al. 2010; Westneat et al. 2011) and compared across studies (Mathot et al. 2012), while statistically defining behavioural syndromes as nonzero between-individual correlations clarifies the types of study designs necessary for their study (Dingemanse, Dochtermann & Nakagawa 2012). We further hope that the application of these statistical models will enable researchers to develop biological hypotheses not previously considered in their study organisms, for example for why phenotypic correlations might vary within vs. between their individual subjects. Finally, above all we hope that this paper helps researchers to construct appropriate statistical models to test ecological hypotheses about how variance components are shaped by natural and sexual selection.

Acknowledgements

We thank Dan Nussey for stimulating us to write this article, Jon Brommer, Wolfgang Forstmeier, Denis Réale, Dave Westneat and Jon Wright, for inspiring discussions on mixed-effect modelling, Tim Coulson, Jarrod Hadfield, Martijn van de Pol, Dave Westneat, and two anonymous reviewers for constructive editorial/reviewer comments, Yimen Araya-Ajoy, Cynthia Downs, and Shinichi Nakagawa for commenting on the manuscript. Dave Westneat kindly provided the SAS-code for the section ‘Do it yourself’ (Text S17, Supporting information). N.J.D. was supported by the Max Planck Society (MPG).

Glossary

Between-individual correlation: Phenotypic correlation at the between-individual level, that is the individual average phenotypic responses of two traits are correlated; called a behavioural syndrome in the context of behaviour (Dingemanse, Dochtermann & Nakagawa 2012). [inline image; eqn 7c]

Behavioural reaction norm: The function describing the relationship between the behavioural phenotype and environmental gradient within the same individual (Martin & Réale 2008). [eqn 5]

Between-individual variance: The amount of phenotypic variance attributable to differences between individuals in average phenotype. [inline image; Table 1]

Individual repeatability: The proportion of phenotypic variance that is attributable to differences between individuals, where phenotypic variance represents the sum of the between- and within-individual variance (Falconer & Mackay 1996).

Personality: Variation among individuals in the intercept of their behavioural reaction norm (Dingemanse et al. 2010).

Pseudo-repeatability: Biased (inflated) repeatability estimate because predictor variables that influence the phenotype within individuals vary between individuals because of a biased sampling scheme; called pseudo-personality in the context of behaviour (Westneat et al. 2011).

Variance component: A random factor explaining phenotypic variance, such as individual or territory identity.

Within-individual correlation: Phenotypic correlation at the within-individual level, that is, two phenotypic attributes show correlated changes within individuals. [inline image; eqn 7d]

Within-individual variance: Amount of phenotypic variance attributable to differences in phenotype among observations of the same individual. [inline image; Table 1]

Within-subject centring: Expressing the observation of a covariate that varies both within and between individuals as a deviation from its mean value over all observations of the same individual (van de Pol & Wright 2009). [eqn S4]

Ancillary