#### Introduction

We introduce here the notation for the simplest univariate MM, where a constant (β_{0}) and the differences between individuals are modelled by including what are known as random intercepts to decomposes phenotypic variance () into between- and within-individual components eqn 1a:

- (eqn 1a)

We note that while is the ‘residual error’ representing measurement error and general environmental variance, it has biological relevance as it includes average within-individual plasticity towards any stimulus that is statistically unaccounted (Westneat *et al*. 2011). This notation, used throughout, differs from the typical statistical notation but is one that we find both unambiguous and intuitive: variances are abbreviated as *V*, covariances as *Cov* and random effects for individuals as ‘ind’.

Equation 1a can be expanded to include additional fixed effects (β terms), like environmental covariates, and the impact of doing so on variance components is a focus of various later sections. Notably, the inclusion of fixed effects requires considerable thought (e.g. whether within-subject centring ['Glossary'] and transformations should be applied), as detailed in Text S3 (Supporting information).

#### Simple Repeatability Analysis

Univariate MMs eqn 1 can be used to decompose the ‘raw’ phenotypic variance in a single response variable (*y*) into between- and within-individual variances (Fig. 2a). Those components are informative in their own right (Jenkins 2011) – indicating the degree to which the expression of a trait differs between individuals vs. the degree to which a single observation differs from an individual's mean – and are also used to calculate repeatability (Falconer & Mackay 1996).

Repeatability is of key importance because it provides a standardized estimate of individuality that can be compared across studies and is part of quantitative genetics theory by setting an upper limit to heritability (but see, e.g. Dohm 2002 for important caveats). Repeatability represents the phenotypic variation (*V*_{P}) attributable to differences between individuals eqn 2:

- (eqn 2)

where . Confidence intervals for repeatabilities can be calculated following Nakagawa & Schielzeth (2010; see Text S17, Supporting information for worked examples). Equation 2 assumes a Gaussian error distribution and that repeated measures were taken under the same conditions (Lynch & Walsh 1998). When this first assumption is not met, alternative estimators of repeatability are available (detailed by Nakagawa & Schielzeth 2010), though researchers should also consider whether additional fixed effects – for example ‘sex’ when modelling variation in morphology for a sexually dimorphic species – can account for non-normality (additional fixed effects do change the interpretation of repeatability, detailed below). When the second assumption is not met, repeatability will be misestimated.

#### Avoiding ‘Pseudo-Repeatability’

Repeatability estimated from models controlling for between-individual fixed effects (e.g. ; eqn 3 represents the proportion of ‘phenotypic variance not accounted for by fixed effects’ (*V*_{P} − *V*_{FIXED}) explained by differences between individuals. This conditional repeatability ('Glossary') will often be the biologically relevant parameter; for example, in our provisioning example, the raw repeatability was inflated due to failure to observe all nests for the same period of time and the conditional estimate represented the biological repeatability.

#### Explaining Variance Components

where *y*_{ij} represents the level of aggressiveness of individual *j* at instance *i*. *x*_{1} could represent maternal hormone levels in the egg from which individual *j* was born and *x*_{2} the conspecific density experienced by individual *j* at instance *i*. As *x*_{1} is a covariate that differs between but not within individuals, it represents a between-individual effect (hence _{B}). In contrast *x*_{2} is a covariate that varies within an individual and thus represents a within-individual effect (hence _{W}). β_{1B} and β_{2W} are the coefficients relating, respectively, *x*_{1} and *x*_{2} to *y*_{ij}. Comparison of the values of and between a model where these between- and within-individual fixed effects were included eqn 4 vs. excluded (e.g. eqn 1a provides quantitative information on variance in each variance component explained by these fixed effects (for guidelines, see Snijders & Bosker 1999).

Equation 4 assumes that all individuals experienced the *same* set of conditions for any within-individual fixed effect *x*_{2} in eqn 4, such that the average value of such fixed effects would not vary among individuals. If this condition is not satisfied, the within-individual fixed effect would be conflated with between-individual variation, and within-subject centring ['Glossary'] methods may be needed to distinguish within- from between-individual effects (Snijders & Bosker 1999; van de Pol & Wright 2009) as detailed in Text S9 (Supporting information). Furthermore, effects of predictor variables with considerable measurement error, for example environmental states like predator density, will be estimated with bias and therefore require specific modelling approaches (e.g. Schafer 1987; Bartlett, De Stavola & Frost 2009). Care is also needed in avoiding spurious results due to failure to fit nonlinear effects of covariates, and the exact choice of the covariate (e.g. population vs. local density) generating specific results.

When one might wish to estimate how much between- or within-individual variation in one phenotypic attribute (e.g. a behavioural response) remains after controlling for variation in another (e.g. circulating hormone levels), multivariate MMs (introduced below), where both phenotypic attributes are treated as response variables (*y* and *z*), may instead be applied (see eqns S2a–d of the Text S8 (Supporting information) for how such conditional estimates are calculated). Multivariate models are particularly appropriate when it is not obvious which phenotypic attribute should be considered predictor vs. response.

#### Estimating Individual Variation in Plasticity

Phenotypes have thus far been considered a function of between- and within-individual fixed effects, implying that the range of phenotypes expressed by a single individual can be characterized by a regression line with the same slope for all individuals (eqns 1, 3, and 4; Fig. 2a). Individuals differed in intercept of this reaction norm by including a random intercept for each individual (ind_{0j}) eqn 1b, but all individuals shared the same reaction norm slope (e.g. β_{2W} in eqn 4). In other words, individuals could vary in their average phenotype but not in their phenotypic plasticity (Fig. 2a). Here, we extend MMs to include individual variation in reaction norm slopes (Fig. 2b).

Consider our previous example wherein aggressiveness of individuals was a function of maternal hormone levels (β_{1B}*x*_{1j}; eqn 4 and density (β_{2W}*x*_{2ij}; eqn 4. In doing so, we assumed that all individuals responded in the same manner to density. However, this assumption may often not hold because plasticity varies among individuals (reviewed by Nussey, Wilson & Brommer 2007; Dingemanse *et al*. 2010; Mathot *et al*. 2012). In our previous example, this would be characterized by individuals increasing (or decreasing) aggressive behaviour to a greater degree than others for the same change in conspecific density (Fig. 2b). We can statistically model this relationship by including a within-individual fixed effect covariate (*x*_{ij}; i.e. density) into our basic model eqn 1, while also fitting random slopes (ind_{1j}) around the population-average slope β_{1} of the dependence of *y*_{ij} on *x*_{ij}, which is called ‘random regression’ (Henderson 1982; Meyer 1998; Schaeffer 2004) eqn 5a:

- (eqn 5a)

Furthering our hypothetical measures of aggression, a negative intercept-slope correlation, as depicted in Fig. 2b, would suggest that individuals that have low average aggression scores compared with others also *increase* their aggression at a greater than average rate in response to increases in conspecific density. Intercept-slope correlations often differ from zero, see for example Mathot *et al*. (2012).

Importantly, the application of random regression implies that the between-individual variance is no longer stable over the (density) gradient, and now *uniquely* represents the between-individual variance at a specific section of environment (i.e. where all covariates have the value zero). Similarly, if one would calculate repeatability using eqn 2, the estimated value would be solely applicable to that specific section of the data. Specifically, repeatability can vary dramatically over the gradient when is tight (Text S5, Supporting information), requiring the evaluation of important assumptions (Text S6, Supporting information). For example, failure to acknowledge the presence of nonlinear effects of reaction norm slopes would automatically lead to inappropriate conclusions about the presence of individual variation in plasticity.

#### Sampling Designs and Sample Sizes Requirements

What type of sampling designs and sample sizes are needed to estimate specific between- and within-individual (co)variance components? Minimum design requirements for the analysis of single phenotypic attributes are provide in scenarios 3 and 4 of Table 2. Sample sizes needed for the accurate estimation for these variance components are, in contrast, less obvious.

Table 2. Four distinct sampling schemes (scenarios) and their estimable (co)variance components (defined in Table 1). We print ‘Data’ for points in time (1–4) where phenotypic data of the phenotypic attribute *y* and/or *z* have been collected for the same individual (1–3), and ‘–’ when no data has been collectedIndividual | Time | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 |
---|

*y* | *z* | *y* | *z* | *y* | *z* | *y* | *z* |
---|

1 | 1 | Data | Data | Data | – | Data | Data | Data | – |

1 | 2 | – | – | – | Data | Data | Data | Data | – |

1 | 3 | – | – | – | – | – | – | – | Data |

1 | 4 | – | – | – | – | – | – | – | Data |

2 | 1 | Data | Data | Data | – | Data | Data | Data | – |

2 | 2 | – | – | – | Data | Data | Data | Data | – |

2 | 3 | – | – | – | – | – | – | – | Data |

2 | 4 | – | – | – | – | – | – | – | Data |

3 | 1 | Data | Data | Data | – | Data | Data | Data | – |

3 | 2 | – | – | – | Data | Data | Data | Data | – |

3 | 3 | – | – | – | – | – | – | – | Data |

3 | 4 | – | – | – | – | – | – | – | Data |

… | … | … | … | … | … | … | … | … | … |

Recommendations about optimal sample sizes vary substantially, depending on (i) what one aims to optimize (accuracy or power), (ii) the variance component of interest, and (iii) constraints imposed by the study system (Snijders & Bosker 1999; Maas & Hox 2004; Scherbaum & Ferreter 2009; Hox 2010; Martin *et al*. 2011; van de Pol 2012). Unfortunately many published recommendations are drawn from situations relevant to the social sciences and have limited relevance for ecologists as they typically focus on cases where a few subjects (for example schools) are sampled with a great number of repeats (for example students). In contrast ecologists often work with larger numbers of subjects (individuals) but face constraints in the number of repeated samples that could possibly be collected. Martin *et al*. (2011) and van de Pol (2012) recently discussed sample size requirements for the estimation of intercept-slope correlations () and provide software for sampling optimization for several variance components. Thus, we focus here on simulations (detailed in Texts S13–S14, Supporting information) asking how sample size affects accuracy of and power to detect repeatabilities of different magnitudes.

Our simulations imply that when repeatability exceeds 0·5 it can be demonstrated with acceptable statistical power (~ 0·8) with few (e.g. 25) individuals sampled only twice (Fig. 3b). For lower repeatability values ≥4 samples per individual are typically required whenever the total number of individuals is low (≤100; Fig. 3b). Moreover, it will generally not be possible to detect repeatabilities of 0·1 with only two repeats per individual (Fig. 3b) as noted previously (Martin *et al*. 2011). Furthermore, accuracy of the estimated value of repeatability increases with true repeatability (Fig. 3a), suggesting that sampling considerations are particularly relevant for traits with low values of repeatability. Finally, optimal sample sizes greatly depend on the level of inaccuracy deemed acceptable. When the total sample size (number of individuals × number of repeats) is a constraining factor, inaccuracy can be decreased by increasing the number of samples per individual at the cost of the number of sampled individuals (Fig. 3b), though only when repeatability is ≤0·3.