Author to whom correspondence should be sent. E-mail: dohm@hawaii.edu

# Repeatability estimates do not always set an upper limit to heritability

Article first published online: 19 APR 2002

DOI: 10.1046/j.1365-2435.2002.00621.x

Additional Information

#### How to Cite

M. R., D. (2002), Repeatability estimates do not always set an upper limit to heritability. Functional Ecology, 16: 273–280. doi: 10.1046/j.1365-2435.2002.00621.x

#### Publication History

- Issue published online: 19 APR 2002
- Article first published online: 19 APR 2002
- Received 25 April 2001; revised 17 July 2001; accepted 2 August 2001

- Abstract
- Article
- References
- Cited By

### Keywords:

- Behaviour;
- common environmental effects;
- genotype–environment interaction;
- maternal effects;
- performance measures

### Summary

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

- 1The concept of repeatability, the measurement of consistent individual differences, has become an increasingly important tool in evolutionary and ecological physiology. Significant repeatability facilitates the study of selection acting on natural populations and the concept has several practical implications for identifying traits.
- 2When properly defined and measured, repeatability can set the upper limit to heritability. This is potentially a very useful interpretation of the repeatability of traits measured on natural populations because often, estimates of heritability cannot be obtained. Many recent reports of repeatability of individual differences for traits have made this interpretation.
- 3However, repeatability estimates may not set an upper limit to heritability if: (a) measured traits are not genetically identical, (b) common environmental effects work in opposition to direct genetic effects, (c) the temporary environments for each trait are negatively correlated, (d) significant genotype–environment interaction is present, or (e) the traits are influenced by maternal effects.
- 4The quantitative genetic theory that defines the concept of repeatability is reviewed and implications of violations of the five assumptions are discussed in the context of interpreting repeatability as an upper estimate to heritability.

### Introduction

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

The concept of repeatability, the proportion of total variance in multiple measurements of a trait that is due to differences among individuals, is a useful tool for quantifying the extent to which an individual’s performance or behaviour remains consistent over time (Bennett 1987; Lessells & Boag 1987; Boake 1989; Arnold 1994; Hayes & Jenkins 1997). Statistically significant repeatability estimates for ecologically relevant behaviours and physiological traits over periods of hours, days and years have now been reported for many natural populations (e.g. Arnold & Bennett 1984; Garland 1985; Huey & Dunham 1987; Djawdan & Garland 1988; Boake 1989; Van Berkum *et al*. 1989; Hayes & Chappell 1990; Jayne & Bennett 1990; Martins 1991; Austin & Shaffer 1992; Arnold, Peterson & Gladstone 1995; Chappell, Bachman & Odell 1995; Clark & Moore 1995; Godin & Dugatkin 1995; Kodric-Brown & Nicoletto 1997; Watkins 1997; Dohm *et al*. 1998; Hayes, Bible & Boone 1998; Hayes & O’Connor 1999; Rhodes *et al*. 2000; Dohm *et al*. 2001). Reports of lack of repeatability for behavioural or performance traits seem to be less common, but include mating preferences in Guppies (Kodric-Brown & Nicoletto 1997), push-up displays in Sagebrush Lizards (Martins 1991), running speed in Golden Marmots (Blumstein 1992), field metabolic rate (FMR) in both the Pouched Mouse (Speakman *et al*. 1994) and the Meadow Vole (Berteaux *et al*. 1996; Berteaux & Thomas 1999), and body mass in a high-altitude population of Deer Mice (Hayes & O’Connor 1999). Findings of low repeatability led some authors to raise the important issue of the ecological relevance and evolutionary significance for traits that apparently do not show consistent individual differences.

What, if anything, should be made of a given repeatability estimate? In particular, what does low repeatability for a trait such as mating preferences or FMR tell us? For traits with high repeatability (close to one), three interpretations are generally made. First, repeatability is said to set an upper bound for the broad- and therefore narrow-sense heritability of a trait because repeatability includes genetic and environmental sources of variation whereas heritability includes only genetic differences among individuals (Boake 1989; Falconer & Mackay 1996; Lynch & Walsh 1998). Second, significant repeatability may be an important determinant of how effective natural selection will be on changing the trait over time because of its relationship to heritability (Huey & Dunham 1987; Boake 1989). Third, high repeatability indicates that individuals tend to perform consistently and therefore there may be little practical reason to obtain multiple measurements (Arnold *et al*. 1995; Falconer & Mackay 1996). Conversely, low repeatability (significantly less than 1) may suggest practical problems associated with the measure (Boake 1989; Falconer & Mackay 1996). For example, low repeatability may indicate that an ecologically relevant time-frame has not been selected for the assessment of trait consistency (Arnold *et al*. 1995).

Based on the standard definition of repeatability (e.g. Falconer & Mackay 1996), some authors have made the reasonable, but incorrect, assumption that heritability cannot *ever* exceed repeatability. For example, about one-quarter of the reports cited above explicitly state that repeatability sets an upper limit to heritability. This statement is true only if certain restrictions hold and, naturally, we must distinguish between true repeatability and estimates of repeatability. I wish to emphasize that all of the restrictions on interpretations of repeatability I will describe here follow directly from Falconer’s presentation of the concepts of repeatability and heritability. However, these admonishments with respect to the interpretation of repeatability estimates are mentioned only briefly in his book (Falconer & MacKay 1996) and important points occur at different places in the text. For this reason, I believe it is appropriate to clarify explicitly the underlying genetic and environmental assumptions that must hold if repeatability estimates are to indicate upper bounds for heritability. My primary objective in this report therefore is to highlight the model assumptions that can lead to estimates of repeatability *less* than the narrow-sense heritability even for the case of two measures of the same genetic trait.

### The standard repeatability model

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

Estimation of repeatability, or heritability for that matter, assumes a specific model for the partitioning of the trait’s phenotypic variance (Lynch & Walsh 1998). Falconer (Falconer & Mackay 1996) defined repeatability (ρ) as

- ρ = (
*V*_{G}+*V*_{Eg})/*V*_{P}.( eqn 1)

Thus, the repeatability, or correlation between repeated measures of a trait on the same individuals results from individuals having different genotypes (*V*_{G}) and general environments (*V*_{Eg}). The appropriate statistical model is

**P**_{ij}= µ +**A**_{i}+**E**_{ij},( eqn 2)

where **P**_{ij} is *j*th record on the *i*th individual for a trait, µ is a constant, **A**_{i} is the effect on the record of the genetic plus permanent environmental effects, and **E*** _{ij}* is the random temporary environmental effects on the record

**P**

_{ij}(Becker 1984). We assume no covariance between any two elements in

**A**,

**E**, or between elements in

**A**and

**E**. A one-way anova may be used to obtain the observed variance components and repeatability is then calculated as the intraclass correlation coefficient (Becker 1984; Lessells & Boag 1987; Hayes & Jenkins 1997). Of course, linear regression and other methods can also be used to obtain variance components for calculation of repeatability (Van Vleck 1993; Hayes & Jenkins 1997). For two measures of the same trait, repeatability can be viewed as the proportion of the difference from the mean in one measure expected in another measure on the same individual. Repeatability then can be calculated as the regression coefficient for the second measure on the first measure. When the two measures have equal variances, the product–moment correlation between records also defines repeatability (Van Vleck 1993). Note that although the expected values for the components of repeatability are the same for regression and correlation approaches, the estimates will often not be equal (Van Vleck 1993).

Falconer stated two assumptions. Repeated measures of a trait must: (1) have equal variances and (2) be measures of the same genetic trait (i.e. a genetic correlation of one among the repeated measures). He added that, if these assumptions were not valid, the concept of repeatability was ‘somewhat vague ... without precise meaning in relation to the components of variance’ (Falconer & Mackay 1996, p. 138). In practice, we may never know whether our estimates conform entirely to these assumptions even if more complicated experimental designs are used, but a variety of approaches can be employed, often with interesting results (e.g. see discussion in Boake 1989; Arnold *et al*. 1995; Hayes & Jenkins 1997; Watkins 1997; Burness *et al*. 2000; Hoffmann 2000).

The first of the two assumptions listed by Falconer is not directly related to the present topic. The requirement for equal variances applied to estimates of repeatability follows from basic anova theory (e.g. Lindman 1992). In practice, however, the bias to testing of a repeatability estimate attributed to unequal variances is probably not going to be of chief concern provided that the sample size within groups is large relative to the number of groups and/or that the variances are not substantially different (Lindman 1992). Repeatability studies, conducted over relatively short periods of time, often conform to this pattern because such studies involve at most a few measurements of the trait on dozens of the same individuals. Repeatability measured over long periods of time may be more problematic because of missing data.

### A general repeatability model

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

Now, consider the second of Falconer’s two assumptions, whether repeated measures are estimates of the same genetic trait. For repeatability to set an upper bounds to heritability, a particular model for how individual differences are manifested must apply. The model described by Falconer is probably the most often cited for repeatability, but there are other equally valid ways to model the partitioning of within *vs* between individual differences (e.g. Turner & Young 1969; Van Vleck 1993; Hayes & Jenkins 1997). The important point for this discussion is that if the model for the decomposition of the trait’s phenotypic variance is incomplete (e.g. an important component of variance has been omitted from the model), then estimates of repeatability cannot be taken to indicate upper bounds to heritability. The models I present below treat repeated measures as different traits and follow general models presented in Wright (1969), although I do not include all possible paths (Wright 1969; Lynch & Walsh 1998). For clarity, I assume traits are measured without error, although measurement error could be incorporated into the models (see Lynch & Walsh 1998; Hoffmann 2000). Where appropriate, I provide examples, mostly hypothetical, to clarify the implications of specific paths or equations.

For a single trait measured on individuals from a population (Fig. 1), the simplest partitioning of total phenotypic variance (**P**) yields two causal components of variance: genetic (**G**) and environmental (**E**) components, where *h* and *e* are path coefficients representing direct effects of genes and environment, respectively, on trait variation. Note that **G** refers only to the additive genetic variance and **E** contains genetic (e.g. dominance) and environmental components of variance (Lynch & Walsh 1998). For two measures of the same trait (Fig. 2), let **P**_{1} and **P**_{2} represents the first and second measures of the trait, **G**_{1} and **G**_{2} the genotypes, **E**_{1} and **E**_{2} the contribution of temporary environmental effects, and **E**c_{1} and **E**c_{2} represent the contribution from the common environment component of variance. The path coefficients *h*_{1}, *h*_{2}, *e*_{1}, *e*_{2}, *c*_{1} and *c*_{2} account for the direct effects of genes, temporary environment and common environmental effects on phenotypic variance for the measurements. Extensions to more than two measures involves adding appropriate paths between **G**, **E**c, **E** and each **P**. From the rules of path analysis (Lynch & Walsh 1998), the structural equation for the correlation (ρ) between the first and second measures is

- ρ
**P**_{1}**P**_{2}=*h*_{1}*r*_{G}*h*_{2}+*c*_{1}*r*_{C}*c*_{2}.( eqn 3)

If we assume that *h*_{1} = *h*_{2}, *r*_{G} = +1, *r*_{C} = +1 and *c*_{1} = *c*_{2}, then equation 3 simplifies to

- ρ
**P**_{1}**P**_{2}=*h*^{2}+*c*^{2},( eqn 4)

where equation 4 is the general equation for repeatability. The two environmental effects, Ec and **E**, can best be distinguished with some examples. The general component, **E**c, represents environmental influences that affect an individual’s performance permanently. For example, aerobic training significantly improves stamina in humans and other mammals, whereas dietary deficiencies often can have long-lasting effects on an individual’s performance (Astrand & Rodahl 1986). Maternal effects (e.g. milk quality, parental care) can have short- or long-term influences on offspring phenotypes (Iverson *et al*. 1993; Arnold *et al*. 1995; Margulis & Altmann 1997). In contrast, **E**, the unique environmental component, represents temporary differences of environment on successive performances (Falconer & Mackay 1996). For example, recent feeding induces a negative effect on burst speed performance in garter snakes (Garland & Arnold 1983).

### Can repeatability ever underestimate heritability?

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

From Fig. 2, we see five conditions for which the true value for the narrow-sense heritability (*h*^{2}) may be greater than the absolute value of repeatability (*h*^{2} > ?ρ**P**_{1}**P**_{2}|): (1) the traits are not genetically identical, (2) common environmental effects are opposite of the genetic effects, (3) the temporary environments are correlated, (4) genotype–environment interaction is present, and (5) the traits are influenced by maternal or paternal effects.

#### Traits are not genetically identical

Repeated measures may reflect different traits genetically. This can be accounted for in the general model by allowing *r*_{G} < +1 or by allowing the *h* path coefficients to differ (e.g. through random assortment in meiosis, see (Wright 1969) for details). For *r*_{G} = +1, but *h*_{1} ≠ *h*_{2}, then repeatability is given by

- ρP
_{1}P_{2}=*h*_{1}*h*_{2}+*c*^{2}( eqn 5)

For clarity, assume the repeated measures are due solely to genetic causes (*c*^{2} = 0). If *h*_{2} < *h*_{1} (*h*_{1} and *h*_{2} ≠ 0), then *h*_{1}^{2} > |ρ**P**_{1}**P**_{2}|. For *r*_{G} < +1, but *h*_{1} = *h*_{2}, then the equation for repeatability becomes

- ρP
_{1}P_{2}=*h*^{2}*r*_{G}+*c*^{2}.( eqn 6)

Again, let *c*^{2} = 0; if –1 ≤ *r*_{G} < +1, then *h*^{2} > |ρ**P**_{1}**P**_{2}|. Therefore, and not surprisingly, estimates of repeatability between genetically different traits can be less than the heritability for one of the traits.

#### Common environmental effects

Repeatability may underestimate heritability if the common environment has opposite effects on the two measures. Common environmental effects are generally thought to increase resemblance between relatives and it does not seem likely that an effect of the common environment would have positive effects on the first record, but negative effects on the repeat performance (Falconer & MacKay 1996). However, there are situations in which a shared environment can reduce resemblance within families. In natural populations, if offspring do not move away from parental territory, or must establish themselves at the periphery of the parent’s suitable habitat, a negative common environmental effect may result. In the laboratory, mice are often housed in groups of two or more per cage. If the process of removing, then reintroducing, individuals for testing disrupts the social milieu of the cage, a cage effect (common environment) may be induced. To the extent the social interactions are vigorous in some cages but not others, then it is conceivable that the effects on performance might be opposite in sign, reducing repeatability. Thus, if –*c*_{1} and +*c*_{2} or if –1 ≤ *r*_{C} < 1, then repeatability can be less than heritability even if the separate records are the same genetic traits (*h*_{1} = *h*_{2}, *r*_{G} = +1). The equation for repeatability is now

- ρ
**P**_{1}**P**_{2}=*h*^{2}−*c*_{1}*r*_{C}*c*_{2}.( eqn 7)

If *c*_{1}*r*_{C}*c*_{2} is large, then *h*^{2} > |ρ**P**_{1}**P**_{2}|.

#### Temporary environments

Repeatability may underestimate heritability if temporary environments associated with each measure are negatively correlated (*r*_{E}, Fig. 2). For example, odour cues are an important component of a mouse’s response to novel environments (e.g. a treadmill belt). If these cues are not removed or at least standardized across trials, mice may respond to the prior presence of individuals and this may alter their performance. Learning or acclimation to the apparatus by the individual over repeated measures is probably unavoidable. One factor, under the control of the observer, includes the time at which an individual is measured for repeated measures. However, resolution is not straightforward. Which is better: to measure individuals at the same time of day, perhaps inducing an order effect, or to randomize times thereby potentially increasing the environmental differences between repeated measures? Repeatability, assuming genetically identical traits, now must include terms for the contributions of the unique environments,

- ρ
**P**_{1}**P**_{2}=*t*−*e*_{1}*r*_{E}*e*_{2},( eqn 8)

where *t* = *h*^{2} + *c*^{2}. If *e*_{1}*r*_{E }*e*_{2} > *t*, then *h*^{2} > |ρ**P**_{1}**P**_{2}|.

#### Interaction between genotype and environment

Repeatability may underestimate heritability if there is significant interaction between genotype and environment. To the extent that differences of environment influence genotypes in a non-random way, then an interaction is present. Genotype–environment interactions can be attributed to differences of sensitivity of genotypes (individuals), or a specific difference of environment may have a greater or lesser effect on a particular genotype (Falconer & MacKay 1996). To accommodate genotype–environment interaction in our general model of repeatability, the general path model can be altered to address the interaction between genes and environment by treating the problem as a correlation between two traits (*r*_{GxE}, Fig. 3). Thus, **P**_{1} and **P**_{2} are the same (genetic) traits if and only if *h*_{1} = *h*_{2}, *r*_{G} = +1, and *r*_{GxE} = 0. Note that this is also a general model for genotype–environment interaction or phenotypic plasticity (i.e. different sets of genes are expressed in the two environments [review in Via 1994]).

What are the effects of a model for genotype–environment interaction on repeatability and heritability? From the path model shown in Fig. 3, the relevant structural equation describing the repeatability now includes paths between genetic and environmental components:

- ρ
**P**_{1}**P**_{2}=*h*_{1}*r*_{G}*h*_{2}+*e*_{c1}*r*_{Ec}*e*_{c2}+*e*_{1}*r*_{G2E1}*h*_{2}+*e*_{2}*r*_{G1E2}*h*_{1}.( eqn 9)

With a few assumptions (*h*_{1} = *h*_{2}, *r*_{G} = +1, *e*_{c1} = *e*_{c2}, *r*_{Ec} = 1, *r*_{G2E1} = *r*_{G1E2}), we can simplify equation 9, yielding

- ρ
**P**_{1}**P**_{2}=*t*+ 2(*r*_{GE}*e*),( eqn 10)

where *t* = *h*_{1}*r*_{G}*h*_{2} + *e*_{m1}*r _{Em}*

*e*

_{m2}and

*r*

_{GE}=

*r*

_{G2E1}=

*r*

_{G1E2}. Thus, the model describes repeatability between genetically identical traits. Repeatability will be less than the heritability for the trait if

*h*

^{2}> [

*e*

_{m}

^{2}

*r*

_{Em}+ 2(

*hr*

_{GE}

*e*)]. For example, if

*r*

_{GE}= –1, repeatability will always underestimate

*h*

^{2}. This holds even for cases in which substantial heritability and a large, positive genetic correlation between the two measures is present.

#### Maternal effects

Lastly, maternal or paternal effects can also lead to unexpected relationships between repeatability and heritability. Models for maternal effects include links between the offspring’s phenotype and the maternal genes and maternal environment provided to the developing offspring (e.g. Kirkpatrick & Lande 1989; Cheverud & Moore 1994). Maternal effects are a special case of common environmental effects (Falconer & MacKay 1996). A path model including direct maternal effects is illustrated in (Fig. 4), where the subscript m indicates a maternal characteristic, **P**_{1}, **P**_{2}, *r*_{E}, *r*_{G}, **E**, *h* and *e* are defined as before, and *r*_{GGm} is the correlation between direct genetic and direct maternal genetic effects. For simplicity, I assume that maternal effects describe all common environments. Note that several potential paths were omitted from Fig. 4, including those between genetic maternal effects for one trait with direct genetic effects for the second trait (e.g. *r*_{Gm1G2}, *r*_{Gm2G1}), between maternal and unique environments (e.g. *r*_{Em1E1}, *r*_{Em2E2}), and correlations between genes and environments.

With maternal effects, the structural model for the repeatability between **P**_{1} and **P**_{2} is now

- ρ
**P**_{1}**P**_{2}=*h*_{1}*r*_{G}*h*_{2}+*e*_{m1}*r*_{Em}*e*_{m2}+*h*_{m1}*r*_{Gm}*h*_{m2}+*h*_{m1}*r*_{Gm1G2}*h*_{2}+*h*_{m2}*r*_{Gm2G1}*h*_{1}.( eqn 11)

Given the number of terms in equation 11, it is not surprising that many combinations of signs of path coefficients and correlations can potentially lead to *h*^{2} > |ρ**P**_{1}**P**_{2}|. To simplify equation 11, let *h*_{1} = *h*_{2}, *r*_{G} = +1, *e*_{m1} = *e*_{m2}, *h*_{m1} = *h*_{m2}. Again, the model describes repeatability between genetically identical traits. Repeatability under a model with maternal effects will underestimate narrow-sense heritability if one or more of the following correlations are large and negative: (1) *r*_{Em}, maternal environments; (2) *r*_{Gm}, maternal genetic effects; or (3) *r*_{GmG}, direct genetic and maternal genetic effects.

### Interpreting repeatability estimates

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

These models and examples may seem trivial, or lack biological relevance; obviously one should not talk about the relationship between heritability and repeatability for different traits, nor should we expect a simple relationship to hold if the environmental context under which two traits are expressed is complicated. Heritability estimates ‘refer to a particular population under particular conditions’ (Falconer & Mackay 1996, p. 161). The same concern applies to estimates of repeatability. On the other hand, a tantalizing application of the repeatability concept is that it can provide a means to quantify whether multiple measures represent the same or different traits or whether substantial, genetically based differences among individuals are likely to be present prior to conducting a full quantitative genetic analysis. For example, Watkins (1997) used the concept of repeatability of maximum burst swimming speed of anuran tadpoles across metamorphic stages. No repeatability was found and these results were tentatively interpreted as evidence that there was no genetic correlation between the same trait measured at different life stages and possible environmental differences between stages were discussed (Watkins 1997). Small *h*^{2} and correspondingly large environmental variance are sufficient for reducing the repeatability among successive measurements, but low repeatability may result from a number of other genetic and environmental causes of variation among individuals (Figs 2–4).

When can repeatability set an upper limit to heritability? The answer will depend on the relative complexity of the genetic and environmental contributions to trait (co)variation. In practice, when an estimate is not statistically significantly different from 1, and the test has suitable power, repeatability may inform us about the upper bounds for heritability. For example, in laboratory house mice, multiple measures of body mass taken on adults over short time periods typically have high repeatabilities (Falconer & Mackay 1996); these repeated measures certainly represent the same genetic trait and the additional genetic and environmental assumptions of the simple repeatability model probably hold. Body mass measured on adult mice before and after a period of fasting, however, probably should be treated as different traits. For example, for 337 male and female, genetically variable laboratory house mice, the repeatability of *ad libitum* feed body mass and body mass measured after 24 h of fasting was 0·28 (significantly different from 1 and 0 at α = 0·05; M. R. Dohm, T. Garland Jr. & J. P. Hayes, unpublished results). Interestingly, the broad-sense heritabilities (*h*^{2}_{B}) were greater than the repeatability estimate (*h*^{2}_{B} = 0·92 for *ad libitum* feed body mass, *h*^{2}_{B} = 0·69 for fasting body mass, *r*_{G} = +0·94, all significantly different from 0) (Dohm 1994). (Note that only the heritability for *ad libitum* body mass was statistically different from 0·28 at α = 0·05.) A second, more ecologically relevant example was reported by Hayes & O’Connor (1999) in their study of natural selection on aerobic capacity of high-altitude Deer Mice. Repeatability of body mass differences among individuals was surprisingly low over about 2 months (product–moment correlation = 0·29, *P* = 0·097). At least for laboratory populations of House Mice (e.g. Falconer & MacKay 1996) and Deer Mice (e.g. Losvold 1986), the heritabilities of various measures of body size are often greater than the field repeatability estimate for body mass reported in Hayes & O’Connor (1999). Presumably, the low field repeatability can be explained by elevated environmental noise in the field compared to the laboratory setting, but as indicated from the path diagrams (Figs 2–4), additional sources of variation may also be important.

### Some modest suggestions for repeatability experiments

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

Certainly, no researcher would knowingly ignore environmental factors that had opposite affects on repeated measures of performance, whether of a general factor with lasting impact on performance or a unique factor, specific to each measurement of performance. Nor would a researcher knowingly apply repeatability to different traits, although a statistically significant ‘repeatability’ between different traits could be viewed as a rough test for genetic correlation (e.g. Watkins 1997). However, even after diligent effort by the researcher to control the experimental context in which the measures are conducted, there is no absolute guarantee that all relevant factors have been accounted for. This caveat is especially true when attempts are made to estimate field repeatability, the repeatability of two or more measures of a trait in a natural population. A genotype–environment interaction may result in the field context simply because recently caught individuals are very likely to respond differently to the stresses of capture and captivity (Baker, Gemmell & Gemmell 1998).

In particular, repeatability estimates may be problematic for highly plastic or strongly context-dependent traits, at least in the sense of providing a bounds for heritability using a relatively simple experimental design compared to a full-blown quantitative genetic study (cf. discussion in Boake 1989). To cite one possible example, low repeatability estimates (intraclass correlation < 0·3) for field metabolic rate, FMR, may have resulted from a failure of individuals to maintain energy balance or from differences in the activity budget during the period of study, as suggested by the authors (Speakman *et al*. 1994; Berteaux *et al*. 1996). The first explanation may represent a violation of the assumption of a lack of independence among the unique environmental effects for successive measures (equation 8, Fig. 2), whereas the second explanation may represent a genotype by environment interaction, i.e. captivity may affect individuals differently (see Fynn *et al*. 2001, e.g. significant FMR repeatability).

When practical, repeatability studies on plastic traits may be improved if experimental designs can be used to rule out violations of one or more of the assumptions. For repeatability of FMR, for example, one might assess consistency of individual differences for metabolic rate over spans of several days on captive mice subjected to different feeding or temperature regimens. Perhaps individuals tend to have high FMR under one treatment, but low under a second treatment, suggesting the presence of genotype-by-environment interaction. Are correlations across treatment groups universally low or heterogeneous? If the correlations are not similar, then additional causes of variation may apply and the assumptions of the standard model for repeatability may be violated. Structural equation modelling, including path analysis, provides a fruitful framework for the design and analysis of more complex analyses of repeatability (Hayes & Shonkwiler 1996).

### Conclusions

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

The concept of repeatability certainly has an important place in ecological and evolutionary studies of individual variation (Bennett 1987; Clutton-Brock 1988; Boake 1989; Arnold 1994; Hayes & Jenkins 1997). However, if an incorrect genetic and environmental model is employed (i.e. any of the five cases listed above apply), then repeatability estimates may be virtually meaningless with respect to what the true heritability might be. Without simultaneous estimation of heritability, it is difficult to judge whether reported estimates of low repeatability have incorrectly been taken as evidence for the upper bounds of heritability. Based on the few available studies for which both repeatability and heritability of performance are available for natural populations (e.g. Arnold & Bennett 1984; Tsuji *et al*. 1989), I do not know of any example of larger, statistically significant heritability compared to repeatability estimates. On the other hand, we know virtually nothing about maternal effects or genotype–environment interaction on performance measures and studies to elucidate these pathways will be an important area for evolutionary and ecological physiologists in the near future (e.g. see Sinervo & Huey 1990; Arnold *et al*. 1995; Burness *et al*. 2000).

I do not wish to imply that small repeatability estimates are meaningless. A statistically significant repeatability estimate provides a testable hypothesis about trait heritability. The concept of repeatability is also important for defining traits and for choosing appropriate statistical models for analysis of trait variation (Boake 1989; Arnold *et al*. 1995; Hayes & Jenkins 1997; Hoffmann 2000). Specifically, repeatability estimates less than one may provide direction for additional tests according to the models provided in Figs 2–4. For the present, however, if the complications illustrated in the path models cannot be ruled out, then we must conclude that it is inappropriate to assume that the simple genetic model of phenotypic covariation is correct and therefore repeatability cannot be taken as an upper bound to heritability.

### Acknowledgements

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

I thank M. Dentine and J. Hayes for helpful discussions on the concept of repeatability. J. Hayes, K. Lessells, D. Roff and an anonymous reviewer made insightful suggestions to improve earlier drafts of the manuscript.

### References

- Top of page
- Summary
- Introduction
- The standard repeatability model
- A general repeatability model
- Can repeatability ever underestimate heritability?
- Interpreting repeatability estimates
- Some modest suggestions for repeatability experiments
- Conclusions
- Acknowledgements
- References

- 1994) Multivariate inheritance and evolution: a review of concepts. Quantitative Genetic Studies of Behavioral Evolution (ed. C. R. B.Boake), pp. 17–48. University of Chicago Press, Chicago. (
- 1984) Behavioural variation in natural populations. III: Antipredator displays in the garter snake
*Thamnophis radix*. Animal Behaviour 16, 1108–1118. & ( - 1995) Behavioural variation in natural populations. VII. Maternal body temperature does not affect juvenile thermoregulation in a garter snake. Animal Behavior 50, 623–633. , & (
- 1986) Textbook of Work Physiology: Physiological Bases of Exercise, 3rd edn. McGraw-Hill, New York. & (
- 1992) Short-, medium-, and long-term repeatability of locomotor performance in the Tiger Salamander
*Ambystoma californiense*. Functional Ecology 6, 145–153. & ( - 1998) Physiological changes in brushtail possums,
*Trichosurus vulpecula*, transferred from the wild to captivity. Journal of Experimental Zoology 280, 203–212.Direct Link: , & ( - 1984) Manual of Quantitative Genetics, 4th edn. Academic Enterprises, Pullman, WA. (
- 1987) Interindividual variability. New Directions in Ecological Physiology (Eds M. E.Feder, A. F.Bennett, W. W.Burggren & R. B.Huey), pp. 147–169. Cambridge University Press, New York. (
- 1999) Seasonal and interindividual variation in field water metabolism of female meadow voles
*Microtus pennsylvanicus*. Physiological and Biochemical Zoology 72, 545–554. & ( - 1996) Repeatability of daily field metabolic rate in female Meadow Voles (
*Microtus pennsylvanicus*). Functional Ecology 10, 751–759. , , & ( - 1992) Multivariate analysis of golden marmot maximum running speed: a new methods to study MRS in the field. Ecology 73, 1757–1767. (
- 1989) Repeatability: its role in evolutionary studies of mating behavior. Evolutionary Ecology 3, 173–182. (
- 2000) Effect of brood size manipulation on offspring physiology: an experiment with Passerine birds. Journal of Experimental Biology 203, 3513–3520. , , & (
- 1995) Repeatability of maximal aerobic performance in Belding’s Ground Squirrels,
*Spermophilus beldingi*. Functional Ecology 9, 498–504. , & ( - 1994) Quantitative genetics and the role of the environment provided by relatives in behavioural evolution. Quantitative Genetic Studies of Behavioral Evolution (ed. C. R. B.Boake), pp. 67–100. University of Chicago Press, Chicago. & (
- 1995) Variation and repeatability of male agonistic hiss characteristics and their relationship to social rank in
*Gromphadorhina portentosa*. Animal Behaviour 50, 719–729. & ( - 1988) Reproductive Success: Studies of Individual Variation in Contrasting Breeding Systems. University of Chicago Press, Chicago. , ed. (
- 1988) Maximal running speeds of bipedal and quadrupedal rodents. Journal of Mammalogy 69, 765–772. & (
- 1994)
*Quantitative genetics of locomotor performance and physiology in house mice*(Mus domesticus). PhD Thesis, University of Wisconsin, Madison. ( - 1998) Physiological variation and allometry in western whiptail lizards (
*Cnemidophorus tigris*) from a transect across a persistent hybrid zone. Copeia 1998, 1–13. , , & ( - 2001) Effects of ozone on evaporative water loss and thermoregulatory behavior of marine toads (
*Bufo marinus*). Environmental Research A 86, 274–286. , , , & ( - 1996) Introduction to Quantitative Genetics, 4th edn. Longman, Harlow. & (
- 2001) Individual variation in field metabolic rate of Kittiwakes (
*Rissa tridactyla*) during the chick-rearing period. Physiological and Biochemical Zoology 74, 343–355. , , , , & ( - 1985) Ontogenetic and individual variation in size, shape and speed in the Australian agamid lizard
*Amphibolurus nuchalis*. Journal of Zoology, London 207(A), 425–439. ( - 1983) Effects of a full stomach on locomotory performance of juvenile garter snakes (
*Thamnophis elegans*). Copeia 1983, 1093–1096. & ( - 1995) Variability and repeatability of female mating preference in the guppy. Animal Behaviour 49, 1427–1433. & (
- 1990) Individual consistency of maximal oxygen consumption in Deer Mice. Functional Ecology 4, 495–503. & (
- 1997) Individual variation in mammals. Journal of Mammalogy 78, 274–293. & (
- 1999) Natural selection on thermogenic capacity of high-altitude deer mice. Evolution 53, 1280–1287. & (
- 1996) Altitudinal effects on water fluxes of deer mice: a physiological application of structural equation modeling with latent variables. Physiological Zoology 69, 509–531. & (
- 1998) Repeatability of mammalian physiology: evaporative water loss and oxygen consumption of
*Dipodomys merriami*. Journal of Mammalogy 79, 475–485. , & ( - 2000) Laboratory and field heritability: some lessons from
*Drosophila*. Adaptive Genetic Variation in the Wild (eds T. A.Mousseau, B.Sinervo & J. A.Endler), pp. 200–218. Oxford University Press, New York. ( - 1987) Repeatability of locomotor performance in natural populations of the lizard,
*Sceloporus merriami*. Evolution 41, 1116–1120. & ( - 1993) The effect of maternal size and milk energy output on pup growth in grey seals (
*Halichoerus grypus*). Physiological Zoology 66, 61–88. , , & ( - 1990) Selection on locomotor performance capacity in a natural population of garter snakes. Evolution 44, 1204–1229. & (
- 1989) The evolution of maternal characters. Evolution 43, 485–503. & (
- 1997) Repeatability of female choice in the guppy: response to live and videotaped males. Animal Behaviour 54, 369–376. & (
- 1987) Unrepeatable repeatabilities: a common mistake. Auk 104, 116–121. & (
- 1992) Analysis of Variance in Experimental Design. Springer-Verlag, New York. (
- 1986) Quantitative genetics of morhological differentiation in
*Peromyscus*. I. Tests of the homogeneity of genetic covariance structure among species and subspecies. Evolution 40, 559–573. ( - 1998) Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA. & (
- 1997) Behavioural risk factors in the reproduction of inbred and outbred oldfield mice. Animal Behaviour 55, 427–438. & (
- 1991) Individual and sex differences in the use of the push-up display by the sagebrush lizard,
*Sceloporus graciosus*. Animal Behaviour 41, 403–416. ( - 2000) Body temperatures of house mice artificially selected for high voluntary wheel-running behavior: repeatability and effect of genetic selection. Journal of Thermal Biology 25, 391–400. , , , & (
- 1990) Allometric engineering: an experimental test of the causes of interpopulational differences in locomotor performance. Science 248, 1106–1109. & (
- 1994) Inter- and intraindividual variation in daily energy expenditure of the Pouched Mouse (
*Saccostomus campestris*). Functional Ecology 8, 336–342. , , , , & ( - 1989) Locomotor performance of hatchling fence lizards (
*Sceloporus occidentalis*): Quantitative genetics and morphological correlates. Evolutionary Ecology 3, 240–252. , , , & ( - 1969) Quantitative Genetics in Sheep Breeding. Cornell University Press, Ithaca, NY. & (
- 1989) Repeatability of individual differences in locomotor performance and body size during early ontogeny of the lizard
*Sceloporus occidentalis*(Baird & Girard). Evolutionary Ecology 3, 97–105. , , & ( - 1993) Selection Index and Introduction to Mixed Model Methods. CRC Press, Boca Raton, FL. (
- 1994) The evolution of phenotypic plasticity: what do we really know? Ecological Genetics (ed. L. A.Real), pp. 35–57. Princeton University Press, Princeton, NJ. (
- 1997) The effect of metamorphosis on the repeatability of maximal locomotor performance in the Pacific tree frog
*Hyla regilla*. Journal of Experimental Biology 200, 2663–2668. ( - 1969) Evolution and the Genetics of Populations, Vol. 2. The Theory of Gene Frequencies. University of Chicago Press, Chicago. (