Repeatability estimates do not always set an upper limit to heritability



  • 1 The concept of repeatability, the measurement of consistent individual differences, has become an increasingly important tool in evolutionary and ecological physiology. Significant repeatability facilitates the study of selection acting on natural populations and the concept has several practical implications for identifying traits.
  • 2 When properly defined and measured, repeatability can set the upper limit to heritability. This is potentially a very useful interpretation of the repeatability of traits measured on natural populations because often, estimates of heritability cannot be obtained. Many recent reports of repeatability of individual differences for traits have made this interpretation.
  • 3 However, repeatability estimates may not set an upper limit to heritability if: (a) measured traits are not genetically identical, (b) common environmental effects work in opposition to direct genetic effects, (c) the temporary environments for each trait are negatively correlated, (d) significant genotype–environment interaction is present, or (e) the traits are influenced by maternal effects.
  • 4 The quantitative genetic theory that defines the concept of repeatability is reviewed and implications of violations of the five assumptions are discussed in the context of interpreting repeatability as an upper estimate to heritability.


The concept of repeatability, the proportion of total variance in multiple measurements of a trait that is due to differences among individuals, is a useful tool for quantifying the extent to which an individual’s performance or behaviour remains consistent over time (Bennett 1987; Lessells & Boag 1987; Boake 1989; Arnold 1994; Hayes & Jenkins 1997). Statistically significant repeatability estimates for ecologically relevant behaviours and physiological traits over periods of hours, days and years have now been reported for many natural populations (e.g. Arnold & Bennett 1984; Garland 1985; Huey & Dunham 1987; Djawdan & Garland 1988; Boake 1989; Van Berkum et al. 1989; Hayes & Chappell 1990; Jayne & Bennett 1990; Martins 1991; Austin & Shaffer 1992; Arnold, Peterson & Gladstone 1995; Chappell, Bachman & Odell 1995; Clark & Moore 1995; Godin & Dugatkin 1995; Kodric-Brown & Nicoletto 1997; Watkins 1997; Dohm et al. 1998; Hayes, Bible & Boone 1998; Hayes & O’Connor 1999; Rhodes et al. 2000; Dohm et al. 2001). Reports of lack of repeatability for behavioural or performance traits seem to be less common, but include mating preferences in Guppies (Kodric-Brown & Nicoletto 1997), push-up displays in Sagebrush Lizards (Martins 1991), running speed in Golden Marmots (Blumstein 1992), field metabolic rate (FMR) in both the Pouched Mouse (Speakman et al. 1994) and the Meadow Vole (Berteaux et al. 1996; Berteaux & Thomas 1999), and body mass in a high-altitude population of Deer Mice (Hayes & O’Connor 1999). Findings of low repeatability led some authors to raise the important issue of the ecological relevance and evolutionary significance for traits that apparently do not show consistent individual differences.

What, if anything, should be made of a given repeatability estimate? In particular, what does low repeatability for a trait such as mating preferences or FMR tell us? For traits with high repeatability (close to one), three interpretations are generally made. First, repeatability is said to set an upper bound for the broad- and therefore narrow-sense heritability of a trait because repeatability includes genetic and environmental sources of variation whereas heritability includes only genetic differences among individuals (Boake 1989; Falconer & Mackay 1996; Lynch & Walsh 1998). Second, significant repeatability may be an important determinant of how effective natural selection will be on changing the trait over time because of its relationship to heritability (Huey & Dunham 1987; Boake 1989). Third, high repeatability indicates that individuals tend to perform consistently and therefore there may be little practical reason to obtain multiple measurements (Arnold et al. 1995; Falconer & Mackay 1996). Conversely, low repeatability (significantly less than 1) may suggest practical problems associated with the measure (Boake 1989; Falconer & Mackay 1996). For example, low repeatability may indicate that an ecologically relevant time-frame has not been selected for the assessment of trait consistency (Arnold et al. 1995).

Based on the standard definition of repeatability (e.g. Falconer & Mackay 1996), some authors have made the reasonable, but incorrect, assumption that heritability cannot ever exceed repeatability. For example, about one-quarter of the reports cited above explicitly state that repeatability sets an upper limit to heritability. This statement is true only if certain restrictions hold and, naturally, we must distinguish between true repeatability and estimates of repeatability. I wish to emphasize that all of the restrictions on interpretations of repeatability I will describe here follow directly from Falconer’s presentation of the concepts of repeatability and heritability. However, these admonishments with respect to the interpretation of repeatability estimates are mentioned only briefly in his book (Falconer & MacKay 1996) and important points occur at different places in the text. For this reason, I believe it is appropriate to clarify explicitly the underlying genetic and environmental assumptions that must hold if repeatability estimates are to indicate upper bounds for heritability. My primary objective in this report therefore is to highlight the model assumptions that can lead to estimates of repeatability less than the narrow-sense heritability even for the case of two measures of the same genetic trait.

The standard repeatability model

Estimation of repeatability, or heritability for that matter, assumes a specific model for the partitioning of the trait’s phenotypic variance (Lynch & Walsh 1998). Falconer (Falconer & Mackay 1996) defined repeatability (ρ) as

ρ = (VG + VEg)/VP.( eqn 1)

Thus, the repeatability, or correlation between repeated measures of a trait on the same individuals results from individuals having different genotypes (VG) and general environments (VEg). The appropriate statistical model is

Pij = µ + Ai + Eij,( eqn 2)

where Pij is jth record on the ith individual for a trait, µ is a constant, Ai is the effect on the record of the genetic plus permanent environmental effects, and Eij is the random temporary environmental effects on the record Pij (Becker 1984). We assume no covariance between any two elements in A, E, or between elements in A and E. A one-way anova may be used to obtain the observed variance components and repeatability is then calculated as the intraclass correlation coefficient (Becker 1984; Lessells & Boag 1987; Hayes & Jenkins 1997). Of course, linear regression and other methods can also be used to obtain variance components for calculation of repeatability (Van Vleck 1993; Hayes & Jenkins 1997). For two measures of the same trait, repeatability can be viewed as the proportion of the difference from the mean in one measure expected in another measure on the same individual. Repeatability then can be calculated as the regression coefficient for the second measure on the first measure. When the two measures have equal variances, the product–moment correlation between records also defines repeatability (Van Vleck 1993). Note that although the expected values for the components of repeatability are the same for regression and correlation approaches, the estimates will often not be equal (Van Vleck 1993).

Falconer stated two assumptions. Repeated measures of a trait must: (1) have equal variances and (2) be measures of the same genetic trait (i.e. a genetic correlation of one among the repeated measures). He added that, if these assumptions were not valid, the concept of repeatability was ‘somewhat vague ... without precise meaning in relation to the components of variance’ (Falconer & Mackay 1996, p. 138). In practice, we may never know whether our estimates conform entirely to these assumptions even if more complicated experimental designs are used, but a variety of approaches can be employed, often with interesting results (e.g. see discussion in Boake 1989; Arnold et al. 1995; Hayes & Jenkins 1997; Watkins 1997; Burness et al. 2000; Hoffmann 2000).

The first of the two assumptions listed by Falconer is not directly related to the present topic. The requirement for equal variances applied to estimates of repeatability follows from basic anova theory (e.g. Lindman 1992). In practice, however, the bias to testing of a repeatability estimate attributed to unequal variances is probably not going to be of chief concern provided that the sample size within groups is large relative to the number of groups and/or that the variances are not substantially different (Lindman 1992). Repeatability studies, conducted over relatively short periods of time, often conform to this pattern because such studies involve at most a few measurements of the trait on dozens of the same individuals. Repeatability measured over long periods of time may be more problematic because of missing data.

A general repeatability model

Now, consider the second of Falconer’s two assumptions, whether repeated measures are estimates of the same genetic trait. For repeatability to set an upper bounds to heritability, a particular model for how individual differences are manifested must apply. The model described by Falconer is probably the most often cited for repeatability, but there are other equally valid ways to model the partitioning of within vs between individual differences (e.g. Turner & Young 1969; Van Vleck 1993; Hayes & Jenkins 1997). The important point for this discussion is that if the model for the decomposition of the trait’s phenotypic variance is incomplete (e.g. an important component of variance has been omitted from the model), then estimates of repeatability cannot be taken to indicate upper bounds to heritability. The models I present below treat repeated measures as different traits and follow general models presented in Wright (1969), although I do not include all possible paths (Wright 1969; Lynch & Walsh 1998). For clarity, I assume traits are measured without error, although measurement error could be incorporated into the models (see Lynch & Walsh 1998; Hoffmann 2000). Where appropriate, I provide examples, mostly hypothetical, to clarify the implications of specific paths or equations.

For a single trait measured on individuals from a population (Fig. 1), the simplest partitioning of total phenotypic variance (P) yields two causal components of variance: genetic (G) and environmental (E) components, where h and e are path coefficients representing direct effects of genes and environment, respectively, on trait variation. Note that G refers only to the additive genetic variance and E contains genetic (e.g. dominance) and environmental components of variance (Lynch & Walsh 1998). For two measures of the same trait (Fig. 2), let P1 and P2 represents the first and second measures of the trait, G1 and G2 the genotypes, E1 and E2 the contribution of temporary environmental effects, and Ec1 and Ec2 represent the contribution from the common environment component of variance. The path coefficients h1, h2, e1, e2, c1 and c2 account for the direct effects of genes, temporary environment and common environmental effects on phenotypic variance for the measurements. Extensions to more than two measures involves adding appropriate paths between G, Ec, E and each P. From the rules of path analysis (Lynch & Walsh 1998), the structural equation for the correlation (ρ) between the first and second measures is

Figure 1.

A simple path model for decomposition of a phenotypic trait as the sum of genetic and environmental components: P = phenotype, G = genotype, E = unique (temporary) environmental effects, and h and e are path coefficients for the direct effects of genes and environment on the trait.

Figure 2.

Extension of the simple path model for one trait (Fig. 1) to two traits or repeated measures of the same trait. The path model of two measures (P1 and P2) of the phenotype, G and E are defined as above, Ec = common environment effects, and h, e and c are path coefficients for direct effects of genes, unique environment, common environment on the phenotype. Correlations between unique environments (rE), common environments (rEc) and genes (rG) are represented by double-headed arrows.

ρP1P2 = h1rGh2 + c1rCc2.( eqn 3)

If we assume that h1 = h2, rG = +1, rC = +1 and c1 = c2, then equation 3 simplifies to

ρP1P2 = h2 + c2,( eqn 4)

where equation 4 is the general equation for repeatability. The two environmental effects, Ec and E, can best be distinguished with some examples. The general component, Ec, represents environmental influences that affect an individual’s performance permanently. For example, aerobic training significantly improves stamina in humans and other mammals, whereas dietary deficiencies often can have long-lasting effects on an individual’s performance (Astrand & Rodahl 1986). Maternal effects (e.g. milk quality, parental care) can have short- or long-term influences on offspring phenotypes (Iverson et al. 1993; Arnold et al. 1995; Margulis & Altmann 1997). In contrast, E, the unique environmental component, represents temporary differences of environment on successive performances (Falconer & Mackay 1996). For example, recent feeding induces a negative effect on burst speed performance in garter snakes (Garland & Arnold 1983).

Can repeatability ever underestimate heritability?

From Fig. 2, we see five conditions for which the true value for the narrow-sense heritability (h2) may be greater than the absolute value of repeatability (h2 > ?ρP1P2|): (1) the traits are not genetically identical, (2) common environmental effects are opposite of the genetic effects, (3) the temporary environments are correlated, (4) genotype–environment interaction is present, and (5) the traits are influenced by maternal or paternal effects.

Traits are not genetically identical

Repeated measures may reflect different traits genetically. This can be accounted for in the general model by allowing rG < +1 or by allowing the h path coefficients to differ (e.g. through random assortment in meiosis, see (Wright 1969) for details). For rG = +1, but h1 ≠ h2, then repeatability is given by

ρP1P2 = h1h2 + c2( eqn 5)

For clarity, assume the repeated measures are due solely to genetic causes (c2 = 0). If h2 < h1 (h1 and h2 ≠ 0), then h12 > |ρP1P2|. For rG < +1, but h1 = h2, then the equation for repeatability becomes

ρP1P2 = h2rG + c2.( eqn 6)

Again, let c2 = 0; if –1 ≤ rG < +1, then h2 > |ρP1P2|. Therefore, and not surprisingly, estimates of repeatability between genetically different traits can be less than the heritability for one of the traits.

Common environmental effects

Repeatability may underestimate heritability if the common environment has opposite effects on the two measures. Common environmental effects are generally thought to increase resemblance between relatives and it does not seem likely that an effect of the common environment would have positive effects on the first record, but negative effects on the repeat performance (Falconer & MacKay 1996). However, there are situations in which a shared environment can reduce resemblance within families. In natural populations, if offspring do not move away from parental territory, or must establish themselves at the periphery of the parent’s suitable habitat, a negative common environmental effect may result. In the laboratory, mice are often housed in groups of two or more per cage. If the process of removing, then reintroducing, individuals for testing disrupts the social milieu of the cage, a cage effect (common environment) may be induced. To the extent the social interactions are vigorous in some cages but not others, then it is conceivable that the effects on performance might be opposite in sign, reducing repeatability. Thus, if –c1 and +c2 or if –1 ≤ rC < 1, then repeatability can be less than heritability even if the separate records are the same genetic traits (h1 = h2, rG = +1). The equation for repeatability is now

ρP1P2 = h2 − c1rCc2.( eqn 7)

If c1rCc2 is large, then h2 > |ρP1P2|.

Temporary environments

Repeatability may underestimate heritability if temporary environments associated with each measure are negatively correlated (rE, Fig. 2). For example, odour cues are an important component of a mouse’s response to novel environments (e.g. a treadmill belt). If these cues are not removed or at least standardized across trials, mice may respond to the prior presence of individuals and this may alter their performance. Learning or acclimation to the apparatus by the individual over repeated measures is probably unavoidable. One factor, under the control of the observer, includes the time at which an individual is measured for repeated measures. However, resolution is not straightforward. Which is better: to measure individuals at the same time of day, perhaps inducing an order effect, or to randomize times thereby potentially increasing the environmental differences between repeated measures? Repeatability, assuming genetically identical traits, now must include terms for the contributions of the unique environments,

ρP1P2 = t − e1rEe2,( eqn 8)

where t = h2 + c2. If e1rE e2 > t, then h2 > |ρP1P2|.

Interaction between genotype and environment

Repeatability may underestimate heritability if there is significant interaction between genotype and environment. To the extent that differences of environment influence genotypes in a non-random way, then an interaction is present. Genotype–environment interactions can be attributed to differences of sensitivity of genotypes (individuals), or a specific difference of environment may have a greater or lesser effect on a particular genotype (Falconer & MacKay 1996). To accommodate genotype–environment interaction in our general model of repeatability, the general path model can be altered to address the interaction between genes and environment by treating the problem as a correlation between two traits (rGxE, Fig. 3). Thus, P1 and P2 are the same (genetic) traits if and only if h1 = h2, rG = +1, and rGxE = 0. Note that this is also a general model for genotype–environment interaction or phenotypic plasticity (i.e. different sets of genes are expressed in the two environments [review in Via 1994]).

Figure 3.

Path model of two measures (P1 and P2) of the same trait but permitting an interaction between genotype and environment (rGE). P, G, E, Ec, h, c and e are defined as above.

What are the effects of a model for genotype–environment interaction on repeatability and heritability? From the path model shown in Fig. 3, the relevant structural equation describing the repeatability now includes paths between genetic and environmental components:

ρP1P2 = h1rGh2 + ec1rEcec2 + e1rG2E1h2 + e2rG1E2h1.( eqn 9)

With a few assumptions (h1 = h2, rG = +1, ec1 = ec2, rEc = 1, rG2E1 = rG1E2), we can simplify equation 9, yielding

ρP1P2 = t + 2(rGEe),( eqn 10)

where t = h1rGh2 + em1rEmem2 and rGE = rG2E1 = rG1E2. Thus, the model describes repeatability between genetically identical traits. Repeatability will be less than the heritability for the trait if h2 > [em2rEm + 2(hrGEe)]. For example, if rGE = –1, repeatability will always underestimate h2. This holds even for cases in which substantial heritability and a large, positive genetic correlation between the two measures is present.

Maternal effects

Lastly, maternal or paternal effects can also lead to unexpected relationships between repeatability and heritability. Models for maternal effects include links between the offspring’s phenotype and the maternal genes and maternal environment provided to the developing offspring (e.g. Kirkpatrick & Lande 1989; Cheverud & Moore 1994). Maternal effects are a special case of common environmental effects (Falconer & MacKay 1996). A path model including direct maternal effects is illustrated in (Fig. 4), where the subscript m indicates a maternal characteristic, P1, P2, rE, rG, E, h and e are defined as before, and rGGm is the correlation between direct genetic and direct maternal genetic effects. For simplicity, I assume that maternal effects describe all common environments. Note that several potential paths were omitted from Fig. 4, including those between genetic maternal effects for one trait with direct genetic effects for the second trait (e.g. rGm1G2, rGm2G1), between maternal and unique environments (e.g. rEm1E1, rEm2E2), and correlations between genes and environments.

Figure 4.

Path model of two measures (P1 and P2) of the same trait with maternal genetic (Gm, hm, rGm) and environmental effects (Em, m, rEm) contributing to the phenotypes. P, G, E, h and e are defined as above.

With maternal effects, the structural model for the repeatability between P1 and P2 is now

ρP1P2 = h1rGh2 + em1rEmem2 + hm1rGmhm2
 + hm1rGm1G2h2 + hm2rGm2G1h1.
( eqn 11)

Given the number of terms in equation 11, it is not surprising that many combinations of signs of path coefficients and correlations can potentially lead to h2 > |ρP1P2|. To simplify equation 11, let h1 = h2, rG = +1, em1 = em2, hm1 = hm2. Again, the model describes repeatability between genetically identical traits. Repeatability under a model with maternal effects will underestimate narrow-sense heritability if one or more of the following correlations are large and negative: (1) rEm, maternal environments; (2) rGm, maternal genetic effects; or (3) rGmG, direct genetic and maternal genetic effects.

Interpreting repeatability estimates

These models and examples may seem trivial, or lack biological relevance; obviously one should not talk about the relationship between heritability and repeatability for different traits, nor should we expect a simple relationship to hold if the environmental context under which two traits are expressed is complicated. Heritability estimates ‘refer to a particular population under particular conditions’ (Falconer & Mackay 1996, p. 161). The same concern applies to estimates of repeatability. On the other hand, a tantalizing application of the repeatability concept is that it can provide a means to quantify whether multiple measures represent the same or different traits or whether substantial, genetically based differences among individuals are likely to be present prior to conducting a full quantitative genetic analysis. For example, Watkins (1997) used the concept of repeatability of maximum burst swimming speed of anuran tadpoles across metamorphic stages. No repeatability was found and these results were tentatively interpreted as evidence that there was no genetic correlation between the same trait measured at different life stages and possible environmental differences between stages were discussed (Watkins 1997). Small h2 and correspondingly large environmental variance are sufficient for reducing the repeatability among successive measurements, but low repeatability may result from a number of other genetic and environmental causes of variation among individuals (Figs 2–4).

When can repeatability set an upper limit to heritability? The answer will depend on the relative complexity of the genetic and environmental contributions to trait (co)variation. In practice, when an estimate is not statistically significantly different from 1, and the test has suitable power, repeatability may inform us about the upper bounds for heritability. For example, in laboratory house mice, multiple measures of body mass taken on adults over short time periods typically have high repeatabilities (Falconer & Mackay 1996); these repeated measures certainly represent the same genetic trait and the additional genetic and environmental assumptions of the simple repeatability model probably hold. Body mass measured on adult mice before and after a period of fasting, however, probably should be treated as different traits. For example, for 337 male and female, genetically variable laboratory house mice, the repeatability of ad libitum feed body mass and body mass measured after 24 h of fasting was 0·28 (significantly different from 1 and 0 at α = 0·05; M. R. Dohm, T. Garland Jr. & J. P. Hayes, unpublished results). Interestingly, the broad-sense heritabilities (h2B) were greater than the repeatability estimate (h2B = 0·92 for ad libitum feed body mass, h2B = 0·69 for fasting body mass, rG = +0·94, all significantly different from 0) (Dohm 1994). (Note that only the heritability for ad libitum body mass was statistically different from 0·28 at α = 0·05.) A second, more ecologically relevant example was reported by Hayes & O’Connor (1999) in their study of natural selection on aerobic capacity of high-altitude Deer Mice. Repeatability of body mass differences among individuals was surprisingly low over about 2 months (product–moment correlation = 0·29, P = 0·097). At least for laboratory populations of House Mice (e.g. Falconer & MacKay 1996) and Deer Mice (e.g. Losvold 1986), the heritabilities of various measures of body size are often greater than the field repeatability estimate for body mass reported in Hayes & O’Connor (1999). Presumably, the low field repeatability can be explained by elevated environmental noise in the field compared to the laboratory setting, but as indicated from the path diagrams (Figs 2–4), additional sources of variation may also be important.

Some modest suggestions for repeatability experiments

Certainly, no researcher would knowingly ignore environmental factors that had opposite affects on repeated measures of performance, whether of a general factor with lasting impact on performance or a unique factor, specific to each measurement of performance. Nor would a researcher knowingly apply repeatability to different traits, although a statistically significant ‘repeatability’ between different traits could be viewed as a rough test for genetic correlation (e.g. Watkins 1997). However, even after diligent effort by the researcher to control the experimental context in which the measures are conducted, there is no absolute guarantee that all relevant factors have been accounted for. This caveat is especially true when attempts are made to estimate field repeatability, the repeatability of two or more measures of a trait in a natural population. A genotype–environment interaction may result in the field context simply because recently caught individuals are very likely to respond differently to the stresses of capture and captivity (Baker, Gemmell & Gemmell 1998).

In particular, repeatability estimates may be problematic for highly plastic or strongly context-dependent traits, at least in the sense of providing a bounds for heritability using a relatively simple experimental design compared to a full-blown quantitative genetic study (cf. discussion in Boake 1989). To cite one possible example, low repeatability estimates (intraclass correlation < 0·3) for field metabolic rate, FMR, may have resulted from a failure of individuals to maintain energy balance or from differences in the activity budget during the period of study, as suggested by the authors (Speakman et al. 1994; Berteaux et al. 1996). The first explanation may represent a violation of the assumption of a lack of independence among the unique environmental effects for successive measures (equation 8, Fig. 2), whereas the second explanation may represent a genotype by environment interaction, i.e. captivity may affect individuals differently (see Fynn et al. 2001, e.g. significant FMR repeatability).

When practical, repeatability studies on plastic traits may be improved if experimental designs can be used to rule out violations of one or more of the assumptions. For repeatability of FMR, for example, one might assess consistency of individual differences for metabolic rate over spans of several days on captive mice subjected to different feeding or temperature regimens. Perhaps individuals tend to have high FMR under one treatment, but low under a second treatment, suggesting the presence of genotype-by-environment interaction. Are correlations across treatment groups universally low or heterogeneous? If the correlations are not similar, then additional causes of variation may apply and the assumptions of the standard model for repeatability may be violated. Structural equation modelling, including path analysis, provides a fruitful framework for the design and analysis of more complex analyses of repeatability (Hayes & Shonkwiler 1996).


The concept of repeatability certainly has an important place in ecological and evolutionary studies of individual variation (Bennett 1987; Clutton-Brock 1988; Boake 1989; Arnold 1994; Hayes & Jenkins 1997). However, if an incorrect genetic and environmental model is employed (i.e. any of the five cases listed above apply), then repeatability estimates may be virtually meaningless with respect to what the true heritability might be. Without simultaneous estimation of heritability, it is difficult to judge whether reported estimates of low repeatability have incorrectly been taken as evidence for the upper bounds of heritability. Based on the few available studies for which both repeatability and heritability of performance are available for natural populations (e.g. Arnold & Bennett 1984; Tsuji et al. 1989), I do not know of any example of larger, statistically significant heritability compared to repeatability estimates. On the other hand, we know virtually nothing about maternal effects or genotype–environment interaction on performance measures and studies to elucidate these pathways will be an important area for evolutionary and ecological physiologists in the near future (e.g. see Sinervo & Huey 1990; Arnold et al. 1995; Burness et al. 2000).

I do not wish to imply that small repeatability estimates are meaningless. A statistically significant repeatability estimate provides a testable hypothesis about trait heritability. The concept of repeatability is also important for defining traits and for choosing appropriate statistical models for analysis of trait variation (Boake 1989; Arnold et al. 1995; Hayes & Jenkins 1997; Hoffmann 2000). Specifically, repeatability estimates less than one may provide direction for additional tests according to the models provided in Figs 2–4. For the present, however, if the complications illustrated in the path models cannot be ruled out, then we must conclude that it is inappropriate to assume that the simple genetic model of phenotypic covariation is correct and therefore repeatability cannot be taken as an upper bound to heritability.


I thank M. Dentine and J. Hayes for helpful discussions on the concept of repeatability. J. Hayes, K. Lessells, D. Roff and an anonymous reviewer made insightful suggestions to improve earlier drafts of the manuscript.