School of Biology, University of St Andrews.


Adaptive evolution occurs when fitness covaries with genetic merit for a trait (or traits). The breeder’s equation (BE), in both its univariate and multivariate forms, allows us to predict this process by combining estimates of selection on phenotype with estimates of genetic (co)variation. However, predictions are only valid if all factors causal for trait-fitness covariance are measured. Although this requirement will rarely (if ever) be met in practice, it can be avoided by applying Robertson’s secondary theorem of selection (STS). The STS predicts evolution by directly estimating the genetic basis of trait-fitness covariation without any explicit model of selection. Here we apply the BE and STS to four morphological traits measured in Soay sheep (Ovis aries) from St. Kilda. Despite apparently positive selection on heritable size traits, sheep are not getting larger. However, although the BE predicts increasing size, the STS does not, which is a discrepancy that suggests unmeasured factors are upwardly biasing our estimates of selection on phenotype. We suggest this is likely to be a general issue, and that wider application of the STS could offer at least a partial resolution to the common discrepancy between naive expectations and observed trait dynamics in natural populations.

In the absence of genetic constraint, evolutionary change is the expected outcome of natural selection, which can itself be defined as occurring when phenotypic variation causes variation in fitness (Endler 1986). Thus, adaptive evolution will generally occur if heritable traits have causative effects on fitness, although for any single trait our expectations for change are also contingent on patterns of genetic covariance with other traits under selection. Conceptually, if a heritable trait causes fitness variation, then breeding values for that trait (i.e., the influence of an individual’s genes relative to population trait mean; Falconer 1981) become breeding values for fitness. Consequently, the phenotypic distribution of the trait will change within a generation, and the resultant change in the distribution of the breeding values will transmit phenotypic change to future generations.

Quantitative genetic theory provides us with two key models for predicting the evolution of phenotypic traits under selection. The first, and most commonly used, is the breeder’s equation (BE) (Lush 1937), which in its univariate form (UVBE) is


where inline image is the predicted change in trait mean phenotype after one generation, inline image is the heritability of the trait (defined as the ratio of additive genetic inline image to total phenotypic inline image variance), and inline image is the selection differential, defined equivalently as the change in mean phenotype within a generation or as the phenotypic covariance of the trait with relative fitness inline image. Less well known, at least to empiricists, is a second model known alternatively as the Robertson–Price identity or Robertson’s secondary theorem of selection (STS) (Robertson 1966, 1968; Price 1970). This states that the evolutionary change is equal to the genetic covariance of a trait with relative fitness such that:


These two equations follow directly from the definition of adaptive evolution given above, and are equivalent when the causative effect of the trait on fitness is solely responsible for the covariance between the trait and fitness (Queller 1992; Morrissey et al. 2010). In other words this is when the statistical relationship between the trait and fitness, estimated as inline image in the BE, arises completely from the fact that the trait variation causes fitness variation. If this is not the case and additional factors influence the covariance between the trait and fitness, a multivariate version of the BE (MVBE; Lande 1979) is required in which:


where inline image is a vector of population mean phenotypic values, inline image and inline image are the genetic and phenotypic variance–covariance matrices, and inline image is a vector of selection differentials. This multivariate form of the model is predictive, but requires the important assumption that all the factors (which may be additional traits or environmental effects such as food availability or temperature) contributing to the trait-fitness covariance are known, measured, and included (Morrissey et al. 2010). Thus, the key assumption of the BE— in both univariate and multivariate forms—is that the trait-fitness covariance is caused by the trait(s) in the model and that there are no “missing traits” (Queller 1992; Hadfield 2008; Morrissey et al. 2010). Under this condition the shift in breeding values, and consequently in population mean phenotypes must follow according only to patterns of genetic variation and covariation captured in inline image.

The STS, on the other hand, requires no assumption about causation to be made (Morrissey et al. 2010). Instead trait evolution is predicted by directly estimating the genetic component of the association between a trait of interest and relative fitness. In fact, despite its name, the STS need not actually be about selection on the focal trait at all. For instance, covariance of focal trait breeding values and relative fitness could arise from selection on a genetically correlated trait (whether measured or not). As such the secondary theorem is arguably more about genetics than it is about selection, especially because the genetic covariance of a trait with fitness can actually be nonzero in the complete absence of selection (e.g., in the case of evolutionary change caused by drift). The important point is that the secondary theorem provides a direct prediction of evolutionary change. However, this prediction, for better or worse, is not generated in conjunction with any insight as to the true form of selection.

We have recently proposed that the STS might provide a more robust predictor of evolutionary change than the BE in studies of natural populations (Morrissey et al. 2010). The core argument for this is that the genetics, physiology, behavior, and ecology of phenotypic variation and fitness variation are so complex that we are unlikely to identify, much less meaningfully measure and adequately model, all of the factors contributing to trait-fitness covariation. Consequently, our estimates of selection acting on individual traits are often likely to be biased (Rausher 1992; Stinchcombe et al. 2002; Kruuk et al. 2003), in which case predictions from the STS should be more robust than those obtained from the BE. If this view is correct, application of the STS may offer a resolution to the problem that while phenotypic traits frequently appear to be selected (Kingsolver et al. 2001; Kingsolver and Pfennig 2004) and heritable (Roff and Mousseau 1987), there is a paucity of evidence for evolutionary responses in intensively studied populations (Merilä et al. 2001), and indeed phenotypic trends in trait values are often counter to expectations based on the direction of selection (Gienapp et al. 2008).

Here we test this proposal with an analysis of morphological trait variation in Soay sheep, Ovis aries, from the island of Hirta, St. Kilda, Scotland. Many morphological traits in this population have a partial genetic basis of variation (e.g., Robinson et al. 2006; Wilson et al. 2006), and also covary phenotypically with fitness such that they are subject to apparent natural selection (e.g., Preston et al. 2003; Milner et al. 2004). Body size in particular is heritable (Milner et al. 2000; Wilson et al. 2007), and is positively related to survival (Coltman et al. 1999b; Milner et al. 2004 ) and reproductive success (Coltman et al. 1999a; Preston et al. 2003). However, the body size of Soay sheep has actually declined during the course of the intensive ongoing individual-based study (Wilson et al. 2007; Ozgul et al. 2009). This decline has occurred in part because of within-generation changes in phenotypic distributions of body size, possibly influenced by a partial relaxation of viability selection (Ozgul et al. 2009). At the genetic level, it has proven very difficult to determine whether mean genetic merit of the population has also changed. Wilson et al. (2007) reported a small upward trend in breeding values, but see Hadfield et al. (2010) for a criticism of the methods by which statistical uncertainty was evaluated (the trend is nonsignificant and small in magnitude). Consequently, this system provides an ideal opportunity to test whether empirical application of the STS will (1) produce different predictions of adaptive phenotypic evolution than the BE and (2) whether predictions of the STS are more consistent with the general pattern of stasis in selected phenotypic traits that predominates in data from long-term individual-based field studies (Merilä et al. 2001). We apply the animal model (Henderson 1973; Kruuk 2004) to estimate genetic parameters associated with both the BE and the STS. Specifically, this allows direct estimation of parameters such as heritabilities, covariances of traits with relative fitness, that is, selection differentials, and the genetic covariances of traits with relative fitness. Consequently, we can evaluate both BE-based predictions of evolutionary change and predictions based on the STS in a common framework. This allows unbiased estimation of key parameters in equations (1), (2), and (3), including and in addition to implementation of statistical tests advocated by Rausher (1992).


We applied linear mixed effect models, specifically various implementations of the animal model (Henderson 1973; Kruuk 2004), to estimate the parameters of the UVBE, MVBE , and the STS for four morphologial traits in Soay sheep. All models were fitted by restricted maximum likelihood using asreml (Gilmour et al. 2002). We first summarize the methodology relating to data collection and the estimation of pedigree before describing the quantitative genetic modeling in full detail.


The Soay sheep population inhabiting Village Bay on the island of Hirta, St. Kilda, has been the subject of intensive, individual-based study since 1985. Each year, extensive censusing and field work is conducted during which the majority of the lambs born in the study area are caught, individually tagged, and tissue samples are obtained to allow the determination of paternity by molecular methods (described below). Each August, a large proportion of the study population is captured and phenotyped for multiple traits, so that multiple measurements may be available for an individual across different years of its life. Here we focus our analysis on weight (in kg, number of observations, ninline image = 4337, number of individuals, ninline image = 1829); length of the hind leg, measured as metatarsus length (in mm, ninline image = 4618, ninline image = 2206); horn length (in mm, ninline image = 1496, ninline image = 878); and scrotal circumference (in mm, ninline image = 322, ninline image = 114). Note that while many females have horns we treat the latter two traits as sex-limited (i.e., consider horns in males only) and furthermore limit our analysis of horn-length to those males with the “normal” morph (i.e., large, strong, curled horns that do not break during fights; Clutton-Brock et al. 1997).

The pedigree information required to parameterize the quantitative genetic models was obtained through a combination of observation (4373 maternities assigned), and molecular paternity assignment using microsatellite and allozyme marker data analyzed with the R package MasterBayes (Hadfield et al. 2006). Horn type and linear and quadratic terms of age were included as additional predictors of paternity in an approach similar to that recently described by Walling et al. (2010). Briefly, we simultaneously analyzed paternity of all lambs born between 1985 and 2009, using the molecular data as previously described (e.g., Overall et al. 2005), but with recent cohorts having been genotyped at a core set of 18 microsatellite loci, and also using unique vectors of candidate sires constructed for each year based on the total set of males known to be alive. MasterBayes estimates the pedigree and the effect of the phenotypic predictors of paternity jointly, providing posterior distributions for both. As expected from previous reports (Pemberton et al. 1999; Robinson et al. 2006), age (both linear and quadratic terms) and horn type are highly significant predictors of paternity (i.e., the posterior distributions of associated coefficients do not overlap zero). We categorically assigned paternity of all lambs for which a particular candidate sire was specified in at least 80% of the samples of the posterior distribution of the pedigree. Our pedigree analysis thus generated 2253 assignments of paternity. The mean individual-level posterior support for these assignments is approximately 98% (see Walling et al. 2010 for detailed discussion of the important distinction between individual- and pedigree-wide level statistical confidence in parentage assignments); this statistic is comparable to the assignment thresholds implemented in the program cervus (Marshall et al. 1998; Kalinowski et al. 2007), which has previously been used for pedigree estimation in this and many other such study systems.

We calculated lifetime breeding success (LBS) as our metric of fitness for both males and females. This quantity was defined as the number of newborn lambs attributed to an individual, as determined either by observation for females or by paternity assignment (as described) for males. LBS was only calculated for individuals born in the study area between 1985 and 2002, but with LBS calculated from lambs born up until and including 2010. This was to reduce bias in estimated fitness of individuals that only visit the study area occasionally, and to reduce censoring bias in those cohorts from which many individuals are still alive. Relative fitness inline image was calculated by dividing LBS for each individual by the appropriate sex-specific mean. Note that the true mean LBS must be equal in the two sexes and is known to be slightly greater than 2 because the population has increased in size over the course of the study. However, as life-history data are not complete for all individuals, especially males, this treatment should provide the closest approximation of true relative fitness. In total, estimates of LBS (and thus inline image) are available for 1107 females and 1007 males, including 392 females and 114 males with nonzero estimated LBS.


We first tested for genetic variation in relative fitness, because in the absence of genetic variance for fitness, genetic covariances of traits with fitness are undefined. We fit the univariate animal model


where inline image, the population mean, is the only fixed effect, and where inline image, inline image, and inline image are design matrices relating observations to the random effects of natal year inline image, breeding value inline image, maternal identity inline image respectively. inline image is an identity matrix and inline image is a vector of residual errors. All random effects (and residuals) are assumed to be normal. Elements of inline image are further assumed to be drawn from inline image, where G is the additive genetic variance covariance matrix (or rather the additive genetic variance in the univariate analyses) and A is the pedigree-derived additive genetic relatedness matrix. For brevity, we refer to G and its elements as “genetic,” and other components of variation as “environmental,” although strictly, G is expected to represent additive genetic variation, and nonheritable genetic variation, such as dominance and epistatic (co)variances are included in other modeled terms, such as permanent environment effects (see below) and residual terms. We tested the significance of each random effect by comparing the likelihood of this model to one in which the appropriate random effect was omitted. This was done using likelihood ratio tests assuming that the test statistic inline image is inline image distributed with one degree of freedom.

We also applied univariate animal models to partition phenotypic variation in each of the four morphometric traits among potential causal sources of variation, and particularly, to estimate additive genetic variances. For each adult (age one or older) morphometric trait, we then fitted the model


where inline image is a design matrix relating individual observations to the fixed effects inline image, which consisted of the population means and multilevel effects of age. Fixed effects included the population mean and age as a multilevel factor (a separate level for each age class in years). For weight and hind leg length only, additional fixed effects of sex and a sex-by-age interaction were included. Random effects were as for equation (4), but with the addition of measurement year inline image and a permanent environment effect inline image to account for the nongenetic component of individual-level repeatability. Statistical significance of random effects was assessed by likelihood ratio tests as described above.


We used bivariate animal models to estimate the key parameters for both the UVBE (the selection differentials, or phenotypic covariances between focal traits and relative fitness) and the STS (the genetic covariances between trait and fitness). In each of these models, relative fitness and one of the phenotypic traits were treated together as dependent variables


This model is very similar in construction to that specified by equation (5), but the vectors of random effects are replaced by matrices describing variation in both dependent variables. Other than the mean, no fixed effects were fit for inline image, whereas the year and residual variance components for inline image were constrained to be zero. Consequently, residual variance in relative fitness is represented in inline image, allowing estimation of the covariance between relative fitness and the nongenetic component of individual-level repeatability (i.e., permanent environment component) of inline image. This is not to imply that we can somehow calculate a metric of the repeatable component of variation for a trait, relative fitness, that is measured only once. Rather, the biologically interesting, nongenetic aspect of the relationship between phenotype and relative fitness is between the residuals of fitness and the repeatable component of trait variation, and this constraint in the mixed model renders this relationship estimable. Critically, in this bivariate animal model with fitness and phenotypic traits treated as dependent variables, the additive genetic effects are assumed to be drawn from the distribution


This definition of a is simply an explicit bivariate description of the general definition of a following equation (4), above. We provide this second definition to highlight how the genetic covariance of a trait and relative fitness (i.e., inline image) can be obtained directly from the solution of the mixed model equations for an animal model.

We then calculated the phenotypic variance of each trait based only on those modeled effects that are associated with the individual, that is, inline image. We calculated the heritability of each trait as the quotient of inline image and this metric of the phenotypic variance. The corresponding selection differential for each trait was then estimated as the phenotypic covariance between trait and fitness, inline image, and predicted responses from the UVBE determined as inline image. The procedure by which standard errors of these predictions were generated (and standard errors of the MVBE, see below) is provided in the appendix.

To test whether the conditions under which the UVBE is predictive hold, we applied the test recently advocated by Morrissey et al. (2010; based on equations presented in Hadfield 2008 and Queller 1992). Specifically we compared, for each trait, the likelihood of the model specified by equation (6) to one in which we constrained the model such that inline image, where inline image are regression coefficients of inline image on inline image (assuming the test statistic inline image, is inline image distributed with one degree of freedom). When the assumptions of the BE are met, that is, the trait in question is the sole cause of its covariance with relative fitness, the genetic and phenotypic regressions of relative fitness on phenotype will be equal. Thus, this test treats the case where the conditions under which the BE is predictive of evolutionary change as the null hypothesis. This is a quantitative implementation of the type of analysis suggested by Rausher (1992), directly testing for differences between the relationship between phenotype and fitness at the phenotypic and genetic levels. This test is unbiased by assumptions made when working with predicted breeding values (Postma 2006; Hadfield et al. 2010), which have been problematic in the tests reported to date. This constraint can be imposed using ASREML by specifying the variance structures with the CHOL command (ASREML user guide release 2, p. 124), in conjunction with cross-covariance structure constraints (ASREML user guide release 2, p. 137). Hence, if the unconstrained model is justifiably better, the BE is shown not to be predictive. We similarly fitted constrained models where either inline image alone, or both inline image and inline image, were constrained to values of zero to assess the statistical significance of the genetic and phenotypic (or among-individual) covariances of each trait with relative fitness. The former is a direct test of the predicted evolutionary change under the STS (against a null hypothesis of zero change), whereas the latter tests the significance of the selection differential used to parameterize the UVBE.


To apply the MVBE, we used a multivariate animal model to determine the additive genetic variance–covariance structure among all four morphometric traits (inline image). We fitted a model structure equivalent to that specified in equation (5), but where inline image is a matrix containing all records from all years for all individuals for all traits. To obtain estimates of the form of multivariate selection on the morphometric traits for which we have repeated measures, we also fitted a mixed model equivalent to that specified by equation (6), where inline image again contains records of all traits, in addition to records of estimated relative fitness. However, in this case we did not include the additive genetic random effect because our goal was to estimate the vector of phenotypic selection differentials (to parameterize the MVBE); because the “permanent environment” effect for each individual therefore represents all differences, genetic or otherwise, between individuals, we refer to it as id in this model, to distinguish it from pe, above. We defined the entries in inline image relating to covariances between relative fitness and the repeatable component of phenotypic variation in each of the traits as inline image (the vector of selection differentials). We defined the repeatable individual (i.e., arising from genetic and permanent environment effects combined) and residual variance–covariance structures for the morphometric traits as inline image and inline image, respectively. We were then able to estimate the vector of selection gradients as


which is conceptually equivalent to a standard multiple regression analysis, but appropriate when repeated measures are available for the traits of interest. We include inline image in equation (8) for comparability to the situation in which a single measurement of the morphometric traits had been made, in which case the phenotypic variance would be the combination of the within- and among-individual variances, which we are able to separate in the current context. Having thus estimated inline image and inline image, we obtained multivariate BE-based predictions, inline image, of evolutionary change for the morphological traits by applying the MVBE (in a version called the “Lande equation”; Lande 1979, 1982; Lande and Arnold 1983)



We detected significant (additive) genetic variance for LBS (scaled in the analysis as relative fitness), our measure of individual fitness in the Soay sheep (Table 1). Birth year, fitted as a random effect, is also significant source of variance, and was therefore included for all subsequent analyses in which fitness was modeled. We did not detect a significant maternal effect on LBS, and indeed the maternal variance estimate was constrained at the lower limit (i.e., very close to zero) when estimates were constrained to positive parameter space. On this basis we chose to simplify model fitting by excluding maternal effects on fitness in subsequent analyses. We also detected significant additive genetic variance for the four morphological traits tested: adult weight, leg length, horn length, and scrotal circumference (Table 2). The other fitted random effects were statistically significant sources of variance for all traits, except for scrotal circumference, where both the nongenetic component of individual repeatability (permanent environment effect) and the maternal effects were nonsignificant and estimated to be very small.

Table 1.  Univariate animal model-based genetic analysis of relative fitness in Soay sheep (Ovis aries) on St. Kilda. The maternal effect was bounded at zero, and therefore we do not report its parameters; maternal effects on relative fitness were excluded from subsequent analyses. inline image is the variance of each component, and inline image is the proportion of the total phenotypic variance attributable to each component. Statistical significances were not calculated for the proportional metrics (i.e., heritability, etc.) nor for the residual or total phenotypic variance.
Componentinline image1 SEinline image1 SE
  1. *Pinline image 0.05; **Pinline image 0.01; ***Pinline image 0.001.

Birth year3.27 ± 1.32***0.0549 ± 0.0211
Genetic1.55 ± 0.868**0.0259 ± 0.0145
Residual54.9 ± 1.89 0.919 ± 0.0255
total phenotypic variance59.7 ± 2.17
Table 2.  Univariate animal model-based genetic analysis of adult morphometric traits in Soay sheep (Ovis aries) on St. Kilda. Variance component (inline image) subscripts are as follows: inline image, phenotypic variance conditional on the fixed effects; inline image, the total variance associated with individuals; inline image the nongenetic component of individual variance; inline image, the genetic component of the individual variance; inline image, birth year; inline image, measurement year; inline image, mother, and inline image, the residual variance. Statistical significances of the residual variance, and the compound variance terms (i.e., inline image and inline image) were not calculated.
Traitinline image±1 SEinline image±1 SEinline image±1 SEinline image±1 SE
  1. *Pinline image 0.05; **Pinline image 0.01; ***Pinline image 0.001.

Weight29.9 ± 2.009.65 ± 0.6518.10 ± 0.7***1.50 ± 0.70**
Leg length1.3.102±10.667.9 ± 3.249.3 ± 3.7***18.60 ± 4.2***
Horn length4.6.103±5.2.1032.1⋅ 103±1.8⋅ 1021.4⋅ 103±2.2⋅ 102***6.6⋅ 102±2.6⋅ 102**
Scrotal circ6.7⋅ 102±64.81.7⋅ 102±31.82.3⋅ 10−4±1.5⋅ 10−51.7⋅ 102±31.8***
Traitinline image±1 SEinline image±1 SEinline image±1 SEinline image±1 SE
Weight4.0 ± 1.3***4.6 ± 1.5***1.5 ± 0.4***10.2 ± 0.3
Leg length32.0 ± 10.2***0.8 ± 0.3***8.4 ± 2.1***19.9 ± 0.6
Horn length1.4⋅ 103±5.0⋅ 102***2.1⋅ 102±78.0***3.4⋅ 102±1.4⋅ 102**5.9⋅ 102±33.9
Scrotal circ21.6 ± 17.5*1.3⋅ 102±57.1***3.8⋅ 10−5±2.6⋅ 10−63.5⋅ 102±24.0

For all four adult morphometric traits, we found a positive association between the repeatable component of phenotype and fitness (Table 3). This is shown in the estimated selection differentials which are highly significant for weight, hind leg length, and horn length. Note that the nongenetic component of repeatability was very small for scrotal circumference (see above), and so the estimate of selection on this trait arises almost entirely from the genetic component of trait-fitness covariance which was found to be positive, but not statistically significant (Table 3). Consequently, the UVBE predicts positive evolutionary change in all traits (Figure 1), although the predictions of change in leg length and scrotal circumference are very small (inline image2 mm/generation), and the estimate for scrotal circumference has a very large sampling error relative to the magnitude of the estimate. Estimates of per-generation change ± 1 SE are as follows: weight, inline image0.57 ± 0.25 kg; hind leg length, inline image1.58 ± 0.53 mm; horn length, inline image12.22 ± 5.33 mm; scrotal circumference, inline image1.59 ±1.33 mm.

Table 3.  Bivariate animal model-based estimates of individual-level covariation between morphometric traits and relative fitness in Soay sheep (Ovis aries) on St. Kilda. The phenotypic covariance between each trait and relative fitness, inline image, is the selection differential, and the additive genetic covariance of each trait with relative fitness, inline image, is the prediction of evolutionary change based on the secondary theorem of selection. The two-tailed probability values associated with the test of the null hypothesis of the equivalence of the regressions inline image of relative fitness of the genetic and nongenetic but repeatable components of phenotypic variance, inline image and inline image are presented in the last column. This test pertains to whether the predictions of the univariate breeders equation and the secondary theorem of selection are consistent with one another. The nongenetic component of the repeatability of scrotal circumference is fixed at the boundary (zero), and so we do not report it or P values from statistical tests that would involve this parameter.
Traitinline image±1 SEP(inline image) 
Weight7.52 ± 0.73inline imageinline image 
Hind leg length7.48 ± 1.943.40 ⋅ 10−7 
Horn length49.5 ± 11.41.09⋅ 10−7 
Scrotal circ   
Traitinline image±1 S.E.P(inline image)P(inline image)
Weight−0.011 ± 0.4070.1570.0483
Hind leg length−1.09 ± 1.280.2916.393⋅ 10−4
Horn length12.57 ± 9.080.0741.000
scrotal circ4.94 ± 4.070.121 
Figure 1.

Predictions of phenotypic evolution of adult morphometric traits in Soay sheep (Ovis aries) on St. Kilda using the univariate breeder’s equation (UVBE), the multivariate breeder’s equation (MVBE), and the empirical application of the secondary theorem of natural selection. Error bars show standard errors.

The multivariate genetic variance–covariance structure is predominantly characterized by positive relationships among the four adult sheep traits (Table 4). The only exception is a modest negative genetic correlation between leg length and horn length, which is not statistically significant as inferred from the overlap of its standard error (i.e., approximate 50% confidence interval) with zero. Although nearly all phenotypic and genetic correlations among traits are positive, the relationships are not so strong that we should reasonably consider the traits, as measured, to be biologically equivalent aspects of the phenotype. For example, the phenotypic and genetic correlations between weight and leg length are only 0.53 and 0.50, respectively, suggesting that these two measures of size capture at least partially distinct aspects of individual phenotype, and should be considered as genetically distinct traits (i.e., inline image does not equal +1). An exception to this lies in the estimated genetic correlation between weight and scrotal circumference, which is actually estimated at slightly greater than the upper limit of +1. (Note that while correlations in excess of one are not biologically interpretable as such, convergence of the multivariate animal model was only possible with inline image modeled as a completely unstructured and unconstrained matrix.)

Table 4.  Multivariate animal model-based genetic analysis of individual-associated variance components for adult morphometric traits in Soay sheep (Ovis aries) on St. Kilda. Values on the diagonal are variances, covariances are reported below the diagonals and correlations are reported above the diagonals. All values are reported ±1 SE. Traits are (left to right and top to bottom) weight, hind leg length, (male) length of normal morph horns, and (male) scrotal circumference.
(a) Phenotypic, conditional on all nonindividual associated
 17.3 ± 0.50.53 ± 0.0160.50 ± 0.0270.56 ± 0.030
 20.3 ± 1.085.0 ± 2.990.35 ± 0.0330.29 ± 0.042
100.3 ± 7.2158.4 ± 16.892350.9 ± 141.20.2828 ± 0.043
 52.2 ± 4.061.4 ± 9.64310.4 ± 50.81512.5 ± 31.26
(b) Permanent environment
 7.7 ± 0.60.86 ± 0.030.70 ± 0.070.90 ± 0.83
16.1 ± 1.245.8 ± 3.480.64 ± 0.070.69 ± 0.71
67.6 ± 8.1151.8 ± 19.91226.8 ± 174.20.19 ± 0.42
10.8 ± 4.220.2 ± 10.529.0 ± 62.818.7 ± 41.7
(c) Additive genetic
 1.2 ± 0.50.50 ± 0.150.33 ± 0.241.13 ± 0.19
 2.4 ± 1.219.1 ± 3.9−0.18 ± 0.200.54 ± 0.15
 9.2 ± 7.6−19.0 ± 20.5612.8 ± 202.40.54 ± 0.20
16.2 ± 4.0  30.7 ± 10.4174.2 ± 69.4167.3 ± 49.0
(d) Residual
 8.4 ± 0.20.14 ± 0.020.36 ± 0.030.48 ± 0.04
 1.8 ± 0.320.1 ± 0.60.25 ± 0.030.13 ± 0.05
23.5 ± 2.225.5 ± 3.8511.3 ± 28.40.26 ± 0.05
25.2 ± 2.310.6 ± 4.2107.2 ± 21.2326.5 ± 21.8

The vector inline image contains the partial regression coefficients, or selection gradients, for the (phenotypic) regression of relative fitness on the repeatable component of individual phenotype, which are positive for all traits except scrotal circumference. Based on approximately doubling the standard errors, these selection gradients are statistically significant for weight and horn length only (Table 5). The MVBE-based predictions of evolutionary change are positive for all traits (Table 5, Fig. 1). Despite the negative selection gradient for scrotal circumference, positive evolutionary change is expected for this trait, in large part as a consequence of its strong genetic covariance with body weight (Table 4), which is apparently subject to strong positive selection as estimated here.

Table 5.  Selection gradients (inline image) and multivariate breeder’s equation-based predictions of the evolution (inline image) of adult morphometric traits in Soay sheep (Ovis aries) on St. Kilda.
TraitUnitsinline image (units/inline image)±1 SEinline image (units)±1 SE
Weightkg0.436 ± 0.0600.547 ± 0.321
Hind leg lengthmm0.018 ± 0.0221.39 ± 0.817
Horn lengthmm8.70.10−3±2.90.10−35.50 ± 4.37
Scrotal circmm−9.55.10−3±5.88.10−37.60 ± 2.11

In contrast to the predictions of both the univariate and multivariate applications of the BEs, the predictions based on the application of the secondary theorem of natural selection are actually negative for weight and for hind leg length, although the estimate is very small in magnitude for weight. Furthermore, the genetic and nongenetic regressions of relative fitness on the repeatable components of phenotypic variation for weight and hind leg length are significantly different (Table 3), with the genetic regressions being smaller than the nongenetic regressions. This result indicates that, for these two traits, the conditions under which the UVBE should be predictive (and equivalent to the STS) do not hold.


We detected covariance of morphometric traits with fitness, additive genetic variance for those traits, and also additive genetic variance for fitness in Soay sheep on St. Kilda. Thus, a number of parameters have values that are consistent with potential for evolutionary change. However, the point estimates of evolutionary change as predicted by the BE, in both its univariate and multivariate forms, and by the STS, are qualitatively different for body weight and leg length. Furthermore, for these two body size traits, the genetic and nongenetic regressions of fitness on phenotype are statistically significantly different, and as such we can demonstrate that the UVBE is not quantitatively predictive. This is because these regressions will differ when the assumption of causality that is required in the UVBE does not hold. In contrast, predictions of evolutionary change based on the alternate approaches are similar for horn size and scrotal circumference. Thus, we have provided an empirical demonstration of the potential for predictions of the BE to differ from those of the STS.

With respect to the evolution of body size, the qualitative contrast between evolutionary predictions based on the BE and the STS is quite stark. The former predicts that sheep should increase in size, while stasis, or indeed evolution of smaller size, is predicted by the latter approach (Fig. 1A, B). This result suggests that the positive (phenotypic) associations between body size and fitness do not arise, at least in totality, from a causal effect of size on fitness. Consequently it seems likely that estimates of positive selection on body size are upwardly biased. This conclusion may provide some resolution to the observation that stasis or counterintuitive phenotypic trends are often found for body size traits (Merilä et al. 2001), in spite of apparently widespread directional selection and heritable variation (e.g., Réale et al. 2003; Kingsolver and Pfennig 2004).

Importantly, although the application of the STS indicates that we should not expect evolution of size, it does not tell us why. Specifically, it provides no answers to the questions of (1) what the true form of natural selection is (with respect to size and/or other traits of interest)? or (2) what genetic constraints may be acting to generate the lack of predicted evolution? It may be that body size is not selected, but rather that an unmeasured trait or aspect of the environment is a causative agent of both fitness variation and of trait (size) variation, but that this is independent of any effect of size on fitness. In this scenario, size is heritable and covaries with fitness, but there is in fact no selection (the scenario primarily advocated by Rausher 1992 and Kruuk et al. 2003). Alternatively, size may be positively selected, that is, have a causative effect on fitness, but genetically correlated traits may be antagonistically selected, a situation consistent with the notion that different aspects of the way body size is related to fitness variation should generate trade-offs (Blackenhorn 2000). In both scenarios, which are not mutually exclusive, the model of selection, genetics, and ultimately evolutionary change provided in the BE is inadequate because of unmeasured quantities (traits, or environmental variation at the level of different experiences of individuals).

The evolution of horn length and scrotal circumference have not been studied in detail in Soay sheep. Here, the predictions of evolutionary change are more consistently positive across the three models, although we note all predictions are associated with substantial statistical uncertainty. Detailed further analyses, similar to those that have been conducted for body size (Wilson et al. 2007; Ozgul et al. 2009), will have to be conducted for these traits before we can determine whether these evolutionary predictions are consistent with the temporal dynamics of these characters. The current analysis indicates, as is the case for these two traits, that the evolutionary predictions of the UVBE and the STS can be consistent.

The multivariate formulation of the BE is generally presented as an improvement on the univariate form in the sense that it is biologically unrealistic to assume that natural selection acts on single traits in isolation. Although the “missing trait” problem still persists (i.e., one can always posit the existence of an additional factor that should have been included), it seems intuitive that multivariate models must be preferable to univariate models in the sense that at least there will be fewer missing factors. However, there is also a hidden danger in the application of the MVBE. The danger arises from the fact that an inadequate model of selection for one trait will not only result in erroneous evolutionary predictions for that trait (as it would in the application of the UVBE), but that error may also be propagated to predictions for genetically correlated traits included in the analysis. Thus, it is entirely possible that UVBE-based evolutionary predictions may be more accurate than those of the MVBE. Although of course we do not know the “truth” here, it is notable that the predictions of evolutionary change for horn length are similar from UVBE and STS but less so from MVBE. This discrepancy arises from the negative genetic correlation of horn length with leg length, and our model of selection of leg length that we suspect may be inadequate, as discussed above. Similarly, the apparent over-prediction of evolutionary change of scrotal circumference by the MVBE, relative to the prediction based on the STE, may arise from an inadequate model of selection of body mass, combined with the strong genetic correlation of body mass and this trait.

It is of course somewhat unsatisfying to argue that (1) the predictions of the BE and the STS are qualitatively different, and (2) that the STS does not predict evolution of body size, when there are clearly very substantial uncertainties associated with all quantitative predictions (Fig. 1). Nevertheless, further dissection of the heritable and nonheritable components of the relationships between the traits and fitness appears useful in supporting our contention. The test for the inequality of the genetic and nongenetic regressions of fitness on the phenotypic traits provides us with more statistically justifiable grounds for the argument that the UVBE and the STS predictions differ for weight and leg length (Table 3). We hope that this test will prove a particularly useful addition to the set of tools available for studying the genetics, selection, and evolution of traits in natural populations.


Empirical application of the STS is in its infancy. The application of mixed models where both relative fitness and phenotypic traits are treated as dependent variables is a fairly novel methodological approach in this respect (but see Etterson and Shaw 2001; Kruuk et al. 2002; and Morrissey and Ferguson 2011), and to our knowledge, this is the first such application to calculate the phenotypic covariance of a repeated measures trait with relative fitness to apply the BE. A number of reports exist from studies where fitness has been treated as a dependent variable in a mixed model to estimate the genetic variance of fitness (e.g., Price and Schluter 1991; Merilä and Sheldon 2000; Coltman et al. 2005; Teplitsky et al. 2009). Thus, this methodology provides a pragmatic way forward, but several considerations should be noted, and additionally, a number of avenues for refinement of the methods are worth highlighting.

First, some discussion of previous similar analyses is worthwhile. Etterson and Shaw (2001) estimated genetic covariances between traits and relative fitness as predictors of evolution in data from a field experiment, and their approach was directly motivated by Robertson (1966) and Price (1970). In discussing their estimates, however, they suggested that their results might be different had other, unmeasured, traits been included in their analysis. Aside from the general tendency of models with different data to give numerically different results, we feel that their estimates should hold, because the genetic covariances of traits with relative fitness should hold as evolutionary predictors, regardless of whether they are generated by direct or indirect selection. A number of estimates of the genetic basis of trait-fitness relationships have been based on the regression of fitness on predicted breeding values (e.g., Kruuk et al. 2001, 2002). These have been informative in that they have demonstrated the potential for trait-fitness relationships to differ at the phenotypic and genetic levels. However, this approach is biased, potentially severely, toward the form of the environmental relationship between traits and fitness, and thus can be biased both in magnitude and direction (Postma 2006; Hadfield et al. 2010). This bias arises from the fact that errors in the prediction of breeding values represent a portion of the environmental variance. Finally, other investigations of the genetic basis of phenotype-fitness relationships have been less quantitative, at least with respect to the key parameter of the STS; Robinson et al. (2006), Morrissey and Ferguson (2011), and Kruuk et al. (2002) made genetic inferences, with respect to selection and evolutionary change, of the genetic basis of phenotype fitness relationships based on correlations of traits with absolute, rather than relative, fitness. These tests were thus not quantitatively predictive of evolutionary trajectories, but were unbiased.

The fitting of mixed models of the type used here relies on an assumption of normality of residuals. Because it is not always clear exactly what the consequences of violating this assumption will be, it is best to design studies in such a way that comparisons are made among quantities that are estimated in similar ways. The key comparisons that we present are designed this way: the covariances of traits with fitness that are at the heart of the BE (both univariate and multivariate) and the STS, that is, the phenotypic and genetic covariances of the traits with relative fitness, were all estimated in models with the same assumptions about the distribution of fitness residuals. Similarly, the comparisons of the genetic and nongenetic regressions of fitness on the traits were conducted in such a comparable manner.

However, it is possible that quantitative inferences of the sort we report here could be obtained from models that allow more sophisticated treatment of fitness distributions. Multivariate generalized linear mixed models (Bolker et al. 2009) should provide a means of testing relationships among nonnormal dependent variables, and allow estimation of covariance components between normally and nonnormally distributed traits. This is possible by assuming normality on an imaginary underlying scale, and linking observed data to this scale with a convenient “link” function. Sources of covariance among traits, or in the current context, among traits and fitness can then be estimated on this imaginary (but very useful) scale. Of course natural selection is probably best thought of as acting on the observed phenotype, rather than on some statistically convenient link scale, although this problem is not insurmountable. Both exact (Rice 2004) and numerical approximation techniques could be devised to recover observed-scale parameters from generalized linear mixed models. Recovering estimates of (observed) trait-relative fitness covariances may not be a particularly complex task, although evaluating differences between the genetic and nongenetic regressions of fitness on traits may be much more difficult. Additionally, because fitness is commonly measured as a count (e.g., LBS), it will not be normally distributed (and neither will be its residuals from any plausible model we can readily conceive of). However, it may not readily conform to a Poisson distribution either, but rather may often require a distribution that allows overdispersion (e.g., overdispersed Poisson). Again this is possible in a generalized linear mixed model framework, but derivation of the critical parameters on the observed scale will be more difficult still. Currently the most sophisticated available software for fitting generalized linear mixed models, the R package MCMCglmm (Hadfield 2010) cannot readily accommodate multivariate models where some traits have repeated records whereas others (e.g., fitness) do not. We also note that the numerical problems associated with these kinds of models are enormous in the most flexible framework in which fitting these generalized linear mixed models might be conducted, that is, the BUGS (Lunn et al. 2000) or JAGS (Plummer 2010) Bayesian programming environments.

Finally, it is of note that there is little tradition in evolutionary quantitative genetics of reporting uncertainties associated with evolutionary predictions, whether obtained by application of the BE, or otherwise. Our attempts to calculate standard errors around predictions of evolutionary change should be considered approximations for a number of reasons. First, our estimated standard errors are all based on normal approximations of the sampling error. Second, our estimates for the UVBE assume independence between the estimates of heritability and selection (and estimates of the MVBE assumed independence between G and inline image). Third, and perhaps most importantly, all estimates are derived from mixed models that assume normality of residuals, as discussed above. Consequently, the standard errors we report here should not be used for hypothesis testing. However, we have included them to provide some feel of the relative uncertainties associated with the different estimated parameters and predictions that we present. In that context, it is certainly of note that the uncertainties in the evolutionary predictions based on the STS are not vastly greater than those based on the BE.

Associate Editor: M. Reuter


We thank the National Trust for Scotland and Scottish Natural Heritage for permission to work on St. Kilda, and the Royal Artillery Range (Hebrides) and QinetiQ for logistic support. We thank J. Hadfield and I. White for useful comments and discussions. The Soay sheep data were collected primarily by field assistants J. Pilkington, A. MacColl, T. Robertson, J. Kinsley, and many volunteers. The long-term data collection on St. Kilda has been supported by the Natural Environment Research Council, the Wellcome Trust, the Biotechnology and Biological Sciences Research Council, and the Royal Society. MBM is supported by a postdoctoral fellowship from the Natural Sciences and Engineering Research Council of Canada, LEBK was supported by a University Research Fellowship from the Royal Society, AJW is supported by a David Phillips Research Fellowship from the BBSRC, and PK is supported by a Marie Curie International Training Fellowship.



Unless specifically noted here, all standard errors were obtained directly from asreml, which uses the delta method (Lynch and Walsh 1998). For example, the standard errors of inline image, inline image and inline image were obtained this way, fully accounting for sampling variances of each parameter and the sampling covariances among the component parameters. No methods are implemented in asreml to obtain the standard errors of the product of the heritability and the selection gradient, and so we obtained approximate standard errors (indeed they are all approximate) assuming that squared relative standard errors are additive, that is,


where we use inline image followed by square brackets to denote sampling error. This approximation of the standard error of inline image assumes that the sampling covariance between inline image and inline image is zero. Although this may not be the case, we do expect it to be small in magnitude relative to the sampling variances of the terms in the summations (i.e., sampling variances of components of inline image and inline image). It is therefore our expectation that this should be a useful approximation for the standard error on the predicted selection response.

Similarly, no automated methods are available for calculation of standard errors for predictions of the MVBE, so we adopted a Monte Carlo simulation approach. We approximated the standard errors for the estimates of inline image and inline image by simulating 1000 replicate multivariate normal matrices each of inline image and inline image, using the maximum likelihood estimates of these matrices as the mean vectors, and the variance–covariance matrices of their component parameters as the variance. Specifically, we sampled


where the vector symbol (inline image) denotes a vector of the elements of a variance–covariance matrix, where inline image indicates a parametric bootstrap replicate sample of the vector of parameters, where the bar indicates the mean estimates, and where inline image indicates the sampling variances and covariances of the parameters. We then applied equations (8) and (9) to each of the 1000 sets of bootstrap matrices and took the standard deviations of the resulting 1000 estimates of inline image and inline image to be the standard errors.