Fluctuating (nondirectional) asymmetry (FA) of bilaterally paired structures on a symmetrical organism is commonly used to assay the developmental instability (DI) caused by environmental or genetic factors. Although evidence for natural selection to reduce FA has been reported, evidence that FA (and by extension DI) is heritable is weak. We report the use of artificial selection to demonstrate heritable variation in the fluctuating asymmetry of interlandmark distances within the wing in an outbred population of Drosophila melanogaster. Our estimates for the heritability of FA range from 0% to 1% and result in estimates for the heritability of DI as large as 20%, comparable to values typical for life-history traits. These values indicate the existence of evolutionarily relevant genetic variation for DI and the effectiveness of selection for reduced FA suggests that natural selection has not fixed all the genetic variants that would improve developmental stability in these populations.

As an organism develops it is subject to environmental perturbations. The degree to which these perturbations affect the developmental process and increase phenotypic variation is termed developmental instability (DI) (Van Valen 1962; Palmer and Strobeck 1986; Parsons 1990; Polak 2003; Leamy and Klingenberg 2005). Nongenetic variation in every trait is influenced by DI, including differences among twins and clones, developmental disorders such as cleft palate syndrome, and the exposure of latent genetic variation under stress (Rutherford and Henikoff 2003). Because DI is detected from phenotypic variation of traits with the same genotype, it is by definition an epigenetic phenomenon (Parsons 1990; Sollars et al. 2003). Nondirectional departures of paired structures from perfect symmetry, fluctuating asymmetry (FA), have been used as an indicator of DI (Whitlock 1998).

Intuition suggests that DI is deleterious to fitness and there is direct evidence that natural selection acts to minimize FA in wild populations (Martín and López 2000; Santos 2001) and humans (Gangestad et al. 2001), although the generality and strength of this relationship is controversial (Lens et al. 2002). Interest has therefore been considerable in the genetic basis of DI (Leamy and Klingenberg 2005) and in whether additive genetic variation (VA) exists that would allow DI to respond to natural selection on FA (Fuller and Houle 2003). Because the relationship between FA and DI is weak, even a very small amount of additive genetic variance in FA could indicate a very large amount of genetic variation in DI (Houle 2000; Van Dongen, et al. 2005), with correspondingly large effects on fitness.

Published evidence for additive genetic variance in FA is weak. Previous well-designed studies on the inheritance of FA using either parent–offspring regression or nested half-sibling designs have occasionally found statistically significant VA (Scheiner et al. 1991), but only at the rate expected as a result of Type I statistical error (Fuller and Houle 2003). Simulation results suggest that artificial selection on FA has much greater power to detect VA in FA (Fuller and Houle 2002), in particular when selection is for increased FA. One of the most convincing demonstrations of genetic variation in DI was obtained through artificial selection in a cross between two inbred lines (Mather 1953), although no genetic parameters were estimated. Two subsequent selection experiments have yielded equivocal results (Reeve 1960; Breuker and Brakefield 2003). We performed artificial selection to increase and decrease the FA of interlandmark distances within the wing in an outbred laboratory population of Drosophila melanogaster. Measurements of FA recorded during each generation allow us to estimate genetic variation in DI via the response of FA.



Wing measurements were made with the WINGMACHINE system (Houle et al. 2003). Flies were subjected to CO2 anesthesia and each wing briefly drawn into a suction device and a digital wing image obtained. The user then marked two landmarks on each wing that the WINGMACHINE software used in an algorithm that automatically fits a B-spline model to the locations of all the wing veins (Fig. 1). The parameters of this model were then used to calculate the appropriate selection index as described below. Before selection, the images of wings with the most extreme FA values were rechecked for splining errors and were reimaged and resplined if necessary to ensure that gross measurement errors were avoided.

Figure 1.

Measurement of FA. (A) Drosophila wing showing a B-spline model fit to the vein locations. Digital images of both wings were acquired from live flies and the model fit to the location of veins by automated image analysis (Houle et al. 2003). (B) Eight distances between vein intersections used to calculate the FA index.


We selected on an FA index based on the size-scaled asymmetries of eight distances between wing-vein intersections (Fig. 1). For the ith distance in the jth individual, size-scaled FA was calculated as FAij= 2|DijLDijR|/(DijL+DijR) where DijL and DijR are the ith distance on the jth fly's left and right wings, respectively. Each FAij was standardized by the mean for that sex, generation, and line, FAi, yielding the selection index inline image. Because the means, FAi, change over the course of the experiment, scaling the contribution of each distance to the FA index by the current mean tends to equalize the contributions of the different distances to the overall index (i.e., preventing the FA of a single distance from increasing and dominating the overall index). Although selection was performed using the index described above, analyses and reported FA values are for the mean and individual FA values themselves (typically size-scaled).

The use of an index based on many values of FA is expected to reflect any underlying variation in DI better than the FA of any one trait (Leung et al. 2000). These distances were chosen from among all possible distances on the basis of low directional asymmetry, low correlations for FA, and low measurement error in the initial populations. Two-way mixed model ANOVAs of repeated measurements (Palmer 1994) confirmed that each of the eight distances had highly significant (P < 0.0001) side by fly interactions, indicating that measurement error was low enough to measure departures from symmetry reliably. The variance component associated with the side by fly interaction is an estimate of the true variance between sides. The median measurement error as a percentage of the between-sides variance was just 4.9% (mean 6.4%) and ranged from 3.2% for distance 4 to 12.7% for distance 1. The proportion of the variance among distances that was due to FA had a median over distances of 11% (mean 13%). This and other statistical analyses were performed in SAS 9.1.

The use of size-scaled FA values in our selection index (i.e., FA2 of Palmer and Strobeck 1986) minimizes selection on size, and therefore any indirect response in FA due to selection on size. The mean non size-scaled FA value had a small positive correlation with size. To quantify the effect of our scaling, we calculated the correlations between size and each size-scaled FA value (scaled and unscaled) in each line, sex, and generation over the entire experiment, then averaged them together. The mean correlation was −0.01522 ± 0.0055 for the mean of the scaled FA values and 0.03719 ± 0.0055 for the mean of the unscaled FA values. Both correlations are small but significantly different from 0. Size scaling resulted in a sevenfold reduction in R2 for mean FA on size from 0.14% for the unscaled mean FA index to 0.02% for the scaled mean FA.


In each generation, in each experimental line, 100 individuals of each sex were measured and 25 individuals of each sex were selected based upon their size-scaled and standardized FA index to continue the line. Two replicates were selected for increased FA values (U1 and U2, the Up treatments) and two for decreased FA values (D1 and D2, the Down treatments) starting from the same base population. An additional pair of unselected lines, each corresponding to one replicate, was maintained in parallel at a population size of 25 individuals of each sex. The base population for all lines was derived from the LHM stock, originally collected as 400 D. melanogaster iso-female lines collected by L. Harshman in central California in 1991 and maintained since that time by L. Harshman (1991–1995), and W. R. Rice (1995–2004). This experiment was initiated in mid-2004, soon after the population arrived in the Houle lab. In our hands, the flies were fed a standard sucrose, corn flour, brewer's yeast medium and maintained in 45-mL vials under a 12:12 L:D cycle at 25°C.

Lines U1 and D1 were collected as virgins, measured, and selected on the same weeks and U2 and D2 on the alternating weeks. For rearing, the 50 selected individuals were randomly mated in sets of five males and females in vials and allowed to lay eggs. These flies were transferred to new vials after approximately 48 and 72 h, except in the first generation (see below). Offspring were collected as virgins from these vials approximately eight to nine days later. Most measured flies came from the first set of vials; we used flies from the second vials only in exceptional cases of low yield. Selection was carried out for 43 generations and only halted due to scheduling issues rather than due to a declining response or reduction in viability as is typical in many selection experiments. In generations 24 through 33 males from the U2 line were inadvertently selected randomly. In generation 36, both males and females from the U1 line were inadvertently selected randomly. These generations were included in our analysis, although selection differentials from these generations were therefore low.

FA values were markedly higher in the initial generation of selection than in subsequent generations. A likely explanation is that larval density was higher in that generation, as parents were transferred on a schedule different from that in subsequent generations. Because of this discrepancy, analyses including generation 0 slightly inflate the response to selection in the Down lines and depress it in the Up lines. Data from the initial generation were therefore omitted from the reported analyses. The overall conclusions about selection response remained qualitatively unchanged when all generations were analyzed together.


To estimate realized heritability of FA, we took explicit account of the sampling dependencies among generations using a generalized least squares (GLS) approach (Lynch and Walsh 1998). GLS analysis was performed on the selection differential in each generation and the corresponding selection response in the next generation. The realized heritability was calculated from


, where X is the vector of selection differentials, W is a square matrix that incorporates the sampling variance of the selection responses on the diagonal and the sampling covariances between successive responses on the off diagonal, and R is the vector of observed responses to selection. The standard error of h2FA is



A separate GLS analysis using the cumulative selection differentials and responses generated extremely similar estimates. The heritability of DI was estimated as h2DI=h2FA/ℜ, whereℜ = 2/π − (π − 2)/(πCV2FA) (Whitlock 1998).


The possibility that differing degrees of inbreeding in the lines may account for differences in FA between the lines was explored by assaying the fitness of each replicate line. At the conclusion of the selection experiment, 100 random pairs of individuals were mated within each of the selected lines and the unselected control lines and allowed to lay eggs for 48 h in 45-mL vials; all offspring eclosing from those vials were counted. The result is a composite measure of fecundity and viability.

The possibility that measured FA changed through an effect on measurement error was explored via the following assay. At generation 38 of selection, we measured left and right wings of 50 individuals of each sex from each selection line twice to determine whether measurement error had changed as a result of selection. As usual, lines U1 and D1 were measured in one week, and U2 and D2 in the following week. One female fly in line U2 was a strong outlier for FA, because of a genuinely aberrant wing, and was omitted from the analysis. We performed a Levene's test in the form of an ANOVA on the absolute values of the deviations between repeated measurements of the same distance (Palmer and Strobeck 1992), followed by separate analyses of each of the eight-component FA distances.


Figure 2 shows the mean of the eight size-scaled FA values (see Methods) relative to the starting value during 43 generations of selection. At the beginning of the experiment, the degree of asymmetry in the separate wing intersection distances ranged from approximately 1.5% (distances d1 and d3) to 2.2% (distance d8) with a mean of 1.8%. Overall FA as measured by the mean size-scaled value increased an average of 23% in the Up treatments and decreased an average of 8% in the Down treatments.

Figure 2.

Mean FA over time. Mean size-scaled FA relative to the first generation plotted over the final 43 generations of selection. Symbols for lines are: U1 (dark blue triangles), U2 (light blue triangles), D1 (red circles), D2 (orange circles). Generation 0 data are omitted due to a large environmentally caused difference in mean FA (see methods).

Figure 3 shows the relationship between the cumulative response and the cumulative selection differential in the mean size-scaled FA value; the slope of this relationship is the realized narrow-sense heritability. Heritability (±SE) of the mean size-scaled FA value was estimated to be 0.0097 ± 0.0009 in U1, 0.0059 ± 0.00011 in U2, 0.0000 ± 0.00011 in D1, and 0.0029 ± 0.00011 in D2.

Figure 3.

Plot of cumulative response versus cumulative selection differential of mean size-scaled FA for the last 43 generations of selection. Symbols for lines are: U1 (dark blue triangles), U2 (light blue triangles), D1 (red circles), D2 (orange circles). Generation 0 data are omitted due to a large environmentally caused difference in mean FA (see methods).

Table 1 presents the complete set of h2FA, IA.DI, and h2DI values for each trait in each line. The heritability of DI (h2DI) for each individual distance trait was estimated via a standard model of the relationship between DI and FA (Whitlock 1998). Mean heritabilities of DI were estimated to be 0.19 in U1, 0.16 in U2, 0.01 in D1, and 0.10 in D2, much lower than some previous estimates (Fuller and Houle 2003), but comparable to values typical for life-history traits (Houle 1992). The ability of DI to evolve is best interpreted through the parameter IA, a simple function of the heritability of FA (Pélabon et al. 2004), which gives the percentage change in a trait when subject to selection as strong as that on fitness (Houle 1992; Hansen et al. 2003). Mean IA.DI s are 0.63% in U1, 0.37% in U2, −0.04% in D1, and 0.04% in D2, suggesting that sustained selection can lead to substantial increases in DI, but at best modest decreases.

Table 1.  Realized heritability estimates for selected lines. Size-scaled FA (h2FA) and DI (h2DI), and opportunity for response to selection in DI (IA.DI), h2FA for each distance as estimated by generalized least squares from the generation-by-generation selection differentials and responses. Note that h2FA is multiplied by 100. IA DI=h2FACV2FA, where CV2FA is the variance in FA divided by the square of the mean FA (Pélabon, et al. 2004). CV2FA was calculated for each generation in each replicate, then values for each replicate averaged together. The heritability of DI was estimated as h2DI=h2FA/ℜ, where ℜ= 2/π− (π− 2)/(πCV2FA)(Whitlock, 1998).
Distanceh2FA× 102SE × 102IA.DI (%)h2DIh2FA× 102SE × 102IA.DI (%)h2DI
Replicate U1    U2   
 d1 0.74 0.29 0.47 0.12 0.18 0.35 0.11 0.05
 d2 1.00 0.11 0.61 0.22 0.68 0.13 0.40 0.28
 d3 1.32 0.19 0.90 0.13 0.60 0.21 0.37 0.10
 d4 0.87 0.11 0.54 0.19 0.63 0.13 0.38 0.17
 d5 0.89 0.12 0.54 0.26 0.69 0.14 0.41 0.27
 d6 1.33 0.26 0.87 0.18 1.15 0.36 0.75 0.16
 d7 0.96 0.22 0.59 0.23 0.42 0.22 0.25 0.17
 d8 0.93 0.12 0.57 0.21 0.51 0.14 0.31 0.12
 Mean 1.01 0.63 0.19 0.61 0.37 0.16
Replicate D1    D2   
 d1−0.75 0.26−0.47−0.13−0.76 0.27−0.47−0.14
 d2 0.15 0.18 0.10 0.04 0.79 0.16 0.47 0.34
 d3 0.09 0.23 0.06 0.02 0.04 0.25 0.02 0.01
 d4 0.19 0.17 0.11 0.09 0.63 0.16 0.37 0.34
 d5 0.12 0.19 0.07 0.07 0.03 0.17 0.02 0.07
 d6−0.64 0.23−0.40−0.11−1.04 0.24−0.67−0.16
 d7 0.31 0.22 0.19 0.08 0.21 0.23 0.12 0.12
 d8−0.01 0.17 0.00 0.00 0.74 0.17 0.44 0.27
 Mean−0.07−0.04 0.01 0.08 0.04 0.10

Inbreeding has been shown to increase FA in some cases (Waldmann 1999; Carter et al. 2009) so one possible alternative explanation for the differences in mean FA is that the Up lines are more inbred than the Down lines. We tested for inbreeding depression by assaying fitness of each lines, with the results shown in Table 2. Replicates differed significantly in fitness (probably because of temporal environmental effects), but no significant differences between any of the three lines within each replicate were detected.

Table 2.  Mean productivity (SE) of each line at generation 44. Mean number of offspring eclosing per vial after 48 h of egg laying by 100 single pairs per line replicate combination.
Replicate 1Replicate 2
U63.55 (2.14)57.39 (1.12)
D61.29 (2.65)61.91 (1.63)
C62.35 (2.34)56.93 (2.04)

Another possible interpretation of our responses is that we inadvertently selected for differences in wings that changed measurement error (e.g., by making the location of vein intersections less easy to measure) rather than the actual asymmetry of wings. Measurement error will inflate the estimates of FA (Palmer and Strobeck 1986; Palmer and Strobeck 2003). Levene's tests of the measurement error in each distance at generation 38 were performed with treatment, sex, and side as fixed effects and week of measurement (replicate) as a random effect. The results showed no sex or side effects, so the final model involved just treatment and week effects and their interaction. These analyses revealed only one significant effect after Bonferroni correction for eight analyses (at P < 0.006), an interaction between treatment and week for D1, with line U1 high and line U2 low. For D1, the mean error was 0.00041 ± 0.00002 in the Down lines and 0.00043 ± 0.00003 in the Up lines. Over all eight distances, measurement error was higher in week 1 than in week 2, averaging 0.00046 ± 0.00003 in week 1 and 0.00038 ± 0.00002 in week 2. The overall measurement error differences that did exist clearly cannot explain the differences in mean FA between Up and Down lines. Line U1 had the highest average measurement error (0.00050 ± 0.00004), and U2 the lowest (0.00037 ± 0.00002).

Given the negative correlation between our size-scaled FA index and size, we may expect to see FA decreases in both Up lines and increases in both Down lines purely due to this allometry. In contrast, over the course of the experiment, mean wing size decreased in lines U1 (–5%) and D1 (–2%), but increased in U2 (+2%) and D2 (+1%). If changes in wing size were responsible for changes in FA, we can also calculate the expected response in the size-scaled FA values to selection on size RFA SIZE, if all variation in FA is in fact due to size, from the equation RFA SIZE=bFASIZESSIZE where bFASIZE is the average regression of FA on size and Ssize is the cumulative selection differential for size. The average regression of size-scaled FA on size during this experiment was –0.0338 ± 0.013. Under these assumptions, the predicted responses in mean size-scaled FA are only an average of 5.3% of those observed (ranging from 2.4% in line D1 to 9.7% in line D2). Given the weak relationship between selection on size and the response in size, this value probably overestimates the actual indirect response in FA to selection on size in this experiment.


Over the course of 43 generations the artificial selection procedure we performed resulted in average increases in overall size-scaled FA of 23% in the Up treatments and decreases of 8% in the Down treatments when comparing the final generation to the initial one; other similar comparisons (e.g., mean of first three compared to mean of last three) yield virtually identical changes. The realized heritabilities of the mean size-scaled FA value were 0.0097 ± 0.0009 and 0.0059 ± 0.00011 in the Up lines and 0.0000 ± 0.00011 and 0.0029 ± 0.00011 in the Down lines; using a standard model of the relationship between DI and FA (Whitlock 1998) these values correspond to heritabilities of 0.19 and 0.16 in the Up lines and 0.01 and 0.10 in the Down lines. The magnitude of the responses did not appear to decline as the experiment progressed, suggesting that the genetic variation for DI initially present was not exhausted over the course of the 43 generations of selection. These data are very strong evidence for heritable variation in three of the four replicates and provide the strongest evidence yet reported for selectable genetic variation in DI.

The response of FA to selection in the Down lines was less pronounced than that in the Up lines. In addition to the lower heritabilities, this reduced response in the Down lines is expected for two reasons. First, the selection differentials on DI, the cause of FA, generated by upward selection on FA are much larger than those generated by downward selection for FA (Fuller and Houle 2002). Second, because natural selection is commonly thought to act to reduce FA and DI (e.g., Martin 2000; Gangestad et al. 2001) alleles increasing DI may be kept at low frequencies.

If the amount of variation in FA in Drosophila is typical, then the signal about an individual's DI offered directly by FA data is clearly small. Although our results strongly suggest that a substantial amount of genetic variation in DI for wing asymmetry is present in this population, FA offers a relatively meager amount of information about that variation. To detect variation in DI, we used more than 500,000 measurements from over 65,000 wing images. Evaluating the overall DI of a single individual from its FA will be inaccurate, even based on many individual measurements, as in our experiment. This suggests that models that explain mate choice on the basis of FA as arising due to the ability of FA to provide reliable information about developmental stability are implausible. Where mate choice is correlated with FA, this is more likely due to another causal factor that affects both, such as parasite resistance, as individuals with fewer parasites have been reported to exhibit lower FA (e.g., Alibert et al. 2002; Bize et al. 2004; but see Lajeunesse 2007; Martin and Hosken 2009), and may only secondarily result in selection for improved developmental stability. It will be more useful to develop measures of genotypic quality based directly on such causal factors, rather than FA.

On the other hand, our results provide the first robust quantitative indication that DI is capable of responding directly to selection. Over long periods of time, natural selection for reduced FA would tend to minimize DI, making the response we found to selection for decreased FA particularly interesting. One possible reason that capacity to evolve decreased DI remains in our population is that costs of developmental precision counteract selection for precision. The causes of DI, in the broad sense of phenotypic variation arising from the same genotype, are just beginning to be understood (Leamy and Klingenberg 2005; Raser and O'Shea 2005), so the basis of such costs is currently unknown. One potential mechanism is that the epigenetic systems that can preserve differences in gene expression within the same genotype have an optimal susceptibility to external signals. If they are too sensitive, development is imprecise. If they are not sensitive enough, developmental regulation itself becomes difficult.

Associate Editor: G. Mayer


We thank T. F. Hansen and C. Pélabon for helpful discussions; the associate editor and two anonymous reviewers for helpful comments; C. Evers for laboratory management; and R. Bates, W. Bevis, M. Fadon, M. Geary, K. Hudson, E. Hume, R. Mangali, S. Maraj, S. McCord, S. Roper, S. Schwinn, D. Simons, L. Smith, T. Weier, M. Welch, S. Zaman, and H. Zohourian for wing measurement. Financial support was provided by a National Institutes of Health Ruth L. Kirschstein National Research Service award (5F32GM070248–02) to AJRC.