Henrique Teotónio, Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08540–1003, USA Tel.: +609 2585587; fax: +609 2581334; e-mail: email@example.com
Abstract The evolution of fitness is central to evolutionary theory, yet few experimental systems allow us to track its evolution in genetically and environmentally relevant contexts. Reverse evolution experiments allow the study of the evolutionary return to ancestral phenotypic states, including fitness. This in turn permits well-defined tests for the dependence of adaptation on evolutionary history and environmental conditions. In the experiments described here, 20 populations of heterogeneous evolutionary histories were returned to their common ancestral environment for 50 generations, and were then compared with both their immediate differentiated ancestors and populations which had remained in the ancestral environment. One measure of fitness returned to ancestral levels to a greater extent than other characters did. The phenotypic effects of reverse evolution were also contingent on previous selective history. Moreover, convergence to the ancestral state was highly sensitive to environmental conditions. The phenotypic plasticity of fecundity, a character directly selected for, evolved during the experimental time frame. Reverse evolution appears to force multiple, diverged populations to converge on a common fitness state through different life-history and genetic changes.
One of the most important topics in evolutionary biology is adaptation, particularly the selective and genetic mechanisms that lead to adaptation. Among the many empirical problems that face the study of adaptation is the inherent difficulty of inferring adaptation from the evolution of characters whose relationship with fitness may be questionable. Another important problem facing many empirical studies of adaptation is that they are frequently carried out in environmental conditions which are of unclear evolutionary relevance, if not actually a source of artefacts (vid. Lewontin, 1974; Wright, 1977; Haymer & Hartl, 1982; Hedrick & Murray, 1983; Service & Rose, 1985; Leroi et al., 1994; Reznick & Travis, 1996; Rose et al., 1996; Chippindale et al., 2001). The obvious solution to these difficulties is to study the evolution of fitness itself in interpretable selective contexts, a solution that is also hard to achieve.
Historically, the direct estimation of fitness in sexually reproducing organisms has been difficult, because fitness measures of single genotypes in natural genetic and environmental contexts are usually impossible to obtain, unlike the situation for clonal organisms (e.g, Lenski et al., 1991; Crill et al., 2000). When isogenicity can be contrived, fitness may be estimated, but inbreeding effects can have devastating effects on the fitness of each line, and the whole set of inbred lines will often represent a biased sample of the genetic variation found in the original population. For a long time, intraspecific and interspecific competition experiments have been used in Drosophila to infer relative genotypic fitness (Latter & Robertson, 1962; Prout, 1971; Wright, 1977; Jungen & Hartl, 1979; Haymer & Hartl, 1982; Hedrick & Murray, 1983; Mackay, 1986; Wilton et al., 1989; Joshi & Thompson, 1995; Weber, 1996; Fowler et al., 1997; Barton & Partridge, 2000). Differences in relative fitness between wild-caught flies and their laboratory derivatives or between laboratory populations have usually been found. But inbreeding and genotype-by-environment interaction make it hard to establish a relationship between these fitness differences and the genetic and environmental circumstances in which the original populations evolved, which renders the scientific value of such findings highly uncertain (Rose et al., 1996).
To address these problems in the study of adaptation one needs, at a minimum, an experimental system in which it is possible to estimate the fitness of members of genetically defined populations relative to the fitness of members of control populations that have already adapted to the experimental environment. If the experimental populations are allowed to evolve in the environment of the previously adapted control populations, their evolution has a well-defined point of reference. One such design is experimental reverse evolution, where the ancestral populations or their equivalent are available for comparison with derived differentiated populations (e.g. Lenski, 1988; Service et al., 1988; Rainey & Travisano, 1998; Burch & Chao, 1999; Crill et al., 2000; Teotónio & Rose, 2000, 2001). This was the experimental paradigm used in the present study. Note that this type of experimental paradigm is very different from both relaxed artificial selection and reverse artificial selection, most importantly with respect to the population sizes involved, the opportunity for natural selection to operate, and the degree of replication. Experimental evolution, forwards or reverse, involves more and larger populations compared with artificial selection, and natural selection is deliberately allowed to occur (see Wright, 1977; Rose et al., 1996; Teotónio & Rose, 2001).
Reverse evolution in sexually outbreeding Drosophila populations appears to depend on previous evolutionary history, particularly with respect to the degree of convergence. Yet incomplete convergence to ancestral states could not be explained for most studied characters either by lack of genetic variation or the presence of gene-interaction systems in these Drosophila populations (Teotónio & Rose, 2000). However, this earlier study did not consider the reverse evolution of fitness itself, being limited to characters whose relationship to fitness is less obvious. Thus, it was not ascertained whether adaptation is itself dependent on previous evolutionary history. Also, the selective mechanisms that determine reverse evolution were not identified.
In this study we measure a total of 10 characters, including life-history characters and competitive fitness relative to a morphologically marked stock, for populations which have been kept in diverse environments for numerous generations, and their descendent populations selected in the environment of their common ancestor for 50 generations (vid. Teotónio & Rose, 2000). We specifically address: (1) whether heterogeneous populations derived from a common ancestor will attain similar degrees of adaptation during reverse evolution; (2) whether the evolutionary dynamics of fitness characters are similar to characters whose relation with fitness is less obvious; and (3) the relationship between the ancestral environment and the selective mechanisms behind reverse evolution.
Materials and methods
All populations were derived from a common ancestor introduced to the laboratory in 1975, the ‘Ives’ population (Rose, 1984; Teotónio & Rose, 2000). In 1980, five populations were derived and ever-since maintained under Ives-ancestral conditions (B1−5), which comprise a 2-week generation cycle. Another five populations were selected for increased late-life fertility (O1−5), being maintained on a 10-week generation cycle (Rose, 1984). In 1989, populations derived individually from the O populations were selected for increased starvation resistance (SO1−5), with another group maintained as the fed demographic control (CO1−5) (Rose et al., 1992). Both these population-groups were maintained on a 3–4 week generation cycle. Lastly, from the CO populations, a set of five populations was selected for decreased developmental time and increased early fertility (ACO1−5) (Chippindale et al., 1997). These were maintained with generation lengths no longer than 9 days. All populations have been maintained at high population sizes (n > 1500), without systematic inbreeding or hybridization, throughout their history. By the time the experiments described here were performed, the O populations had undergone 110 generations of selection for late-life fitness, the SO and CO had undergone 130 generations in their selective environment, and the ACO populations 270 generations.
New populations were obtained from the four groups of selected populations (O, SO, CO and ACO), each new population derived from the same numbered replicate ancestor population (e.g. IO1 derived from O1, IO2 derived from O2, etc.), and cultured in the common ancestral Ives-environment for 50 generations (Teotónio & Rose 2000). This environment features discrete 2-week generations, with natural selection for early fertility in crowded conditions. Egg-laying occurs within 2 h, density being controlled at 50–100 eggs per vial. Each of the five IB populations was derived from a single B population. These were the control populations in the experimental design. Approximately 440 generations had elapsed since their formation by the time the reverse evolution experiment started (Teotónio & Rose, 2000).
A total of 45 populations were studied: the control IB1−5, the 20 differentially selected populations (O1−5, CO1−5, ACO1−5, and SO1−5), and the 20 returned to the Ives environment (IO1−5, ICO1−5, IACO1−5, and ISO1−5). All characters were measured at generation 50 of reverse evolution.
Competitive fitness assays
Three competitive fitness assays were done: male, female, and population fitness. In each of them, morphologically marked populations were competed against the experimental populations in Ives-environment conditions.
For the male fitness assay, five vials were used per population comparison, each containing 25 eggs of the population to be tested and 75 eggs of a brown-eyed bw competitor outbred stock (supplied by A.K. Chippindale, UCSB). After 14 days, 20 bw females from each assay vial were isolated into separate vials. The proportion of these females with wild-type (wt) progeny (as a consequence of being fathered by males from the experimental population) was taken as the male fitness estimate. This includes male survivorship, mating success, and viability of offspring (see also Chippindale et al., 2001). Although most vials had all bw or wt progeny, in approximately 20% of them the progeny were of varied parental origin, because of multiple mating. In such cases, each vial was given a score of 0.75 if the majority of the progeny had wt eye colour (65–95%), 0.5 if approximately half was wt (35–65%), and 0.25 if a minority of flies were wt (5–35%). This scoring system was adopted because the volume of data rendered exact counts impractical.
For the female fitness assay, five pairs of wt adults were placed together with 15 pairs of bw adults in eight replicate vials per population tested. The females were allowed to lay eggs for 2 h after which all adults were discarded. Egg density was controlled by removing excess eggs. After 14 days, female fitness was estimated as the proportion of wt progeny. This estimate includes female survivorship, fertility, and offspring viability.
The population fitness assay employed a two-generation competition contrived so that the marked population could not hybridize with the experimental populations. This was done using a compound autosome stock as the marked competitor because all hybrid progeny with the experimental population will be inviable (Jungen & Hartl, 1979; Novitski et al., 1981). The marked stock no. 1113, with genotype C(2)EN, b bw; st, was obtained from the Bloomington Drosophila Stock Center. In this assay, seven replicate vials were used per population. In each vial, three pairs of adults from the experimental population were placed together with 17 pairs of the marker stock. These adults were allowed to lay eggs for 24 h, after which egg density was controlled. After 14 days all emerged adult flies were scored for their genotype and allowed to lay for 24 h under crowded conditions for a second generation, egg density again being controlled. The proportion of wt adult progeny, after the first and second generation of competition, was used as an estimate of population fitness.
For developmental time, 10 vials per population were collected with 60 eggs each and emerging adults were counted every 6 h. Population viability, regardless of sex, was taken as the egg to adult survivorship. In the starvation resistance assay, four same-sex flies were placed in each of 10 replicate vials under high humidity conditions but no food. Survivorship was scored every 6 h (Teotónio & Rose 2000).
Early fecundity under low crowding conditions was measured by placing one pair (one male and one female) of 14-day-old-flies in each of the 30 vials assayed per population. These flies were allowed to lay eggs for 24 h. For early fecundity under high crowding conditions, 20 pairs were used for each vial, with 1, 2 and 6 h for the laying period. Eight vials for the 1 h assay and five vials for the 2 and 6 h assays were collected from each population.
Data for developmental time, starvation resistance, and 1 h early fecundity under high crowding conditions for the reverse evolved populations IO1−5, ISO1−5, ICO1−5 and IACO1−5 were previously reported in Teotónio & Rose (2000).
For each selection history, there are five independent replicate populations. Because of this, our observational units are the mean values of these populations. All group comparisons were made relative to the IB populations.
Two sets of hypotheses were tested. First, we tested whether particular independent groups of populations with different evolutionary histories converged to control IB values. For example, we tested whether O1−5 populations were differentiated from the IB populations, and whether the IO1−5 converged to the IB control levels. This was done with unpaired two-tailed Student's t-tests, with α set to 0.05. Sequential Bonferroni was used for multiple comparison corrections on each character separately, under the composite null hypothesis that all group differences to control are equal to zero (Rice, 1989). The number of tests for all characters were eight, with the exception of fecundity where the number of tests were 32.
Secondly, we tested the general question of convergence to the ancestral values, and the role that previous history has on reverse evolution, using a mixed analysis of variance (anova) with a fixed effects ‘selection’ factor and a random effects ‘history’ factor. The historical factor was reflected in the division of populations into four groups according to their specific evolutionary history (the antecedent selection in O, CO, ACO or SO populations). For the selection factor, the populations were divided into two groups: those which had recently undergone Ives-culture and those which had not. A significant interaction would reveal historical effects on adaptation. For each population the data used were the absolute differences from the mean of the five IB populations. This was done because differences in sign might underestimate average differentiation among ‘selection’ categories and overestimate the interaction term between ‘history’ and ‘selection’, as a result of sign differences in initial differentiation.
For the analysis of differential plasticity in fecundity, analysis of covariance (ancova) was performed on the 1, 2 and 6 h assays. A separate ancova was performed for each of the four evolutionary histories (that is, one for the O ancestry group, including both IO1−5 and O1−5 populations, one for the CO ancestry group, etc.). Selection (Ives selection vs. non-Ives selection) was the fixed factor with number of hours as the covariate. A significant interaction between ‘selection’ and ‘hours’ would indicate a change with reverse evolution in the relationship between observed fecundity and the number of hours allowed for egg laying.
The results for the male fitness assay are summarized in Fig. 1a and Table 1. Student t-tests revealed that the immediately ancestral O1−5, SO1−5, and ACO1−5 populations were differentiated from the IB populations, the CO1−5 being marginally differentiated (P = 0.07), whereas all reversely evolved populations, IO1−5, ISO1−5, ICO1−5, and IACO1−5 converged to control levels. anova indicated that type of selection (populations cultured in Ives vs. non-Ives conditions) to be highly significant, although incidental evolutionary history (O, SO, CO, and ACO ancestry grouping) and the interaction between them were nonsignificant (see Table 1).
Table 1. Mean Squares for a two-factor mixed ANOVA table with selection factor fixed and history factor random.
significance testing of F statistics: *0.01< p ≤0.05; **p≤0.01. † generation 1 of angular transformed population fitness assay.
Selection × History df=3
Female fitness results are shown in Fig. 1b. As tested with t-tests no group of populations was different from the control populations. However, anova on female fitness data showed both a significant ‘selection’ and ‘history’ factor (Table 1).
For the population fitness assay, in the first generation, the SO populations were heteroscedastic, as measured by Bartlett's tests. Angular transformation partially corrected this and transformed data were used in subsequent analysis of all populations. Comparisons using t-tests revealed that ACO1−5 (P = 0.02) were significantly less fit in the assay's first generation. All other populations were undifferentiated (see Fig. 1c). However, the CO1−5 (P = 0.07), IO1−5 (P = 0.07), and O1−5 (P = 0.08) were marginally differentiated from control levels. By the second generation of the population fitness assay, most of the populations undergoing assay had cleared the experimental vials of morphologically marked competitor flies (not shown). For this reason, multigenerational statistical analyses could not be pursued. anova did not reveal any significant effects of any factor, when each generation was analysed separately.
Results from population viability, female developmental time, and female starvation resistance are shown in Fig. 2 and Table 1.
For female developmental time, all but the IO populations were differentiated from the control populations, by t-tests. Male developmental time results are similar to female results, with the exception of the ISO populations, which for differences were not significant. anova shows that both selection and interaction between selection and history are significant.
The CO, ICO and O populations were different from the IB populations with respect to viability (Fig. 2). The interaction between selection and history in the anova for population viability was significant (Table 1).
Analysis of both female and male starvation resistance showed that all but the ICO and IACO populations were differentiated from the IB populations. Also for both sexes, only the interaction term in the anova was significant (Table 1). As the SO treatment has disproportionately higher starvation resistance values than those of any other group of populations (Fig. 2), we re-analysed the data using the absolute differences of the assayed populations from the control populations divided by the value of the immediately ancestral differentiated population group. This analysis was undertaken in order to minimize any effects of initial differentiation on the degree of convergence, which could adversely affect the estimation of the interaction term. The anova in this case showed for females that both type of selection (F1,3 = 17.5; P = 0.03) and the interaction term (F3,32 = 3.35; P = 0.03) are significant. In males, selection is marginally significant (F1,3 = 6.41; P = 0.09), whereas the interaction is highly significant (F3,32 = 10.3; P < 0.01).
The early fecundity results are presented in Fig. 3 and also in Table 1. At low adult density with 24 h of egg-laying, ACO and SO are different from the IB values as tested with t-tests. The interaction term was the only factor found significant in the anova.
All populations but ISO are differentiated from the control for early fecundity at high density with 1 h egg-laying. In the 2 h assay all but ISO and ICO groups are different from the control. Finally, analysis of the 6-h assay showed that only O, SO, ACO were significantly different from the control. The fecundity data indicate a return to the ancestral fecundity character with increased time of egg-laying in the reverse evolution populations (Fig. 3). anova revealed that type of selection had significant effects in the 1- and 2-h assays, although the interaction was significant for the 1, 2 and 6 h assays (for the 1 h at the 6% level; Table 1).
The ancova performed on the fecundity data shows that the fecundity characters of all populations responded to 50 generations of Ives-culture, as the selection factor was significant, irrespective of history (not shown). A change of the linear relationship of fecundity with hours was found for the populations with O and SO ancestry, shown by a significant interaction term (F1,26 = 9.2, P < 0.01 for O ancestry; F1,26 = 5.8, P = 0.02 for the SO ancestry), but it was not found for populations with CO and ACO ancestry.
Reverse evolution and measures of fitness
Our findings show that the evolutionary dynamics of reverse evolution depend in part on previous selective history, as many of the anova interaction terms reflecting selective ancestry are significant. Our results also show that phenotypic convergence is not universal, particularly for characters whose relationship to fitness is less clear. Similar results have been obtained in experimental studies of adaptation in bacteria (e.g. Travisano et al., 1995). These conclusions are in agreement with several other studies of reverse evolution in both sexual and asexual species (Lenski, 1988; Service et al., 1988; Rainey & Travisano, 1998; Crill et al., 2000; Teotónio & Rose 2000, 2001).
Among the three competitive fitness estimates, the one that gave the clearest results was male fitness. This character was differentiated considerably in the starting populations, converging to a level not significantly different from that of the common ancestral control by generation 50, for all populations selected in the ancestral environment. The particular components of male fitness which are being selected during reverse evolution is unknown and may be different from population to population. For example, rapidity to achieve sexual maturity might be important in populations that take more time to develop and mature than the ancestral populations. In populations selected for late-life fertility, increased starvation resistance, and intermediate age of reproduction, individuals may not attain their peak mating ability by the time that mating usually occurs in the ancestral environment (Rose, 1984; Leroi et al., 1994). In populations selected for accelerated development, on the other hand, selection should be different (Chippindale et al., 1997). Because the generation length of these particular populations is set to 9 days, a male fitness decline after this day may have evolved as a result of a trade-off with very early mating success, producing radically accelerated male senescence. But these possibilities are only speculative at this point.
The total fitness assays did not give much useful information. Differences in total population fitness between the differentiated populations and the controls were negligible, with the exception of the competitive fitness of those selected for accelerated development during the first generation of assay. Taking these results separately from those of the second assay generation they reveal that significant female competitive fitness differences were present, despite the fact that we did not observe those differences in the female fitness assay. However, multigeneration population competition did not reveal any fitness differentiation involving the accelerated development stocks and thus we must conclude that all populations had similar competitive population fitness.
The multigenerational competition did not reveal significant population differences because of the very poor performance of the mutant stock employed. The use of this type of assay is attractive in strictly sexually reproducing organisms because it can directly assess a full generation of competitive success for a genetically unmanipulated experimental population, by contrast with techniques involving the use of inbreeding to estimate the fitness of single chromosomes (e.g. Fowler et al., 1997; Barton & Partridge, 2000). In our case however, it proved a poor experimental technique.
Although there was a significant selection and historical effect in female competitive fitness estimates, these revealed little differentiation, before or after reverse evolution. This is surprising, given that the ancestral environment imposes selection for reproduction at day 14 after egg collection, and these conditions were closely replicated. The lack of differentiation for females may be due to one of two causes. First, perhaps the test employed is not sensitive enough to population differences because the competitor population has such a good performance in this environment that it masks any differences (but see Chippindale et al., 2001). Secondly, the lack of differentiation may be real. The question is then what component of female fitness is actively being selected, if any? The results for fecundity characters measured directly reveal extensive differentiation of populations. An explanation for the lack of female fitness differentiation might have to wait for experiments which can test the extent to which fecundity plasticity reveals female fitness differences (see below).
It is interesting to note that although male fitness was highly differentiated and responsive to selection, female fitness was neither. As in previous studies with Drosophila, the two characters are apparently able to evolve separately from each other (Rice, 1996; Chippindale et al., 2001). The fact that the Pearson correlation coefficient between male and female fitness is equal to −0.002 ± 0.023 reinforces this conclusion.
Alternative routes to adaptation
The specific life-history mechanisms behind reverse evolution were tackled by measuring life-history characters which are closely related to fitness in the ancestral environment: developmental time, starvation resistance and particularly viability and fecundity. Generally the life-history data do not indicate a widespread return to ancestral values after 50 generations of reverse evolution. Our results are not due to a lack of relevant genetic variation or presence of an epistatic genetic architecture (Teotónio & Rose, 2000). In contrast to this, and to the best of our resolution, the significant results from the male competitive fitness assays show a more pronounced convergence to ancestral levels.
These two patterns together suggest that similar degrees of adaptation were achieved through different underlying life-history and genetic mechanisms (cf. Prout, 1971; Wright, 1977; Cohan, 1984; Cohan & Hoffman, 1989; Travisano & Lenski, 1996; Crill et al., 2000). The populations previously selected for increased starvation resistance reverted to ancestral character values in fecundity, remaining differentiated for starvation resistance and developmental time. The populations with accelerated development did not return to ancestral values for fecundity or developmental time. The late-life fertility populations also did not completely return to ancestral fecundity or starvation resistance levels. Nevertheless, all these groups of populations returned to ancestral male fitness levels.
Alternative routes to adaptation have been experimentally observed in Drosophila through asymmetry of correlated responses to selection in alternative environments (Shiotsugu et al., 1997) or similar environments (e.g. Cohan & Hoffman, 1989; see also Bohren et al., 1966; Gromko, 1995). Our study however, is one of the first to attempt a direct measure of fitness. Microbial systems have long permitted the simultaneous estimation of fitness and life-history characters. It is well established in the microbial literature that different genetic, physiological and life-history mechanisms are implicated in similar adaptive responses. Generally, this is supported by numerous studies on the evolution of bacterial resistance to phages and antibiotics (e.g. Lenski, 1988; Cohan et al., 1994; review in Lenski, 1998), bacterial adaptation to minimal media (e.g. Travisano et al., 1995; Travisano & Lenski, 1996), viral adaptation to alternative hosts (Crill et al., 2000), and also by the adaptive recovery of deleterious mutations in both viruses and bacteria (Burch & Chao, 1999; Moore et al., 2000).
The role that sexual recombination might have in these evolutionary patterns and their genetic basis has however, not been adequately determined. For example, it has been experimentally determined that bacterial evolution during antibiotic exposure and recovery involve compensatory genetic changes, thus revealing an epistatic mode of action. In sexual populations, because of recombination, such cases might not be so common (Teotónio & Rose, 2001).
Environmental sensitivity of adaptation
Our study illustrates the importance of characterizing adaptation in evolutionarily relevant environments. In particular it shows some of the pitfalls that can occur when not describing the environment properly. For example, if we had measured early fecundity only at low density, no differences between selected populations and controls would have been detected. Furthermore, if early fecundity at high density with 1-h of egg-laying had been the only fecundity character studied, it might have been concluded that reverse evolution did not lead to convergence for fitness characters (Teotónio & Rose, 2000). Nonetheless, with more time for egg-laying, measurable evolutionary convergence occurs. The rapidity of egg-laying might be what is selected for in the ancestral environment, as well as being the evolutionary constraint involved during convergence. The reverse evolution of fitness might then be explained by different rates of response to an environmental transition at the time of egg-laying, such as the contact with fresh food, CO2, or the presence of laid eggs. Both amount of nutrients and adult crowding have been previously implicated in the evolution of early fecundity in some of our laboratory populations (Leroi et al., 1994). Here the data indicate that the speed of response to an environmental transition was the focus of selection for early fecundity. This is a striking example of the environmental specificity and dependence of adaptation (Travisano & Lenski, 1996). The findings on fecundity also indicate that the change in plasticity for a fitness character during reverse evolution depended on previous selection history.
Despite the intricacy of these details, it is nonetheless clear that plasticity is eminently evolvable, even in an evolutionary time frame as short as 50 generations.
Complexity in experimental evolution
Overall our findings illustrate the complexity of studying the evolutionary processes of adaptation. Clear expectations of the outcome of reverse evolution were defined a priori, but were not fully met. Presumably this reflected the effects of previous selection on the populations analysed, the variable associations between life-history characters and fitness, and the subtle environmental dependence of the expression of these characters. This complexity should not be underestimated, both when designing studies involving experimental evolution and when interpreting their results.
For essential technical help we thank Y. Chau, T. Vu, N. Vu, C. Hammerle and E. Gass. A. K. Chippindale and L. D. Mueller have given helpful advice throughout the project. C. L. Burch, G.A.C. Bell, and two anonymous reviewers improved the manuscript with their suggestions. M. Matos received a travel grant from FLAD. H. Teotónio was supported by the Gulbenkian Foundation, FLAD, and PRAXIS/FCT (Portugal).
Received: 31 October 2001;revised: 4 February 2002;accepted: 2 March 2002