When multiple substitutions affect a trait in opposing ways, they are often assumed to be compensatory, not only with respect to the trait, but also with respect to fitness. This type of compensatory evolution has been suggested to underlie the evolution of protein structures and interactions, RNA secondary structures, and gene regulatory modules and networks. The possibility for compensatory evolution results from epistasis. Yet if epistasis is widespread, then it is also possible that the opposing substitutions are individually adaptive. I term this possibility an adaptive reversal. Although possible for arbitrary phenotype-fitness mappings, it has not yet been investigated whether such epistasis is prevalent in a biologically realistic setting. I investigate a particular regulatory circuit, the type I coherent feed-forward loop, which is ubiquitous in natural systems and is accurately described by a simple mathematical model. I show that such reversals are common during adaptive evolution, can result solely from the topology of the fitness landscape, and can occur even when adaptation follows a modest environmental change and the network was well adapted to the original environment. The possibility of adaptive reversals warrants a systems perspective when interpreting substitution patterns in gene regulatory networks.

Advances in the functional annotation of DNA, combined with comparative genomic methods, allow for increasingly detailed characterizations of the evolution of biological function (Andolfatto 2005; Tuch et al. 2008; Field et al. 2009; Gerke et al. 2009; Chen et al. 2010). Explanations for observed sequence changes fall into several broad categories including random genetic drift, numerous types of adaptive change, compensatory evolution, relaxation of constraint, and others. Often several of these processes may be happening simultaneously, complicating efforts to understand the evolutionary forces behind the changes (Kreitman 1996; Lynch 2007; Barrett and Hoekstra 2011). The process of compensatory evolution is particularly compelling because it offers a means for the underlying encoding of a trait to evolve even as the correct functional output is conserved (Ancel and Fontana 2000; Moses et al. 2006).

Compensatory evolution has been invoked to explain the evolution of transcription factor binding sites (Gibson 1996; Ludwig et al. 2000; Moses et al. 2006; Lusk and Eisen 2010), RNA secondary structure (Stephan 1996; Chen et al. 1999; Ancel and Fontana 2000), protein structure and fold stability (DePristo et al. 2005; Callahan et al. 2011), nucleosome positioning (Kenigsberg et al. 2010), and cis/trans interactions (Landry et al. 2005; Tuch et al. 2008; Kuo et al. 2010). There are two commonly invoked modes of compensatory evolution. In the first, mutations are compensatory with respect to fitness, meaning that one or both of the mutations are individually deleterious, but jointly they are neutral or even advantageous. The functional effect of the alleles is not of primary relevance. In the second mode, the focus is the functional effect of the mutations. Jointly, the mutations preserve proper functioning of the system, but individually, one or both of the mutations would disrupt function. These two modes overlap when there is a direct relationship between function and fitness and often such a relationship is implicitly assumed.

In the case of cis-regulatory modules, compensatory evolution is commonly offered as an explanation for functional reorganization, with nearly all such comments referring to the work of Ludwig et al. (2000). By making and expressing chimeric enhancer constructs, Ludwig et al. (2000) showed that compensatory, epistatic changes must exist in the even-skipped stripe 2 enhancer despite conservation of functional output. In this setting, it is intuitive to imagine a direct relationship between function and fitness, particularly given the expression of important developmental enhancers is most certainly under stabilizing selection, and any functional mutation would likely disrupt proper expression.

However, when one assumes a direct link between functional changes and fitness consequences, several difficulties arise. Even mutations of very small deleterious effect are usually eventually removed from the population (Ewens 2004), such that an initial deleterious substitution is unlikely and the process is likely to proceed very slowly. Even if compensatory evolution occurs via the joint fixation of conditionally neutral mutations, it is also likely to proceed very slowly (Kimura 1985; Stephan 1996; Iwasa et al. 2004; Durrett and Schmidt 2008), and thus may not account for the bulk of apparent compensatory evolution. Ludwig et al. (2000) had argued that nearly neutral changes might underlie the compensatory evolution. This requires either many very small effect mutations or a function-fitness mapping in which mutations of large functional effect can nonetheless be nearly neutral. This latter explanation seems plausible based on a model of enhancer function (Bullaughey 2011), and has been termed pseudo-compensatory change (Haag 2007).

Although there is a tendency to assume stabilizing selection on the trait, and conclude that compensatory evolution explains turnover, adaptation is also a possibility. He et al. (2011) examined nucleotide substitutions in Drosophila affecting known transcription factor binding sites in developmentally important enhancers. The authors characterized each substitution with respect to whether it increases or decreases the site’s affinity to the transcription factor. They found an excess of affinity-decreasing mutations in Drosophila simulans, suggesting adaptive loss of transcription factor binding sites. In addition, there were some affinity-increasing mutations, but not more than expected relative to the benchmark (substitutions that did not alter affinity). How should one interpret such changes? Are some of the affinity-decreasing mutations compensated by affinity-increasing mutations? What about the excess of affinity-decreasing mutations? If adaptation is really driving the loss of binding sites, how do we interpret the affinity-increasing mutations? Would these have been deleterious?

In answering these questions, the distinction between compensation with respect to function and compensation with respect to fitness becomes important. Two mutations that appear compensatory with respect to function need not be compensatory with respect to fitness. In particular, when a system is evolving adaptively, mutations that have opposing effects with respect to a function, and thus appear compensatory, may be individually adaptive due to other epistatic changes that are occurring in the genetic background. I term this phenomenon an adaptive reversal. In an adaptive reversal, the sign of the fitness effect of altering a phenotype in a particular direction flips, as changes affecting other traits occur. In this sense, it is similar to sign epistasis, in which the sign of the fitness effect of a mutation depends on the genetic background (Weinreich et al. 2005).

At its simplest, epistasis is a relationship among multiple loci or traits whereby the joint effect of several changes is not the outcome expected of the individual changes (see Phillips 2008 for a good review).

Epistasis can exist for many reasons, including structural considerations, molecular interactions, relationships mediated through the concentrations of molecules (e.g., transcriptional regulation), and even host–pathogen interactions. Despite its obvious importance (Wolf et al. 2000), we presently understand few of the seemingly infinite array of epistatic interactions, partly because epistasis is challenging to detect (Whitlock et al. 1995). Our limited knowledge of epistasis has prompted many investigators to consider flexible models of two- or three-locus interactions, in which the epistatic effects are parameters, and not based on assumptions regarding specific biological interactions (Cheverud and Routman 1996; Wolf et al. 2000; Takahasi and Tajima 2005; Gandon and Otto 2007). Another approach is to consider complex models that capture epistasis among many loci (Gavrilets et al. 1998; Weinreich et al. 2005; Yukilevich et al. 2008), at many nucleotides within a locus (Kauffman and Levin 1987; Orr 2006) or in many dimensions (Martin and Lenormand 2006), but again the epistasis present in these models is not based on explicit biological mechanisms.

One common framework for considering arbitrary epistatic relationships is the fitness landscape, with valleys and peaks representing regions of lower or higher fitness. Popularized early on by Wright (1932), fitness landscapes have continued to provide a fertile analogy for discussing evolutionary processes (Gavrilets 2004; Skipper 2004; Orr 2005; Weinreich et al. 2005), although see Kaplan (2008). A fitness landscape prescribes a fitness to each multilocus genotype (or combination of phenotypes), allowing for any type of epistasis. The simplest fitness landscape model is Fisher’s Geometrical model Fisher (1930), which despite its simplicity has yielded important theoretical predictions (Orr 2005; Martin and Lenormand 2006) and a framework to test empirical measurements (Martin et al. 2007).

Yet like the models that directly prescribe epistatic interactions, most studies invoking fitness landscapes do not rely on known biological functional relationships to specify the landscape, with some notable exceptions (e.g., Ancel and Fontana 2000; Poelwijk et al. 2006; Perfeito et al. 2011). Instead, as a means of increasing generality and tractability at the cost of realism, landscapes are mainly parameterized by their dimensionality (Orr 2000), covariance matrix Martin and Lenormand (2006), connectivity Yukilevich et al. (2008); Macía et al. (2012), or degree of ruggedness Orr (2006).

To evaluate whether adaptive reversals are plausible, in particular whether they stem naturally from the epistasis intrinsic to a gene network, I investigate them using an explicit model of a small regulatory network. This three-gene regulatory network, the type I coherent feed-forward loop, is a pathway that is ubiquitous in nature (Shen-Orr et al. 2002; Mangan et al. 2003), has been well-studied in Escherichia coli, has behavior that is accurately captured by a simple mathematical model (Mangan et al. 2003), and offers an interpretable basis for fitness (Dekel et al. 2005).



A broad overview of the model and the rational is given in the Results; what follows here are the modeling details. I closely follow the modeling conventions and notation of Alon and colleagues (Shen-Orr et al. (2002); Alon (2006)), modeling network dynamics with a system of differential equations and fitness as a cost-benefit trade-off. The topology of the type I coherent feed-forward loop is given in Figure 1A and the network dynamics in Figure 1B. The dynamics of genes Y and Z are modeled as a pair of differential equations


where inline image and inline image are parameters (or traits) giving the production rates of Y and Z respectively. In each case, X needs to be in its active form, inline image, and in order for Z to be induced, Y must be above a threshold k (Iverson square brackets indicate boolean functions). The genes are degraded/diluted at rates inline image and inline image, proportional to the current protein concentrations. The threshold, k, is an approximation to nonlinear activation dynamics of Z by Y (Alon 2006). Differential equations of this form have two solutions depending on whether production is turned on or not. For gene Y, the expression when the gene has been turned on for time t is

Figure 1.

Layout and dynamic behavior of a type I coherent feed-forward loop. (A) Layout of the feed-forward loop. Gene X is activated by an external signal, SX, and in its active state induces genes Y and Z. Gene Y also drives the expression of Z. The inputs to Z are combined via an AND-type logical operation whereby Z is only induced when X is active and Y is sufficiently expressed. (B) Dynamical behavior and cost-benefit basis for fitness. Each plot shows responses to three signals as a function of time (horizontal axis). Top row: There are two true signals and one false signal (dashed). Second row: Gene Y is induced when X is active, but it must reach a threshold k before it is sufficiently expressed to activate Z. Third row: Gene Z is only induced when Y is above k, although at a delay inline image relative to Y. Fourth row: Both Y and Z have costs proportional to their expression, with Z having a much larger cost, as it encapsulates the cost of activating the pathway downstream of Z. Fifth row: A benefit proportional to Z expression is derived from the pathway downstream of Z, but only in the presence of a true signal. Bottom row: The fitness rate is: benefit cost. Total fitness (see the Methods) is an integral over time for each signal and also an integral over the signal duration distributions for true and false signals. This figure follows the illustration conventions of Alon (2006), adjusted to reflect minor modeling differences.

Time inline image after getting turned off having reached expression level Y0, the dynamics follow an exponential decay


Expression of Z is similar, except that in addition to activation of X, Y must surpass the threshold k before Z is expressed. This occurs at some delay, inline image, that is a function of the production and degradation rates associated with Y:


Costs associated with a network may come in the form of explicit costs, such as energetic costs (i.e., transcriptional and translational costs), and in the form of implicit costs, such as the opportunity cost of deploying limited resources. For simplicity, I assume that all these costs are proportional to the expression level of the protein with constant of proportionality, inline image, for gene Y and inline image for gene Z. The cost of Y for a particular signal of duration, d, is thus


The first term in the sum is the cost while X is actively transcribing Y and the second term is the cost after X is deactivated and Y expression is subsequently shut down. These two terms are unlikely to have the same constant, inline image, but as a first-order approximation this may not be unreasonable.

The cost of Z expression is defined similarly, although upon activation of X, expression of Z starts at a delay, inline image, meaning Z is only expressed for a duration, inline image


Similarly, I model the benefit, which captures the contribution of the network to fitness (e.g., by increasing growth rate), purely as a result of Z expression and proportional to it at rate inline image. The benefit only applies while the signal, SX is active (i.e., when the metabolizable substrate is present), and when the signal duration is long enough for Y to pass the threshold, k, which happens at time inline image


These cost-benefit assumptions are oversimplifications for several reasons. Opportunity costs likely increase faster than linearly with expression and the benefit likely shows diminishing returns (Alon 2006). Also, not all costs are proportional to expression, some are proportional to the production rates, inline image and inline image; whereas others are proportional to the degradation rate inline image and inline image. Nonetheless, these simple cost-benefit relationships capture some of the epistasis among the evolvable traits.


Fitness is environment-specific, with the environment entirely described by the duration distributions of true and false signals. I desired the false signals to be primarily short, while allowing the true signals to be substantially longer. This pattern justifies the coherent type I feed-forward network that functions as a filter of short signals (Alon 2006). I consider two environments, low- and high-noise environments. To maintain a closed form for fitness, I use simple shapes for these distributions. The distribution of false signals, inline image, is quadratic in the range (0, inline image) and satisfies the following criteria:


making inline image


The distribution of true signals is uniform with durations spanning the range (0, inline image), and also integrates to unity. Thus, inline image in the interval (0, inline image) and zero elsewhere. For a given environment, the frequency of false and true signals are inline image and inline image respectively. For the simulations presented, inline image. The only difference between the low- and high-noise environments is that the abundance of false signals in the high-noise environment is sixfold higher (inline image) than in the low-noise environment (inline image).

The total fitness component attributable to this network motif is then the sum of two integrals over the true, inline image, and false, inline image, signal duration distributions


Iverson square brackets indicate a boolean function (i.e., there is no benefit for a false signal). The penalties are introduced to constrain the evolvable traits from becoming too high or too low (see below).

Putting everything together, the closed form for the fitness function, integrating over the distributions of true and false signals, is the following:


The first five parameters of inline image are the evolvable traits and the remaining are for the cost and benefit functions and the signal duration distributions (see Table 1). On the right side of the equality are following six terms: (1) the cost of Y expression, (2) the cost of Z expression due to false signals, (3) the cost of Z expression due to true signals, (4) the benefit due to Z expression, and (5–6) the penalties.

Table 1.  Notation used for fitness function.
TraitDescriptionCan evolveLow-noise optimum1High-noise optimum1,2
  1. 1 For the evolvable traits, values given are those producing the highest fitness network in the given environment. For the other parameters (i.e., the nonevolvable ones), the values given define the environment and cost function.

  2. 2 A dash indicates that the value of the low-noise environment is also used for the high-noise environment.

  3. 3 inline image is a function of inline image, inline image, and k (see Methods).

inline image Degradation rate of gene YYes 1.31 0.96
inline image Production rate of gene Y Yes  1.31  0.68
inline image Degradation rate of gene ZYes 1.18 1.26
inline image Production rate of gene Z Yes 45.6 44.5
k Y threshold required to express ZYes 0.04 0.16
inline image Benefit per unit of Z No  5  
inline image Cost per unit of YNo 0.4
inline image Cost per unit of Z No  2  
inline image Frequency of false signalsNo 0.05 0.3
inline image Frequency of true signals No  0.15  
inline image Maximum false signal durationNo 0.5
inline image Maximum true signal duration No  4  
inline image Delay before Z expression3   


The model described thus far suffers from several unrealistic behaviors that arise when the evolvable traits reach very high or low values. There are two strategies to circumvent these problems. One strategy is to make the model more realistic, so that it captures the nonlinear consequences of very small or large trait values. The alternative is to simply introduce a fitness penalty that ensures that the system will not evolve into these implausible domains. I chose the latter strategy to keep the model as simple as possible while avoiding unrealistic outcomes.

As an example, the cost of protein expression should be a convex function of protein concentration (Dekel and Alon 2005) because, at very high expression levels, additional translated molecules deprive core cellular functions of needed protein, or become toxic. Rather than explicitly incorporate a nonlinear cost, I achieve essentially the same constraint by penalizing high values of inline image. As another example, consider two aspects of the model: First, for an arbitrarily low steady-state expression level of Y, an appropriately small value of k can be chosen to achieve any desired delay, inline image. Second, Z production is independent of Y expression conditional on a given inline image. These two properties lead to the unlikely behavior that Y expression can be tuned to be arbitrarily low while maintaining the same profile of Z expression. However, at very low expression levels, the concentration is no longer wellapproximated by a continuous, smooth curve, and instead is highly stochastic, depending on chance molecular interactions. Thus, a network relying on very low expression levels to activate a downstream pathway would not reliably maintain the pathway in either the on or off state. This would be costly, a feature not captured by the model. I prevent this from occurring by penalizing very small trait values.

I chose the following arbitrary penalties for penalizing high and low values of the traits respectively:


where qj is the value of the jth evolvable trait. By design, the penalty for low values is much more abrupt than the penalty for high values, reflecting the non-linearities missing from the simplified model. For what I consider the meaningful parameter range, these penalties have little effect on fitness, as desired.


To examine whether adaptive reversals are an intrinsic result of the topology of the fitness landscape, I consider an approximation to the adaptive process that is equivalent to computing the adaptive trajectory as the mutation step size becomes infinitesimally small. I refer to this as the small mutation size (SMS) approximation.

Consider the evolutionary trajectory of the population on a relatively smooth, continuous fitness landscape. The phenotype, inline image, starts at coordinate x0 at time t=0, and proceeds to evolve adaptively. The path traced out by evolution, x (t, x0), is thus a function of time and is dependent on the initial starting point. When the mutation rate is sufficiently high, and the mutational size is sufficiently small, adaptation will be nearly deterministic.

Another interpretation of this trajectory is as an expectation, inline image. If the fitness landscape is sufficiently smooth, then in the current local vicinity, inline image, the landscape is a hyperplane with a slope in the orientation of the gradient of the fitness function. The expected phenotype some time inline image in the future is simply a small step in the most up-hill direction (under the assumption of a symmetrical mutation kernel). Thus in this deterministic approximation, adaptation proceeds by marching smoothly up the landscape in the most up-hill direction. Because future evolution in a constant environment depends only on the current phenotype and is independent of the history, one can think of the expected adaptive trajectory as satisfying a system of time-homogeneous ordinary differential equations


where inline image is some function of the current phenotype. Knowing each inline image for some starting trait vector, x0, would then allow computation of the path using standard tools. Note that “time” in the SMS approximation relates to the evaluation of the path as a system of differential equations and does not necessary correspond directly to evolutionary time. Under the simplifying assumptions that I have described, the system of differential equations, inline image, is simply the gradient of the fitness function, F, evaluated at inline image


or alternatively


I perform the numerical evaluation using the classic Runge–Kutta method (RK4) for solving systems of differential equations. To ensure that numerical evaluation of the gradient-ascent path is accurate given the curvature of the surface, I shrink the move at any step of the algorithm so that it corresponds to at most a 10% change in any single dimension. This has the effect of limiting the rate at which a parameter changes and serves to increase the accuracy of the computed path.


I simulate an adaptive walk using a standard technique in which candidate mutations are sampled from a mutation kernel and accepted as a substitution or rejected based on the fixation probability (Gillespie 1994). I compute fixation probabilities using the standard diffusion result for a haploid population Ewens (2004)


where N= 1000 and s is the selection coefficient for the proposed mutation relative to the current fitness


A mutant network is arrived at as follows. First a trait is chosen among the evolvable traits with equal probability; each mutation affects only a single trait. Second, a mutational effect size is sampled. I chose to represent mutations on a multiplicative scale so that effect sizes are scale-invariant; in other words, a trait always has the same probability of changing by a given fraction regardless of the current trait value. Thus, for a mutational effect u and an original trait value of X, the new trait value is inline image. Had I assumed a mutation has an additive effect drawn from a fixed distribution, then traits that evolved to large trait values would have also evolved smaller relative mutational effect sizes.

For the simulations presented, I draw u from a two-component mixture of zero-centered Gaussian distributions with inline image and inline image and mixture proportions of 0.7 and 0.3, respectively. Although arbitrary, the mutation kernel allows for a small number of large effect mutations and a larger number of relatively modest effect ones.

The adaptive walk simulations were run for 10,000 mutational proposals, resulting in an average of 24 substitutions. Given N= 1000, this represents a 2.4-fold increase above what would be expected under neutrality.



Consider a single gene with substitutions in its promoter on a particular lineage. If the promoter is otherwise conserved, then the clustering of substitutions might suggest adaptation (Pollard et al. 2006; Kim and Pritchard 2007; Holloway et al. 2008). Further suppose that these substitutions are known to have sizeable functional effects. A cartoon illustrating such a scenario (Fig. 2) shows three substitutions that decrease and one that increases expression. How does one interpret that one substitution is of opposite effect relative to the others? For a single evolving trait and a constant environment, there are three basic alternatives.

Figure 2.

Single-gene viewpoint of nucleotide substitutions. (A) Cartoon phylogeny showing a cluster of four substitutions on a single lineage. (B) Cartoon of a gene illustrating the same four substitutions as in (A). Arrows indicate the direction of the marginal functional effect of each substitution on some aspect of gene expression. Three substitutions reduce expression and one substitution increases it. In this cartoon, all four substitutions affect regions bound by transcription factors (black boxes). Despite detailed functional knowledge, observing only this one gene in isolation without the ability to consider epistatic interactions with other genes limits the interpretations regarding mutations of opposing functional effect.

The first alternative is that these substitutions are neutral with respect to fitness, despite their demonstrable functional effects. I do not further discuss this possibility, as the present study is concerned with adaptation. In the second scenario, one mutation is deleterious and another compensatory (overcompensation is shown in Fig. 3A). However, only deleterious mutations of very small effect have any appreciable chance of substituting, unless the substitution occurs jointly with compensatory ones (Kimura 1985), and in either case the rate of such substitutions will likely be very low (Stephan 1996; Ewens 2004; Iwasa et al. 2004; Durrett and Schmidt 2008).

Figure 3.

Illustration of scenarios showing how mutations of opposing functional effect can be related to smooth fitness landscapes in one (A–B) and two (C) dimensions. (A) Compensatory evolution on a fitness landscape involving one evolvable trait. Points indicate three allelic values with the starting point of the adaptive walk at (1) followed by a deleterious substitution to allele (2) and a beneficial (overcompensating) substitution to allele (3). Horizontal arrows indicate the opposing functional changes whereas vertical arrows indicate the opposing changes in fitness. (B) Overshooting the optimum. Notation is as in (A), however in this example, despite opposing functional effects, the substitutions are individually adaptive. (C) Adaptation on a two-dimensional fitness landscape. Two continuous traits, x and y, comprise the plane, while fitness is shown as a shaded topology with contour lines. The adaptive trajectory in the small mutation size (SMS) approximation, that is, optimal hill-climbing path, is shown as the black curved arrow. Notice that with respect to trait x, the adaptive trajectory bends back such that initially decreasing x is adaptive and later increasing x is adaptive. Fitness is a carefully selected, but arbitrary function chosen for illustrative purposes: inline image.

In the third scenario, the functional effects are opposing, but both mutations are individually adaptive. Overshooting an optimum is an example of this (Fig. 3B), which requires that the curvature (or ruggedness) of the landscape be on the same scale as the mutational effect sizes. In the case when small mutations are more likely than larger ones (a common assumption), the overshooting alternative is less likely than simply ascending the fitness landscape.

All of the above involves consideration of only a single trait—no epistatically interacting traits are simultaneously evolving. This severe assumption limits the possible interpretations of opposing substitutions that affect a single trait. If there are epistatic interactions, more complex behaviors can arise. In particular, it is possible for mutations to be of opposite effect with respect to a trait but be individually adaptive (Fig. 3C).


In the preceding discussion, the relationships between evolvable traits and fitness were arbitrary. The main aim of the present study, however, is to investigate a more concrete, biologically realistic fitness landscape. Although little is known about the topology of fitness landscapes, biological systems are highly modular (Hartwell et al. 1999; Schlosser and Wagner 2004; Wagner et al. 2007), suggesting progress might be made by considering a small, tractable module, the function of which is relatively well understood and isolated from the many other subsystems of the organism (Alon 2006).

One such module that may be instructive is the type I coherent feed-forward loop (Shen-Orr et al. 2002). In its simplest incarnation, this small genetic network consists of only three genes (shown in Fig. 1A). Gene X, upon responding to a signal, induces gene Z both directly and indirectly via gene Y with the two inputs to gene Z integrated through an AND operation. This regulatory module occurs in many organisms including E. coli, Bacillus subtilis, Saccharomyces cerevisiae, Caenorhabditis elegans, and humans (see Kalir et al. (2005) for citations), and in the best characterized regulatory network of E. coli, it is highly over-represented (Shen-Orr et al. (2002). One reason this motif may be so common is that it performs an important biological function; namely, it acts as a sign-sensitive delay element (Mangan et al. 2003), in which induction of Z is delayed upon activation of the signal, SX, but turned off immediately upon cessation of the signal (Shen-Orr et al. 2002). As a result of this sign-sensitive delay, it functions to filter out short pulses in the signal, inducing whatever pathway is downstream of Z only when the signal is sustained. Figure 1B illustrates the temporal dynamics of this feed-forward loop. Fitness is modeled as a cost-benefit trade-off, closely following Dekel et al. (2005) and Alon (2006) and motivated by optimality principles (Rosen 1967). In the case of the most well-studied instantiation of this motif—the l-arabinose utilization system of E. coli—the benefit is primarily the ATP derived from metabolism of arabinose (Mangan et al. 2003).

Finally, and importantly here, the dynamics of this network motif are well understood and accurately captured by a simple mathematical model, and they appear accurate even when embedded in the full gene regulatory network of an organism (Mangan et al. 2003). For these reasons, the type I coherent feed-forward loop is an ideal test case for adaptation in multiple dimensions, having the property that epistasis present among the functional dimensions emerges from a biologically realistic set of interactions.

The evolutionary maintenance of this motif rather than direct regulation of Z or no regulation of Z may occur in a number of circumstances. One scenario involves a particular cost-benefit trade-off, in which there is a fixed cost of expression, independent of level, and a benefit proportional to the expression level (Dekel et al. 2005). Another circumstance in which a sign-sensitive delay element will be favored is when the signal is noisy and the distribution of true and false signals differ sufficiently in their mean durations. I model this latter scenario and assume a heterogeneous mixture of true and false signals; when the signal is true, the benefit ensues, but a false signal has no benefit. When false signals are generally short, relative to true signals, this network will act as an effective filter.


I investigate the prevalence of adaptive reversals using the small feed-forward network I described earlier. I consider only five traits to be evolvable: the degradation and production rates of Y (inline image and inline image), the corresponding rates for Z (inline image and inline image), and the expression threshold, k. I assume that each of these is continuous, which may not be a poor approximation if sufficiently many nucleotides encode each of these traits.

I first consider an idealized approximation to the adaptive process that I refer to as the SMS approximation (see Methods). If mutations are tiny relative to the curvature of the landscape, and the mutation rate is sufficiently high, then the adaptive trajectory will track the gradient of the fitness landscape (i.e., it will climb to a local optimum via the most uphill path). I have formulated the fitness function in such a way that it has a closed form and known partial derivatives with respect to each evolvable trait. I am thus able to numerically evaluate the adaptive trajectory by taking the gradient of fitness and treating it as a system of differential equations formed by these partial derivatives, starting from some initial point in the five-dimensional parameter space.

In this idealized adaptive process, both compensatory mutation and over-shooting the optimum are impossible, and so by projecting the adaptive trajectory onto one trait axis, any observed reversal will be a direct result of the topology and the epistasis encoded in the genetic network. An adaptive trajectory on this fitness landscape is a path in six dimensions—five trait dimensions and fitness. To investigate and visualize the prevalence of adaptive reversals, I computed 200 SMS adaptive paths from random starting points in the five-dimensional hypercube. Figure 4 shows several of these trajectories projected onto two-dimensional slices of the six-dimensional fitness landscape, each slice corresponding to one evolvable trait dimension and fitness (also see Table S1 for a summary).

Figure 4.

Adaptive reversals occur solely as a result of the fitness landscape topology. Each column highlights one adaptive trajectory (thick lines) projected onto each of the five trait dimension (rows). The adaptive trajectory is a function of the evolvable trait (vertical axis) and fitness (horizontal axis). Given fitness is guaranteed to increase with time, adaptation always proceeds from left to right within each small plot. The adaptive trajectories shown correspond to evolution starting from points sampled uniformly from a five-dimensional hypercube. Several other adaptive trajectories are shown in the background (blue lines) for replicates 1 and 7 for reference and to illustrate how adaptation always converges to the same optimal point in the fitness landscape. The first six columns (from the left) exemplify adaptive trajectories with at least one adaptive reversal (red curves). The last three columns show adaptive trajectories in which none of the traits undergo adaptive reversals. All trajectories were numerically evaluated according to the small mutation size (SMS) approximation (see Methods).

There are several qualitative behaviors apparent in these projections: (1) four of the five traits exhibit reversals, but not all traits do, (2) some traits, (e.g., k) show much larger, more dramatic reversals than other traits, and (3) the reversals do not happen strictly on fitness plateaus, that is, fitness can increase substantially even during the return portion. Thus, even in this very low-dimensional, continuous trait setting, adaptation of some traits may often reverse course as a result of adaptive changes elsewhere.

Thus far, I considered adaptation from random starting points. However, evolution does not proceed in this way, but rather the ascent starts when a relatively well-adapted species encounters a new environment or the prevailing environment changes.

In a feed-forward loop, the major influence of the environment is the signal SX, which in the case of the l-arabinose utilization network of E. coli, represents the presence or absence of l-arabinose. I therefore define the environment by the duration distributions of true and false signals. I then consider the case of adaptation to a new environment with a higher proportion of false signals (high-noise), starting from a population well adapted to a low-noise environment (see Fig. 5A).

Figure 5.

Adaptation to a high-noise environment of a network originally adapted to a low-noise environment. (A) The two environments are defined by the signal distributions (shaded gray curves, right vertical axis). False signals are generally short (quadratic, light gray curve), with none longer than 0.5. True signals have a uniform distribution (dark gray curve). The low-noise environment (left plot) has a much smaller density of false signals than does the high-noise environment (right plot). Partial fitness (left vertical axis), defined as the net fitness for a single signal, is shown as a function of the duration of the signal in each of two environments for each of two networks: the one initially adapted to the low-noise environment (red curve), and the one that evolves as a result of adaptation to the high-noise environment (gray curve). As can be seen, in the high-noise environment (right plot), the network adapted to the low-noise environment (red curve) suffers a major fitness cost for short signal durations, as these tend to be false signals, which this network fails to filter as effectively as the high-noise-adapted network. (B) Total fitnesses of each network in each environment. (C) Evolution of a trait (vertical axis) as a function of time (horizontal axis, same for each plot) for each of the five evolvable traits (first five plots). See the Methods for the interpretation of time. The sixth plot shows the increase in fitness during adaptation to the high-noise environment. (D) Evolution of the delay, inline image. The delay is a function of other parameters: inline image.

Upon introduction to the high-noise environment, the low-noise-adapted network has a much lower fitness than it had in the low-noise environment. In response, the network evolves toward a solution with higher fitness (Fig. 5B). The most pronounced reversal that occurs in this particular bout of adaptation affects trait k (Fig. 5C). Adaptation to the high-noise environment and the reversal of trait k can be interpreted as follows. In the low-noise environment, false signals were comparatively rare, and thus there was little need to delay activation of Z via a high threshold k. Thus, the low-noise-adapted network exhibits a low value for trait k. However, upon introduction to the high-noise environment, the most expeditious way to increase the network’s fitness was to increase k, as this prevents Z from unnecessary induction. As a result, k quickly evolved to a higher value, and the delay in Z induction, inline image, correspondingly increased (Fig. 5D). However, after inline image became long enough to filter out the false signals, k, in conjunction with inline image and inline image, evolved in a way that preserved inline image, but reduced the level of Y expression, thereby lowering costs and increasing fitness. As a result, k exhibited an adaptive reversal.


Although the SMS approximation is useful for showing that reversals can result solely from the topology of the fitness landscape, there are a number of unrealistic aspects to this model. First, real mutations are likely of some discrete, potentially large effect, and thus adaptation is not a smooth optimal ascent up the landscape. Second, separate biological functions are often encoded in separate locations on the DNA (e.g., Ptashne 1988; Kirchhamer et al. 1996; Barrière et al. 2011), and thus mutations may generally affect only one of the various functions.

To demonstrate adaptive reversals when mutational effects-sizes are discrete and affect only single traits, I simulate an adaptive walk consisting of a succession of such mutations (Gillespie 1994). I use rejection sampling, with the acceptance probability of a proposed mutation equal to the fixation probability (see Methods). In this approximation to a population process, drift and selection are both accounted for. However, the joint fixation of mutations that would not substitute individually is not possible, removing a confounding type of compensatory change. This modeling decision also means that linkage relationships among traits can be ignored.

To characterize the propensity for reversals when mutational effects are discrete, I initiate 1000 simulations from random starting points and evolve the networks in the high-noise environment of Figure 5. As under the SMS approximation, adaptive reversals are common; among these simulations, there are adaptive reversals affecting 30% of the simulations and all the traits (see Table 2). By dividing the adaptive reversals into two components, the initial portion and reverse portion (the portion in the opposing direction), it becomes apparent that the typical substitution on the initial portion is of much larger selective effect than the typical substitution on the reverse portion, despite that mutations on the initial and reverse portions often have similar effects on the traits (Table 2). This finding underscores the need to distinguish between functional effects and selective effects and it is consistent with adaptive walks on a fixed landscape offering diminishing returns (Orr 2005).

Table 2.  Reversals during adaptation from random starting points with discrete-sized mutational effects.
TraitReversals1Forward portionReversal portion
Steps2median Ns3median inline imageSteps2median Ns3median inline image
  1. 1 Reversals observed out of 1000 simulations from random starting points conditional on fitness > 0. Adaptation is in the high-noise environment of Figure 5. Only certain reversals are counted: ones not including deleterious mutations, not overshooting an intervening optimum, and including substitutions with Ns > 5 on both initial and reverse portions.

  2. 2 Average number of mutations affecting the trait on the indicated portion of the adaptive reversal.

  3. 3 Median population selection coefficient for all substitutions affecting the trait on the indicated portion of the adaptive reversal.

  4. 4 Median change in the trait observed over the reversal portion, normalized by average change across all runs.

inline image 61.0 94.40.5674.07.11.032
inline image 93 1.9  64.6 1.574 2.1 6.2 0.750
inline image 821.9105.21.8091.98.40.510
inline image 136 1.6 100.4 1.882 1.7 9.5 0.527
k 381.9 31.30.8311.47.50.404

Unlike in the SMS approximation, the discrete mutational effects in these simulations introduce the possibility of overshooting an optimum. To determine if this was the case for each substitution, I retrospectively assess fitness at a grid of effect sizes smaller than the observed substitution. An intervening optimum was overshot when a smaller effect size leads to a greater fitness than that associated with the observed substitution. I only count adaptive reversals that do not overshoot an intervening optimum.

As in the case of the SMS approximation, it may be more meaningful to consider adaptation to a new environment starting from a network adapted to a distinct but similar environment, rather than from arbitrary points in the fitness landscape. Therefore, I also conduct 100 replicate simulations starting from a network that is initially well adapted to the low-noise environment of Figure 5, and evolve the network in the high-noise environment (all replicates start from the same point). Two examples, shown in Figure 6, exhibit adaptive reversals affecting the production and degradation rates of Z (inline image and inline image). One advantage of a model in which each substitution only affects a single trait is that the beneficial (or deleterious) effect is clearly attributable to that trait. In other words, substitutions that have opposite effects with respect to the trait are individually adaptive solely as a result of the mutation’s effect on that trait, and not because of a pleiotropic effect of the mutation.

Figure 6.

Adaptation to an environmental shift with discrete mutational effects. Two replicate bouts of adaptation are shown, starting from a network adapted to a low-noise environment, and evolving in response to a shift to a high-noise environment. Each small plot shows the evolution of one trait (vertical axes) as a function of time (horizontal axes). Vertical stripes indicate the accumulation of substitutions with the bottom axis giving the number of accumulated substitutions and the top axis indicating when the substituted mutation occurred among all mutations (most of which are lost). A mutation only affects a single trait. All substitutions are indicated on every plot, but only the colored circles correspond substitutions that affect the given trait. Thus each column features exactly one colored circle. Substitutions are colored according to the population selection coefficient, Ns (see legend; N=1000). Purple dotted lines connecting substitutions indicate that the substitution overshot an intervening optimum. Solid lines indicate that overshooting an optimum did not occur. Adaptive reversals not overshooting an optimum can be seen in both simulations for trait inline image and in the right-hand simulation for inline image.

The replicate simulations are often very similar, as is expected when adaptation is largely determined by the topology of the landscape and when the starting point is the same for all replicates. Nonetheless, the replicate simulations differ in whether an adaptive reversal is observed or not. So although adaptive reversals are observed in 70% of the replicate simulations, others converge at near optimal solutions without exhibiting any adaptive reversals. In this way adaptive reversals, as with many features of evolution, exhibit a high degree of stochasticity and historical contingency (Lenormand et al. 2009; Bullaughey 2011; Salverda et al. 2011).

Traversal of a fitness landscape with discrete mutational effects is not limited to exactly ascending the fitness gradient, and thus it is even possible to have adaptive reversals involving traits that do not generally show adaptive reversals in the SMS approximation. For example, under the SMS approximation when starting from arbitrary parameterizations of the feed-forward network, adaptive reversals were not found to affect the production rate of Z, inline image (see Fig. 4). Yet in the simulations detailed in Table 2, the most commonly affected trait is inline image and one example is shown in Figure 6 (right panel). Conversely, the threshold k does not show as many reversals in the discrete effect model as in the SMS approximation. These discrepancies appear to result from the SMS approximation allowing multiple parameters to change simultaneously. This suggests that discrete mutational effects may actually increase the opportunities for adaptive reversals.


Adaptive reversals in the SMS approximation are impossible in one dimension, whereas they appear common in five dimensions (Fig. 4). Clearly, the dimensionality of the landscape plays a role in whether adaptive reversals occur. But dimensionality is not a smoothly tunable knob; instead it is the combination of particular epistatic relationships intrinsic to the network construction, dynamics, and fitness model.

To gain insight into whether adaptive reversals are primarily the result of epistasis between certain subsets of the traits and how the number of evolving traits affects the prevalence of adaptive reversals, I again evolved networks using the SMS approximation starting from random points. However, I constrained evolution to a subset of the evolvable traits.

I considered all 31 nonempty subsets of the five evolvable traits. Fixing a trait—preventing it from evolving—has several interpretations. First, it removes this trait and all associated epistatic interactions with other traits from the model, thereby reducing the degree of epistasis. Second, the fixed traits can be thought of as having a mutational target that is so much smaller than the other traits that they rarely, or in this case never, evolve. Thus, this exercise is informative in that, in real biological systems, traits may have very different mutational target sizes, which in turn may shape which adaptive trajectories are most likely and whether adaptive reversals occur. The results of this analysis are presented in Figure 7.

Figure 7.

Prevalence of reversals depends on what traits are evolvable and scales with dimensionality. In addition to allowing all five traits to evolve (right-most column, purple label), evolution is restricted to all possible nonempty subsets of the five traits. There are five possible one-trait subsets (left set, red label), 10 possible two-trait subsets (second to left, orange label), and so on, arrayed along the horizontal axis. This allows consideration of how the prevalence of reversals scales with dimensionality or how evolution would proceed if the mutation target for one or more traits is nearly zero relative to the other traits. A total of 200 random starting points were sampled from the same hypercube as used in Figure 4. Numbers indicate percentages of adaptive trajectories showing a reversal with respect to the trait. The fixed traits are set to the trait value that is optimal in the high-noise environment. The bar plot above (gray bars) indicates the percentage of replicate simulations in which there is at least one reversal observed for that trait subset. All tests for reversals are done as in Figure 4—the SMS gradient ascent path is computed numerically for each starting point and the high-noise environment is used.

As expected, adaptive reversals are absent when only one trait is allowed to evolve. In general, as the number of evolvable traits increases, there is a trend toward more reversals (Fig. 7, top) and reversals of larger effect (Fig. S1). But this trend is not perfect; some combinations of parameters are much more likely to exhibit adaptive reversals than others. This pattern reflects the nature of the epistasis. For example, in landscapes with only inline image and one other trait free to evolve, which trait exhibits reversals varies dramatically. When inline image evolves in conjunction with inline image, the former exhibits many reversals and the latter does not. When inline image evolves with inline image, the pattern is reversed, with inline image not exhibiting many reversals, whereas inline image does. When inline image and k evolve together, both show many reversals, and when inline image and inline image evolve together, neither show many.

These patterns align with the general intuition that some traits are coupled more tightly by epistatic interactions than others. How the propensity to undergo adaptive reversals scales to fitness landscapes of more than five dimensions remains to be seen, and will require biologically plausible higher dimension models.


In principle, epistasis allows for what I call an adaptive reversal—a trait adaptively evolving in one direction and then, after a change elsewhere in the system, evolving adaptively in the reverse direction. In such adaptive reversals, mutations of opposing functional effect are individually adaptive, which is in contrast to compensatory evolution, in which changes of opposing functional effect also have opposing fitness effects (i.e., a damaging and a restorative mutation). Whether regulatory networks exhibit an epistatic structure that can lead to adaptive reversals is not immediately clear, however, as the epistasis is shaped by functional properties of the system.

There are a number of a priori reasons why one might not expect adaptive reversals to be a common feature of regulatory evolution. First, epistasis relates to the dimensionality of the biological subsystem under consideration. For example, epistasis internal to a protein may relate to the catalytic properties of the active domains, the shapes of the interaction surfaces, the structural scaffold, the stability of the protein, and the folding efficiency. The protein, in these regards, is a high-dimensional object. In contrast, interactions in regulatory networks are largely mediated through expression levels, which are continuous and perhaps of lower dimensionality, varying only over time and space. Furthermore, the bulk of epistatic interactions may be among nucleotides involved in encoding a single biological function. For example, epistasis may be common within a protein, where there may be many interacting residues (Lunzer et al. 2010), but less common between proteins, because interaction surfaces may be limited to a subset of domains and residues (Heger and Holm 2003). Similarly, epistasis in regulatory networks may be limited by the modularity characteristic of networks (Hartwell et al. 1999; Alon 2003); genes are encoded separately, transcriptional control is encoded separately from the determinants of transcript and protein stability, and the cis-regulatory encoding is itself modular. On the other hand, in a multi-component, adaptively evolving system, adaptation may be an especially strong driver of epistatic changes in the genetic background, making the assumption of no epistasis less plausible. So although adaptive reversals may not be surprising when one considers the dizzying possibilities for arbitrary high-dimensional fitness landscapes, that adaptive reversals occur in a relatively low-dimensional, biologically plausible model for regulatory function, is important.

To investigate the prevalence and nature of adaptive reversals in a setting in which the epistatic properties are determined by a biologically plausible functional model, I employed a ubiquitous and well-understood regulatory network: the type I coherent feed-forward loop (Shen-Orr et al. 2002; Mangan et al. 2003; Alon 2007). This network motif offers an interpretable, low-dimensional model for a component of fitness, and thus may be informative for considering the relationships among functional effects and fitness effects of mutations. The fitness model is based on a cost-benefit trade-off relating the benefit of pathway activation and the costs associated with gene expression. Adaptation is generally imagined to proceed in response to environmental changes (Takahata et al. 1975; Gillespie 1993), and thus by using a model with multiple environments—a low-noise environment and a high-noise environment—I was able to consider adaptation of a network that is already well-adapted to its original environment, but which can be improved by new mutations upon its introduction to a new environment. Considering only a modest environmental change and a network that was well adapted to a previous environment seems more relevant than considering adaptation from an arbitrary starting point in a multidimensional fitness landscape, as real populations are always reasonably well adapted to some environment, and successive environments are probably autocorrelated. Together, the model of a small network, the cost-benefit fitness function, and the influence of the environment comprise the necessary components to consider whether adaptive reversals are plausible in a regulatory network. Using this model, I have shown that substitutions that have opposing effects with respect to a trait need not be compensatory; instead there is good reason to expect such substitutions may often be individually adaptive and constitute adaptive reversals.

There are a number of predictions based on experimental data suggesting that adaptive reversals involving protein evolution or protein–DNA coevolution may be important. One such example involves the evolution of resistance to the antibiotic cefotaxime in E. coli. Weinreich et al. (2006) determined that 18 of the 120 possible five-step paths that confer resistance are strictly increasing in fitness. When paths are extended to include all 18 billion acyclic paths involving reversions, only nine of these are found to be strictly increasing in fitness (Depristo et al. 2007). These particular paths each involve at least two adaptive reversals at the nucleotide level because the start and end sequences differ at all five considered nucleotides, and thus each site that experiences a reversion must mutate a second time. As the authors note, these nine paths are a tiny fraction of the 18 billion possible; nonetheless, these additional paths involving adaptive reversals increase by half the total number of strictly increasing adaptive trajectories. Another example involves the coevolution of E. coli lac repressor and its operator-binding site. Using a model for fitness based in part on a large experimental dataset of reporter–operator affinities, Poelwijk et al. (2006) find that if one restricts evolution to adaptive trajectories that maintain a minimum level of affinity between repressor and operator, many of the strictly adaptive trajectories involve what they refer to as diversions, which are actually adaptive reversals at the nucleotide level. In both of these studies, adaptive reversals are at the nucleotide level, whereas in this study, I consider adaptive reversals affecting higher order functions, and thus the constituent substitutions may be distributed among sites. Further progress in understanding the functional basis of adaptation should shed more light on the prevalence of adaptive reversals in a variety of systems (Dean and Thornton 2007).

The existence of adaptive reversals has important implications for inferring the regime of selection under which a certain function or system has evolved. Focusing on a single trait at a time rather than the system in which it is embedded could mistakenly lead one to infer the trait to be evolving under stabilizing selection (via compensatory evolution), when it is actually one component of an adaptively evolving system. Although it remains to be shown, the substitutions of opposing functional effect may even be helpful in detecting adaptation, when combined with data from other coevolving traits. For example, when assessing whether an excess of functionally relevant substitutions has occurred on a particular lineage, rather than viewing functionally opposing substitutions as evidence of stabilizing selection on the one trait, one could incorporate them in the pool of evidence consistent with directional adaptive change of the overall system.

There are a number of ways in which the present model may depart from the biological reality. First, the mutational process employed in this study operates at the level of the trait rather than at the level of the sequence. This property is shared by Fisher’s Geometrical Model and related models (Fisher 1930; Martin and Lenormand 2006). Although this mutational process has the advantage that assumptions need not be made about the functional encoding, it implies that increases and decreases of any effect size are always possible. For quantitative traits, these may not be unrealistic approximations (Hansen 2006); however, the traits modeled here may be more granular. Implicit in this mutational scheme is the absence of pleiotropy. That the production and degradation rates are separately encoded and thus not pleiotropic is likely a reasonable assumption; most regulatory elements are outside the processed transcript (Maston et al. 2006) and it is the transcript and resultant protein that most determine stability and degradation (Ross 1996; Garneau et al. 2007), although exceptions to this have been recently uncovered in yeast (Depristo et al. 2007; Bregman et al. 2011; Shalem et al. 2011; Trcek et al. 2011). It is less clear whether the activation threshold, k, is independent from the production rate of Y, inline image, as these both may be encoded in the cis-regulatory elements associated with Y. Pleiotropy could also exist with traits not modeled. For example, the three transcription factors may participate in other regulatory circuits or have other functions, making otherwise adaptive mutations deleterious. Such pleiotropy can restrict the additive genetic variance available to selection and thereby limit adaptation (Walsh and Blows 2009). Although pleiotropy is clearly biologically important, excluding it in the present study is helpful for keeping the model simple.

I also assume the environment is constant after the initial change, and although also unrealistic, it is nonetheless a common assumption when modeling adaptation (Orr 2005), and it seems like a useful starting point. By assuming constancy of the environment, I ensure that adaptive reversals resulted from other, epistatic changes to the network, and not simply because the environment changed again. Although it is certainly possible that selective pressures change rapidly, the present study is concerned with the plausibility of adaptive reversals due solely to the fitness landscape topology. In fully specifying the fitness function for the feed-forward loops considered, I needed to make some assumptions regarding the exact nature of the cost functions. Finally, in the adaptive walk simulations, I accept or reject each mutation based on the fixation probability, and thus successive substitutions are independent. This approach does not model certain population dynamics including the joint substitution of mutations or interference among simultaneously segregating mutations. These dynamics are likely to be of very minor importance here, and by deliberately excluding them, I ensure that the substitutions comprising the adaptive reversal are indeed individually beneficial.


I have shown how adaptive reversals can occur during the evolution of a particular regulatory motif, the type I coherent feed-forward loop. I chose to investigate this motif because its dynamics are accurately captured by a simple mathematical model (Mangan et al. 2003), it is prevalent in nature (Shen-Orr et al. 2002; Alon 2007), has an interpretable basis for fitness, and is relatively low dimensional. I did not expect this particular model to have an unusual propensity to exhibit adaptive reversals. Additional work will be needed to show whether adaptive reversals are a common feature of multidimensional evolution in other systems, and this will require appropriate, biologically accurate models, which unfortunately remain rather few.

A necessary requirement for adaptive reversals is the existence of epistasis among beneficial mutations, which experimental data suggest may be common (Weinreich et al. 2006; MacLean et al. 2010). Furthermore, experimental evolution (e.g., Barrick et al. 2009; MacLean et al. 2010) may soon offer opportunities to test the predictions of this study. Several criteria must be met. First, the relevant pathways must be sufficiently understood and potential epistatic interactions identified. Second, the functional effects of observed substitutions must be measurable. Third, the fitness effects of individual and combinations of mutations must be measurable. Fourth, the population must evolve in an appropriate environment to drive adaptation of the pathway of interest. For select cases, all of these criteria can already be met or will likely be met imminently.

Associate Editor: J Wilkins


I would like to thank Molly Przeworski, Bin He, Ilya Ruvinsky, Hideki Innan, Richard Hudson, Martin Kreitman, Hisashi Ohtski, and John Reinitz for helpful discussions. I would also like to thank Yongtao Guan, Bin He, and Molly Przeworski for suggestions on the manuscript. This work was supported by a William Rainey Harper Fellowship to KB, a National Science Foundation East Asia Pacific Summer Institutes (EAPSI) and Japan Society for the Promotion of Science (JSPS) Fellowship to KB, and a Howard Hughes Medical Institute Early Career award to Molly Przeworski. The author declares that he has no conflicts of interest.