A long-standing question in evolutionary and developmental biology concerns the relative contribution of cis-regulatory and protein changes to developmental evolution. Central to this argument is which mutations generate evolutionarily relevant phenotypic variation? A review of the growing body of evolutionary and developmental literature supports the notion that many developmentally relevant differences occur in the cis-regulatory regions of protein-coding genes, generally to the exclusion of changes in the protein-coding region of genes. However, accumulating experimental evidence demonstrates that many of the arguments against a role for proteins in the evolution of gene regulation, and the developmental evolution in general, are no longer supported and there is an increasing number of cases in which transcription factor protein changes have been demonstrated in evolution. Here, we review the evidence that cis-regulatory evolution is an important driver of phenotypic evolution and provide examples of protein-mediated developmental evolution. Finally, we present an argument that the evolution of proteins may play a more substantial, but thus far underestimated, role in developmental evolution.

Developmental evolutionary studies have played an important role in elucidating the molecular mechanisms that give rise to phenotypic diversity (Wilkins 2002; Carroll 2005b; Davidson 2006). Mechanistically, embryonic development is governed by the precise temporal and spatial deployment of gene regulatory networks (Wilkins 2002), thus the evolution of development is fundamentally a question of evolving gene regulation. A priori, changes in gene regulation can occur either in cis-, which broadly includes cis-regulatory regions, promotesrs, enhancers, and silencers, as well as untranslated regions of mRNA, and in trans- including both regulatory proteins and miRNAs and other noncoding regulatory RNAs. Although many empirical studies and theoretical arguments support the importance of cis-regulatory change to developmental evolution (Carroll 2005b; Prud'homme et al. 2007; Wray 2007), often implicitly to the exclusion of changes in proteins, there is accumulating evidence implicating protein (particularly transcription factor) evolution as a major source of phenotypically relevant variation in developmental evolution. Indeed, it seems likely that regulatory elements and transcription factors evolve “hand-in-hand” to effect phenotypic evolution. The case for cis-regulatory change has eloquently been made in several recent reviews and books (Carroll 2005b; Prud'homme et al. 2007; Wray 2007), therefore, in this essay we will focus on the potential role of transcription factor change to gene regulatory evolution.

Although protein-mediated evolution of developmental pathways is rarely completely excluded as a source of developmental evolutionary change (Hsia and McGinnis 2003; Wray 2007), mutations in protein-coding genes, primarily transcription factors, are not considered a likely source of evolutionary change. The primary argument against a substantive role for protein-mediated evolution of gene regulatory networks is the expectation that mutations in protein-coding genes have extensive negative pleiotropic effects (Stern 2000; Carroll 2005b; Prud'homme et al. 2007; Wray 2007). Generally, mutations with more extensive pleiotropic effects are less likely adaptive than mutations with fewer pleiotropic effects. The argument goes like this: a mutation with many pleiotropic effects may be beneficial in one context, but the probability it is beneficial in all contexts is extremely low. Thus, mutations with dramatically lower degrees of pleiotropy will be favored sources of adaptive variation (Stern 2000; Carroll 2005b). For example, it is generally accepted that a mutation in the coding region of a transcription factor (or other protein) that is expressed in multiple tissues will likely affect the expression of all the genes that transcription factor regulates (Stern 2000; Carroll 2005b). In contrast, a mutation in a cis-regulatory element (or other regulatory region) will affect gene expression solely in the spatiotemporal domain mediated by that regulatory element (Carroll 2005b). Although this line of reasoning is particularly powerful, it rests upon the assumption that cis-regulatory elements are modular, thus reducing the pleiotropic effects of mutations, whereas proteins are not.

The assumption that proteins lack modular structural and functional architectures sufficient to reduce negative pleiotropic effects, combined with the early finding that transcription factors evolve slowly and rarely acquire novel functions has led many evolutionary and developmental biologist to conclude that the primary source of adaptive phenotypic variation is cis-regulatory change (Carroll 2005b; Prud'homme et al. 2007; Wray 2007). Indeed, there is a large and growing body of data implicating regulatory elements in phenotypic evolution, but is it sufficient to conclude from these data that “regulatory element evolution must be the major contributor to the evolution of form?” (Carroll 2005a: 1159, emphasis added) or is there room for the evolution of transcription factor function (Hsia and McGinnis 2003) and in what context does transcription factor evolution play a role? In this essay we develop the argument, based on theoretical considerations and empirical evidence, that protein evolution makes much larger contribution to phenotypic evolution than is generally recognized. Although there is no doubt that regulatory element evolution plays an important role in phenotypic evolution, failure to recognize protein evolution as a source of developmental variation will seriously hamper a comprehensive understanding of the evolution of form.

Ascension of the cis-Regulatory Paradigm

The foundations of the cis-regulatory paradigm of developmental evolution were built upon several early studies on the molecular basis of morphological evolution. One of the earliest studies to consider the genetic basis for morphological change (King and Wilson 1975) concluded that the degree of protein divergence between humans and chimpanzees was so small that it could not account for the anatomical and behavioral differences between them; they proposed that the divergence in morphological and behavioral characters between humans and chimps was based on changes in the mechanisms controlling gene expression. This and additional work by Wilson and colleagues (Wilson et al. 1974a,b; Prager and Wilson 1975) generalized the finding that the rate of protein and morphological evolution was highly uncoupled and supported the conclusion that morphological divergence in the absence of protein divergence could only be reconciled if regulatory gene mutations (i.e., in cis-regulatory regions) rather than structural gene mutations (i.e., in protein-coding regions) were the principal mechanism of phenotypic change. This conclusion was in agreement with the recently formed theory that the vast majority of amino acid differences between species is neutral (Kimura 1968; King and Jukes 1969; Kimura 1983), and thus could not be responsible for adaptive or any other kind of functional (or morphological) evolution (excluding exaptation).

Since these early studies, a growing body of empirical work has demonstrated the importance of cis-regulatory changes to developmental evolution, principal among them was the discovery that some genes are regulated by modular elements that direct gene expression to discrete spatial and temporal domains during development (Wilkins 2002; Davidson 2006; Wray 2007). Before proceeding with a critique of the assertion that proteins are unlikely to be responsible for developmental evolution, we will first review some of the empirical evidence and theoretical justifications cited in support of the cis-regulatory paradigm.


Perhaps the most influential data used to support the cis-regulatory paradigm of developmental evolution have been the finding that transcription factors from strikingly divergent species seem to be functionally equivalent, often despite great sequence divergence. The first demonstration of functional equivalence of transcription factors was between homologous human and Drosophila Hox/HOM genes HoxB-4 and Deformed (Dfd). McGinnis et al. (1990) tested whether the human HoxB-4 gene could substitute for the regulatory functions of Dfd in Drosophila embryos, including an autoregulatory function of Dfd, by inserting a heat-shock promoter driving HoxB-4 expression into the Drosophila genome. Amazingly, in developing embryonic and larval cells, the human HoxB-4 gene activated ectopic expression of the endogenous Dfd gene via its autoregulatory element. Thus, the human gene could replace the normal regulatory function of its Drosophila homolog after more than 800 million years of divergence (Hedges and Kumar 2003). These results were followed by additional studies that found HoxB-4 cis-regulatory elements could mimic the function of the Dfd autoregulatory element in Drosophila embryos, with the HoxB-4 cis-regulatory elements driving the expression of Dfd in a posterior head segment of Drosophila (Malicki et al. 1992), whereas the reciprocal experiment found that the Drosophila Dfd autoregulatory element provided spatially localized expression in the hindbrain of mouse embryos (Awgulewitsch and Jacobs 1992). These results suggested that the Dfd/HoxB-4 autoregulatory circuit is conserved in both arthropod and chordate lineages and likely arose before their divergence.

Although Dfd and HoxB-4 were the first example of functional equivalence between transcription factors, the most celebrated case has been Pax6. The Pax6 gene codes for a transcription factor containing two highly conserved DNA-binding motifs, a paired domain and a paired-type homeodomain. Pax6 loss of function mutations result in similar eye malformations in mice, humans, and Drosophila (Hill et al. 1991; Jordan et al. 1992; Niimi et al. 2002) suggesting they have similar functions in eye development despite the vast differences in the development and structure of vertebrate and invertebrate eyes. Ectopic expression of eyeless, the Drosophila homolog of the vertebrate Pax6 gene, induces eye development on the antennae, legs, and wings of flies suggesting it is a “master regulator” that switches on the eye development regulatory network (Halder et al. 1995). Astonishingly, ectopic expression of the mouse Pax6 gene in Drosophila embryos induced the formation of well-developed compound eyes (Halder et al. 1995) whereas the reciprocal experiment, expression of Drosophila eyeless and twin of eyeless genes in Xenopus embryos, resulted in development of vertebrate eye structures (Onuma et al. 2002). These data strongly suggested that Pax genes function as a conserved high-level regulatory switches that activate the developmental network leading to eye formation.

The homeodomain of vertebrate Hox genes is also remarkably well conserved in primary function (DNA-binding), tertiary structure, and ability to compensate for loss of function mutations of paralogous genes. Striking functional redundancy of paralogs, for example, was demonstrated for HoxA-11 and HoxD-11 knockout mice, which have no kidney defects and only mild limb defects. However, compound mutant HoxA-11/HoxD-11 null mice have rudimentary or missing kidneys and dramatically shorter radius and ulna bones in the forearm (Davis et al. 1995; Zakany et al. 1996; Patterson et al. 2001). More dramatic results were shown for gene swaps between HoxA-3 and HoxD-3, which are nearly unalignable outside of the homeodomain, yet are completely functionally redundant (Condie and Capecchi 1994; Greer et al. 2000). Similarly, interchangeability of HoxA-1 and HoxB-1 in mouse development was tested by swapping their protein-coding regions (Tvrdik and Capecchi 2006). Mice expressing the HoxB-1 protein from the HoxA-1 locus, and vice versa, are essentially normal except for a minor facial nerve hypomorphism in hemizygous HoxB-1(A1/-) mice and decreased viability in homozygous HoxA-1(B1/B1) embryos.

Numerous other examples of transcription factor and structural protein functional equivalence have been identified both between orthologous genes from long diverged organisms (Bickar et al. 1985; Yu et al. 1995; McIntosh and Bonham-Smith 2001; Wang et al. 2002, 2004; Zhang et al. 2004; Liu et al. 2006; Shimeld 2007) and paralogous genes from ancient duplications (Roman et al. 1997; Bouchard et al. 2000; Fleischmann et al. 2000; Hirth et al. 2001; Acampora et al. 2003; Coronado et al. 2004; Chia and Costantini 2005; Pan et al. 2005). The conclusions from these studies have been well learned: The functional specificity of transcription factors does not seem to change through evolution and sequence divergence between species was likely neutral with respect to developmental function, at least so long as binding-site recognition remained unaltered. Thus, when looking for the mechanistic basis of morphological differences between species, transcription factors (and other proteins) seem to be an unlikely source of adaptive variation, instead focus on regulatory differences in target genes.


If changes in transcription factor proteins are not responsible for developmental evolution, then where do we look for the source of phenotypic variation? The answer, or at least the most popular proposal, has been in the cis-regulatory regions of protein-coding genes (King and Jukes 1969; Wilson et al. 1974a,b; Prager and Wilson 1975; Stern 2000; Wilkins 2002; Carroll 2005b; Prud'homme et al. 2007; Wray 2007). Proponents of the cis-regulatory paradigm point to several features of enhancers they consider as uniquely suited to promote phenotypic change, most important among them is the modular structure of cis-regulatory architecture (Carroll 2005b; Prud'homme et al. 2007; Wray 2007). Briefly, regulatory regions of genes are composed of multiple discrete elements that can either activate (promoters, enhancers) or repress (silencers) transcription. Enhancers, for example, are usually 200- to 300-nucleotide-long elements that have binding sites for multiple transcription factors and generally direct expression of genes to specific tissues or times during development. Thus, mutations that abolish or create a novel enhancer usually affect gene expression in only a single tissue or developmental time window; in other words, independent cis-regulatory regions independently effect the expression of genes in different times and places.

As a consequence of the modular structure of cis-regulatory regions, the effect of mutations that alter a single cis-regulatory element will be restricted to particular places and times and not globally affect gene expression, that is, not alter expression in every tissue in which a particular gene is expressed (Stern 2000; Prud'homme et al. 2007). Thus, cis-regulatory regions are thought to be largely free of the kinds of deleterious pleiotropic effects that constrain proteins. Proteins, such as a broadly expressed transcription factor, are thought to be an unlikely source of phenotypic change because a mutation in the coding region may directly affect the expression of all the genes the protein regulates in all the tissues it is expressed in (Carroll 2005a,b). Given the above rationale, it is not surprising that numerous examples of cis-regulatory divergence have been identified to be the source of phenotypic variation (reviewed in Stern 2000; Wray 2007).

Why Do We Need to Consider Transcription Factor Protein Evolution in Addition to Binding Site Evolution?

In spite of how powerful the logic of and the evidence for the cis-regulatory model is, it nevertheless does not accommodate all the facts relevant to the evolution of gene regulation. One of the most important of which is that the functional properties of transcription factors, and other proteins, evolve (Morris et al. 1997; Hanks et al. 1998; Park et al. 1998; Ranganayakulu et al. 1998; Hyman et al. 2003; Chen et al. 2004; Punzo et al. 2004; Stolt et al. 2004; Hanzawa et al. 2005; Siegal and Baker 2005; Kellerer et al. 2006). The challenge is to incorporate these findings in a meaningful way into the worldview of regulatory evolution, understand how it happens, and identify the role it plays in the evolution of gene regulatory networks. In this section we briefly summarize the kind of evidence that shows transcription factor functions evolve and discuss some models of how it contributes to the evolution of gene regulation.


Much of the evidence for transcription factor evolution will be explained in greater detail in later sections. Here we only summarize the kinds of evidence that challenges the assumption of an unchanging “genetic toolkit.” One kind of evidence is an extension of the work that originally suggested extensive conservation of transcription factor functions, namely the transfer of a transcription factor gene from one species, for example mouse, into another species, for example Drosophila. Although these kinds of experiments have provided stunning cases of functional conservation, they also uncovered cases of functional nonequivalence between homologous transcription factors. To our knowledge, the first evidence of functional divergence was found in an attempt to rescue the loss of function mutation in the Drosophila tinman gene, which codes for a transcription factor necessary for heart development, by its mammalian homologue Nkx2.5. As expected, the mouse gene was able to rescue the expression of some target genes, like FascIII, but not that of others, like eve, zfb-1, and D-MEF2 (Park et al. 1998; Ranganayakulu et al. 1998). Although this and other experimental studies (discussed below) clearly show that transcription factors do not remain conserved in all their functions, it does not illuminate the role these changes played in evolution, because they do not identify the exact evolutionary transition that led to these functional differences.

Another kind of evidence comes from molecular evolutionary studies of transcription factor genes. Although there is evidence for conservation of parts of transcription factors, such as the DNA-binding homeodomain of Hox proteins, there is also extensive evidence for adaptive change in transcription factor proteins (Maiti et al. 1996; Ting et al. 1998; Purugganan et al. 2000; Zhang et al. 2002; Barrier et al. 2003; Martinez-Castilla and Alvarez-Buylla 2003; Lynch et al. 2004; Bustamante et al. 2005; Casillas et al. 2006; Crow et al. 2006; Luo et al. 2006; Pollard et al. 2006; Mukherjee and Bürglin 2007; Wang and Zhang 2007), evidence of adaptive evolution has even been found within the “conserved” homeodomain itself (Sutton and Wilkinson 1997; Lynch et al. 2006). Much of this evidence has come from comparing rates of amino acid substitutions to the rate of synonymous nucleotide substitutions, for instance, it has recently been shown that the progesterone receptor has evolved under the action of positive Darwinian selection in humans and chimpanzees (Chen et al. 2008). These changes happened in parts of the protein important for transcriptional activity and occurred coincident with changes in the mechanism of parturition in higher apes. Another type of evidence comes from the comparison of patterns of amino acid conservation and variation in different clades, with significant differences in stabilizing selection between clades suggesting functional divergence (Gu 2003).

The final kind of evidence comes from genome-wide studies of transcription factor target gene evolution. For example, Borneman and colleagues compared the target sequences of the pseudohyphal regulators Ste12 and Tec1 using ChIP-Chip in three yeast species. Remarkably, the presence or absence of binding sites for these two transcription factors in potential target genes did not predict whether they actually bound the binding sites (Borneman et al. 2007). Hence, binding sites are neither necessary nor sufficient for the activity of a transcription factor protein. In a similar study, Tuch and colleagues investigated the target sequences of the MADS-box transcription factor Mcm1 in three species of fungi: Sacharomyces cerevisiae, Kluyveromyces lactis, and Candida albicans. As expected, the authors found Mcm1 binding sites in association with binding sites for transcription factors known to interact with Mcm1 in all three species, but in K. lactis binding sites for Mcm1 were found in association with binding site for Rap1 in the regulatory regions of ribosomal protein genes (Tuch et al. 2008a). No such association was found in the other two species, suggesting that a novel protein–protein interaction evolved between Mcm1 and Rap1 in K. lactis, which led to the recruitment of Rap1 proteins to the Mcm1 target genes. Similarly a new interaction between Mcm1 and Wor1 has evolved in C. albicans, suggesting that the evolution of protein–protein interactions (PPIs) can lead to acquisition of new target genes and novel gene expression patterns.


Transcriptional regulators need to interact with the regulatory region of a gene, such as by directly binding DNA through a DNA-binding domain, to influence its transcription. This mode of transcription factor recruitment is at the heart of the cis-regulatory model, but it is not the only way regulatory proteins can be recruited to target genes. For example, transcriptional co-factors are bound exclusively by PPIs and do not directly bind DNA, even classic transcription factors like homeodomian-containing transcription factors can regulate target genes without ever touching DNA (Zappavigna et al. 1994; Plaza et al. 1997; Vander Zwan et al. 2003). This mechanism of transcription factor function is not a rare exception, as has been shown by genome-wide studies (Borneman et al. 2007; Tuch et al. 2008a). Here, we consider some models how the origin of novel PPIs between transcription factors can influence the evolution of gene regulatory networks.

The first such model has been proposed by Tuch and colleagues (2008b) and was motivated by their finding that large numbers of new transcription factor binding sites arose adjacent to ancestral binding sites in certain fungal lineages (Tuch et al. 2008b). Their scenario is the following (Fig. 1A): consider a transcription factor A (e.g., Mcm1) that regulates a set of target genes TG-1, TG-2, TG-3, etc., which form a functional module (e.g., ribosomal proteins). Let us further assume that there is another transcription factor B (for instance Rap1) that ancestrally does not physically interact with A. Now consider a mutation in B, B*, that creates a novel protein–protein interaction between A and B, A::B*, which recruits the B* protein to the loci where A already binds. If the presence of B* at those loci is beneficial, selection will act to stabilize the binding of B* at these loci, for instance by evolving B*-binding sites adjacent to binding sites for A (Fig. 1A) or further strengthening the interaction with A. Note that in this model, the evolution of the protein–protein interaction is the initial step in the evolution of a binding site on the DNA, and this mode of evolution is much more likely than a model of evolution in which each target gene has to evolve B-binding sites independently. The mutation that generates the novel protein–protein interaction instantaneously recruits the B* protein to all the target genes of A. These target genes are likely to be a functionally integrated module (like the ribosomal protein module) and thus influence biological processes in a coherent way. Hence, these mutations would be highly pleiotropic at the level of the individual target gene but modular with respect to function. This mode of evolution can even lead to the complete replacement of an ancestral regulator by another (Tanay et al. 2005).

Figure 1.

Models of transcription factor recruitment via novel evolution of the protein–protein interactions (PPIs). (A) Model of target gene recruitment through the origin of new PPIs. In this scenario an interaction evolves between transcription factors A and B (New PPI), after which interactions between B and DNA are stabilized by the origin of new transcription factor binding sites (New CRE). Rewiring in this manner could avoid fitness barriers imposed by initially changing regulation one gene at a time (After Tuch et al. 2008). (B) Model of target gene recruitment through the origin of a new cis-regulatory elements (new CRE). Note that in this model, only one allele in a diploid organism possesses the new CRE so only one allele is coopted into a new expression pattern and likely with weak effects (small arrow) on target gene expression (TG). (C) Model of target gene recruitment through the origin of new PPIs. The new PPI between A and B recruits B to all A target genes, thus both alleles are coopted and the mutation is dominant. The effects from this mutation on target gene expression can be much larger (large arrows).

The next model is similar in logic to the previous one, but is from the viewpoint of a target gene newly recruited into a novel function. Let us compare two scenarios, one in which the novel expression domain evolves from the origin of a binding site first (Fig. 1B) and one where transcription factor protein mutations happen first (Fig. 1C). In the first scenario, a nucleotide substitution would create exactly one binding site in one of the two alleles in the genome of a diploid organism. Hence, the new activity would be due to the binding of a single transcription factor molecule and affect only one allele as long as the frequency of this new allele in the population is low (as it must be at the moment of its creation). In order for natural selection to act on this new mutation, the effect of a single binding site in heterozygote must have a sufficient impact on fitness to be noticeable and thus at least be intermediate in dominance and not recessive. Otherwise the new binding site will be masked from natural selection.

In the second scenario the mutation B → B* happens first, creating a novel protein–protein interaction (A::B*) that recruits B* to all A target genes. This mutation immediately recruits as many copies of the mutant transcription factor to the locus as there are copies of A already bound there. The mutation also affects both alleles in a diploid genome if both alleles bind the protein A. Hence, the effect is likely to be dominant. Again, if the recruitment of B* to the target locus of A is beneficial, selection will stabilize the interaction through the evolution of a binding site for B* or strengthening the interaction with A.

There are some points to be noticed about these scenarios. One is that neither scenario requires the binding specificity of the transcription factor to change, nor is any other generic function of a transcription factor affected, such as transcriptional activity and nuclear localization. Instead, these scenarios demonstrate that transcriptional regulation can be affected as much by PPIs among transcription factors as it is by the presence or absence of transcription factor binding sites.

Reconsidering the Role of Transcription Factors in Developmental Evolution

The models discussed above suggest there is a potential for regulatory rewiring through the gain and loss of PPIs between transcriptional regulations, and may explain patterns of evolution hard to accommodate under cis-regulatory models. Indeed, numerous empirical studies have identified transcription factors (and other proteins) that are functionally divergent between different species and among ancient paralogs that are dependent on interactions between proteins (Morris et al. 1997; Hanks et al. 1998; Park et al. 1998; Ranganayakulu et al. 1998; Hyman et al. 2003; Chen et al. 2004; Punzo et al. 2004; Stolt et al. 2004; Hanzawa et al. 2005; Siegal and Baker 2005; Kellerer et al. 2006). In the remainder of this article we briefly review some of the best-documented examples of functional divergence between transcription factors as well as some of the features of protein structure, function, and evolution that reduce pleiotropy and enhance evolvability of transcription factors. These mechanisms, including (sub)neo-functionalization after gene duplication, tissue-specific expression, alternative splicing, domain shuffling, and the gain and loss of protein–protein interaction motifs (Fig. 2), facilitate the kinds of divergence in transcription factor functions that generate novel gene regulatory links.

Figure 2.

Diverse mechanisms reduce pleiotropy and generate functional diversity in proteins. The general structure of a gene includes cis-regulatory elements that bind trans-acting proteins (transcription factors), exons (black boxes), 5′- and 3′-UTRs (gray boxes). Exons may be spliced in different ways (shown by light gray lines). (A) Mechanisms that reduce pleiotropy. (B) Mutations that generate functional diversity in proteins. (C) Mutations that generate expression diversity.


The finding that transcription factors are functionally divergent between different species and paralogs demonstrates that transcription factors can evolve new functions, but the primary argument against a substantial role for transcription factor changes in developmental evolution is the perception that they have strong pleiotropy. An important, but thus far unrecognized, mechanism that can limit the negative pleiotropic effects of mutations in transcription factors is tissue-specific gene expression. The expression of eukaryotic genes is controlled by arrays of common core promoter elements and diverse gene- and tissue-specific enhancer elements. These elements cooperate to generate specific gene expression patterns by binding appropriate transcription factors and recruiting the transcriptional machinery (reviewed in Lemon and Tjian 2000). The essential molecular steps that lead to gene expression are DNA binding through the DNA-binding domain(s) of transcription factors, and the recruitment of other DNA-binding transcription factors, cofactors, and the basal transcriptional machinery through PPIs. The assembly of the enhancer complex can occur before, after, or simultaneously with DNA-binding (Lemon and Tjian 2000).

Although the transcription factor-DNA interaction is a principle element of gene regulation, it is not the principle element of gene regulation because DNA-binding by a protein will have little effect on gene expression per se. It is the recruitment of other transcription factors, coactivators, corepressors and finally the transcriptional machinery, mediated via PPIs, that ultimately leads to gene expression. DNA-binding is an essential part of this process, but it is no more or less important than the PPIs that assemble the enhancer and polymerase complex. For example, repression of Distalless (Dll) in the abdomen of Drosophila embryos requires at least six different transcription factors (Ubx, AbdA, En, Hth, Exd, and Slp) to bind each other and their DNA-binding sites (Fig. 3) (Gebelin et al. 2004). Thus, gene regulation is dependent on two essential biomolecular interactions, DNA-protein and protein–protein. Although it is becoming clear that changes in both interactions contribute to developmental evolution, the question remains whether there are analogous mechanisms to the combinatorial logic of enhancers working in transcription factor function that can decrease the negative effects of pleiotropy and increase their evolvability. Considering the mechanisms of gene regulation by transcription factors in the light of modularity indicates the answer to these questions is yes.

Figure 3.

Tissue-specific gene expression contributes to combinatorial regulation of Distalless in developing Drosophila abdominal segments. In the center is a summary of the expression patterns of Ubx, AbdA, Exd, Hth, Slp and En in an abdominal segment. In both anterior and posterior compartments of the segment a Hox/Exd/Hth/Hox tetramer binds to the Distalless regulatory element (DMX-R), the specific members of the repressor complex, however, varies by compartment. The repressor complexes in the anterior and posterior compartments are shown to the left and right, respectively, of the expression pattern summary. Repression of Distalless in requires at least six different transcription factors to bind each other and their DNA-binding sites (Gebelin et al. 2004).

Most genes are regulated by “mixing and matching” different transcription factors with additional transcription factors, coactivators and corepressors in a combinatorial fashion to regulate the expression of individual genes (Lemon and Tjian 2000; Yu et al. 2006). Remarkably, a common assumption is that most transcription factors are broadly expressed, whereas, in fact, recent data indicate that many are tissue specific. These findings raise a broader question, to what extent are the regulatory functions of transcription factors tissue specific? Although there are few studies that have directly addresses this question, Yu et al. (2006) found that there were ∼290 genes specific for each major tissue type in humans and over 7200 tissue-specific genes in the human genome. Thus, tissue-specific genes account for about one-third (7200/22,000) of all human protein-coding genes. Furthermore, most tissue-specific genes (85%) were specific to only a single tissue. Of the tissue-specific genes identified by Yu and colleagues (2006), nearly 8% (605/7261) encoded transcription factors. Given the recent estimate of 1962 transcription factors in the human genome, these data indicate that ∼30% (605/1962) of human transcription factors are tissue specific even though transcription factors represent only ∼9% (1962/22,000) of the protein-coding genes in the genome. These data imply a large number of transcription factors function in directing the appropriate tissue-specific expression of target genes. This wealth of tissue-specific transcription factors provides a large number of combinatorial possibilities for selection to act on to generate novel TF–TF interactions, while keeping the negative pleiotropic effects of these new interactions to a minimum.

Individual transcription factors can also contribute to tissue-specific gene regulation even though their expression is not tissue specific through interactions with distinct sets of transcription factors in different tissues and tissue compartments. For example, although Exd and Hth are expressed in both anterior and posterior regions of abdominal segments in Drosophila (Fig. 3), they can have regionally specific effects on Dll expression by interacting with cofactors that have more restricted expression domains such as Ubx, Slp and En (Fig. 3). Similarly, by interacting with distinct partners in different tissues, Ubx can act in tissue-specific gene regulation in the abdominal and thoracic segments even though it is not specific for either of these tissues itself (Mann and Morata 2000). Thus, a mutation in Ubx that generated a novel protein–protein interaction site for an abdominal cofactor could generate a novel regulatory link specific to the abdomen, without functional effects in other tissues such as the thorax and haltere. Based on this kind of regulatory logic, Yu et al. (2006) proposed that the function of transcription factors is best defined by their interactions with other transcriptional regulators (i.e., “the company it keeps”).

Tissue specificity provides a mechanism that can minimize the pleiotropic effects of mutations that lead to the gain of novel regulatory links via transcription factor evolution, it also limits the effects of loss of function mutations that break regulatory links. In plants there have been extensive studies on the evolution of flower color resulting from changes in the regulation of the anthocyanin pathway, a model system for the evolution of gene regulation in plants. This pathway is well conserved across angiosperms and in several well-studied species, such as maize (Zea), petunia (Petunia), snapdragon (Anitrrhinum), and morning glory (Ipomoea), several to many paralogs of transcription factors that regulate the anthocyanin pathway occur (Rausher et al. 1999), each of which tends to be expressed in one or a few tissues (Yoshida et al. 2008) and functionally diverge after duplication (Yoshida et al. 2008) and generally produce only localized phenotypic effects when inactivated. For example, in the petunia P. axillaries white flowers have evolved from a blue-flowered ancestor via the loss of function of a myb transcription factor (Quattrocchio et al. 1999). This loss of function mutation generated an ecologically important novel phenotype, indicating that the pleiotropic effects of this mutation were trivial or able to be compensated for without long-term deleterious effects (Quattrocchio et al. 1999). Similarly, an inactivating mutation in a myb transcription factor (Ipmyb1) in the morning glory causes the white allele of the widespread blue–white polymorphism in Ipomoea purpurea (Chang et al. 2005). The loss of Ipmyb1 function has very tissue-specific effects: the stem, vegetative tissues, and the nectar guides of the flowers are unaffected and accumulate anthocyanins; only the petals fail to accumulate pigment (Schoen et al. 1984). Remarkably, numerous field experiments have failed to detect any deleterious effects resulting from this mutation indicating the pleiotropic effects of this mutation are extremely small or nonexistent (Rausher et al. 1993; Rausher and Fry 1993; Fry and Rausher 1997; Mojonnier and Rausher 1997).

The examples cited above demonstrate the power of tissue-specificity to reduce the negative pleiotropic effects of mutations. Mixing and matching tissue-specific and general transcriptional regulators generates a buffer against the spread of deleterious effects of a mutation, such as those that generate a new cofactor association, outside the tissue it is beneficial in. For example, a mutation that generated a novel protein–protein interaction site in Ubx may affect the expression of genes in tissues that expresses the cofactor, but it need not affect the expression of all Ubx target genes in every tissue Ubx is expressed in. Similarly, the effects from a complete loss of function of myb genes in petunia and morning glory were limited to a single tissue (petals) and thus did not deleteriously affect plant survival.


Perhaps one of the most powerful mechanisms to evolve transcription factors with novel functional activities is by gene duplication (Ohno 1970; Hoekstra and Coyne 2007). Indeed, Hoekstra and Coyne (2007) have recently suggested that gene duplication may be one of the most powerful mechanisms of transcription factor divergence because immediately after the duplication event there is a period of functional redundancy that can reduce the negative consequence of pleiotropy. Neofunctionalization after gene duplication has been documented in many protein families, providing compelling evidence for its role in the diversification of transcription factor functions such as MADS box genes in plants (Hernandez-Hernandez et al. 2007) and Hox3/zen/bcd in arthropods (Falciani et al. 1996; Stauber et al. 2002; Panfilio and Akam 2007).

Members of the type II MADS box family control many important aspects of plant development (reviewed in Becker and Theien 2003), including regulating floral organ identity. The ABC model of floral organ identity proposes that organ identity is determined by the partially overlapping expression of three gene classes producing a distinct combinatorial code: class A genes code for first whorl sepals; class A + B genes for second whorl petals; class B + C genes for third whorl stamens; and class C genes alone, for the fourth whorl carpels (Coen and Meyerowitz 1991). All but one of the ABC class genes are type II MADS box genes, which are also known as MIKC MADS box genes because of their conserved domain structure. Starting at the N-terminal end of the gene, the M or MADS domain is highly conserved across eukaryotes, and mediates DNA binding and protein dimerization (Yanofsky et al. 1990; Riechmann and Meyerowitz 1997). The next two regions, known as the I and K domains, are primarily involved in mediating protein dimerization (Riechmann and Meyerowitz 1997), whereas the last, the C domain, has a number of different functions, including mediating higher-order interactions among MADS protein dimmers (Egea-Cortines et al. 1999; Yang et al. 2003; Yang and Jack 2004), transcriptional activation (Cho et al. 1999; Honma and Goto 2001), and is a location for posttranslational modification (Yalovsky et al. 2000). Interestingly, although the C-terminal domain has a lower degree of overall sequence conservation than the other regions, each of the major MIKC subfamilies possesses short, highly conserved diagnostic motif at their C-terminal end (reviewed in Vandenbussche et al. 2003).

The two major lineages of B-class genes in angiosperms, APETELA3 (AP3) and PISTILLATA (PI) arose by gene duplication shortly after the gymnosperm–angiosperm divergence, AP3 duplicated again in the stem-lineage of core eudicots generating TM6 and euAP3 (Hernandez-Hernandez et al. 2007). Molecular evolutionary analysis of AP3 and PI has identified episodes of positive selection immediately after the AP3/PI duplication and additional episode of positive selection after the duplication of AP3 in the core eudicots that gave rise to TM6 and euAP3 (Hernandez-Hernandez et al. 2007). Although TM6 continued to evolve under functional constraint after duplication, strong positive selection was identified in the K-domain and C-terminal euAP3-motif in euAP3 suggesting these regions evolved novel functional specificities after duplications (Fig. 4). Furthermore, site-directed mutagenesis studies indicate positively selected sites in the K-domain of AP3 and PI are crucial for establishing PPIs among MADS-domain genes (Fan et al. 1997; Lamb and Irish 2003; Yang et al. 2003; Yang and Jack 2004).

Figure 4.

Evolution of PI/AP3 transcription factor genes in flowering plants by duplication and functional divergence. The PISTILLATA (PI) and APETALA 3 (AP3) genes of eudicots arose by a gene duplication early in the evolution of flowering plants, likely from a Gymnosperm-like B-class gene (GYMNO B-Genes), followed by the acquisition of PI-motif. An additional duplication of the PI gene gave rise to its paralog AP3, which its self acquired a novel AP3-specific motif (paleoAP3) and further diverged the PI-motif. In the core eudicots, PI duplication again, giving rise to TM6, which is similar to AP3 in function and structure, and euAP3 that has significantly diverged from AP3 and TM6 in function and structure by altering the more ancestral paleoAP3-motif into a new motif (euAP3). M, MADS domain; I, intervening domain; K, K-domain; C, C-terminal domain.

Functional divergence between AP3 and PI has also been tested using domain deletion and swap experiments. To test whether lineage-specific motifs in the C-terminus of AP3 and PI were critical for their function, Lamb and Irish (2003) used truncation mutants missing C-terminal motifs and assayed their ability to induce an ectopic phenotype in wild-type plants and rescue in AP3 and PI null plants; neither AP3 nor PI were able to function without the C-terminal domain indicating C-terminal domains are necessary for function. However, swapping the C-terminal domains indicate they are not functionally equivalent, furthermore although regions ahead of the C-terminal domain are required for PI function, the AP3-specific function is conferred by its C-terminal domain (the AP3-motif). Finally, Lamb and Irish (2003) tested whether the euAP3-motif acquired novel functions distinct from PI and its paralog TM6 by generating chimeric AP3 proteins with the C-terminal domains of PI and paleoAP3. Like PI and AP3, the euAP3 motif is essential for proper protein function and is functionally not equivalent with the paleoAP3 motif, consistent with the identification of positively selected sites in the C-terminal domain of euAP3 (Fig. 4). The remarkable congruence between molecular evolutionary identification of sites fixed by positive selection and functional analyses identifying these sites as crucial for PPIs suggests that selection acted to modulate cofactor associations to effect changes in the development of floral morphology.

Like PI and AP3 in plants, the insect Hox genes Zerknüllt (zen) and Bicoid (bcd) arose by gene duplication followed by spectacular divergences in function (Fig. 5). Although there is evidence that cis-regulatory changes were associated with the divergence in Hox3/zen/bcd function, there is also evidence that changes in the proteins themselves played a major role in the functional transitions from Hox3 to zen, and after the zen/bcd duplication (Fig. 5) (Falciani et al. 1996; Stauber et al. 2002; Hughes et al. 2004; Panfilio and Akam 2007; Papillon and Telford 2007). The divergence of Hox3 and zen functions in early insects was mediated by the evolution of a novel motif outside of the homeodomain (the zen motif), loss of the ancestral hexapeptide motif (FPWM), and migration of the homeodomain to the amino terminus (Falciani et al. 1996; Stauber et al. 2002; Panfilio and Akam 2007). Remarkably, the Hox3/zen gene from Thermobia, a basal insect lineage, appears to be intermediate in structure and tissues expression between the more ancestral Hox3 and the derived zen: the homeodomain has moved more toward the amino-terminus and the gene is expressed in both a Hox3-like pattern and in extra-embryonic tissue like zen genes, suggesting it is an intermediate between a Hox3 gene and zen (Fig. 5) (Hughes et al. 2004). Following the duplication of zen in the stem-lineage of cyclorraphan Dipterans, bcd evolved extremely rapidly including a change in the conserved homeodomain that generated novel RNA-biding function, loss of the zen motif and the acquisition of a new function in polarizing the anterior–posterior axis of the embryo (Fig. 5) (Stauber et al. 2002). The key events that likely lead to divergence in functions between Hox3/zen was the loss of the hexapeptide motif and evolution of extra-embryonic expression whereas in zen/bcd the key events were likely subfunctionaliztion of expression domains and neofunctionalization of the protein.

Figure 5.

Functional divergence in Hox3/zen/bcd genes via loss of protein–protein interaction motifs and gene duplication. The insect Hox gene Zerknüllt (zen) is derived from an ancestral Hox3 gene, and Bicoid (bcd) arose by gene duplication followed functional divergence. Although there is evidence that cis-regulatory changes were associated with the divergence in Hox3/zen/bcd function (Expression Pattern), changes in the proteins themselves played a major role in the functional transitions from Hox3 to zen, and after the zen/bcd duplication (Gene, Protein Structure). The divergence of Hox3 and zen functions in early insects was mediated by the evolution of a novel motif outside of the homeodomain (the putative zen motif), which occurred coincident with the loss of the ancestral hexapeptide motif (FPWM) and migration of the homeodomain to the amino terminus. The Hox3/zen gene from Thermobia, a basal insect lineage has both Hox-like and extra-embryonic expression and the hexapeptide motif, suggesting it is an early stage of the Hox3 to zen change. Following the duplication of zen in the stem-lineage of cyclorraphan Dipterans, bcd evolved extremely rapidly including a change in the conserved homeodomain that generated novel RNA-biding function, loss of the zen motif, and the acquisition of a new function in polarizing the anterior–posterior axis of the embryo. (Falciani et al. 1996; Stauber et al. 2002; Hughes et al. 2004; Panfilio and Akam 2007; Papillon and Telford 2007)

The principle emerging from the PI/AP3 and Hox3/zen/bcd examples is that gene duplication circumvents functional constraints on transcription factors by generating a period of redundancy that allows one copy to acquire a novel function whereas the other preserves the ancestral function. After gene duplications, functional divergence in transcription factors can occur in domains that mediate PPIs and/or their DNA-biding domains, altering their cofactor associations and binding-site preference, respectively. In addition, duplicates can diverge in functions though changes in splice patterns or domain shuffling, suggesting they are an important route out of negative pleiotropy (Hoekstra and Coyne 2007) and play an important role in the origin of novel gene regulatory links.


Although tissue-specific gene expression generates modules of gene expression (gene expression domains) that effectively limits the pleiotropic effects of mutations to a single or few tissues, it is not the only mechanism that can reduce pleiotropy in protein-coding genes. Alternative splicing is increasingly recognized as a widespread mechanism that enables multiple structurally and functionally distinct proteins to be generated from a single transcript and differentially regulate gene expression (Hughes 2006). Thus, a single gene transcript can be spliced to produce different proteins in different tissues and developmental stages. Tissue-specific alternative splicing is a potentially powerful mechanism to increase protein diversity and escape the negative consequences of pleiotropy. For example, if an amino acid substitution in a tissue-specific transcript generates a novel function or new downstream target gene, that effect will be specific to the tissue or developmental stage that transcript is expressed in, effectively eliminating pleiotropic effects in other tissues. As noted by Lopez (1995), alternative splicing in transcription factors is expected to either alter DNA-binding domains, affecting binding site affinity or specificity, or alter the pattern of cofactor interaction sites. These differences in alternative splicing patterns between species could be functionally important with respect to gene regulatory networks.

An expanding body of evidence indicates that alternative splicing is common. For example, one study found that nearly 50% of mouse genes are alternatively spliced, whereas 18% and 14% use either alternative start or stop sites, respectively (Sharov et al. 2005). Another similar study found that at least 41% and possibly as much as 60% of mouse genes have multiple splice forms; over 70% of all alternatively spliced exons were part of the protein-coding region indicating alternative splicing likely alters function (Zavolan et al. 2007). Analysis of the human transcriptome found that at least 50% and potentially as much as 80% of human genes are alternatively spliced (Xu et al. 2002; Johnson et al. 2003).

Although alternative splicing produces proteins with different structural architectures, it may not necessarily lead to different functional specificities. Several studies, however, have found that alternative splicing does alter protein function, from minor functional tweaking to dramatic functional switching (reviewed in López 1995). The three isoforms of the Drosophila transcription factor CF2, for example, contain either three, six, or seven Cys2-His2 zinc finger DNA-binding domains (Hsu et al. 1992). Although it is unclear if the three-finger isoform (CF2-III) binds DNA, the six- (CF2-II) and seven-finger (CF2-I) isoforms have different binding-site preferences (Gogos et al. 1992; Hsu et al. 1992). Alternative splicing of CF2 is also differentially regulated, form II is the most common during most of development in females, form I is the most common in male tissues other then the testis, where form III is the most abundant. A particularly striking example of alternative splicing resulting in isoforms with opposite functions is the human transcription factor AML1, which can function as either an activator or a repressor depending on isoform structure (Tanaka et al. 1995). A recent genome-wide study by (Taneri et al. 2004) found that alternative splicing of mouse transcription factors primarily effected the architecture of their DNA-binding domains presumably altering their DNA-binding specificities. Interestingly, there is usually only a single splice variant within a tissue, although splice variants are different between tissues; Taneri et al. suggest that transcription factors might regulate tissue-specific gene expression by having different alternatively spliced isoforms in different tissues.

What are the evolutionary consequences of this abundance of alternative transcripts? One of the main benefits of alternative splicing may be to localize the effect of function altering amino acid substitutions to specific tissues and developmental stages, thus preventing the kinds of global pleiotropy that can hinder the evolution of broadly expressed proteins. This suggests that alternatively spliced exons should evolve relatively more rapidly than constitutive exons because they are under fewer functional constraints imposed by negative pleiotropy. Consistent with this prediction species-specific splice variants evolved significantly more rapidly than constitutive exons (Cusack and Wolfe 2005; Xing and Lee 2005). Rapid evolution of tissue-specific splice variants is particularly interesting because it directly contradicts the most prominent argument against transcription factor protein evolution contributing to phenotypic change, namely that mutations in broadly expressed transcription factors may be beneficial in one tissue but will be deleterious in other tissues and be selected against, thus they are under strong functional constraints and evolve slowly. These data indicate that tissue-specific alternative splicing directly reduces the negative consequences of pleiotropy, effectively reducing functional constraints.

In addition to localizing function altering amino acid substitutions to specific tissues and developmental stages, gain and loss of alternatively spliced exons between species can dramatically alter protein function in specific tissues and developmental periods, potentially providing a large source of genomic variation for phenotypic change. Comparisons of human and mouse ortholog gene pairs indicate that only 10% of cassette-type alternative splicing events have been conserved over this ∼80 My divergence; the remaining 90% of alternative spliced transcripts have either species-specific novel exons or species-specific splice variants (Nurtdinov et al. 2003; Chen et al. 2006; Sorek et al. 2006). Several studies have demonstrated species-specific exons are common in rodents (Nekrutenko 2004) and humans (Zhang and Chasin 2006) and originate at particularly high rates, about 2.71 × 10−3 per gene per million years in rodents (Wang et al. 2005). The rate of new exon formation is dramatically higher than both the nucleotide substitution rate and gene duplication rate (Lynch and Conery 2003), suggesting novel exon formation may play an important, though underappreciated role, in generating phenotypic diversity.

The functional consequences of alternative splicing and novel exon acquisition have recently been investigated in mammalian chromodomain Y-like (CDYL) genes (Li et al. 2007). CDYL genes function as transcriptional regulators and play an important role in mammalian spermatogenesis (Lahn et al. 2002; Caron et al. 2003). Comparative evolutionary analysis has found that the CDYL gene evolved a new promoter upstream of the ancestral promoter and two new protein-coding exons in the stem-lineage of placental mammals. Comparison of nucleotide substitution rates between novel and ancestral exons indicates that the new exons evolved approximately 3× faster than ancestral exons and were driven by positive selection. Use of the derived promoter generates a protein 62aa longer (CDYLa) and with significantly weaker repression activity than the isoform produced from the ancestral promoter (CDYLb) (Fig. 6). Consistent with the above proposal that tissue-specific alternative splicing likely promotes functional differentiation between isoforms, CDYLb is the major isoform in most somatic cells, whereas CDYLa has a more restricted expression (Lahn and Page 1999; Li et al. 2007).

Figure 6.

Functional differences between isoforms of the human CDYL gene. CDYL genes function as transcriptional regulators and play an important role in mammalian spermatogenesis (Lahn et al. 2002; Caron et al. 2003). Use of alternative promoters generates two distinct isoforms, CDYLa (745aa) and a 62aa truncated form CDYLb (683aa). The longer CDYLa isoform has significantly weaker repression activity than the shorter CDYLb isoform (84% repression in a GAL4 expression assay compared to 93% repression) and has a more restricted expression than the broadly expressed CDYLb s(Lahn and Page 1999; Li et al. 2007).

Evolvable Toolkit Genes

The most popular conceptualization for developmental evolution is the “genetic toolkit.” Genes in the toolkit, primarily transcription factors and signaling molecules, are master regulators that govern the formation and patterning of bodies and body parts (Carroll 2005a). The finding that toolkit genes were common among diverse organisms gave rise to an apparent paradox, how did diverse body plans arise if the set of genes used to build them is the same? The solution to the paradox appears to be that developmental evolution results from changing the location and time that toolkit genes are deployed. According to the cis-regulatory view of developmental evolution, these changes in the deployment of toolkit genes results from changes in their regulatory sequences. Thus, where and when tools are used varies but the tool remains functionally the same—a hammer is a hammer. However, a growing body of data is revealing that toolkit genes have evolved novel functions, suggesting that although the developmental toolkit is common to diverse organisms, it evolves in response to each organisms functional needs—a hammer is a hammer, but it may be a mallet, a sledgehammer, a ball-pen or able to pull nails (Fig. 7). According to this view of developmental evolution, changes in gene regulation result from both changes in the regulation of toolkit genes and their functional specificities. In the remainder of this section we will highlight some examples of how novel functions emerge in toolkit genes and how the mechanisms of reducing pleiotropy discussed above have been used to facilitate functional divergence.

Figure 7.

Evolving toolkits. In the most prevalent conceptualization of “toolkit” genes, similar sets of transcription factors and signaling molecules are shared by diverse organisms and development evolves by changing when and where these genes are expressed through changes in their regulatory elements. In an alternate view, similar sets of transcription factors and signaling molecules are shared by diverse organisms and development evolves not only by when and where these genes are expressed regulatory element evolution but also through the evolution of novel functions (functional tweaking).


At the highest of biomolecular levels, proteins are organized into discrete structural and functional domains, which are generally defined as self-stabilizing and independently folding regions of a protein chain (Darby and Creighton 1995; Voet et al. 1999). Structural domains are generally more than 200 amino acids in length, although some such as the MADS and Hox domains are much smaller, and often are composed of smaller structural motifs. Consequently, domains are structurally and functionally independent modules. Domains have discrete activities such as catalyzing biochemical reactions and mediating molecular binding with other proteins, peptides, ligands, and nucleic acids (Voet et al. 1999). Domain structures are often well conserved across diverse organisms, often despite dramatic divergence in primary (amino acid) sequence. Because they are structurally and functionally semiautonomous units, domain swapping between proteins is common during evolution. A fascinating example of domain swapping recently been reported by Adamska and colleagues (2007). These authors have shown that the hedgehog (Hh) protein evolved in metazoans by exon shuffling between a hedge-domain containing protein ancestrally involved in intercellular communication and a hog/intein-domain containing protein, into a single gene (Fig. 8). They infer that the autocatalytic activity of the hog/intein-domain allowed the release of the hedge ligand, allowing long-range cell–cell signaling that was later coopted for complex morphogenetic patterning (Adamska et al. 2007).

Figure 8.

The Metazoan Toolkit Gene Hedgehog (Hh) evolved by domain shuffling. The intein-related hog domain (black) predates the origin of the Metazoa, is auto and is associated with many proteins (white). The hedge-domain (red) arose in the ancestor of sponges and eumetazoans, and was originally part of Hedgling (red, pink, and blue). Early in metazoan evolution, domain shuffling resulted in the emergence of the conventional Hh, composed of the hedge-domain and the more ancient hog/intein-domain. The autocatalytic activity of the hog/intein-domain (black arrow) in Hh could have allowed the release of the hedge ligand and hence longer range signaling that could be used to control more complex morphogenetic patterning. Reprinted from Current Biology, 17(19), Adamska et al., The evolutionary origin of hedgehog proteins. Copyright (2007), with permission from Elsevier.


Large domains are burdened by severe structural constraints imposed by the need to fold into a stable structure and function correctly (Darby and Creighton 1995). Although many PPIs are mediated by contacts between secondary structural motifs and domains, a growing number of interactions are being identified that are mediated by short linear motifs or SLiMs (Neduva and Russell 2005). The key feature of linear motifs is their small size, usually 3–10 residues long with only two or three required to mediate the interaction, and low binding energies leading to weak interactions, thus they tend to be the primary mediator of transient interactions such as ligand docking, and the assembly of enhanceosomes and the basal transcription apparatus (Neduva and Russell 2005). In addition, SLiMs most often occur in poorly structured regions of proteins, with more than 85% of known motifs located in disordered regions, indicating they are relatively free from structural constraints (Neduva and Russell 2005, 2006; Fuxreiter et al. 2007). This feature of SLiMs is particularly advantageous because it reduces the number of potentially structurally deleterious mutations in SLiMs, thus minimizing intramolecular pleiotropic effects of amino acid substitutions.

SLiMs are particularly evolvable, their small size and lax sequence specificity means that functional linear motifs more easily appear and disappear than domains and structural motifs (Neduva and Russell 2005). Just a single mutation is often enough to convert a nonfunctional stretch of amino acids into a functional SLiM, giving them a high degree of evolutionary plasticity (Neduva and Russell 2005). Correspondingly, SLiMs are poorly conserved compared to domains even though the same kinds of motifs are used across diverse organisms. In a recent review Neduva and Russell (2005) examined the conservation of experimentally determined linear motifs across eukaryotes and found that although domain architecture was well conserved, linear motifs were poorly conserved between lineages. Because of their small size, linear motifs are also likely to evolve in unrelated proteins convergently: 1 in 20 proteins contain the SH3 binding-motif RxPxxP (Neduva and Russell 2005). Thus, SLiMs are a large source of potential interactions that can be coopted into existing regulatory or interaction networks leading to novel effects, such as acquisition of novel cofactors and target genes. These features of linear motifs, that is small size, evolutionary plasticity, and rapid turnover rates, has led them to be considered “evolutionary interaction switches” (Neduva and Russell 2005).

Perhaps the most dramatic example of a functional change in a transcription factor is the Drosophila Hox/HOM gene fushi tarazu (Ftz). Although Ftz from primitive insects functions as a homeotic gene, Drosophila Ftz has lost all homeotic functions and functions in segmentation instead (Lohr et al. 2001). To determine how Drosophila Ftz evolved from a homeotic gene into a novel segmentation gene, Löhr et al. (2001) ectopically expressed Ftz from different species in fruit flies to assess their potential to cause homeotic transformations and regulate segmentation. Although Ftz from the basal insect lineages Tribolium and Schistocerca possessed homeotic functions, for example, repressing hth and causing transformations of antenna toward leg, the Drosophila gene lost its homeotic function and only had segmentation potential.

Remarkably, although Drosophila Ftz is solely segmental, Ftz from Schistocerca and Tribolium had extremely weak and moderate segmentation potential, respectively, suggesting that the switch from homeotic to segmentation function occurred in stages (Lohr et al. 2001). This change is dependent on the ability of Ftz to interact with Ftz-factor 1 (Ftz-F1) at the nuclear receptor SLiM (LXXLL), which is present in Drosophila and Tribolium Ftz but not in Schistocerca (Fig. 9). Conversely, loss of homeotic function in Drosophila Ftz is dependent on loss of the Extradenticle (Exd) interaction SLiM YPWM upstream of the Ftz homeodomain; the YPWM motif is present in both Schistocerca and Tribolium (Fig. 9). Löhr et al. (2001) proposed a stepwise model to explain the evolution of the novel segmentation function of Drosophila Ftz: Ancestrally, all insect Ftz genes had homeotic functions dependent on the Exd interaction motif YPWM. Sometime after divergence of the Drosophila-Tribolium lineage from Schistocerca, the Ftz-F1 interaction motif evolved. Ftz in this intermediate stage had both a segmentation function, dependent on interaction with Ftz-F1, and homeotic functions dependent on interaction with Exd. Subsequent loss of the Exd interaction motif in the stem-lineage of Drosophila produced an Ftz with only segmentation functions. Thus, the evolution of a novel Ftz function was dependent upon the gain and loss of small linear motifs that mediate PPIs (Fig. 9).

Figure 9.

Evolution of a novel segmentation function in Drosophila Ftz dependent on the gain and loss of Short Linear Motifs (SLiMs). The insect Ftz was ancestrally a homeotic gene, dependent on interaction with Exd at the hexapeptide motif (YPWM). In the stem-lineage of beetles and flies (Endopterygotoa) a novel SLiM (LXXLL) originated in Ftz that mediated an interaction Ftz-F1. During this intermediate stage Ftz had both homeotic function (dependent on the YPWM motif) and segmentation function (dependent on the LXXLL motif). In the Drosophila lineage, the YPWM motif was lost resulting in loss of homeotic function in Ftz.

A particularly well-studied example of transcription factor functional divergence is the insect gene Ultrabithorax (Ubx). Ubx is a homeotic (HOM/Hox) transcription factor expressed in the third thoracic (T3) segment of insects and is necessary for proper development of T3 appendages such as hindwings in butterflies and beetles and halteres in fruit flies (Weatherbee et al. 1998, 1999; Tomoyasu et al. 2005). Averof and Akam (1995) have proposed that the insect body plan evolved from a crustacean-like plan in two phases: restriction of Ubx and AdbA expression to the proto-abdominal region followed by acquisition of repressive activities in Ubx and AdbA that suppressed thoracic-type limbs in the abdomen. Although the first stage of this transition is dependent on alterations in Ubx and AdbA expression domains (Averof and Akam 1995), the second stage could involve changes in the coding regions of Ubx and AdbA or their cofactors, the cis-regulatory regions of Ubx and AdbA or some combination of these changes.

Grenier and Carroll (2000) compared the activity of Ubx from the Onycophoran velvet worm (Acanthokara kaputensis) and fruit fly (Drosophila) using in vivo misexpression studies. The similarity of Ubx from these two species is practically nonexistent outside the highly conserved DNA-binding homeodomain, with much of the dissimilarity because of indels. In spite of these differences both proteins were able to transform antenna into legs and the forewing into a halter by repressing srf in the wing disc and activating dpp in the visceral mesoderm, respectively. The velvet worm Ubx, however, was unable to produce all the phenotypes of fly Ubx misexpression. The velvet worm gene, for example, did not transform thoracic cuticle into abdominal cuticle nor did it repress Dll expression in the leg rudiments of larval Drosophila, both typical effects of Drosophila Ubx. Homeodomain swap experiments confirmed that the results were not because of differences in their homeodomains, indicating the fly-specific activities of Ubx were not the result of differences in DNA binding. Thus, functional divergence was not dependent on cis-regulatory elements in Drosophila Ubx target genes and was likely because of differences in the ability of Drosophila and velvet worm Ubx genes to form PPIs required for normal fly Ubx function. This interpretation was supported by functional mapping that identified an insect-specific QAQAQK(A)n motif (QA-motif) C-terminal to the homeodomain in Drosophila Ubx that played a role in Dll repression and was able to confer limb repression activity when grafted onto velvet worm Ubx (Fig. 10) (Galant and Carroll 2002).

Figure 10.

Evolution of a novel leg repression function in insect Ubx. The insect thorax is made of three segments, the prothorax (T1), mesothorax (T2), and the metathorax (T3). Each thoracic segment has a pair of ventral appendages (legs) and may have dorsal appendages (wings and wing derivatives). Averof and Akam (1995) have proposed that the insect body plan (Fly and Beetle) evolved from a crustacean-like plan (brine shrimp) in two phases: restriction of Ubx expression (shown in blue) to the proto-abdominal region followed by acquisition of repressive activities in Ubx that suppressed Dll expression and limbs in the abdomen (Ubx shown in red). Ancestrally, the cryptic Dll repression function of Ubx was itself repressed by amino acids downstream of the homeodomain (HD), expansion of the QA-motif in the stem-lineage of Hexapods uncovered the repression motif in the amino-terminal region of Ubx allowing Dll repression and repression of abdominal appendages likely via the recruitment of a new co-repressor (shown in green) or stabilization of other cofactor associations. Wild-type crustacean (brine shrimp) Ubx weakly represses Dll expression. Note that Ubx expression in the velvet worm is restricted to the extreme posterior region of the final body segment. Ubx is expressed in all thoracic and abdominal segments of crustaceans (brine shrimp), Ubx expression in Hexapod insets (Fly and Beetle) is restricted to the abdomen and T3 and is weakly expressed in the posterior end of T2.

Similar results were found by Ronshaugen and colleagues (2002) by dissecting functional differences between Ubx from Drosophila and brine shrimp (Artemia). Ronshaugen and colleagues found the Dll repressor domain to be N-terminal to the homeodomain, and the QA-motif to be a facilitator of the repressor domain. Although the brine shrimp Ubx lacks the QA-motif and only has mild repression function, addition of a QA-motif in combination with experimental removal of several phosphorylation sites transforms the Artemia Ubx into strong repressor of leg development in Drosophila larva. Based on these results, Ronshaugen et al. (2002) proposed the gain of the QA-motif and the loss of serine/threonine phosphorylation sites in the common ancestor of insects uncovered a cryptic Dll-repression function in Ubx. Coupled with the restriction of Ubx expression to the posterior trunk, this novel function of Ubx contributed to the evolution of the insect body plan (Fig. 10).

Does the evolution of novel linear motifs affect global gene expression patterns, and therefore suffer the consequences of negative pleiotropy? Two recent studies of Ubx suggest they have few effects (Hittinger et al. 2005; Merabet et al. 2007). Hittinger et al. (2005) investigated the function of the QA-motif by deleting it in Drosophila using allelic replacement. Curiously, the deletion of the QA-motif (UbxΔQA) did not have an effect on Dll repression in the abdomen of UbxΔQA/UbxΔQA flies nor was it strongly pleiotropic in most other tissues. However, manipulating the dose of Ubx and AbdA (which is functionally redundant with Ubx with respect to Dll repression in the abdomen, Fig. 3) uncovered a crucial role for the QA-motif in imparting full Dll repression activity on Ubx and revealed only slightly more pronounced pleiotropic effects (Hittinger et al. 2005). Hittinger and colleagues (2005) concluded that the QA-motif was required for only a subset of Ubx-regulated developmental processes, and that the differential pleiotropy observed for the QA-motif might allow selection to alter development of characters with minimal pleiotropic fitness trade-offs.

In a similar study, Merabet and colleagues (2007) examined the ability of Drosophila Ubx to physically interact with Exd and drive expression of Ubx target genes. These authors found that mutation of the UbdA motif (UbxUbdA) dramatically reduced binding with Exd and the ability of Ubx to repress Dll, but that mutation of the hexapeptide motif (UbxHX) had no effect on the interaction with Exd or Dll repression. Although the UbxUbdA mutant lost its ability to repress Dll in the abdomen, it had no effect on the repression of the Ubx target genes spalt, Blistered/dSrf, and vestigial in the haltere. Unexpectedly, the UbxUbdA, UbxHX, and the double mutant UbxUbdA/HX all retained their ability to activate decapentaplegic, a well-characterized Ubx-Exd target gene. These results demonstrate that mutation of Ubx protein–protein interaction motifs do not have globally deleterious effects on Ubx target gene expression because different cofactor associations are used to regulate distinct sets of target genes and because Ubx-cofactor associations are tissue specific, which prevents widespread effects of mutations.


Like linear motifs, simple sequence repeats (SSRs) are evolutionary labile and often variable between species, thus they have been called “evolutionary knobs” that fine-tune transcription factor function (King et al. 1997; Kashi and King 2006). SSRs, particularly long runs of the same amino acid (homopolymeric repeats), are common in transcription factors and are significantly overrepresented in this class of genes (Karlin and Burge 1996; Mar Albà et al. 1999; Young et al. 2000; Alba and Guigo 2004). There is increasing evidence that glutamine, alanine, proline, and glycine repeats can mediate protein–protein, protein–DNA and protein–RNA interactions and thereby regulate gene expression (Mitchell and Tjian 1989; Emili et al. 1994; Gerber et al. 1994; Perutz 1994; Xiao and Jeang 1998) and several studies have shown that repeats vary in length between species (Mortlock et al. 2000) and are associated with morphological evolution (Emili et al. 1994; Fondon and Garner 2004). These data suggest SSRs play an active role in generating functional divergence and in phenotypic evolution in general.

In one of the earliest studies to demonstrate that homopolymeric amino acid repeats were functional, Gerber et al. (1994) showed that stretches of glutamine and proline could activate transcription when fused to the DNA-binding domain of the GAL4 transcription factor. In vitro, activity increased with repeat length, whereas in cell transfection assays maximal activity was achieved by 10–30 glutamines and ∼10 prolines. These authors proposed that homopolymeric amino acid stretches may be the main cause modulating transcription factor activity, but this suggestion has received little attention in developmental evolution.

Amino acid repeats are extremely rare in prokaryotes, however, in eukaryotes glutamine, asparagine, and alanine repeats are fairly common (Faux et al. 2005). Interestingly, polyglutamine repeats are common to both vertebrate and invertebrate proteins whereas polyasparagine repeats were rare in vertebrates (Faux et al. 2005). Mar Albà and Guigó (2004) analyzed repeat content in a large set (7039) of human–mouse–rat orthologs and found that a high proportion of repeats were species specific. Only 52% of mouse genes and 46.5% of rat genes had repeats conserved with human genes. Among human-specific repeats, polyalanine was most common whereas among rodent-specific repeats polyglutamine was most common (Alba and Guigo 2004).

SSRs are particularly abundant in proteins that regulate gene expression and evolve rapidly, yet few studies have examined the roles of SSRs in molecular or morphological divergence. In an elegant study of SSR variation in 17 developmental genes between 92 dog breeds, Fondon and Garner (2004) found high levels of tandem repeat variation and evidence that repeats were driven to fixation in breeds by selection. Although most of the variation between genes were small changes in repeat length, usually two or three amino acids, five genes had large expansions or contractions in SSRs, including Six-3, HoxA-7, Runx-2, HoxD-8, and Alx-4.

Although the function of most of these repeats is not known, previous developmental and biomedical studies in mice and humans suggested that mutations in Runx-2 could have phenotypic effects. The glutamine/alanine-repeat (QA-repeat) in Runx-2 is correlated with morphological divergence between closely related species, particularly the degree of dorsoventral nose bend (clinorhynchy) and midface length in dogs and other carnivores (Fondon and Garner 2004). Runx-2 regulates the rate and timing of bone development such that up regulation leads to acceleration and extension of bone development and down regulation leads to deceleration and truncation of bone development. These results suggest mutations that alter the activity of Runx-2, and not just its expression, may play a role in bone development. Opposing effects on transcriptional activity have been reported for polyglutamine and polyalanine repeats: polyglutamines have been observed to drive transcription (Gerber et al. 1994) and polyalanines to repress transcription in a length-dependent fashion (Briata et al. 1997; Brown and Brown 2004). Indeed, deletion of QA-repeat has previously been shown to dramatically reduce the transactivation function of Runx-2, with the glutamine stretch bearing the activation function and the alanine stretch possessing repression activity. Sears et al. (2007) specifically tested if variation in Runx-2 QA-repeat lengths altered its transcriptional activity assays using Runx-2 target gene reporter assays. The transcriptional activity of Runx-2 increased as the ratio of glutamines to alanines increased, thus expansion and contraction of the QA-repeat modulated Runx-2 transcriptional activity (Fig. 11). To our knowledge, this is the first experimental demonstration that SSRs actively contribute to normal developmental variation and not just disease (Stephens 2006). Interestingly, previous studies suggest that SSRs may adopt unique tertiary structures that are particularly well suited for intermolecular interactions (Hicks and Hsu 2004; Cubellis et al. 2005).

Figure 11.

Simple sequence repeats in the Runx2 gene are correlated with morphological evolution in dog skull shape (A, B) and transcriptional activity (C). The ratio of glutamines to alanines is positively correlated with the degree of dorsoventral nose bend (clinorhynchy) in purebred dogs (A). Purebred bull terrier skulls from 1931 (top), 1950 (middle), and 1976 (bottom). Analysis of the Runx2 repeats in the 1931 bull terrier revealed a more intermediate allele than in the contemporary bull terrier. Data in A and B are from Fondon and Garner (2004). The ratio of glutamines to alanines in Runx2 is positively correlated with transcriptional activity as assayed in a β-reporter assay driven from the Col10 promoter (Sears et al. 2007). The size of the arrow is drawn in proportion to transcriptional activity. Pie charts indicate the ratio of glutamines (dark gray) to alanines (black), the glutamine/alanine ratio is given below each pie chart. Note that when the glutamine/alanine ratio is 1, transcriptional activity is 1% of wild-type activity (Thirunavukkarasu et al. 1998).

Do homopolymeric amino acid repeats suffer the consequences of negative pleiotropy? Biomedical studies of repeat expansion diseases suggest that SSRs may have extremely few pleiotropic effects. For example, expansion of a polyalanine repeat in HoxD-13 by 7–14 residues causes synpolydactyly, a dominant developmental limb deformity characterized by duplication of fingers and webbing between fingers (Goodman et al. 1997; Kjaer et al. 2002; Zhao et al. 2007). No other organ or tissue systems are affected in synpolydactyly, suggesting SSR expansions have little effects on the functions of HoxD-13 outside of the autopod. Similarly, a recent study by Anan et al. (2007) found that transgenic mice missing an amniote-specific polyalanine tract in HoxD-13 had only one developmental defect: fusion of a single sesamoid bone in the wrist (Anan et al. 2007).

Functional Equivalence Revisited: Beyond Equivalent Morphological Outcomes

Although functional equivalence between transcription factors in particular developmental contexts has been instrumental in the rise of the cis-regulatory paradigm, is the ability to produce a particular character the only aspect of the phenotype that is relevant? Are there functions that are divergent between apparently equivalent proteins? If so, are we being misled by focusing exclusively on whether a particular protein can rescue the development of a specific morphological character? The answers to these questions are not so clear.

There have been few studies of the consequences of gene replacement experiments on the expression of all transcription factors target genes, therefore the genome-wide effects of gene swapping are still unclear. But, emerging data suggest that even transcription factors that are functionally equivalent in the development of some phenotypes, may be divergent in others. For example, human Otx1/2 genes can functionally rescue their Drosophila ortholog (otd) in nervous system development, correctly regulating otd target genes leading to a normal phenotype (Leuzinger et al. 1998; Nagao et al. 1998). The reverse, however, is only partially true. Otd can replace most functions of Otx1/2 in mouse nervous system development, but cannot rescue the development of the mesencephalon, the formation of cerebellar foliation, or the development of the lateral semicircular canals of the inner ear (Acampora et al. 1998). These developmental differences likely reflect differences in their ability to regulate downstream target genes. Montalta-He et al. (2002) directly tested conservation of otd/Otx2 target genes using whole genome microarrays in Drosophila. Amazingly, 287 genes responded to otd overexpression whereas 682 genes responded to Oxt2 overexpression; only 90 target genes were shared by otd and Otx2. (Montalta-He et al. 2002) concluded that these 90 shared target genes were responsible for the functional equivalence of otd and Otx2 in Drosophila development. Thus, even transcription factors that are functionally equivalent with respect to the development of a particular structure can be nonequivalent with respect to regulation of gene networks at the genomic level.


One of the foremost issues for the evolution of gene regulation, and developmental evolution in general, is how new gene regulatory links arise. Do they evolve more often by the origin of novel cis-regulatory elements or by the origin of novel functional specificities in transcription factors, or a combination of these mechanisms? And do some kinds of phenotypes arise more often by one mechanism than another? The ultimate source of gene regulatory evolution has long been thought to be cis-regulatory. However, there are several features of protein function and structure that reduce the negative pleiotropic effects of mutations, including tissue-specific expression, gene duplication, and alternative splicing to generate tissue-specific isoforms and there are many ways proteins can evolve novel functions, including domain shuffling and the gain/loss of PPIs. These aspects of proteins make them highly evolvable, yet able to minimize the negative fitness consequences that may arise from the evolution of a new function. Finally, there are modes of gene regulatory network evolution that strongly suggest novel PPIs initiated the recruitment of new target genes, rather than the origin of a new transcription factor binding sites (Fig. 1, e.g., Tuch et al. 2008b). Thus, the evolution of transcription factor proteins themselves, and not just their binding sites, plays an active role in the evolution of development.

Associate Editor: M. Rausher


The authors are extremely grateful to A. Monteiro, J. Townsend, M. Rausher, and S.B. Carroll and an anonymous reviewer for comments on an earlier version of this manuscript. This work is supported by a grant from the John Templeton Foundation. The opinions expressed in this report are those of the authors and do not necessarily reflect the view of the John Templeton Foundation.