Evolutionary and comparative analysis of MYB and bHLH plant transcription factors


(fax +1 614 292 5379; e-mail grotewold.1@osu.edu).


The expansion of gene families encoding regulatory proteins is typically associated with the increase in complexity characteristic of multi-cellular organisms. The MYB and basic helix-loop-helix (bHLH) families provide excellent examples of how gene duplication and divergence within particular groups of transcription factors are associated with, if not driven by, the morphological and metabolic diversity that characterize the higher plants. These gene families expanded dramatically in higher plants; for example, there are approximately 339 and 162 MYB and bHLH genes, respectively, in Arabidopsis, and approximately 230 and 111, respectively, in rice. In contrast, the Chlamydomonas genome has only 38 MYB genes and eight bHLH genes. In this review, we compare the MYB and bHLH gene families from structural, evolutionary and functional perspectives. The knowledge acquired on the role of many of these factors in Arabidopsis provides an excellent reference to explore sequence–function relationships in crops and other plants. The physical interaction and regulatory synergy between particular sub-classes of MYB and bHLH factors is perhaps one of the best examples of combinatorial plant gene regulation. However, members of the MYB and bHLH families also interact with a number of other regulatory proteins, forming complexes that either activate or repress the expression of sets of target genes that are increasingly being identified through a diversity of high-throughput genomic approaches. The next few years are likely to witness an increasing understanding of the extent to which conserved transcription factors participate at similar positions in gene regulatory networks across plant species.


Regulation of gene expression is central to all organisms. It provides a complex control mechanism by which plants respond to abiotic and biotic stresses, and modulate developmental processes. This regulation is coordinated by a number of mechanisms that involve DNA methylation, chromatin organization, dimerization and sequence-specific DNA binding, which may be executed by transcription factors (TFs). In addition to recognizing specific DNA motifs in gene regulatory regions, a feature that we consider essential for classification of a protein as a TF, they can activate or repress transcription, possibly by interaction with other proteins. Depending on protein–protein interactions and probably the particular chromatin context, a TF can function as an activator of one set of genes and as a repressor of others. A significant proportion of protein-encoding genes is dedicated to the control of gene expression. For example, the genome of Arabidopsis thaliana includes 27 416 protein-coding genes (TAIR10, http://arabidopsis.org), of which 6% (more than 1700) encode TFs. The function of a few TFs has remained conserved between plants and animals (separated by over a billion years of evolution). Examples include by members of the E2F family, which control core cell-cycle functions (Inzé and De Veylder, 2006). However, most other TFs have significantly diverged in function since the separation of plants and animals, and approximately 45% of Arabidopsis TFs belong to families that are specific to plants. As in animals, TF families have considerably expanded in particular plant lineages, suggesting that they are involved in the regulation of clade-specific functions (Dias et al., 2003; Shiu et al., 2005). These expansions significantly complicate the establishment of paralogous/orthologous relationships, making it particularly difficult to predict gene function on the basis of proximity in phylogenetic trees. Among the families of TFs that have significantly expanded in the past 600–100 million years are the MADS box proteins, basic-region leucine-zipper proteins (bZIP), and the MYB and bHLH families (Figure 1), which are described extensively below (Becker et al., 2000; Riechmann et al., 2000; Shiu et al., 2005; Chen and Rajewsky, 2007).

Figure 1.

 Eukaryotic phylogeny showing the distribution of selected TFs families.
MYB3R and R2R3-MYB numbers are included in the total number of MYBs. The tree topology reflects a consensus of recent large-scale phylogenies (e.g. Burki et al., 2008 for eukaryotes, and http://www.mobot.org/MOBOT/research/APweb/ for flowering plants). Divergence time estimates were obtained from the TimeTree database (Hedges et al., 2006; http://www.timetree.org). The sizes of TF families were estimated by performing profile searches using HMMER3 (http://hmmer.janelia.org/). Profiles used for searches were generated using the following seed alignments downloaded from Pfam (Finn et al., 2010): PF00010 (bHLH), PF00170 (bZIP), PF00249 (MYB) and PF00319 (MADS). The numbers of MYB3R (c-MYB-like) and R2R3-MYB proteins were estimated by searching using a profile generated by the R2R3-MYB sequence alignment described by Dias et al. (2003). This profile results in low E-values for MYB3R and R2R3-MYB proteins and high E-values for other MYB proteins. Proteins with E-values ≤ 10−20 in searches using the R2R3 profile were classified as MYB3R if they had three matches to the PF00249 profile and as R2R3-MYBs if they had two matches. Sequences with other numbers of MYB domains, sequences with domains that indicated they were not members of the MYB3R or R2R3 group, and sequences with E-values >10−20 were classified in the heterogeneous group of MYB proteins. Estimates of gene family sizes obtained using this protocol may differ slightly from those obtained using other approaches.

MYB transcription factors

Structure of MYB domain proteins

General structure The first gene described as encoding a MYB domain-containing protein was v-myb in the genome of avian myeloblastosis virus (AMV), which is also responsible for the name ‘MYB’ for this domain (Klempnauer et al., 1982). Following identification of the human c-MYB proto-oncoprotein and two related vertebrate MYB factors, A-MYB and B-MYB, MYB domain proteins have been found in all eukaryotic organisms studied to date (Lipsick, 1996; Kranz et al., 2000). In plants, the maize (Zea mays) C1 gene, which is responsible for the regulation of anthocyanin pigmentation (Paz-Ares et al., 1987), was the first plant TF as well as the first plant MYB gene cloned. Table 1 lists the expanded name for C1 and all other MYBs and bHLHs mentioned in this review.

Table 1.   Abbreviated names of proteins mentioned in this review, together withthe fully expanded names and other names that can be found in the literature or in AGRIS (http://arabidopsis.med.ohio-state.edu) (Davuluri et al., 2003)
Abbreviated nameExpanded and additional names
  1. Only TFs mentioned in this review are included.

  2. Am, Antirrhinum majus; At, Arabidopsis thaliana; Ca, Capsicum annuum; Eg, Eucalyptus gunii; G, Gerbera hybrida; Ga, Gossypium arboreum; Gh, Gossypium hirsutum; Gm, Garcinia mangostana; Go, Gratiola officinalis; Ib, Ipomea batatas; In, Ipomea nil; Lj, Lotus japonicus; Md, Malus domestica; Nt, Nicotiana tabacum; Os, Oryza sativa; Pm, Picea mariana; Ph, Petunia hybrida; Pp, Physcomitrella patens; Pt, Pinus taeda; Sb, Sorghum bicolor; Sl, Solanum lycopersicum; Vm, Veronica montana; Vv, Vitis vinifera; Zm, Zea mays.

HEC1/2/3HECATE1/2/3, AtbHLH088/037/043
MYB3R1,2PC-MYB1,2, AtMYB3R1, 2
PRE1/2/3/4/5PACLOBUTRAZOL RESISTANCE1/2/3/4/5, AtbHLH136/134/135/161/164
RL1/2/3/4/5/6RADIALIS-LIKE1/2/3/4/5/6, AtRL1/2/3/4/5/6
RSL2/3/4/5RHD6-LIKE2, AtbHLH085/084/054/139

The common feature of all MYB factors is the presence of one to four or more imperfect MYB repeats (R), which can function synergistically or individually in DNA binding and protein–protein interactions, respectively. Each MYB repeat is approximately 52 amino acids long, and contains three regularly spaced tryptophan (or aliphatic) residues that together form a hydrophobic core (Figure 2b,c) (Kanei-Ishii et al., 1990). Each repeat adopts a helix-turn-helix conformation, confirmed by NMR and X-ray crystallography (Ogata et al., 1996). The third α-helix (or DNA-recognition helix) of each repeat binds to the major DNA groove. In MYB domains with two or more MYB repeats, DNA binding involves synergistic DNA recognition by the third α-helix of the individual MYB repeats (Ogata et al., 1996; Jia et al., 2004).

Figure 2.

 Structure of the plant MYB domain proteins and the conserved sequence of their DNA binding domain.
(a) Schematic representation of the general structure of plant MYB3R and R2R3-MYBs. The MYB repeats are shown in green and additional conserved domains (the DUF3351 domain in CDC5 homologs and the DnaJ domain of MIDA/Zuotin homologs) are shown in red. As an example of the C-terminal regions that are present in various groups of R2R3-MYBs, the P_C domain (PF06640), which is present in the ZmP1 protein (Grotewold et al., 1994), is shown.
(b, c) Sequence logos of the general MYB (SANT) domain (b) and R2R3-MYB domains (c) based on the profiles used to generate Figure 1. Relative entropy is measured in bits; high entropy values indicate a high degree of certainty that the corresponding amino acid is present in MYB homologs. Conserved tryptophan residues are indicated by stars. Conserved helices are shown below each logo. The width of the amino acid positions indicates the probability of insertions (wider) and deletions (narrower). Logos were generated using the method described by Schuster-Boeckler et al. (2004), available at http://www.sanger.ac.uk/cgi-bin/software/analysis/logomat-m.cgi.

Plant MYB3R proteins. Plant MYB proteins can be classified into three major groups. A small number of MYBs, encoded by small gene families in Arabidopsis and other plants (Figure 1), correspond to MYB3R (Stracke et al., 2001) or pc-MYB (Braun and Grotewold, 1999) proteins, and are characterized by three imperfect MYB repeats (R1, R2 and R3) (Figure 2a). Although not formally tested for plant MYB3R proteins, it is likely, based on results obtained with c-MYB and v-MYB (Dini and Lipsick, 1993) that R1 is not essential, but contributes to DNA-binding affinity. Each repeat is more closely related to corresponding repeats in other gene family members than to other repeats within the same protein (Ito, 2005). In addition, the R2 sequence, particularly of the DNA-recognition helix, is present in both plant MYB3R proteins and their vertebrate counterparts (Kranz et al., 2000).

Plant R2R3-MYB proteins. The largest group of plant MYB factors are R2R3-MYBs, containing two MYB repeats that are most similar to R2 and R3 from c-MYB (Figure 2a). This family includes hundreds of members in all the terrestrial plants that have been investigated (Figure 1). Although the MYB domains are very conserved within R2R3-MYBs, the C-termini are variable (Figure 2a), often containing transcriptional activation or repression domains and conserved serine and threonine residues, which may correspond to post-translational modification sites (Martin and Paz-Ares, 1997). Many R2R3-MYB proteins have C-terminal transcriptional activation domains (TADs), with the first identified in ZmC1 (Goff et al., 1991). ZmC1 has an acidic TAD, and extensive mutagenesis experiments identified several important amino acids, one of which is essential for TAD function (Asp256) (Sainz et al., 1997b). Similar domains were also described in other R2R3-MYBs (Urao et al., 1996; Liu et al., 1999). The Arabidopsis paralogs GL1 and WER share a conserved motif in their C-termini. In GL1, this motif is necessary for transcriptional activation (Larkin et al., 1993; Esch et al., 1994), which suggests evolutionary conservation of a TAD in the C-termini of these paralogs (Lee and Schiefelbein, 2001; Yoshida et al., 2009).

Recurring amino acid motifs in the C-termini have been used to divide the large Arabidopsis R2R3-MYB family into 22 sub-groups (Kranz et al., 1998; Stracke et al., 2001; Dubos et al., 2010). This classification can also be applied to R2R3-MYB proteins from other plants, although some differences exist. Variations in the numbers and sizes of sub-groups have been identified by comparative phylogenetic studies, suggesting specialization after divergence from the last common ancestor (Stracke et al., 2001; Wilkins et al., 2009). Interestingly, most of the protein sequence motifs used for R2R3-MYB classification appear to only be present in plant MYBs. Recurring motifs are used to identify and classify MYBs from expressed sequence tags (EST) or genomic sequences of other plant species.

Other plant MYB proteins. The third MYB group has more structural diversity, with the number of MYB repeats ranging from one to four. Some of these MYB proteins are broadly distributed across the eukaryotic branch of the tree of life (e.g. SNAP190, CDC5 and MIDA1; Figure 2a), and may contain additional protein domains. Other group members contain only a single MYB repeat. Some are plant-specific, but proteins harboring the MYB-related SANT domain (nomenclature based on the yeast Swi3, yeast Ada2, human N-CoR and yeast TFIIIB proteins) (Aasland et al., 1996) are broadly distributed. SANT domains are mainly found in proteins involved in chromatin functions, although they are unlikely to bind DNA (Gruene et al., 2003). Instead, SANT domain proteins often recognize histone tails (Boyer et al., 2002, 2004).

Several sub-groups of single MYB (R-MYB) proteins have been described (Table 2). The first includes proteins with a R3-like domain (Kirik and Baeumlein, 1996), which often act as inhibitors by competing with R2R3-MYBs for co-regulators, particularly the bHLH type (see below) (Schellmann et al., 2002; Kirik et al., 2004b; Wester et al., 2009). In Arabidopsis, these R3-MYBs are often involved in plant morphogenesis (Simon et al., 2007; Pesch and Huelskamp, 2009) or secondary metabolism (Dubos et al., 2008; Matsui et al., 2008). The second sub-group includes factors that have been proposed to be related to the evolutionarily older R1/R2 class (Feldbruegge et al., 1997; Romero et al., 1998). Examples include the CCA1 and LHY proteins involved in circadian clock control (Lu et al., 2009). Members of this second type bind DNA, and some contain a SHAQK(F/Y) motif in their DNA-recognition helix (Wang et al., 1997; Lu et al., 2002, 2009). SHAQKY-type R-MYBs are also found in the amoeba Dictyostelium discoideum, indicating broad distribution of this group (Figure 1) (Fukuzawa et al., 2006). The third sub-group is the GARP class (nomenclature based on the maize G2, Arabidopsis ARRs, Chlamydomonas reinhardtii PSR1 and Arabidopsis PHR1 proteins), which have a variety of functions, including control of organ polarity and response to phosphate starvation (Hosoda et al., 2002; Bustos et al., 2010). The fourth sub-group includes proteins related to snapdragon (Antirrhinum majus) AmRAD, and includes R-MYBs that participate in flower and fruit development (Barg et al., 2005; Baxter et al., 2007). Finally, there is a group that includes maize IBP and Arabidopsis RTBP1, which recognize the plant telomeric DNA consensus TTTAGGG (Lugert and Werr, 1994; Yu et al., 2000; Marian et al., 2003). Although some R-MYBs have DNA-binding activity, others lack conserved amino acids that are normally required for DNA binding. Moreover, some R-MYBs have hydrophobic residues in the predicted DNA-recognition helix (Boyer et al., 2004; Clapier and Cairns, 2009), or lack positively charged surface residues, presumably interfering with DNA binding (Hanaoka et al., 2001). As a result, R-MYBs appear to be functionally divergent from MYB3R or R2R3-MYB proteins, and may control gene expression directly or indirectly through histone modifications and chromatin remodeling, rather than direct DNA contact (Boyer et al., 2002, 2004; Marian and Bass, 2005; Clapier and Cairns, 2009).

Table 2.   Functions of plant MYBs described in this review
Phylogenetic groupArabidopsis thaliana functionGenes and functions in other species
  1. The various sub-groups are defined in Dubos et al. (2010). For a full list of proteins and references, please refer to Table S1.

Sub-group 4Transcriptional repressors, inhibit proanthocyanidin synthesis: MYB3, MYB4, MYB7, MYB32Anthocyanins/flavonols: FaMYB1, lignin: AmMYB308, AmMYB330, EgMYB1, ZmMYB31, ZmMYB42:, isoprenoid and flavonoid biosynthesis: conifers (sub-group expanded)
UV-dependent sinapate ester biosynthesis: MYB4
Pollen-wall composition: MYB32
Sub-group 5Proanthocyanidins/tannins in seed coat: MYBTT2Proanthocyanidins: LjTT2a, b, c, PtMYB134
Anthocyanins: GMYB10, MdMYB1, MdMYB9/TT2, MdMYB10, MdMYB11, PhMYBAN4, PmMBF1, VvMYB1a, ZmC1 (COLORLESS1), ZmPl1
Sub-group 6Anthocyanin biosynthesis (vegetative tissues): PAP1, PAP2AmROSEA1, CaA, GmMYB19, GtMYB3, IbMYB1, InMYB1, 2, 3, PhMYBAN2, SlANT1
Secondary cell-wall formation (inflorescence stem): PAP1
Others: MYB113, MYB114
Sub-group 7Flavonol biosynthesis, all tissues: PFG2, PFG1, PFG3VvMYBF1
Phlobaphenes, 3-deoxy-flavonoids: SbY1, ZmP1/P2
Sub-group 9Petal epidermis cell shape: MIXTAAmMIXTA (conical epidermal cells), AmMYBML1 (trichome formation, conical cells, mesophyll cells)
Early inflorescence development, seed germination: MYB17
Negative regulator of trichome branching: NOKGhMYB25 (cotton fiber initiation)
Sub-group 15Epidermal cell-type determination
Trichome initiation in shoots: GL1, MYB23GhMYB1, GaMYB2
Root hair patterning, stomata formation: WER, positive regulation of MYB23 by WER and positive feedback loop
Trichome extension and branching: MYB23 together with MYB5
Sub-group 16Seedling hypocotyl elongation (far-red light): LAF1 
Others: MYB19, MYB45
Orphans/atypicalSeed coat differentiation, proanthocyanidins/tannins: MYB5Activator of vacuolar acidification, anthocyanins: PhPH4
Trichome branching: MYB5, with MYB23
Stomata cell divisions, terminal differentiation, cell cycle: FLP and MYB88
MYB20, MYB42Cell-wall formation, lignin biosynthesis: PtMYB1
MYB83Secondary cell-wall formation and lignin biosynthesis: EgMYB2
Shoot morphogenesis and leaf patterning: AS1Dorsi-ventral polarity of leaves: ZmRS2
CPC-like (R2R3 origin)Trichomes/root hairs: TRY, CPC, ETC1, ETC2, ETC3, TCL1, MYBL2 
Negative regulator of anthocyanin biosynthesis: CPC
Negative regulator of flavonoid biosynthesis: MYBL2
CCA1-like (R1/R2 origin)Circadian clock: CCA1, LHYPpCCA1a,b
GARP-likePhosphate starvation response: PHR1
RAD-likeRL1, RL2, RL3, RL4, RL5, RL6,Promotion of asymmetric petal development: AmRAD, VmRAD, GoRAD; early fruit development: SlFSM
MYB3RCell cycle, G2/M phase transition: MYB3R1, MYB3R2, MYB3R3, MYB3R4, MYB3R5NtmybA1, A2, OsMYB3R-2
UnusualCell cycle, pre-mRNA splicing and transcriptional regulation of cyclins: CDC5Asymmetric cell division, in higher plants in reproductive tissues: GlsA
Unknown: MYB4R
Cell growth and division, DnaJ domain: MIDA1

Evolution of MYBs in plants

Origin of R2R3-MYB proteins What is the origin of the R2R3-MYB genes, one of the largest groups of plant regulatory genes? Two hypotheses have been proposed to explain the relationship between R2R3-MYB and MYB3R genes. The first hypothesis takes into consideration that MYB3R proteins are closely related to vertebrate c-MYB and similar proteins in other eukaryotic groups, such as slime molds and ciliates (Braun and Grotewold, 1999; Yang et al., 2003). These primordial proteins are thought to have existed before the divergence of animals and plants (Kranz et al., 2000). However, it is not absolutely clear whether R2R3-MYB proteins originated recently from MYB3R proteins by the loss of R1 (Braun and Grotewold, 1999; Dias et al., 2003), or by domain duplication and subsequent gain of R1 in an ancient R2R3 predecessor, this hypothesis proposes that R2R3 is a precursor of MYB3R (Jiang et al., 2004a). Although two- and three-repeat MYB proteins are found throughout the eukaryotic tree of life, resolving the history of MYB repeats is difficult due to their divergence. Each hypothesis requires both gene loss and domain loss, which are important but poorly understood evolutionary processes (Braun, 2003).

Expansion of R2R3-MYB proteins. Plant R2R3-MYB proteins underwent an extensive and recent amplification around the time of the origin of land plants approximately 500 million years ago, well before the separation of monocots and dicots (Rabinowicz et al., 1999; Rensing et al., 2008). This explains why many members of the MYB family control plant-specific functions, suggesting that the expansion may have occurred in response to selection for the regulation of processes related to the sessile lifestyle of land plants. The expansion of several regulatory gene families of TFs in plants is greater than that in in animals, partly as a result of more recent whole-genome duplications (Shiu et al., 2005; Chen and Rajewsky, 2007). Further segmental tandem duplication of large chromosome regions has been shown to play a prominent role in the evolution of Arabidopsis MYB genes, in particular sub-groups 6 and 7 (Cannon et al., 2004; Stracke et al., 2007; Matus et al., 2008), and also in poplar (Populus trichocarpa), in which more than 35% of the R2R3-MYB genes are part of tandem repeats (Wilkins et al., 2009). Other examples of MYBs amplified by tandem or whole-genome duplications include the maize genes ZmP1/ZmP2 (Zhang et al., 2000; Dias et al., 2003) and ZmC1/ZmPl1 (Cone et al., 1993), respectively. Moreover, a recent comparison of the genomes of Arabidopsis thaliana and Arabidopsis lyrata suggests intra-chromosomal recombination of genes, which belong to certain gene families, in conjunction with tandem gene duplication (Woodhouse et al., 2010).

It is well established that gene duplication can result in at least three functional outcomes: duplicated genes preserve the same function as prior to duplication (sub-functionalization), one of the duplicates acquires a new function (neo-functionalization), or one of the duplicates becomes non-functional (Lynch and Conery, 2000). Sub-functionalization after gene duplication has been observed for many MYB TFs, and is often associated with partition of gene expression domains (e.g. WER and GL1) (Lee and Schiefelbein, 2001). An interesting example is provided by the AmMIXTA and AmMYBML1 genes. AmMIXTA controls the conical shape of petal epidermal cells (Glover et al., 1998), but AmMYBML1 regulates the formation of leaf hairs (trichomes) and conical-cell and mesophyll-cell morphogenesis in ventral petals (Perez-Rodriguez et al., 2005; Baumann et al., 2007). However, there is no evidence that AmMIXTA can complement an Ammybml1 mutant, so it remains to be established whether this is a case of neo- or sub-functionalization. Examples of neo-functionalization are rare, in part because of the difficulties associated with reconstructing ancestral states. Phylogenetic analyses of MYB proteins regulating Arabidopsis trichome development show a close relationship of these proteins to regulators of the flavonoid pathway. Arabidopsis MYB regulators of trichome formation may have derived from gene duplication and subsequent divergence events of an anthocyanin biosynthesis regulator, implying a case of neo-functionalization, and perhaps multiple origins for the control of trichome formation (Serna and Martin, 2006). Furthermore, based on functional and structural relationships between maize flavonoid R2R3-MYB regulators, it has been proposed that, following gene duplications, mutations associated with partial loss of function, e.g. deficiency of protein–protein interactions, may have driven metabolic diversity through accumulation of pathway intermediates. Such a model implies that R2R3-MYB sub-functionalization may precede the appearance of new functions, as predicted by the sub-neo-functionalization model (He and Zhang, 2005). This has potential to drive the formation of new metabolic pathways (Grotewold, 2005), therefore broadening phenotypic diversity (Chatterjee and Yuan, 2006).

Gene structure of R2R3-MYBs. In addition to conserved R2R3-MYB functions, the gene structure organization is also maintained throughout various plant families. For instance, comparison of the intron/exon structure of R2R3-MYB genes from grapevine (Vitis vinifera) and Arabidopsis showed that exons 1 and 2 encode almost the entire R2R3-MYB domain in both species (Matus et al., 2008), with only few exceptions, including the multi-exonic genes MYB88 and FLP (Lai et al., 2005), which have been previously classified as atypical, based on their evolutionary origin (Dias et al., 2003). Furthermore, a similar intron/exon structure between maize and rice (Oryza sativa) R2R3-MYB factors is evident (Dias et al., 2003), as well as between Arabidopsis and members of the Rosaceae (Jiang et al., 2004b; Lin-Wang et al., 2010). Indeed, the lengths of the first two exons among these genera are very similar and appear to be highly conserved (Matus et al., 2008) compared to the 3rd exon (and sometimes the 4th and 5th exons), which may be highly variable in length and sequence (Stracke et al., 2001; Dubos et al., 2010; Lin-Wang et al., 2010). The restricted length of the first two exons could, to a certain extent, account for the conservation of the MYB domain throughout plant evolution, whereas the less restricted C-terminal region of R2R3-MYB proteins was able to rapidly diverge (Matus et al., 2008).

DNA-binding specificity. DNA binding by most MYB proteins (R2R3-MYBs and MYB3Rs) involves dimerization of two recognition helices that are tightly packed together (Sakura et al., 1989). The DNA sequence motifs recognized by plant and vertebrate MYB proteins differ, and it has been difficult to identify specific DNA-contacting residues that participate in the distinct DNA-binding specificities (Williams and Grotewold, 1997). Similar to several other R2R3-MYB proteins (Sablowski et al., 1994; Romero et al., 1998), maize ZmP1 binds the consensus DNA sequence CC(T/A)ACC (Grotewold et al., 1994), whereas vertebrate MYB proteins bind the (C/T)AACGG motif (Howe and Watson, 1991; Weston, 1992). In contrast, a single amino acid change from Leu71 to Glu was sufficient to change the dual DNA-binding specificity of PhMYB3 from ANNC(G/C)GTTA to AGTTAGTTA (a c-MYB binding site) (Solano et al., 1995, 1997). Analysis of representative R2R3-MYBs from Arabidopsis identified unique binding specificities for particular sub-families, suggesting similar DNA-binding preferences within the same phylogenetic group (Romero et al., 1998). Related DNA-binding preferences of MYB proteins controlling different pathways may be specified by combinatorial interactions with other TFs (Ramsay and Glover, 2005). In comparison, FLP controls the last step in the Arabidopsis stomatal pathway (Lai et al., 2005), and has unusual DNA-binding specificity that is very distinct from the canonical plant or animal MYB consensus sequences (Xie et al., 2010). In addition to featuring an unusual intron/exon structure and some characteristics of MYB3R proteins the third α-helix of R3 is not conserved. Similarly, AS1 also has a non-conserved R3 third α-helix, and depends on the zinc-finger AS2 for DNA binding and repression of its target genes (Guo et al., 2008).

Regulation of MYB DNA binding also depends on redox control. Interestingly, while animal and several plant R2R3-MYB proteins have a single conserved R2 cysteine residue, most R2R3-MYB domains have two proximal cysteines. In ZmP1, they form an intramolecular disulfide bond in vitro under oxidative conditions, preventing DNA binding (Williams and Grotewold, 1997; Heine et al., 2004). Nevertheless, animal proteins with a single cysteine in R3 are redox-controlled (Myrset et al., 1993; Brendeford et al., 1997), suggesting the existence of mechanisms other than intramolecular disulfide bond formation in MYB protein redox regulation. In addition, cysteine nitrosylation and phosphorylation of DNA-binding domain residues can result in alteration of the DNA-binding specificity, as shown for snapdragon and Arabidopsis R2R3-MYB proteins (Moyano et al., 1996; Serpa et al., 2007).

Functions of MYB proteins in plants

MYB3R proteins and cell-cycle regulation Plant MYB3R factors are associated with the transcriptional control of cyclins, especially in late G2 and M phase (Ito et al., 2001; Ito, 2005). Indeed, many plant cell-cycle genes harbor the M phase-specific activator (MSA) motif (T/C)C(T/C)AACGG(T/C)(T/C)A in their regulatory region, which is recognized by plant and animal MYB3R proteins (Ito et al., 2001; Araki et al., 2004; Suzuki et al., 2006; Haga et al., 2007; Ma et al., 2009). Over-expression of NtMybA1 and NtMybA2 in tobacco (Nicotiana tabacum) BY-2 cells or their homologs AtMYB3R1 and AtMYB3R4 in Arabidopsis plants resulted in activation of many cytokinesis-related genes (Haga et al., 2007; Kato et al., 2009). OsMYB3R-2 controls a cyclin involved in the G2/M transition at low temperatures, and also participates in DREB/CBF pathway regulation (Ma et al., 2009), increasing tolerance to freezing, drought and salt stress (Dai et al., 2007). This role of plant MYB3R factors contrasts with that of their vertebrate counterparts, which have been primarily implicated in control of the G1/S transition (Egoh et al., 2010).

However, cell-cycle control may not be unique to MYB3R factors. Chromatin immunoprecipitation followed by microarray (ChIP-chip) experiments demonstrated that FLP binds and regulates several cell-cycle genes that are involved both in the G2/M and G1/S transitions (Xie et al., 2010), but are particularly associated with the last step of the division/differentiation of cells of the Arabidopsis stomatal lineage. These results suggest that, although MYB3R factors appear to be generally involved in cell-cycle regulation, proteins such FLP may control more specialized aspects of the cell cycle, perhaps being part of a link between cell-cycle regulation and differentiation.

R2R3-MYBs and the regulation of plant form and metabolism. The functions of R2R3-MYB proteins are primarily related to regulation of plant form and plant metabolism (Table 2). In addition to phenylpropanoid biosynthesis control, R2R3-MYBs have been implicated, among many other functions, in plant hormone- and pathogen-mediated stress responses, control of glucosinolate biosynthesis, and organ determination (Table S1). Other functions of R2R3-MYBs involve the control of cell shape and the formation of root hairs and trichomes. To clarify how evolutionary forces have shaped plant R2R3-MYB function, we describe how duplication/divergence of particular MYBs may have affected the differentiation of these highly specialized epidermal cells in a number of plant species. Arabidopsis trichome development is directed by the interplay of specific sub-groups of R2R3-MYB and bHLH factors, in cooperation with the WD repeat (WDR) protein TTG1, forming the MYB–bHLH–WDR (MBW) complex (Figure 4b) (Walker et al., 1999; Pesch and Huelskamp, 2009). Various MBW complexes control trichome initiation (Higginson et al., 2003; Ishida et al., 2008; Zhao et al., 2008; Morohashi and Grotewold, 2009), trichome branching (Kirik et al., 2005; Li et al., 2009) and root hair formation (Figure 4b) (Tominaga et al., 2007; Ishida et al., 2008; Tominaga-Wada et al., 2009). Some of these functions are conserved in other plants, e.g. cotton fiber development (Machado et al., 2009; Zhang et al., 2010). For instance, R2R3-MYBs GhMYB1 (Gossypium hirsutum) or GaMYB2 (Gossypium arboreum) rescued the Arabidopsis gl1 mutant phenotype (Wang et al., 2004). Homologs of other regulators have also been described in several cotton species (Wang et al., 2004; Humphries et al., 2005; Guan et al., 2008). Despite the similarities, the cotton homolog GhMYB25 of the R2R3-MYB NOK, which negatively controls Arabidopsis trichome branching, was identified as a main player in cotton fiber initiation (Jakoby et al., 2008). However, over-expression of GhMYB25 in tobacco increased branching of trichomes (Wu et al., 2006), whereas over-expression in cotton plants resulted in an augmented number of fiber initials (Machado et al., 2009). These results suggest that GhMYB25 plays a key role in cotton fiber development, especially at the initiation phase (Zhang et al., 2010). AmMIXTA controls the conical shape of petal epidermal cells, but, similarly to GhMYB25, promotes trichome formation when ectopically expressed in tobacco. In addition, GhMYB25 also induces the formation of conical cells on all aerial epidermal surfaces, indicating a common developmental pathway for epidermal cell types in these plants (Glover et al., 1998; Payne et al., 1999; Perez-Rodriguez et al., 2005; Baumann et al., 2007). When considered together, these results suggest that the well-described Arabidopsis trichome formation mechanism may serve as a model for other regulatory networks such as cotton fiber development, after taking in account physiological differences (Zhao et al., 2008; Zhang et al., 2010).

Figure 4.

 Representation of MBW or MYB–bHLH complexes and their function in plants.
(a) Sequence logo of the interaction motif found in the R3 repeat of R2R3-MYB domain proteins. The profile is based on alignment of all bHLH-interacting MYBs mentioned in this review. For more information on the logo, see Figure 2. The conserved amino acids important for interaction specificity are indicated by arrows. Amino acids that appear to be conserved but are not indicated by arrows are conserved also in non-bHLH-interacting R2R3 MYB proteins (see Figure 2).
(b) Schematic representation of protein complexes of plant bHLH proteins (red), MYB domain proteins (blue) and WDR proteins (green) mentioned in this review. Dotted lines indicate that an interaction has been proposed but not experimentally proven. The Roman numerals adjacent to these complexes refer to the references listed below the figure. Some complexes are referred to in a more general fashion in the text (e.g. regulation of anthocyanins).

Another example in which the functional divergence of conserved R2R3-MYB proteins is evident is the regulation of particular branches of flavonoid accumulation. Maize ZmP1 is a member of the recently amplified sub-group 7, whose members in Arabidopsis and grapevine control flavonol accumulation, and possibly also include early flavonoid biosynthetic genes (Table 2) (Mehrtens et al., 2005; Czemmel et al., 2009). In contrast, ZmP1 activates a subset of flavonoid pathway genes required for biosynthesis of phlobaphene pigments and 3-deoxy-flavonoids (Grotewold et al., 1994). A similar function is performed by the Sorghum bicolor protein SbY1, which is highly similar to ZmP1 (Chopra et al., 1999; Ibraheem et al., 2010). However, the recent finding that ZmP1 can also control a flavonol synthase (ZmFLS1) gene (Ferreyra et al., 2010) suggests that what was previously perceived as functional divergence between monocot and dicot P1-like factors may actually not be so. Regulators of Arabidopsis anthocyanin biosynthesis are also conserved in maize, petunia and tobacco (Table 2, sub-groups 5 and 6) (Cone et al., 1993; Quattrocchio et al., 1999; Borevitz et al., 2000; Pattanaik et al., 2010). Based on sequence homology, orthologous factors controlling the anthocyanin pathway were also identified in less-well studied plants, such as mangosteen (Palapol et al., 2009). Other branches of the phenylpropanoid pathway include proanthocyanidin (PA) biosynthesis, which is controlled by TT2 in the Arabidopsis seed coat (Nesi et al., 2001). In Lotus japonicus, the TT2 orthologs LjTT2a–c (Yoshida et al., 2010) have also been shown to control the same subset of genes. Recently, isoprenoid and flavonoid biosynthesis in conifers such as white spruce (Picea glauca) and pine (Pinus teada), was shown to be regulated by several R2R3-MYBs of sub-group 4, a sub-group that expanded following the divergence of gymnosperms and angiosperms (Table 2) (Bedon et al., 2010). In maize, the ZmMYB31 ortholog negatively regulates several genes involved in the biosynthesis of monolignols (Fornaléet al., 2006; Sonbol et al., 2009). In vitro SELEX experiments identified the DNA consensus sequence of ZmMYB31 as ACC(T/A)ACC, and ChIP demonstrated interaction with two lignin gene promoters in vivo (Fornaléet al., 2010). Further results suggest that repression of lignin biosynthesis may result in redirection of carbon flux toward another branch of the phenylpropanoid pathway, the biosynthesis of anthocyanins (Fornaléet al., 2010). In addition, cell-wall formation and lignin biosynthesis are regulated by the R2R3-MYB TFs PAP1 in Arabidopsis (Bhargava et al., 2010), PtMYB1 in pine (Pinus taeda) (Patzlaff et al., 2003) and EgMYB2 in eucalyptus (Goicoechea et al., 2005).

Functions of unusual MYB proteins. The presence of two or three MYB domains does not automatically place a specific MYB protein into the R2R3 or MYB3R categories. For instance, orthologs of the fission yeast CDC5 gene encode two MYB repeats followed by a third MYB-like repeat (Figure 2a) (Ohi et al., 1998). The origin of CDC5 appears to differ from that of typical R2R3 or MYB3R TFs, as does its conserved role in pre-mRNA splicing (Burns et al., 1999; McDonald et al., 1999; Lei et al., 2000). Further, CDC5 is involved in cyclin transcriptional regulation, controlling the G2/M transition phase (Stracke et al., 2001; Lin et al., 2007). CDC5 can also function as a TF, and has been shown to recognize the DNA-binding consensus CTCAGCG, raising the possibility that plant CDC5 has multiple roles (Hirayama and Shinozaki, 1996; Palma et al., 2007).

Although Arabidopsis and many other plants contain an MYB protein with four repeats (MYB4R) (Dubos et al., 2010), similar to human SNAP190, which is part of a basal TF complex involved in small nuclear RNA (snRNA) gene transcription through interaction with the TATA binding protein, the function of such proteins in plants remains unknown (Wong et al., 1998; Hovde et al., 2002; Hinkley et al., 2003). Another protein, MIDA1, contains two C-terminal MYB repeats and an N-terminal DnaJ domain with similarity to the fungal Zuotin proteins, and is associated with cell growth and division (Figure 2a) (Shoji et al., 1995; Braun and Grotewold, 2001). In Volvox carteri, a mutation in the MIDA1 ortholog GlsA results in asymmetric cell divisions (Miller and Kirk, 1999; Cheng et al., 2003), and further orthologs have been identified in higher plants, which show preferential expression in reproductive tissues (Mori et al., 2003). However, it is the DnaJ domain rather than the MYB domain that is involved in most of the described functions (Cheng et al., 2003, 2006; Mori et al., 2003).

Functions of single MYB repeat proteins. A large number of single MYB repeat (R-MYB) proteins have been described recently. The redundantly acting regulators of trichome and root hair formation in Arabidopsis belong to the best described group (R3-MYB, Table 2) (Wada et al., 1997; Schellmann et al., 2002; Pesch and Huelskamp, 2009). By moving from one cell to another (Schellmann et al., 2002; Kirik et al., 2004b; Zhao et al., 2008), they can modulate trichome and root hair patterning through interference with protein complexes that promote cell fate determination (Wada et al., 1997; Serna and Martin, 2006; Li et al., 2009). R-MYB proteins are also involved in control of the circadian clock (Lu et al., 2009), which is probably an ancient function, as homologs of Arabidopsis CCA1/LHY (PpCCA1a and PpCCA1b) participate in photoperiod-related cell growth in the moss Physcomitrella patens (Okada et al., 2009a,b).

Control of floral asymmetry in snapdragon involves the R-MYB AmRAD, which is specifically expressed in dorsal petals (Corley et al., 2005). Homologs of AmRAD have been identified in several other plants that have asymmetric flowers, such as Veronica montana (VmRAD) and Gratiola officinalis (GoRAD), in which they are highly expressed in pre-anthesis flowers (Preston et al., 2009). In tomato (Solanum lycopersicum), the AmRAD homolog SlFSM1 is specifically expressed in the very early stages of fruit development, suggesting a role other than flower form determination (Barg et al., 2005). Furthermore, three rice R-MYB proteins (OsMYBS1–3) regulate expression of the α-amylase gene in response to sugar and hormone signals by binding to the DNA consensus motif TATCCA (Lu et al., 2002).

bHLH proteins

Structure of plant bHLH proteins

General features of the bHLH domain bHLH proteins are the second largest class of plant TFs. Since discovery of the bHLH motif in the murine muscle development TFs E12 and E47 (Murre et al., 1989), structural and functional analyses of dozens of proteins from yeast, animals and plants have shown how important this domain is for the transcriptional regulation of genes that participate in many essential physiological and developmental processes. The Arabidopsis genome contains more than 150 bHLH proteins, and about 30% have been functionally characterized to varying degrees (Toledo-Ortiz et al., 2003; Pires and Dolan, 2010). Analysis of Arabidopsis bHLH proteins has lead to a steady increase in the characterization of similar regulators from other plant species, including important crops, and more than 630 bHLH proteins from a diverse pool of photosynthetic eukaryotes have been reported (Carretero-Paulet et al., 2010).

The bHLH domain is highly conserved and comprises approximately 60 amino acids with two functionally distinct regions (Figure 3). The basic region at the N-terminus contains 13-17 primarily basic amino acids and binds to the hexanucleotide E-box DNA motif CANNTG, where N corresponds to any nucleotide. bHLH domains with at least five basic amino acids in the basic region and a highly conserved HER motif (His5–Glu9–Arg13), which is found in more than 50% of plant bHLHs, are predicted to bind DNA (Figure 3) (Atchley and Fitch, 1997; Massari and Murre, 2000; Toledo-Ortiz et al., 2003). Recent findings have shown that some Caenorhabditis elegans bHLHs also bind to non-canonical or E-box-like sequences (e.g. CACGCG and CATGCG), and that the DNA sequences flanking these hexamers contribute to DNA-binding specificity (Grove et al., 2009). Such properties have yet to be explored for plant bHLH factors. The helix-loop-helix (HLH) region comprises two amphiphatic α-helices, mainly consisting of hydrophobic amino acids, which are connected by a loop of variable length (Figure 3). Proteins containing the HLH motif often form homo- or heterodimers with other bHLH proteins, which is a prerequisite for DNA recognition and contributes to DNA-binding specificity. The highly conserved Leu23 residue is structurally necessary for dimer formation in the human MAX and Arabidopsis PAR1 proteins, with the latter also requiring a conserved Leu52 residue (Figure 3) (Brownlie et al., 1997; Carretero-Paulet et al., 2010). Interestingly, the bHLH domain of some proteins is also able to interact with non-bHLH proteins (Herold et al., 2002; Hernandez et al., 2007). Although additional amino acids possibly dictate protein–protein interaction specificities (Littlewood and Evan, 1998; Massari and Murre, 2000; Ciarapica et al., 2003), more research is required to precisely understand how dimer formation is established.

Figure 3.

 Sequence logo of the bHLH domain.
The profile is based on the full-length alignment of all bHLH proteins from the Pfam database used for the search shown in Figure 1. Relative entropy is measured in bits; high entropy values indicate a high degree of certainty that the corresponding amino acid is present in bHLH homologs. The H5, E9 and R13 amino acids in the basic domain that are important for DNA binding are indicated by stars. Amino acids important for dimerization of the helix-loop-helix domain are indicated by arrows. The logo was generated as described for Figure 2.

bHLH proteins have been classified on the basis of several characteristics. Originally, animal bHLH proteins were organized based upon tissue distribution, dimerization capability and DNA-binding specificities (Murre et al., 1994). The more frequently used classification is based on evolutionary relationships, and takes into account the DNA-binding specificity (if known), the conservation of amino acids at certain positions, and the presence or absence of conserved domains in addition to the bHLH domain (Atchley and Fitch, 1997). The classification of bHLHs into four monophyletic groups (A–D) was extended to six groups after more genomes were sequenced. The most recent phylogenetic analysis, which includes more than 500 plant bHLHs, allows for their classification into 26 sub-groups (Pires and Dolan, 2010), corresponding to how bHLH factors are listed in Table 3. Phylogenetic analysis of several atypical bHLH proteins extended the number of sub-groups to 32 (Carretero-Paulet et al., 2010).

Table 3.   Functions of plant bHLHs mentioned in this review
Phylogenetic groupArabidopsis thaliana functionGenes and functions in other species
  1. The various sub-groups are defined in Pires and Dolan (2010). Additional references for newly characterized bHLH proteins are listed below the table.

  2. Additional references: FaSPT (Tisza et al., 2010); SlSTYLE2.1 (Chen et al., 2007); MdbHLH3 (Espley et al., 2007); GtbHLH1 (Nakatsuka et al., 2008); InDEL, InIVS (Morita et al., 2006); VvMYCA1, VvMYC1 (Matus et al., 2010), PhJAF13 (Quattrocchio et al., 1998), AmDEL (Goodrich et al., 1992).

Sub-group IaStomata differentiation: MUTE, FAMA, SPEECHLESSStomata differentiation: OsMUTE, OsFAMA, OsSPCH2, ZmMUTE
Sub-group Ib(2)Iron homeostasis: ORG2, ORG3Iron homeostasis: OsIRO2, HvIRO2
Sub-group III(a + c)Regulation of iron uptake: FITIron acquisition: SlFER
Sub-group IIIbStomata development and cold acclimatization response and freezing tolerance: ICE1, SCRM2Activator of cold-responsive genes: TalC41/87 (wheat)
Sub-group III(d + e)ABA, JA and light signaling pathway: MYC2Phenylpropanoid biosynthetic pathway: PsGBF (pea)
ABA signaling: AIB
Tryptophan biosynthesis: dominant mutation of ATR2 (atrD2)
Sub-group IIIfAnthocyanin biosynthesis: GL3, EGL3, TT8Anthocyanin biosynthesis: OsRa, OsRb, OsRc, OsB2, ZmR, ZmB-Peru, ZmIN1, AmDEL, GhGMYC1, InDEL, InIVS, PhJAF13, PfMyc-rp, PfMyc-F3G1, GtbHLH1, MdbHLH3, PhAN1
Seed coat differentiation: GL3, EGL3, TT8, MYC1Seed coat differentiation: PhAN1
Proanthocyanidin biosynthesis: TT8Proanthocyanidin biosynthesis: VvMYCA1, VvMYC1, PhAN1
Trichome/root hair formation: MYC1, GL3, EGL3Vacuolar acidification: PhAN1
Sub-group IVaER body formation: NAI1 
Sub-group IVcMetal homeostasis, auxin-conjugate metabolism: ILR3 
Sub-group VII(a + b)Light and gibberellin signaling: PIF1/PIL5, PIF3, PIF4, PIF5/PIL6Amylose synthesis: OsbHLH102
Fruit dehiscence: ALC
Phytochrome and cytochrome signaling: HFR1
Carpel margin development; mediator of germination responses to light and temperature: SPTFruit development: FaSPT
Fertilization process: UNE10
Sub-group VIIIbTransmitting tract and stigma development: HEC1, HEC2, HEC3Axillary meristem generation: OsLAX
Fruit differentiation: IND
Sub-group VIIIc(1)Root hair formation: RHD6, RSL1Rhizoids and caulonemata formation: PpRSL1, PpRSL2
Sub-group VIIIc(2)Root hair development: RSL4, RSL3, RSL2, RSL5 
Sub-group XIRoot hair development: bHLH066, bHLH069, bHLH082Root hair development: LjRHL1
Fertilization: UNE12Phosphate deficiency stress: OsbHLH096/OsPTF1
Sub-group XIIIRoot development: LHW 
Sub-group XVGibberellin signaling: PRE1, PRE2, PRE3, PRE4, PRE5Cell elongation in developing styles: SlSTYLE2.1
Light signal transduction: KDR 
OrphansAnther development: AMS, DYT1 
Early embryo development: MEE8
Shade avoidance response: PAR1, PAR2

Features of additional domains. Outside the bHLH domain, despite low sequence conservation (Heim et al., 2003; Li et al., 2006), 28–50 additional conserved regions or motifs were identified, several of which were probably present before the invasion of land by plants, as evidenced by the presence of at least four of them in the red algae Cyanidioschyzon merolae (Figure 1) (Carretero-Paulet et al., 2010; Pires and Dolan, 2010). These motifs include the leucine zipper domain (present in sub-families IIIb, IVb and IVc; Table 3), which is usually found adjacent to the bHLH domain and is important for stabilizing dimerization and DNA-binding specificity (Blackwood and Eisenman, 1991; Bresnick and Felsenfeld, 1994, Kanaoka et al., 2008). It is the only motif that is also present in animal bHLH factors, but the absence of sequence similarity between the corresponding domains of plants and animals suggests that acquisition of the leucine zipper motif most likely occurred independently in both kingdoms (Heim et al., 2003; Li et al., 2006).

Several evolutionarily unrelated bHLH proteins contain a domain that is structurally related to the ACT fold (nomenclature based on the aspartate kinase, chorismate mutase and TyrA proteins) that has been identified in several metabolic enzymes as a ligand-binding domain (Chipman and Shaanan, 2001). In bHLH factors, the ACT motif functions as a dimerization domain (Feller et al., 2006). The presence of the ACT and leucine zipper domains in proteins from various bHLH sub-families indicates that incorporation of such domain into bHLH genes occurred multiple times in evolution, perhaps by a domain-shuffling process (Pires and Dolan, 2010).

Multiple bHLH proteins from various sub-groups have been shown to contain an acidic region, which facilitates transcriptional activation and/or dimerization. This region is usually N-terminal to the bHLH domain, as in ICE1 (Chinnusamy et al., 2003) or ZmR (Feller et al., 2006). Interestingly, the paralog of ZmR, ZmB1, does not harbor an obvious TAD (Goff et al., 1992), suggesting that either functional diversification between these proteins has been previously overlooked or that the TAD in ZmR does not play a significant role in the function of this regulator.

Evolution of plant bHLH proteins

Several excellent recent studies described the evolution of the plant bHLH family (Carretero-Paulet et al., 2010; Pires and Dolan, 2010). We will therefore briefly summarize the most important features and focus primarily on aspects not previously covered.

bHLH proteins represent an ancient family found in fungi, plants and animals, but not prokaryotes. The function of yeast bHLHs in general transcriptional enhancement and cell-cycle control suggests that these may have been some of the early roles of bHLH regulators in primitive eukaryotes (Riechmann et al., 2000; Ledent and Vervoort, 2001). In contrast to MYB factors, for which clearer lineages can be identified based on MYB repeat numbers and homology, it is unclear whether all bHLH proteins have evolved from a common ancestor (monophyletic), or whether bHLH proteins evolved by domain shuffling, perhaps from an ancestral protein that only contained the bHLH domain (Morgenstern and Atchley, 1999). The most recent evolutionary analysis from various land plants, chlorophytes and red algae suggests that the first plants had one or only a few bHLH genes, and that all modern plant bHLH proteins descended and radiated from these predecessors by a process that involved a significant number of gene duplications (Carretero-Paulet et al., 2010; Pires and Dolan, 2010).

Expansion in the bHLH family occurred after the split between green algae and land plants, but before the origin of the mosses, which is mirrored in the number of bHLHs found in these species (Figure 1). After divergence of the mosses from the vascular plants, a second expansion took place, and probably correlates with the specialization of vascular and flowering plants. The low sequence similarity between animal and plant bHLHs, which is often restricted to signature amino acids in the bHLH domain, is in agreement with these findings, and shows that expansion occurred independently and almost entirely after the divergence of plants and animals (Toledo-Ortiz et al., 2003).

It is still not entirely resolved how gene duplication occurred during early evolution. Studies in Arabidopsis and rice have suggested that expansion occurred by genome segment and tandem duplications (Toledo-Ortiz et al., 2003; Li et al., 2006), while studies in animals suggest single-gene duplication events (Amoutzias et al., 2004). With regard to the question of what selective pressures drove such diversification, two hypotheses have been proposed. First, it is possible that bHLH proteins expanded in parallel with the evolution of multicellularity (Simionato et al., 2007). This is in agreement with the idea that higher organismal complexity requires increased regulatory complexity and biological specificity. On the other hand, it is possible that bHLH proteins evolved in parallel with the colonization of land, and plants needed to develop novel regulatory networks to withstand the challenges associated with a sessile lifestyle (Carretero-Paulet et al., 2010). Both hypotheses are valid, and further research may show which scenario is more likely.

One interesting question that remains to be answered is what is the functional outcome of gene duplications for members of the bHLH gene family? Members of the same sub-group are often involved in the same biological process and their functions overlap. For example, GL3, EGL3 and TT8 are partially redundant in the control of anthocyanin biosynthesis (Nesi et al., 2000; Zhang et al., 2003), and HEC1, 2 and 3 partially overlap in development of the stigma and transmitting tract (Table 3) (Gremski et al., 2007). However, each of the anthocyanin regulatory genes has evolved additional functions, such as PA biosynthesis (TT8), trichome formation (GL3/EGL3) and production of seed coat mucilage (TT8/EGL3) (Nesi et al., 2000; Zhang et al., 2003; Baudry et al., 2004; Zimmermann et al., 2004), but no such additional functions have yet been found for the HEC genes (Table 3). This suggests that, in addition to sub-functionalization, neo-functionalization played an important role during evolution of bHLHs. In addition, as HEC genes are highly important for plant fertility, whereas sub-group IIIf bHLHs appear to be associated to functions that are less vitally important, it is possible that evolutionary selection had an impact on essential and non-essential plant processes.

Function of bHLH proteins in plants

Most bHLH proteins so far identified were functionally characterized in Arabidopsis, and their roles include regulation of fruit dehiscence, carpel, anther and epidermal cell development, phytochrome signaling, flavonoid biosynthesis, hormone signaling and stress responses. Only a few bHLH genes have been functionally characterized in other plant species, and these will be discussed here in the context of the corresponding genes from Arabidopsis.

Probably the best-described bHLH proteins are the members of sub-group IIIf. These ZmR-like TFs are involved in flavonoid biosynthesis, trichome and root hair formation, and have been best studied in Arabidopsis and maize (Quattrocchio et al., 2008). As these bHLHs function in cooperation with MYB domain proteins, forming MBW complexes (see above), they are discussed in the third section of this review.

bHLH proteins act as transcriptional activators or repressors, and have either a very broad or very restricted expression pattern, which is influenced by their dimerization properties. SPT, which was originally identified as a positive regulator of carpel and fruit development, has recently been shown to negatively control seed germination, and expansion of cotyledons, petals and leaves (Heisler et al., 2001; Penfield et al., 2005; Groszmann et al., 2010; Ichihashi et al., 2010). This broad spectrum of SPT functions may be a consequence of dimerization with other bHLH proteins such as HEC1–3 or IND (Table 3) (Gremski et al., 2007). Recently, orthologs of SPT were identified in tomato (SlSPT) and strawberry (FaSPT), and were shown to complement the spt mutant (Fragaria × ananassa Duch., Groszmann et al., 2008; Tisza et al., 2010). ICE1 is another constitutively expressed bHLH protein. It interacts with MUTE, FAMA and SPCH to control stomata development. ICE1 also functions in cold-acclimatization responses and freezing tolerance, and its ability to bind to several E-box sequences in vitro and activate transcription in transient expression assays in Arabidopsis leaves suggests that it may not require other bHLH factors for these functions (Chinnusamy et al., 2003). ICE1, together with the closely related protein SCRM2, are required in each step of the cell-fate transition in the stomatal lineage, from meristem mother cell to guard cell, whereas the closely related tissue-specific MUTE, FAMA and SPCH proteins each regulate a single step in this differentiation process (Ohashi-Ito and Bergmann, 2006; MacAlister et al., 2007; Pillitteri and Torii, 2007; Kanaoka et al., 2008; Nadeau, 2009; Serna, 2009). Although monocots and dicots differ significantly in stomatal ontogeny, morphology and pattern (Evert, 2006), putative orthologs of these TFs were identified in rice and maize (Table 3) (Liu et al., 2009). Although the function of FAMA may be significantly conserved between these species, the functions of MUTE and SPCH appear to have diverged after separation of monocots and dicots (Liu et al., 2009). A functional parallel has been drawn between stomata development in plants and muscle or neuron development in animals, as all of these processes are sequentially regulated by a set of tissue-specific and non-tissue-restricted bHLH proteins (MacAlister et al., 2007; Nadeau, 2009).

Compared to the broad expression patterns of SPT and ICE1, other bHLH genes are only expressed in certain tissues or at particular developmental times. These include ZmR and ZmB1, which are expressed in the aleurone layer of the maize kernel and in green tissues, respectively, and ALC, which is expressed mainly in the valve margins of the silique (Goff et al., 1992; Ludwig et al., 1989; Rajani and Sundaresan, 2001). In Arabidopsis, a network of regulators, including MYB and bHLH TFs, has been shown to function in root epidermis development (Schiefelbein, 2003). The bHLH proteins identified include RHD6 and RSL1, which are involved specifically in the early stages of cell differentiation in root hairs (Menand et al., 2007), as well as targets of RDH6 (RSL2 and RSL4), which control post-mitotic growth in root hairs (Yi et al., 2010). Early ancestors of land plants, e.g. the bryophytes, possess tip-growing cells such as rhizoids and caulonemal cells, which are morphologically similar to root hairs and carry out rooting functions (Sakakibara et al., 2003). It is interesting that the moss P. patens contains seven RDH6-like genes, and that two of these (PpRSL1 and PpRSL2) are closely related to Arabidopsis RDH6 and RSL1 (Menand et al., 2007). Together, PpRSL1 and PpRSL2 regulate the development of rhizoids and caulonemal cells in the moss gametophyte and complement the rdh6 mutant (Menand et al., 2007). This indicates that bHLH sub-group VIIIc evolved from a common ancestor before the separation of bryophytes and vascular plants (Menand et al., 2007). Analysis of root hair mutants in lotus and rice, which both differ from Arabidopsis in terms of root hair patterns, revealed new developmental regulators (Karas et al., 2009; Ding et al., 2009). LjRHL1 and OsRHL1 are highly homologous to members of sub-class XI in Arabidopsis (bHLH066, bHLH069 and bHLH082) (Heim et al., 2003), and all of these proteins are necessary for root hair development.

Some sub-groups of bHLH proteins are quite large, with up to 15 members. They are closely related, and some have complete or partially redundant functions. Sub-group VII (a + b) (Table 3) contains proteins involved in light signaling, and fruit and flower development. Several PIFs and PILs have been identified, which repress aspects of early plant development such as seed germination (PIL5), flowering time and hypocotyl growth (PIF3 and PIF5), hypocotyl elongation in the shade-avoidance response (PIL1/2) or chlorophyll biosynthesis in the dark (PIF1) (Huq and Quail, 2002; Kim et al., 2003; Salter et al., 2003; Oh et al., 2004; Khanna et al., 2007; Leivar et al., 2008). Upon exposure to light, followed by phytochrome binding, these PIFs are degraded and photomorphogenesis is triggered (Leivar et al., 2008). Most of these PIFs and PILs are partially redundant, and have additional roles in ethylene and gibberellin signaling. The circadian clock regulates some PIFs, such as PIF4 and PIF5 (Fujimori et al., 2004; Lucyshyn and Wigge, 2009). With the anticipated identification and understanding of these regulators in Arabidopsis light signaling, it will be possible to identify orthologous genes from important agricultural plants (Wang and Deng, 2002). In fact, some phytochrome-mediated light-signaling regulators have already been identified in crops (Tsuge et al., 2001), and show high homology to their Arabidopsis counterparts.

Combinatorial gene regulation by MYB–bHLH interactions

Combinatorial interactions among TFs are central to gene regulation of any given cellular process (Martinez, 2002; Istrail and Davidson, 2005). Despite advances in genome-wide approaches for the elucidation of plant regulatory networks (Lee et al., 2007; Benhamed et al., 2008; Kaufmann et al., 2009; Morohashi and Grotewold, 2009; Oh et al., 2009), our understanding of combinatorial control of plant gene expression remains limited. The cooperative interaction between MYB and bHLH TFs serves as a paradigm to understand plant combinatorial gene regulation.

Flavonoid biosynthesis is probably the best-studied example of cooperation between MYB and bHLH proteins (Koes et al., 2005; Quattrocchio et al., 2008). Flavonoid regulatory genes and protein complexes in potentially important agronomic and ornamental plants have been identified by hypothesizing that orthologous proteins are likely to perform similar or identical functions in other organisms, such as apple, strawberry, rice, maize, cotton, snapdragon, Perilla frutescens, Gerbera hybrida or Gentiana triflora (Figure 4b and Tables 2 and 3) (Aharoni et al., 2001; Brueggemann et al., 2010; Elomaa et al., 2003; Espley et al., 2007; Furukawa et al., 2007; Grotewold et al., 2000; Hu et al., 2000; Humphries et al., 2005; Lin-Wang et al., 2010; Martin et al., 1991; Nakatsuka et al., 2008; Schwinn et al., 2006; Sompornpalin et al., 2002; Sweeney et al., 2006). Studies of the ZmR-dependent and ZmR-independent regulation of maize flavonoid pathway genes by ZmC1 and ZmP1 (Table 2), respectively, resulted in identification of the MYB domain residues that provide ZmC1, but not ZmP1, with the ability to interact with ZmR (Grotewold et al., 2000). Transfer of six residues from ZmC1 to ZmP1 resulted in a ZmP1 protein that maintained the ZmR-independent activity, but that also singularly responded to ZmR (Grotewold et al., 2000; Hernandez et al., 2004). Further studies expanded the set of six amino acids to the more general sub-group IIIf bHLH interaction motif [DE]Lx2[RK]x3Lx6Lx3R, (Figure 4a), which contains residues that specify the interaction with the bHLH partner and contribute to the biological specificity and stability of the MYB–bHLH complex. The interaction motif is conserved in several R2R3-MYB sub-groups, and has been identified in all anthocyanin-accumulating plants studied (Zimmermann et al., 2004; Lin-Wang et al., 2010). It is conserved between angiosperms and gymnosperms (Bedon et al., 2007), suggesting that MYB–bHLH interaction arose early in land plant evolution.

A century of genetics (Coe et al., 1988) helped building the basis for establishing that the function of ZmC1 is absolutely dependent on ZmR, despite the ability of ZmC1 to bind to DNA in vitro (Sainz et al., 1997b), and the presence of a functional TAD at its C-terminus (Sainz et al., 1997a). Although it was initially thought that the requirement for this interaction is a consequence of the intrinsically low affinity of ZmC1 for DNA (Sainz et al., 1997b), subsequent studies showed that a C1 mutant that binds DNA with high affinity remains dependent on R (Hernandez et al., 2004). GL1 and WER also contain TADs, and WER was shown to bind DNA in vitro, yet the function of both of these proteins remains dependent on the GL3/EGL3 bHLH factors (Lee and Schiefelbein, 2001; Zhang et al., 2003). This prompts the question of why this MYB–bHLH interaction is essential for activity and what the role of the WDR protein is in this interaction. Part of the answer is likely to reside in the increasing appreciation that chromatin plays a role in anthocyanin regulation or epidermal cell differentiation, for example by interacting with chromatin-modifying factors (Caro et al., 2007; Hernandez et al., 2007). However, this possibility alone cannot explain the absolute requirement of ZmC1 for ZmR on non-chromatin templates, such as plasmids introduced transiently into maize cells, which are unlikely to form chromatin (Hernandez et al., 2007). It is possible that inhibitors, such as the ZmR-related ZmIN1 bHLH factor (Burr et al., 1996), sequester ZmC1 by formation of non-functional MYB–bHLH complexes, and that the function of ZmR is to release ZmC1 from this inhibition. More likely, in vivo DNA target site recognition and robust binding may require the increased DNA contact surface provided by the MBW complex. Interestingly, however, ChIP experiments were unable to detect DNA binding of ZmR to one of the pathway gene promoters in the absence of ZmC1 (Hernandez et al., 2007). These results suggest that R2R3-MYBs such as ZmC1 may influence the recruitment of ZmR-like factors by DNA, perhaps by conformational changes that expose the bHLH domain for dimerization and DNA binding. Furthermore, it has been shown that mutation in the N-terminal MYB-interacting region (MIR) region of AmDEL and PfMYC-rp can positively affect transcriptional activation, and it is possible that binding of the R2R3-MYB protein to the bHLH mirrors the effect of such a mutation (Pattanaik et al., 2006, 2008).

The specific amino acids in the MIR of bHLH factors responsible for interaction with R2R3-MYB domains have not yet been identified in ZmR or any of the anthocyanin bHLH regulators. However, the gl3-sst allele (sst; shapeshifter) harbors a point mutation in this region (L78P), which decreases the interaction with GL1 and TTG1 (Figure 4b) (Esch et al., 2003).

Other cellular processes involving MBW complexes include regulation of trichome and root hair development and vacuolar acidification (Figure 4b). In Arabidopsis, GL1 and GL3/EGL3 form a complex that regulates trichome formation (Zhang et al., 2003). Homologs of GL1 and GL3/EGL3 have been identified in cotton, and have been shown to play a role in trichome development (Wang et al., 2004; Serna and Martin, 2006). Surprisingly, in non-trichome cells, GL3/EGL3 interact with the R3-MYB proteins CPC, TRY, ETC1 and ETC2. CPC-like MYBs inhibit bHLH binding by competing with R2R3-MYBs (e.g. GL1 and PAP1/2) for the same binding sites in the MIR region of the bHLH protein (Kirik et al., 2004a, Schiefelbein, 2003; Zhu et al., 2009). In contrast, multicellular trichome formation in both snapdragon and Solanaceae species does not appear to require MYB–bHLH interaction (Serna and Martin, 2006). Zmpac1 mutants have only anthocyanin-related phenotypes (Selinger and Chandler, 1999), but ZmPAC1 over-expression in Arabidopsis can complement all ttg1-related phenotypes, including those resulting from developmental programs that are apparently different in maize and Arabidopsis (Carey et al., 2004). This suggests that differences in function between maize and Arabidopsis are not a result of differences in the WDR protein sequence, but rather are a result of differences in the regulatory circuits, possibly reflecting different protein–protein interactions. This results in functionally redundant WDR proteins with different spatial and temporal expression patterns (Carey et al., 2004). Given the large size of the WDR protein family, it is conceivable that a number of other transcription complexes exist, in which these proteins participate (van Nocker and Ludwig, 2003). Many MYB–bHLH interactions have been identified, but WDR partners have not been identified (Figure 4b). Petunia PhAN11 forms a complex with PhAN1 and PhAN2 (petal limbs) or PhAN4 (anthers) for activation of anthocyanins in flowers (Figure 4b), and may be involved in a complex with PhAN1 and PhPH4 for activation of vacuolar acidification (Figure 4b) (Quattrocchio et al., 2006; Gerats and Strommer, 2009). Formation of the MBW complex has so far been assumed to be unique to plants. Interestingly, however, the N-terminal R2R3 region of vertebrate c-MYB interacts with the bHLH domain of MyoD in proliferating cells to negatively regulate skeletal muscle differentiation (Kaspar et al., 2005). Although the residues involved in this interaction are clearly different from those implicated in formation of the MBW complex, this finding poses the question of whether the MYB–bHLH synergy is much older than currently assumed.

WDR proteins interact with members of bHLH sub-group IIIf, but not with members of sub-group IIId or IIIe (Zimmermann et al., 2004). Although members of sub-groups IIId and IIIe contain the MIR region at the N-terminus, interaction studies in yeast showed that none interacted with a ZmC1-like protein. Only one member of sub-group IIIe, MYC2, was shown to interact with MYB2 in abscisic acid signaling (Abe et al., 2003). ATR2, another member of group IIIe, functions in trypthophan biosynthesis together with the R2R3-MYB protein ATR1. However, ATR1 and ATR2 do not interact in yeast and have an additive effect on trypthophan regulation, probably acting through distinct mechanisms (Smolen et al., 2002). It is possible that members of sub-group IIId and IIIe interact with as yet unidentified MYB factors.

MYB–bHLH interactions do not occur exclusively through the MIR region and the signature motif in the MYB domain (Figure 4a). MYB15 interacts with ICE1, and together they bind to the CBF3 promoter to regulate cold-stress tolerance (Figure 4b) (Agarwal et al., 2006). Comparative sequence-structure analysis using the PHYRE structure prediction package (Kelley and Sternberg, 2009) identified an ACT fold in the C-terminus of ICE1 (A. Feller, unpublished results). However, it remains to be determined whether this putative ACT domain promotes dimerization, as established for ZmR (Feller et al., 2006). MYB15 lacks essential amino acids for interaction with a bHLH partner, misleadingly suggesting that it may function in a bHLH-independent fashion. Another example of an unusual MYB–bHLH association is the interaction of HFR1 and LAF1 during regulation of phytochrome A signaling. HFR1 and LAF1 regulate part of the phytochrome A signaling pathway mostly independently of each other, as shown by mutant analysis (Jang et al., 2007; Yang et al., 2009). However, both proteins interact in vivo through the MYB domain of LAF1 and the C-terminus of HFR1 (Jang et al., 2007). Interestingly, both proteins associate with the E3 ubiquitin ligase COP1, which contains a WDR motif (Figure 4b). The MYB–bHLH interaction inhibits ubiquitination of HFR1 or of LAF1 by COP1, stabilizing both factors (Jang et al., 2007). MYB–WDR and bHLH–WDR complexes have also been identified in vertebrates, suggesting possible conservation of these interactions. For example, c-MYB interacts with the WD domain of the E3 ubiquitin ligase Fbxw7α (Fbox/WDR-containing protein 7) in the presence of NLK1 kinase, which induces c-MYB ubiquitination (Kanei-Ishii et al., 2008). Fbxw7α also interacts with c-MYC and targets it for degradation (Welcker et al., 2004).

Based on these studies, it is tempting to suggest that R2R3-MYBs usually function in a combinatorial fashion with bHLH factors. Although this is perhaps the case, there is simply not enough data yet to confirm one way or the other. However, given that TFs often interact with other regulatory proteins, and given the large size of the MYB and bHLH families, it is not unexpected that interactions between these two types of regulatory proteins have been uncovered. It is possible that when the entire TF protein–protein interaction range is identified by high-throughput methods, R2R3-MYB regulatory specificity will be shown to be provided by interaction with members of TF families other than just bHLHs.

Conclusions and future prospects

Almost 30 years ago, when MYB and bHLH factors were first identified in maize, it was unforeseeable that they are members of two of the largest families of TFs, and that related factors are at the core of the regulation of many plant cellular processes. A wealth of genetic, biochemical and molecular data has shown that MYB and bHLH factors frequently participate in transcriptional regulation of genes involved in diverse cellular functions, often in combinatorial regulatory complexes involving bHLH–bHLH and sometimes MYB–bHLH interactions. Cellular targets for bHLH, MYB and MYB–bHLH complexes are starting to be identified by ChIP-chip or second-generation sequencing (ChIP-Seq), providing an indication of the intricacy of the gene regulatory networks that underlie plant development and metabolism. The results of such studies, combined with numerous high-throughput protein–protein interaction studies, are likely to provide a much clearer picture of the regulatory complexes that function in the control of most genes. However, such studies have so far only been possible in model plant systems, primarily Arabidopsis. Establishing the architecture of gene regulatory networks in agronomically important plants poses very significant challenges. A question that remains unanswered is to what extent details of TF function and gene regulatory network architecture in important crops can be inferred from results in Arabidopsis. MYB or bHLH sequence conservation between plants does not necessarily imply functional conservation, as a number of studies have demonstrated. A better understanding of how MYBs and bHLHs, as well as other regulatory factors, have functionally diverged is necessary, and this will require significantly more research in comparative regulatory genomics: understanding how gene regulatory networks have evolved over time. Our extensive knowledge of MYB and bHLH regulatory proteins in various plants provides a powerful starting point for such studies.


We thank Dr. Ling Yuan (Department of Plant and Soil Sciences, University of Kentucky) for insightful comments on this manuscript. Research in Erich Grotewold’s laboratory on MYB and bHLH factors is funded by National Science Foundation grant DBI-0701405, Department of Energy grant DE-FG02-07ER15881 and Agricultural and Food Research Initiative Competitive Grant number 2010-65115-20408 from the US Department of Agriculture National Institute of Food and Agriculture.