Transcription factors and their genes in higher plants

Functional domains, evolution and regulation


L. Liu, Department of Biology, Dalhousie University, Halifax, NS B3H 4J1, Canada. Fax: +902-494-3736, Tel.: +902-494-6525, E-mail:


A typical plant transcription factor contains, with few exceptions, a DNA-binding region, an oligomerization site, a transcription-regulation domain, and a nuclear localization signal. Most transcription factors exhibit only one type of DNA-binding and oligomerization domain, occasionally in multiple copies, but some contain two distinct types. DNA-binding regions are normally adjacent to or overlap with oligomerization sites, and their combined tertiary structure determines critical aspects of transcription factor activity. Pairs of nuclear localization signals exist in several transcription factors, and basic amino acid residues play essential roles in their function, a property also true for DNA-binding domains. Multigene families encode transcription factors, with members either dispersed in the genome or clustered on the same chromosome. Distribution and sequence analyses suggest that transcription factor families evolved via gene duplication, exon capture, translocation, and mutation. The expression of transcription factor genes in plants is regulated at transcriptional and post-transcriptional levels, while the activity of their protein products is modulated post-translationally. The purpose of this review is to describe the domain structure of plant transcription factors, and to relate this information to processes that control the synthesis and action of these proteins.


nuclear localization signal.

Within plant cells, some genes are expressed constitutively [1], whereas others respond to specific stimuli [2–10]. Both patterns depend on the interaction of transcription factors with cis-acting elements and/or with other transcription factors required for gene expression [11], and they are important in the regulation of cell activities. Therefore, alteration in the expression of transcription factor genes normally results in dramatic changes to a plant [12–14] and structural changes to these genes may represent a significant evolutionary force [15]. As a practical consequence, engineering of transcription factor genes provides a valuable means for manipulation of plants [13], but success in such endeavors depends on how well the genes are understood. To this end, numerous plant transcription factor genes and the proteins they encode have been characterized [11,16–28]. In this review, we analyze recent advances in the study of higher plant transcription factors, with emphasis on the domain structure of these proteins as well as the evolution and regulation of their genes.

Functional domains of transcription factors

General features of plant transcription factors

Many transcription factors have been examined by X-ray crystallography and NMR spectroscopy, but for plants only the structure of the Arabidopsis thaliana TATA box-binding protein 2 is known precisely [29]. The functional domains of plant transcription factors are usually derived by comparing amino acid sequences deduced from cDNA clones with their animal counterparts. Study of putative functional domains by mutational and functional analysis has demonstrated that typical plant transcription factors consist of a DNA-binding region, an oligomerization site, a transcription regulation domain and a nuclear localization signal (NLS) (Fig. 1), although some lack either a transcription regulation domain [30] or a specific DNA-binding region [27,31,32]. In addition, the identification of conserved RNA-binding motifs in the carnation ethylene-responsive element-binding protein-1 [33], a G-protein β-subunit-like motif in COP1 (constitutive photomorphogenic 1) [34], and a putative membrane-spanning region in PEND (plastid envelope DNA-binding), a bZIP (basic zipper) protein [21] demonstrates that novel functional domains occur in plant transcription factors.

Figure 1.

Schematic representation of functional domains within plant transcription factors. Functional domains vary in their location and number, but the arrangement shown here is typical of many plant transcription factors: solid box, DNA-binding domain; hatched box, oligomerization domain; ‘laddered’ box , regulation domain; open boxes, domains of unspecified function. The NLS, indicated by labelled arrows, may vary in number and be positioned in the DNA-binding, oligomerization and unspecified function domains. The domains are not drawn to scale because there is divergence in size from one family of transcription factors to another.

Classification of transcription factors depends on their structural features, with families sometimes subdivided according to the number and spacing of conserved residues in the most similar domain (Table 1, Fig. 2) [24–26,28,35–63]. For instance, owing to the quantity and arrangement of cysteine (C) and histidine (H) residues, the factors containing zinc fingers fall into five classes: C2H2, C3H, C2C2 (GATA finger), C3HC4 (RING finger), and C2HC5 (LIM finger) [51]. Alternatively, transcription factors of the same family are categorized by reference to a domain that falls outside the most conserved region. As one example, homeodomain factors exist as five groups, including the homeodomain zipper [54], homeodomain finger [41], GLABRA2, ELK-homeodomain [42,64], and twin-homeodomain factors [65]. The conserved regions upon which nomenclature is based were initially thought to engage only in sequence-specific DNA interactions. However, some of them consist of both oligomerization and DNA-binding domains arranged in positions that vary relative to one another (Fig. 3). Indeed, with the exception of HMG (high mobility group) finger [66], homeodomain finger [41], and the A. thaliana HAT homeobox zipper factors [54], each plant transcription factor contains only one type of combined DNA-binding/oligomerization domain, but it may be present in multiple copies.

Table 1. Structural features of conserved domains that are used to classify plant transcription factors. These domains, the most conserved regions within genes of the same family, are responsible for DNA-binding and oligomerization. For examples of domain tertiary structure, see [29]. EREBP (ethylene-responsive element-binding protein); ARF (auxin-response factor); ABI3 (abscisic-acid-insensitive 3); other acronyms are defined in the text of the review.
Zinc fingerFinger motif(s) each maintained by cysteine and/or
histidine residues organized around a zinc ion
bZIPA basic region and a leucine-rich zipper-like motif[36]
Myb-relatedA basic region with one to three imperfect repeats[37]
 each forming a helix–helix–turn–helix[38]
TrihelixBasic, acidic and proline/glutamine-rich motif which[39]
 forms a trihelix DNA-binding domain[40]
HomeodomainApproximately 60 amino acid residues producing
either three or four α-helices and an N-terminal arm
Myc b/HLHA cluster of basic amino acid residues adjacent to
a helix–loop–helix motif
MADSApproximately 57 amino acid residues that comprise
a long α-helix and two β-strands
AT-hook motifA consensus core sequence R(G/P)RGRP with the
RGR region contacting the minor groove of A/T-rich
HMG-boxl-shaped domain consisting of three α-helices with an
angle of about 80° between the arms
AP2/EREBPA 68-amino acid region with a conserved domain that
constitutes a putative amphiphatic α-helix
B3A 120 amino acid conserved sequence at the C-termini
of VP1 and ABI3
ARFA 350 amino acid region similar to B3 in sequence[24,49]
Figure 2.

Consensus sequences and secondary structures for DNA-binding and oligomerization domains within major families of plant transcription factors. The consensus sequences were either derived by analysis of gene alignments in the cited references or, for ARF, from Arabidopsis sequences deposited in Genbank, including: Biposto (AF022368), ARF8 (AFO42196), ARF7 (AFO42195), ARF6 (AFO13467), ARF4 (AFO13466), ARF1-binding protein (U89771), ARF1 (U83245), ARF3 (U89296). A single letter at a site indicates that the amino acid residue appeared in at least 70% of the compared sequences, and if in bold, the residue was identical in all sequences. If two residues appear at a single site, the frequency of each was greater than 30%. Residues in italicized letters are highly conserved among different organisms, and with the exception of arginine at position 3 in MADS, all are identical (bold) within the plant transcription factors compared. The symbols are: --, gap introduced to maximize alignment; ., variable residue; ↓, ↑, residues at corresponding positions in animal transcription factors, although not necessarily the same, are involved in either DNA binding (↓) or oligomerization (↑). Underlined regions are as follows: single underline, enriched in basic residues; double dotted underline, α helix; double underline, β-pleated sheet; triple underline, loop structure; dashed underline, zipper structure.

Figure 3.

Relative locations of DNA-binding and oligomerization domains within representative families of plant transcription factors. The DNA-binding and oligomerization domains are either separated by several residues (AP2) [26], adjacent to one another (bZIP) [52,53] or overlapping (MADS) [44,59,60]. Solid box, DNA-binding domain; dotted box, oligomerization domain. The remaining symbols are defined in Fig. 2 which contains a detailed description of major plant transcription factor families.

DNA-binding domains

The DNA-binding domains of plant transcription factors, many of which are basic in character, contain amino acid residues that contact DNA bases at cis-acting elements, and these determine the specificity of the protein [59,67]. Other residues enhance transcription factor binding by contacting DNA nonspecifically through interaction with either phosphate or deoxyribose moieties [59]. The base-recognition residues are often highly conserved [59,67]. For example, one arginine residue in bZIP domains [67], either one or two cysteine and histidine residues in zinc-finger motifs [51], and single arginine and lysine residues in MADS (Mcm1, Agamous, Deficience, and Serum response factor) domains [59] are nearly identical within plants, animals and fungi (Fig. 2). The spatial arrangement of these amino acid residues in DNA-binding domains is important for specificity, as evident in the cysteine-rich regions of the HMG-finger transcription factor ENBP1 (pea early nodulin gene promoter binding protein 1) and the Arabidopsis homeodomain-finger factor HAT3.1. The cysteine-containing regions form zinc-finger DNA-binding motifs, but unlike other zinc-finger motifs that recognize specific promoter sequences [68], they interact with DNA in a sequence-independent manner [41,66]. The loss of specificity is probably caused by changes in the three-dimensional structure of the DNA-binding domain, which is due mainly to the positioning of cysteine residues. Several plant transcription factors possess both specific and nonspecific DNA-binding domains, with the latter occasionally necessary for transactivation of target genes, as shown for VP1 (VIVIPAROUS 1), a regulator of plant genes including wheat Em[69]. VP1 has a weak nonspecific DNA-binding domain termed BR2 (B2) and a sequence-specific binding domain, BR3 (B3), that recognizes Sph elements (CATGCATG) found in Em and a few other plant genes. Interestingly, VP1 activates Em via BR2 instead of BR3. This BR2-dependent regulation requires members of the 14-3-3 family, originally found as soluble proteins in extracts of mammalian brain. The 14-3-3 proteins lack DNA-binding domains but link VP1 and a cis-element-specific transcription factor such as EmBP1 (Em-binding protein 1) to the promoter region of Em[32,69].

Secondary structure in DNA-binding domains seems to affect their affinity and selectivity. In this context, some neutral amino acid residues, such as proline and glycine, are important [40,70]. For example, the C-terminal DNA-binding domain of the rice trihelix factor, GT-2 (GT2-box-binding factor), loses its activity when helix-breaking prolines are substituted for other amino acid residues [40]. Moreover, probably because of inhibition of α-helix formation, a proline–glycine pair in the center of the basic DNA-binding region of some bHLH (basic helix–loop–helix)-type transcription factors is apparently essential for recognition of the E2F cis-element (TTT[G/C][G/C] CGC), while preventing interaction of these bHLH factors with the cis-element, E-box (CANNTG; N = A, T, G, C) [70].

Usually, each plant transcription factor has only one type of DNA-binding domain, occurring in either single or multiple copies. For instance, most plant Myb-related proteins have two Myb domains, whereas the potato MybSt1 (Myb Solanum tuberosum 1) [71] and the Arabidopsis CCA1 [38] each contain only one copy. One-, two- or three-fingered DNA-binding motifs occur for C2C2 and C2H2 zinc-finger transcription factors [50,72], and AP2 (APETALA2) factors may exhibit two DNA-binding domains connected by a conserved sequence [47]. In plants, each HMG-1/2 type factor possesses one HMG-1/2 DNA-binding domain, whereas an HMG-I/Y type factor has either four or seven repeated HMG-I/Y sites [61]. The repeated DNA-binding domains in animal transcription factors can interact co-operatively with the same cis-acting element on a target gene. As one example, the second (R2) and third (R3) DNA-binding domains of Myb-related proteins make sequence-specific DNA contacts [37], they are closely packed in the major groove of the DNA with their helices in contact, and they bind bases co-operatively. These types of data are not generally available for plant transcription factors, but similar DNA-binding domains may have different specificities, as demonstrated for GT-2 in rice. The sequence-related trihelix motifs located in the C- and N-terminal halves of GT-2, respectively, bind preferentially to GT2-bx (GGTAATT) and GT3-bx (GGTAAAT), GT-box motifs of the rice phytochrome A gene [40]. Although much information is available, there is an urgent need to further characterize this critical domain if plant transcription factors are to be understood as well as these proteins are in animals.

Oligomerization domains

Many plant transcription factors form hetero- and/or homo-oligomers, affecting DNA-binding specificity, the affinity of transcription factors for promoter elements [73,74] and nuclear localization [43]. Oligomers are either stabilized by hydrophobic interactions between coiled coils and β sheets, or by reactions between hydrophilic residues, wherein the alignment of residues affects oligomerization by altering ionic environments [59]. Amino acid sequences of transcription factor oligomerization domains are usually highly conserved, each type yielding, in combination with DNA-binding regions, discrete three-dimensional arrangements (Table 1, Fig. 2). The oligomerization domain of bZIP-type factors is characterized by several regularly spaced leucine residues and by a zipper-like structure [43]. In b/HLH type factors, a helix–loop–helix composition appears [43], while MADS factors have oligomerization domains that form two α helices and two β-pleated sheets [44]. Transcription factors of the same family may differ in oligomerization domain length. For most bZIP factors, the leucine zipper is composed of four or five heptad repeats, but in Arabidopsis ATB2, it contains nine repeats [75]. In other cases, dissimilar regions outside oligomerization domains influence subunit association, as shown for MADS transcription factors in which a keratin-like domain termed K [60] and an intervening domain designated I [76] are essential for interaction. These variations in oligomerization increase the versatility of the transcription machinery, and they have the capacity to modulate gene expression in plants.

Transcription regulation domains

Transcription factors of the same family generally have distinct actions because of differences in their regulation domains, regions of the proteins that tend to diverge from one another [77]. Regulation domains, and hence transcription factors, function as either repressors or activators, depending on whether they inhibit or stimulate the transcription of target genes.

Repression of gene expression may occur via exclusion of activators from target promoters by competitive binding between transcription factors for the same cis-acting element. Other possible mechanisms include masking of regulation domains by dimerization of transcription factors, as well as interaction of repression domains with transcription factors [78]. No evidence for competition between plant transcription factors is available, but data on the latter two processes are emerging. For example, two bZIP proteins from rice, namely OsZIP-2a and -2b, dimerize in vitro with the wheat bZIP factor EmBP1, preventing its interaction with target promoters [52]. The observations indicate that these rice bZIP factors are able to quench other bZIP proteins, but definitive conclusions can only be drawn when they are tested more extensively. Several observations suggest the existence of repression domains in plant transcription factors, but they remain poorly characterized. PvALF (Phaseolus vulgaris ABI-3-like factor) activates transcription from selected genes, including the French bean phytohemagglutinin gene DLEC2. ROM2 (regulator of maturation-specific protein 2), a bZIP protein, binds to the enhancer site of DLEC2 and represses PvALF-activated transcription of the gene, but this ability is lost after the portion of the molecule N-terminal to its bZIP domain is deleted. Interestingly, the truncated protein binds to the enhancer site, and if it is joined to the activation domain of PvALF, the chimeric protein activates DLEC2. This indicates that a repression domain in the N-terminal half of ROM2 inhibits the PvAF-activated transcription of DLEC2[78].

Activation domains of plant transcription factors often exhibit sequence divergence, although the GCB motif found in many HBP-1a/GBF (histone promoter-binding protein-1a/G-box-binding factor) type bZIP factors is an exception [79]. The GCB (GBF-conserved box) motif, with the consensus sequence NLNIGMDXW, activates reporter genes when fused to the yeast GAL4 DNA-binding domain [79]. Modulation of gene expression by truncated transcription factors fused to the yeast GAL4 DNA-binding domain revealed that activation domains are enriched in either acidic amino acids, proline or glutamine, although this is not necessarily important for function [80–82]. Site-directed mutagenesis of residues in the activation domain of the maize Myb-like transcription factor, C1, demonstrated that only one of 11 acidic residues, namely aspartate 256, is essential. Leucine 253 is also involved in activation of transcription, and modification of other amino acids in the domain had no effect, indicating that single strategically placed residues determine activation [82]. Study of single amino acid changes in the activation domain of VP16, a herpes simplex virus transcription factor, and in the general transcription factors, TBP (TATA-binding protein) and TFIIB, also indicates that activation domain function depends on interactions between individual residues [82].

Amphipathic α-helices within acidic activation domains may be important functionally, but introduction of helix-incompatible amino acid residues into this region has little impact on C1 [82]. Moreover, the acidic activation domain of solubilized VP16 lacks α helix. β-Sheet in proline-rich activation domains was also thought to be required for function, but disruption of this secondary structure is without effect on HBP-1a(17) action [80]. In comparison with the results just described, the proline-rich region of HBP-1a(17) has several trans-activation modules that function when separate from one another, but not when they are present as a single unit [80]. As a consequence of this finding, it was proposed that intramolecular interactions cause conformational changes, thereby modulating activation potency. Obviously, comparing plant transcription factor regulation domains with corresponding regions of related proteins from other organisms is required and will provide valuable insight.

Nuclear localization signals

As for proteins from other organisms that selectively enter the nucleus, plant transcription factors contain NLSs characterized by a core peptide enriched in arginine (R) and lysine (K) [65,83–88]. The action of the basic core is influenced by flanking residues [85]. Within plant transcription factors, the NLSs vary in sequence, organization and number (Table 2). There may be a single NLS in which the basic residues are closely associated (SC, Table 2) [88] or a single NLS, wherein the basic residues form two functionally important groups separated by several non-conserved residues, the so-called bipartite structure (SB, Table 2) [88]. Other transcription factors contain multiple copies of the NLS [65,84–86] which may be functionally independent [84,86,87] and either clustered [65] or dispersed within the protein (Table 2). In the maize homeobox-finger proteins, ZmHox2a and 2b, for instance, eight putative bipartite NLS repeats are arranged in tandem at the N-terminus of each protein [65]. On the other hand, the rice and Arabidopsis trihelix factors GT-2 [84], the wheat homeodomain factor KNOTTED1 [87], and the maize bZIP factor Opaque2 [86] possess two functionally independent NLSs, either located within or exterior to DNA-binding domains.

Table 2. Sequence, organization and location of NLSs in representative plant transcription factors. Plant transcription factors may contain single continuous (SC), single bipartite (SB), double bipartite (DB), multiple continuous (MC) or multiple bipartite (MB) NLSs. Actual NLS sequences are shown in all cases except for the MB-type which is a consensus derived by comparison of eight putative NLSs in each of ZmHox2a and ZmHox2b. Bold letters represent conserved arginine and lysine residues of NLS core structures; …, non-conserved residues in the NLS consensus sequence; *, the NLS has been shown experimentally to promote nuclear translocation; **, putative NLS; unspecified region refers to those sequences of the transcription factor that lack DNA-binding, oligomerization and regulation activities.
NLS amino
acid sequence
Transcription factorLocation of NLSReference
SCKRIAEGSKKRRIKQD*Tomato HSFA1Oligomerization domain[88]
SBRKDKQRIEVGQKRRLTM*Tomato HSFA2Oligomerization domain[88]
DBKKCKEKFENVHKYYKRTK*Arabidopsis, rice GT-2N-terminal trihelix domain[84]
 KRCKEKWENINKYFKKVK*Arabidopsis, rice GT-2C-terminal trihelix domain[84]
 RKRKESNRESARRSRYRK*Maize Opaque2Basic region of bZIP domain[86]
 RRKLEEDLEAFKMTR*Maize Opaque2Vicinity of activation domain[86]
MCERSKKRSRE**Tobacco bZIP TAF-1N-terminal, unspecified region[85]
 ERELKREKRKQ**Tobacco bZIP TAF-1Basic region of bZIP domain[85]
 ARRSRLRKQ**Tobacco bZIP TAF-1Basic region of bZIP domain[85]
MBPAA.KRK….S…SP…VRVLRS**ZmHox2a and 2bN-terminus, unspecified region[65]

Using site-directed mutagenesis, the NLS within the DNA-binding area of Opaque2 was shown to retain its nuclear translocation capability, even when DNA-binding ability is lost [86]. Therefore, these two activities are independent of one another. Other experiments indicate that the functional competence of NLSs varies. As one example, the NLS within the DNA-binding domain of Opaque2, when fused to β-glucuronidase, translocates the reporter protein into the nucleus more efficiently than does its companion NLS [86]. In addition, although two putative NLSs are found in tomato heat-shock-responsive transcription factors HSFA1 and 2, only the NLS adjacent to the oligomerization domain (Table 2) supports translocation [88]. Some plant transcription factors may lack an NLS, and they are thought to be imported into the nucleus by dimerizing with proteins that possess these signals [89].

Evolution of transcription factor genes

Transcription factor genes of the same family but from diverse eukaryotic organisms show structural and functional similarity, suggesting that they evolved from a common ancestor. Gene duplication undoubtedly played an important role during this evolution, an idea supported indirectly by the observation that related pairs of maize homeodomain knox genes reside in duplicated regions of the genome [55]. After duplication, transcription factor gene distribution may be altered through translocation, and related family members are either dispersed throughout the genome or clustered on one chromosome [55,90,91].

Sequence alignment of transcription factor genes indicates that nucleotide substitution played a central role in the evolution of conserved regions, whereas substitutions and small insertions/deletions contributed to variable region diversification [92]. In addition, exon capture through recombination of different genes or parts thereof formed new transcription factor genes [14,54]. Strong evidence for exon capture was obtained by demonstration of spontaneous fusion between a metabolic enzyme-encoding gene and a homeodomain factor gene [14]. Sequence comparisons suggest that homeodomain leucine zipper genes [54], homeodomain ring-finger genes [41], b/HLH leucine zipper genes [58] and HMG-finger genes [66] originated through exon capture.

The basic helix–loop–helix DNA-binding/oligomerization sequences in the myc-like R genes of seven monocot and dicot species have a much lower nucleotide replacement rate than other regions [92]. Similarly, nonsynonymous nucleotide substitution within sequences encoding the DNA-binding/oligomerization domain of MADS genes is significantly less than for other areas [60]. DNA-binding/oligomerization encoding domains of transcription factor genes thus appear to diverge at reduced rates, and one explanation is that these regions are critical for function, even though they constitute less than half of each gene [92]. Mutations in these positions are usually detrimental and they are eliminated by natural selection. Clarification of evolutionary events and their importance in the design of these proteins awaits the characterization of more plant transcription factor gene families.


Biological and physical aspects of transcription factor gene control

Plant transcription factor genes may either be expressed constitutively or in organ-limited [12,50,93], stimulus-responsive [31,38,94–98], development-dependent [61,78,99,100] and cell-cycle-specific manners [101]. Although members of transcription factor gene families can differ in their time of expression [50], they may be transcribed simultaneously and co-ordinately, as shown for three Brassica GBFs, designated BnGBF1a, 1b and 2a. Their genes are transcribed at a constant ratio (1a > 2a > 1b) in various organs and developmental stages, with mRNA pools largest in photosynthetically active organs such as leaves and cotyledons [93].

Light [38,97], hypoxia [31,96], non-freezing low temperature [94,96], salt stress [96], abscisic acid [96,98] and gibberellic acid [95] individually influence transcription factor gene expression. In addition, plant transcription factor genes respond to multiple environmental signals, as shown for mLIP15 (maize low temperature-induced protein 15), a maize bZIP factor that binds to the promoters of wheat histone gene H3 and the low-temperature-inducible gene Adh1 (alcohol dehydrogenase-1). The amount of mlip15 transcript increases dramatically upon exposure to reduced temperature, salt stress and exogenous abscisic acid, but neither heat shock nor drought affects expression of this gene [96]. Equally interesting, members of a gene family are not necessarily responsive to the same stimulus; some genes of the bZIP family are regulated by light [53,102], while others respond to abscisic acid, auxin and salicylic acid [98,103–105]. The impressive range of effectors that modulate expression of transcription factor genes forecasts an equally large number of response mechanisms, and these are described in the following sections.

Expression kinetics of transcription factor genes

Coincidental accumulation of mRNA from plant transcription factor genes and their targets occurs in response to several stimuli, and consistent with a role in activation, the proliferation of many regulatory gene mRNAs precedes the expression of their effector genes [31,38,95,98]. In contrast, quantitative changes in transcripts from other regulatory and target genes are inversely correlated, indicating that products of the former are transcriptional repressors [53,106]. Members of the same plant transcription factor multigene family may be expressed differently. Examples are the bZIP factors, CPRF-1 (common plant regulatory factor-1), -2 and -3, which bind specifically to Box II, an important promoter region of the parsley light-responsive chalcone synthase gene (chs). CPRF-1 mRNA is made transiently when dark-grown parsley cells are exposed to light, suggesting that its product participates in light-mediated activation of the parsley chs gene. Concurrently, CPRF-3 mRNA decreases gradually, revealing an inverse correlation to the amount of chs mRNA. CPRF-2 mRNA is unaffected by light [53]. These distinct kinetics imply differences in the regulatory capacity of each of the related transcription factors.

The relationships between regulatory and regulated genes exhibit other nuances. Some genes encoding transcription factors that recognize stimulus-responsive cis-acting elements are not regulated by the stimulus; the tobacco bZIP factor, TFHP-1, interacts specifically with the wounding-responsive cis-acting element of prxC2, a horseradish peroxidase gene, but transcription of the TFHP-1 gene is indifferent to leaf damage [107]. Similarly, transcription of genes encoding GT-1 (GT-rich motif binding factor 1), and CGF-1 (chlorophyll a/b-binding protein gene GATA-box factor 1), factors that associate with GATA boxes of selected light-inducible genes, is constitutive rather than light-responsive [108]. As one explanation for this behavior, the GT-1 and CGF-1 regulator genes may lack appropriate stimulus-responsive cis-acting elements. In another case of atypical regulation, fluctuation in the amount of regulator gene transcripts does not influence expression of putative target genes, as occurs for ZmHox (Zea mays homeobox) 1a and 1b. These maize transcription factors recognize the promoter of Sh 1 (shrunken-1); however, this gene is unaffected by altering the expression of either ZmHox 1a or ZmHox 1b[109]. Modulation of Sh 1 potentially entails other stimulus-responsive regulators that co-operate with transcription factors, or the regulator genes may be controlled post-transcriptionally, as discussed below.

Regulation of transcription factor genes by cis-acting elements

Quantitative variations in transcription factor mRNA, achieved by suppression and over-expression experiments, cause substantial changes in plants [12]. Therefore, accurate regulation of transcription factor genes by their cis- and trans-acting elements is potentially very important and several examples of this are known. The seed-specific expression of Atmyc1, a Myc-related transcription factor gene of Athaliana, is conferred by a cis-regulatory element designated as the Sph box (CATGCATG) [110]. Because of common cis-acting elements, regulatory and regulated genes are sometimes influenced by the same transcription factors [68,111]. In addition, some transcription factor genes have cis-acting elements that are affected by their own products [68,97,111].

The reaction of a single plant transcription factor gene to more than one stimulus may depend upon multiple cis-elements. The regulatory sequence of the Arabidopsis HMG-I/Y gene contains putative binding sites for bZIP, Myc, Myb, GT-2, and HBP-1 transcription factors [112]. The upstream sequence from position −142 to −132 (CGTCCATGCAT) of the maize transcription factor gene, C1, is essential for regulation by VP1, whereas a larger overlapping element, from −147 to −132 (CGTGTCGTCCATGCAT) modulates activation by abscisic acid. A separate light-sensitive cis-element, similar to those found in other genes that respond to light, is located between positions −116 and − 59 of C1[113]. These findings support the premise that differential expression of related transcription factor genes upon exposure to disparate environmental stimuli is due to cis-acting elements. However, the regulatory sequences of more transcription factor genes must be characterized before the general applicability of this rule to plants is certain.

Alternative mRNA splicing, a post-transcriptional mechanism for regulation of transcription factor genes in plants

Alternative splicing has been documented for mRNAs originating from several plant transcription factor genes [72,114,115]. An example is the maize P gene, which generates two transcripts by alternative splicing at the 3′ end of the precursor mRNA. The larger message of 1802 nucleotides encodes a 43.7-kDa protein with an N-terminal region showing 40% identity with the DNA-binding domain of several Myb family proteins. A smaller message of 945 nucleotides produces a 17.3-kDa protein that contains most of the Myb domain, but it differs from the first protein at the C-terminus [114]. A second illustration of alternative splicing involves the constitutively transcribed LSD1 (lesion simulating disease resistance 1), an Arabidopsis zinc finger protein gene encoding a repressor that acts during pathogen-induced cell death [72]. Alternative splicing of LSD1 mRNA affords major and minor transcripts, with one and two start codons, respectively. The latter message produces the more prevalent protein, while the former gives rise to a protein with five extra amino acid residues at its N-terminus. In yet another case, alternative splicing of the rice myb7 gene transcript occurs. One mRNA yields a protein containing a partial Myb domain followed by a leucine zipper motif encoded by an unspliced intron; the other mature transcript, with two introns spliced, furnishes a complete Myb factor [115].

Transcription factors derived from the same precursor mRNA by alternative splicing may have distinct regulatory functions. As a consequence, adjusting the ratio between mRNAs, as occurs for rice myb7 during anoxia stress [115], acts as a switch that affects expression of other genes. More comparative analyses of sibling transcripts must be performed in order to determine how important alternative mRNA splicing is to the synthesis and function of plant transcription factors.

Repression of plant transcription factor mRNA translation by upstream open reading frames (ORFs)

Short upstream ORFs, usually with start codons in poor context when compared with the eukaryotic consensus (A > G) CCATGG, occur in the leader sequences of transcripts originating from members of several plant transcription factor gene families, including Arabidopsis HMG-I/Y, maize B/R and bZIP[112], and rice myb[115]. As one example, the upstream ORF of Arabidopsis HMGI/Y mRNA, which encodes a polypeptide of 13 amino acid residues, ends 60 nucleotides upstream of the start codon for the transcription factor. Upstream ORFs in the mRNA transcripts from a few transcription factors repress translation from downstream ORFs. To demonstrate that expression of rice myb7 mRNA is repressed by its upstream ORF, the myb7 gene fragment containing the upstream ORF was fused upstream to a chloramphenicol acetyltransferase (CAT) reporter gene [115]. In a transient expression system, CAT activity increased threefold when the start codon in the upstream ORF was mutated, suggesting that the upstream ORF represses translation of the myb7 mRNA ORF. Because upstream ORFs only exist in some members of a multigene family, translational repression by upstream ORFs varies the expression of related transcription factor genes, offering another mechanism for differential regulation of structural genes in plants.

Post-translational regulation of transcription factor activity

Effect of translocation on transcription factor activity

Plant transcription factors enter the nucleus by a selective process, this being a prerequisite for their function [116,117]. The nuclear pores of higher plants contain proteins that bind the NLS of transcription factors [118], and mutations in localization sequences impair interaction with these proteins. Some transcription factors, such as the Arabidopsis bZIP protein, HY5 (hypocotyl 5), normally occur in the nucleus [116] but the position of many others is controlled by environmental stimuli [88,117]. Two pools of bZIP GBFs were found in dark-grown parsley by using a cell-free system prepared from protoplasts [117]. Nuclear translocation of one pool was determined by light-induced cytosolic phosphorylation of the constituent transcription factors, while localization of the other seemed dependent on phosphorylation caused by a different stimulus. Phosphorylation also regulates nuclear import of many proteins by altering cytoplasmic retention factors, affecting intra- and inter-molecular NLS masking, or by directly modifying the NLS [119]. However, it is not known if the translocation of plant transcription factors is controlled by these mechanisms.

As revealed by in-situ immunolocalization, several plant transcription factors move between cells via plasmodesmata, resulting in nonautonomous effects [120,121]. The details of this mechanism were reviewed recently, demonstrating that intercellular translocation plays an important role in the actions of transcription factors [121].

Post-translational modifications affect binding of transcription factors to DNA

Regulation of transcription factor binding to DNA via protein phosphorylation and dephosphorylation may determine the expression of many target genes, including those that encode transcription factors. The interaction of some transcription factors with DNA is abolished by dephosphorylation [122,123] and stimulated by phosphorylation [122,124], whereas the reverse is true for others [125–127], or they are unaffected [123]. It is possible that several kinases influence the combination of various plant transcription factors with DNA, but only casein kinase II [122] and serine kinases [124,126,127] have been characterized in this light. These kinases function in either the cytoplasm [124], nucleus [126] or other organelles where they may associate with the transcription machinery [128]. Both external and internal stimuli affect the regulatory mechanisms. For instance, serine residues in the DNA-binding domain of the bZIP transcription factor HBP-1a(17) are phosphorylated in a Ca2+-dependent manner [126] while phosphorylation of another bZIP factor, Opaque2, is controlled by a circadian-clock-related mechanism [125].


Functional domains within many plant transcription factors have been characterized, but much remains to be learned about these proteins. The cloning and sequencing of more plant transcription factor genes is necessary, as is examination of their expression, either during development or upon exposure to external stimuli. X-ray crystallography and NMR spectroscopy will provide essential data, revealing in high resolution the structure of transcription factors and their interaction with DNA. Most known transcription factors possess only one distinct DNA-binding/oligomerization region; however, a few contain two, and some have unusual sequences such as an RNA-binding site or a G-protein β-subunit homologous domain. A demonstration of the existence of additional novel transcription factors and their biological significance awaits further study. Clarification of transcription factor gene evolution and chromosomal distribution will increase our knowledge of organism diversification and, from a practical perspective, advance genetic engineering. In addition, identification of cis- and trans-acting elements associated with plant transcription factors will reveal mechanisms that regulate their expression. Examination of post-transcriptional and post-translational controls of transcription factor gene expression will further elucidate the biological consequences of alternative mRNA splicing, translational repression, modification of DNA-binding activity and differential translocation on transcription factor activity. Clearly, control of transcription factor gene expression and function involves an important network of interrelated processes. Discerning the biophysical and biochemical bases of these processes is a major challenge, as well as an excellent opportunity to make fundamental contributions to our understanding of plant cell/molecular biology.


We thank Drs R. Mackay, B. Pohajdak, D. Richardson, S. Douglas, and Y. Pan for their critical comments. We are grateful for support from Saint Mary’s University. This work was funded by grants from the Natural Sciences and Engineering Research Council of Canada to M.J.W. and T.H.M.