Spermatophyte seed-storage proteins have descended from a group of proteins involved in cellular desiccation/hydration processes. Conserved protein structures are found across all plant phyla and in the fungi and Archaea. We investigated whether conservation in the coding region sequence is paralleled by common gene regulatory processes. Seed- and spore-specific gene promoters of three phylogenetically diverse plants were analysed by transient and transgenic expression in Arabidopsis thaliana and tobacco. The transcription factors FUS3 and ABI3, which are central regulators of seed maturation processes, interact with cis-motifs of seed-specific promoters from distantly related plants. The promoter of a fern spore-specific gene encoding a seed-storage globulin-like protein exhibits strong seed-specific activity in both Arabidopsis and tobacco. The existence of phylogenetic footprints indicates good conservation of regulatory pathways controlling gene expression in fern spores and in gymnosperm and angiosperm seeds, reflecting the concerted evolution of coding and regulatory regions.
Plant seeds have evolved to nourish, protect and distribute the next-generation embryo, and represent a suitable experimental system for the study of tissue-specific gene regulation in differentiation and development. Moreover, human nutrition and animal feeding mainly depend on seeds of crop plants.
The function of FUS3 relies on its interaction with the RY motif that is present in a number of seed-specific gene promoters. Using an approach called phylogenetic shadowing (Boffelli et al., 2003) or footprinting (Gumucio et al., 1992), the RY motif has been identified as the core of the legumin box (Bäumlein et al., 1986; Dickinson et al., 1988). Mutation in either FUS3 or the RY motif can effect a drastic reduction in the activity of the promoter, resulting in the loss of seed-storage protein synthesis (Bäumlein et al., 1992; Reidt et al., 2000). Similar outcomes have been described for ABI3 (Mönke et al., 2004). The RY motif (characterized in part by the consensus sequence CATGCA) lies approximately 100 bp upstream of the transcription start of many seed-specific gene promoters, including those regulating the synthesis of globulins, napins and oleosins (Bäumlein et al., 1986, 1992, 1994; Dickinson et al., 1988; Mönke et al., 2004; Reidt et al., 2000). A nearly perfectly conserved copy of the RY motif is also present in the first intron of the rice FLO/LFY-like gene, and has been identified as a candidate regulatory site (Bomblies and Doebley, 2005; Prasad et al., 2003). FUS3 is considered to be a nexus of hormone-controlled processes during embryogenesis (Gazzarini et al., 2004), and is required in the protoderm to restrict the domain of expression of TTG1 (Tsuchiya et al., 2004). It has also been shown to interact with RY in repression of the gibberellin biosynthesis gene AtGA3ox2 (Curaba et al., 2004). Together with LEC1 and LEC2, FUS3 gene expression is negatively regulated by the CHD3-chromatin remodelling factor PICKLE, a regulator of embryonic identity (Rider et al., 2003). Recent data suggest that the interaction between B3 domain transcription factors and RY motifs represents just one component in a complex regulatory pathway that is required for seed-specific expression in both dicot embryos and the monocot endosperm. Thus, an ensemble of cis-motifs consisting of ACGT-like motifs interacting with bZIP factors, RY motifs interacting with B3 factors, CTTT-like motifs interacting with DOF factors, and AACA-like motifs interacting with MYB factors has been suggested as the common underlying principle behind seed-specific gene regulation (Vicente-Carbajosa and Carbonero, 2005).
The vicilins and legumins are representative of the two major classes of seed-storage globulin in the angiosperms and gymnosperms. The structure and evolution of these nutritionally important proteins, along with the enzyme machinery responsible for their molecular maturation and mobilization during seed maturation and germination, have been intensively studied (Shutov et al., 2003). The two globulin types share a two-domain tertiary structure (Adachi et al., 2001), which is presumed to have evolved from a common single-domain germin-like ancestor (Bäumlein et al., 1995; Dunwell, 2003) that may be traced back to the Archaea (Shutov et al., 1999). Similar single-domain proteins (the so-called spherulins) have been identified in the myxomycete Physarum polycephalum, where they have been shown to be necessary for the acquisition of cellular desiccation tolerance (Lane et al., 1991). Thus it has been proposed that the ancestral seed-storage globulin was recruited from a developmentally regulated protein specific to desiccation-tolerant tissues. A vicilin-like holoprotein, specifically expressed in spores of the fern Matteuccia struthiopteris (Kakhovskaya et al., 2003; Shutov et al., 1998), has the structural features of an intermediate two-domain ancestor of both the vicilins and the legumins, as it is composed of two-domain subunits (Shutov et al., 2003). However, both domains of the Matteuccia vicilin-like protein (MVP) retain germin-like structures that lack characteristic variable inserts that are functionally important in advanced seed-storage proteins. The structural features of MVP are more like those of a non-spermatophyte ‘primitive’ storage protein, so the protein probably represents a progenitor of the spermatophyte storage globulins (Shutov et al., 1998). Supporting evidence for this hypothesis has been obtained from Cycadophyta, Ginkgoophyta, Pinophyta, Gnetophyta and Magnoliophyta gene and protein sequences (Shutov and Bäumlein, 1999; Shutov et al., 1998, 1999).
Determination of the evolutionary pathway that has shaped extant seed protein coding regions, as well as the state of knowledge covering the regulatory processes involved in seed-specific gene expression, now offers the possibility of investigating the evolution of both coding and regulatory sequences in distantly related plant species. Here, we describe the evidence for phylogenetic footprints in fern spore- and seed-specific gene promoters.
A promoter fragment of MVP was cloned by genome walking. Its identity was established by the presence of 98 bp of the known gene sequence (including 60 bp of coding sequence). A search for conserved cis-motifs identified a TATA consensus at 66 bp upstream of the ATG start codon and an RY motif (represented by the CATGCA consensus sequence) 56 bp upstream of the TATA box. Two extensive stretches of simple CT dinucleotide repeats are also present (Figure 1). We extended the 282 bp flanking the Ginkgo biloba legumin gene GNK (Häger et al., 1995) to 424 bp. A TATA consensus sequence occurs 48 bp upstream of the ATG start codon, and one RY CATGCA motif is 107 bp upstream, while a second is 8 bp downstream of the TATA box (Figure 1). In the 492 bp Zamia furfuracea vicilin protein gene (ZVP), a TATA consensus sequence occurs in the promoter, 60 bp upstream of the ATG start codon, but unlike in the MVP and GNK gene promoters, no RY consensus sequence was identifiable (Figure 1). Neither a TATA consensus sequence nor any RY consensus sequences were found in the 334 bp promoter of the P. polycephalum spherulin 1b gene (SPH).
To test whether the RY motif present in the promoters of the MVP and GNK genes could be responsible for the observed seed-specific promoter activity, all 24 322 promoters of the Arabidopsis genome were considered as control set. Of these promoters, 92.6% contain no RY consensus sequence (CATGCA), 6.7% contain only one, and 0.7% contain at least two, indicating that two or more occurrences of the RY consensus sequences may be considered statistically significant. Hence, the GNK gene promoter and the Vicia faba USP (‘unknown seed protein’) gene promoter, containing two RY consensus sequences each, show a statistically significant enrichment of the RY motif.
Induction by transiently co-expressed Arabidopsis transcription factors
The over-representation of RY consensus sequences in the MVP and GNK gene promoters is suggestive of the conservation of regulatory pathways within dicots, gymnosperms and even pteridophytes. We tested this notion by generating promoter–GUS reporter gene constructs in transient expression assays in Arabidopsis protoplasts. Each experiment was repeated three times with independent preparations of plasmids and protoplasts. CaMV 35S promoter-driven ectopic co-expression of FUS3 and ABI3 generated an increase in the activity of both the MVP and GNK gene promoters (Figure 2). An even greater increase was induced when the transcription factors were co-expressed. The effect of FUS3 is less than that of ABI3 for the GNK gene promoter, but the two transcription factors had a similar effect for the MVP gene promoter. Two well-characterized seed-specific promoters controlling the expression of USP and legumin B4 (LeB4) of V. faba (Bäumlein et al., 1991a; b) were used as a control. To demonstrate that the induced transient expression depends on an intact RY motif, a CATGCA to CATGGT mutated motif in both promoters was tested with the transcription factor ABI3 (Figure 3). This mutation destroys the function of the RY motif in the LeB4 (Bäumlein et al., 1992) and USP gene promoters (Fiedler et al., 1993). As expected, the mutant MVP and GNK gene promoters were not induced. Neither the ZVP nor the SPH gene promoter showed any clear induction with either of the transcription factors.
Thus, four gene promoters (MVP, GNK, USP and LeB4) show an increase in transient activity when co-expressed with FUS3 or/and ABI3, whereas two promoters (ZVP, SPH) show no increase. To test whether the observed transient expression (and non-expression) might be correlated to the presence (and absence) of the RY consensus sequence in the corresponding promoters, Fisher’s exact test (Fisher, 1992) was applied to these five promoters and the control set of all 24 322 promoters of the Arabidopsis genome, which were assumed to show no increase in transient activity for this study. All four promoters that show an increase of the transient activity also contain at least one RY consensus sequence, whereas the ZVP gene promoter, which does not show an increase of the transient activity, contains no RY consensus sequence. As only 1801 promoters (out of 24 322 promoters) of the Arabidopsis control set contain at least one RY consensus sequence, a P-value of 3 × 10−5 is obtained using Fishers’s exact test, indicating that over-representation of the RY consensus sequence in the transiently expressed promoters is statistically significant.
To investigate whether there are other hexamers that are significantly over-represented in the promoter regions of the transiently expressed genes, we applied Fisher’s exact test to all of the possible 4096 hexamers, rank the hexamers by their P-value, and show the 10 hexamers with the lowest P-values in Table 1. Remarkably, the RY consensus sequence CATGCA has the second lowest P-value, indicating that only one of all 4096 hexamers has a higher correlation with the expression data than the RY consensus sequence. Interestingly, the lowest P-value was obtained for the hexamer GCATGC, which deviates from the RY consensus sequence only by a shift of a single nucleotide.
Table 1. The 10 hexamers with the most significant over-representation in the promoter regions of the transiently expressed genes
P-values were computed by Fisher′s exact test.
8.00 × 10−7
3.02 × 10−5
3.34 × 10−4
6.23 × 10−4
7.36 × 10−4
7.39 × 10−4
1.04 × 10−3
1.13 × 10−3
1.23 × 10−3
1.37 × 10−3
Seed-specific promoter activity in transgenic Arabidopsis and tobacco plants
The promoter–reporter constructs were transformed into Arabidopsis and tobacco to test whether the effects observed in transient assays were reproducible in stably transformed plants. Consistently across five independent transgenic events, the MVP and GNK gene promoters were strongly activated in seeds, and lacked activity in leaves (Figure 4). The same pattern occurred with the ZVP gene promoter–reporter construct, even though this promoter neither contains an RY consensus sequence nor responded to the transcription factors FUS3 and ABI3 in the transient assay. No promoter activity was observed when the GUS reporter gene was driven by the SPH gene promoter (data not shown). A positive control is provided by analysis of a ubiquitously active CaMV promoter–reporter construct, which shows GUS activity both in seeds and leaves. No GUS signal was detectable in non-transformed control plants. The USP gene promoter construct used as positive control showed the expected seed-specific activity. The same constructs were also transformed into tobacco, where similar patterns of promoter activity as seen in Arabidopsis were observed (Figure 5). The MVP, GNK and ZVP gene promoters were active in the developing embryo and in the endosperm dissected from the embryo, but none showed any activity in the leaves.
As histological GUS staining is not suited to estimation of quantitative differences in expression, fluorimetric measurement was used to confirm the qualitative outcomes of the GUS-based experiments. In five independent MVP and GNK transgenic Arabidopsis and tobacco lines, GUS activity in leaves was low and comparable to the level in wild-type plants. The USP gene promoter is associated with very high seed-specific expression promoter activity. The ZVP gene promoter activity was still about fivefold above that of the control (Figure 6).
The correct tissue-specific and development-dependent activity of promoter constructs in transgenic hosts demonstrates the existence of a complex ‘cis-regulatory code’ (Berman et al., 2002; Davidson, 2002; Dermitzakis and Clark, 2002; Markstein et al., 2002; Sumiyam et al., 2001), which is much less understood than is the protein coding machinery. It is generally accepted that a regulatory code of rather short and degenerated promoter sequence motifs arranged in combinatorial ensembles is deciphered by interacting transcription factor complexes. The frequently strong conservation of protein coding regions between even rather distantly related species has led to the conclusion that evolution occurs mainly by changes in gene regulation. Comparative screens of promoters in orthologous genes from closely related species (phylogenetic shadowing) or between more distantly related species (phylogenetic footprinting) has identified conserved, and therefore putatively functional, promoter motifs (Boffelli et al., 2003; Gumucio et al., 1992). In addition to studies on the MADS box genes (DeBodt et al., 2006), promoter evolution studies in plants have mostly been restricted to closely related species such as the cereals (Guo and Moose, 2003), and generally do not go beyond the family level (such as, for example, the Brassicaceae, Koch et al., 2001).
We have demonstrated how certain gene regulatory functions related to seed- and spore-specific gene expression are conserved across a wide species range, from ferns through cycads, gymnosperms and G. biloba to the angiosperms Arabidopsis and tobacco. There appears to be a strong functional conservation of the RY consensus sequence, which has been identified as the target for central regulators of seed development containing the B3 domain, such as FUS3, ABI3 and LEC2 (Mönke et al., 2004; Reidt et al., 2000; To et al., 2005). That the regulation of seed processes has been conserved over long evolutionary periods is also evident from the observation that the moss Physcomitrella patens ABI3 gene can partially complement the ABA insensitivity during germination of the Arabidopsis abi3-6 mutant (Marella et al., 2006).
Since the discovery of alternating plant generations over 150 years ago, the reproductive structures and the processes of the heterosporic ferns and spermatophytes have been considered to represent homologous developmental pathways. Specifically, this has prompted the homologous pairing of the megasporophyll with the carpel, the megasporangium with the nucellus, the megaspore mother cell with the embryo sac mother cell, the megaspore with the embryo sac cell, and the megaprothallium with the primary gymnosperm endosperm, and the angiosperm developed embryo sac with its antipodals, polar cell and egg apparatus. The question remains as to how the seed habit evolved from plants which, like ferns, lacked a dormant embryonic phase in their development. A prominent hypothesis has suggested that this occurred via a heterochronic shift in a storage and dormancy programme from a pre-embryonic (spore) phase of the plant life cycle to a post-embryonic (seed) phase (Banks, 1999). Tissues of distinct histogenetic origin would have been recruited for development into a desiccation-tolerant storage tissue. These origins may have arisen from behaviour of the primary endosperm of gymnosperms (1n), the storage cotyledons in dicotyledonous embryos (2n) and the triploid secondary endosperm (3n) of the grasses as secondary fertilization products. The spore-specific expression of a gene encoding a storage globulin-like protein (Shutov et al., 1998) has been taken to suggest that fern spores may also function as a storage tissue, an idea further supported by the interaction between FUS3 or ABI3 and the MVP gene promoter (Figure 2). Alternatively, MVP may not be a genuine storage globulin, but rather a protein related to the desiccation process, as seed globulins are descended from germin- and spherulin-like ancestors involved in various cellular desiccation processes (Shutov and Bäumlein, 1999; Shutov et al., 1998; Wohlfarth et al., 1998). Although MVP shares the structure of a typical β-barrel of bicupins (Dunwell, 2003), it lacks variable hydrophilic insertions. The acquisition of these insertions as processing sites accessible to limited proteolysis has played a key role in the evolution of genuine storage proteins and their co-evolution with specific proteases (Shutov et al., 2003).
The interaction between RY motifs and B3 domains represent just one component of a more complex cis-motif ensemble that is conserved among genes expressed in Arabidopsis cotyledons and barley endosperm. Additional important components required for gene expression in the seed include the ACGT, AACA and CTTT motifs as putative targets for bZIP factors such asAtbZIP10 and AtbZIP25, MYB transcription factors and DOF zinc fingers (Vicente-Carbajosa and Carbonero, 2005). The ZVP gene promoter lacks the RY consensus sequence and is not activated by transiently co-expressed transcription factors. However, it still exhibits seed-specific activity when stably incorporated as a transgene. It may be that the specificity of this promoter requires its full integration into the genome to ensure proper chromatin packaging, as has also been suggested for the phaseolin gene promoter (Li et al., 2001; Ng et al., 2006). Alternatively, it may depend on an alternative, B3 domain- and RY motif-independent regulatory pathway. The bZIP, MYB and DOF factors are good candidates for such a pathway, especially as three ACGT consensus sequences are found in the ZVP gene promoter. This is suggestive of the presence of two separated, ancient regulatory pathways – the B3/RY pathway and a bZIP/ACGT pathway – which have become merged in the developing Arabidopsis embryo and cereal endosperm (Vicente-Carbajosa and Carbonero, 2005). Taken together, our data demonstrate a high level of conservation of the spore- and seed-specific gene regulatory pathway, involving the interaction between transcription factors containing a B3 domain and RY motifs. The system represents an example of the concerted evolution of protein coding gene regions and the corresponding regulatory promoter sequences.
PCR and DNA digestion and ligation were performed according to standard molecular biology protocols (Sambrook and Russel, 2001). Purified M. struthiopteris, G. biloba, Z. furfuracea and P. polycephalum genomic DNA was used to isolate, via genome walking, the promoter regions of genes encoding, respectively, a vicilin-like protein (Z54364.1), a legumin-like protein (Z50778.1), a vicilin-like protein (Z50791.1) and spherulin 1b (M18429.1). Genomic DNA of M. struthiopteris was digested with StuI, DraI, EcoRV or ScaI, that of G. biloba with PvuII or SmaI, that of Z. furfuracea with PvuII or DraI, and that of P.polycephalum with Ecl136II or EcoRV. The restriction fragment 5′ ends were ligated to the adaptor sequence 5′-GTAATACGACTCACTATAGGGCACGCGTGGTCGACGGCCCGGGCTGGT-3′ (BD GenomeWalker universal kit, BD Biosciences; http://www.bdbiosciences.com), and used for genome-walking PCR with forward primers AP1 5′-GTAATACGACTCACTATAGGGC-3′ and AP2 5′-ACTATAGGGCACGCGTGGT-3′ and Advantage 2 polymerase mix (Advantage 2 PCR kit; BD Biosciences) with cycling parameters according to the manufacturer’s protocol. Primers AP1 and AP2 were used as forward primers for the primary and secondary genome-walking reactions, respectively. Reverse gene-specific primers matched the 28–30 bp of coding sequence approximately 100–150 bp downstream of the ATG start codon. All promoter fragments were re-amplified by PCR to add a HindIII site to the 5′ end, and a PstI or SalI site at the 3′ end. PCR amplification was achieved using Pfu polymerase (Stratagene, http://www.stratagene.com/), and selected amplicons were isolated from agarose gels using an Extract II nucleo-spin kit (Machery Nagel; http://www.macherey-nagel.com). A SureClon ligation kit (Amersham Pharmacia Biotech, http://www5.amershambiosciences.com/) was used for blunt-end cloning in the pUC18 vector. A DNA ligation kit (Strategene) was used for sticky-end ligation of the HindIII–PstI promoter fragments into the vector pGUS1 cut with HindIII and PstI (Bäumlein et al., 1991a,b). Plasmids were isolated using a FlexiPrep kit (Amersham) for small-scale preparation and a QIAQuick plasmid isolation kit (Qiagen, http://www.qiagen.com/) for midi-preps. DNA fragments were sequenced by primer walking using a 3730xl-DNA sequencer (Applied Biosystems; http://www.appliedbiosystems.com).
Promoter sequence analysis
Promoter sequences for the MVP, GNK and ZVP genes were obtained and analysed in this study. Previously published sequences (V. faba USP and LeB4 legumin; Bäumlein et al., 1991a,b) were used as positive controls. The promoters were truncated at the 5′ end to the length of the shortest of these promoters (GNK, 424 bp). The control set from Arabidopsis thaliana was obtained by extracting 424 bp of upstream sequence of all GenBank annotated mRNAs (chromosome I, NC_003070.5; chromosome II, NC_003071.3; chromosome III, NC_003074.4; chromosome IV, NC_003075.3; chromosome V, NC_003076.4), ending 1 bp upstream of the presumed translation start site. For genes with putative multiple coding sequences, the promoter region associated with the most upstream translation start point was chosen. Sequences where there was either an overlap with the presumed coding region of another gene, or where the annotation was ambiguous (e.g. no start codon at the annotated translation start site), were excluded. Two transcription factor binding motifs were considered: the TATA motif (consensus sequence TATAWAW) and the RY motif (consensus sequence CATGCA). The former is the common TATA box core of eukaryotic promoters (Bucher, 1990; Shahmuradov et al., 2003).
In vitro mutagenesis
The R -motif in the MVP and GNK promoters was mutagenized in vitro using the QuikChange site-directed mutagenesis kit (Stratagene) with two primer pairs (MVP forward, 5′-TGGGTGTCGCGCATGGTGCAGCGCAATGTGGTG-3′ and MVP reverse, 5′-CACCACATTGCGCTGCACCATGCGCGACACCCA-3′; GNK forward, 5′-CGAGGAGGGGGCTGGCATGGTAGATAGCGGAG-3′ and GNK reverse: 5′-CTCCGCTATCTACCATGCCAGCCCCCTCCTCG-3′), each designed to include two non-wild-type nucleotides (underlined) within the RY motif. These nucleotide substitutions alter the motif from CATGCA to CATGGT, which eliminates the SphI restriction site within the motif. To remove non-mutated parental DNA template, the amplicon was restricted using methylation-specific DnpI in addition to SphI. Putative mutant plasmids were re-sequenced to confirm incorporation of the mutation.
Protoplast transformation and transient expression
Arabidopsis thaliana suspension cultures provided a source of protoplasts. These were isolated as follows: the cell walls were digested using 1% cellulose R10 and 0.5% macerozyme R10 (Duchefa Biochemie; http://www.duchefa.com), and the resulting protoplasts were centrifuged (100 g) and washed twice in W5 medium (0.9% NaCl, 1.8% CaCl2, 0.04% KCl, 0.1% glucose, pH 5.6). Next, they were concentrated to a density of approximately 3 × 106 cells ml−1 in 0.45 m mannitol, 15 mm MgCl2, 0.1% MES, pH 5.6. Transformation was effected by adding 5 μg of plasmid and 160 μg carrier DNA to 330 μl of the protoplast suspension (equivalent to approximately 1 × 106 cells), equilibrating for 10 min, and then adding an equal volume of PEG (40% PEG 6000, 0.1 m Ca(NO3)2, 0.4 m mannitol, 0.1% MES, pH 6.5). After 20 min, the protoplasts were diluted into 4 ml K3 medium (Nagy and Maliga, 1976), incubated in small Petri dishes for 16–18 h at 25°C in the dark, and harvested. GUS activity was determined by chemiluminiscence assay using a Tropix GUS-Light™ kit (Applied Biosystems) and a Lumat LB9501 luminometer (Berthold; http://www.bertholdtech.com). A control construct consisting of the 35S CaMV promoter driving the GUS reporter gene was used to standardize the experiments, as it is expresses efficiently in this system. Three replicates of each experiment were performed, using independent plasmid preparations and independently isolated batches of protoplast.
An Agrobacterium tumiefaciens-mediated leaf disc procedure was used to transform tobacco, as described by Bäumlein et al. (1991a,b). Arabidopsis thaliana transformation was performed using the vacuum infiltration method, as described by Bechthold et al. (1993).
Reporter gene detection
GUS staining followed standard procedures (Jefferson et al., 1987), and was visualized under either a light Axioplan 2 microscope (Zeiss, http://www.zeiss.com/) or a binocular StereoLumar V12 microscope (Zeiss). Quantitative GUS assays were performed using the Tropix GUS-Light kit (Applied Biosystems). Protein concentrations were determined using Bradford reagent (Bio-Rad, http://www.bio-rad.com/) with BSA as a standard.
We are grateful to Elke Liemann, and Sabine Skiebe for their excellent technical assistance, and to Professors A.D. Shutov (State University of Moldova) and K. Müntz for their long-term collaboration on the evolution of seed globulins. Financial support from the Leibniz Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), the Deutsche Forschungsgemeinschaft (DFG grant to I.K.), and the Federal Ministry of Education and Research (BMBF grant number 0312706A) is acknowledged. We thank http://www.smartenglish.co.uk for linguistic help in the preparation of this manuscript.