The completion of the Arabidopsis genomic sequence offers the possibility to extract global information about regulatory mechanisms. Here, we describe a data mining strategy in combination with gene expression analysis to identify bona fide genes regulated by the E2F transcription factor. Starting with a genome-wide search of chromosomal sites containing E2F-binding sites, we studied in depth two of the most abundant E2F-binding sites within the Arabidopsis genome and identified over 180 potential E2F target genes. Among them and in addition to cell cycle-related genes, we have also identified genes belonging to other functional categories, e.g. transcription, stress and defense or signaling. We have determined the expression levels of genes selected from different categories under two experimental situations. Using cultured cells partially synchronized with aphidicolin, we found that most potential E2F targets identified in silico show a cell cycle-regulated expression pattern with a peak in early/mid S-phase. In addition, we used Arabidopsis transgenic plants expressing a DP gene containing a truncated DNA-binding domain, which likely has a dominant-negative effect on AtE2Fa, b and c (also named AtE2F3, 1 and 2, respectively), which require DP for efficient DNA binding. Contrary to the up-regulation observed in early/mid S-phase-cultured cells, the expression of a large number of potential E2F targets was decreased in the transgenic plants. Our results strongly support that the RBR/E2F pathway plays a crucial role in regulating the expression of the genes identified in this study.
Cell cycle progression is universally controlled by the sequential and coordinated activity of cyclin-dependent kinases (CDKs). One of the key stages in eukaryotic cell cycle control is the G1/S transition. In higher eukaryotes, the retinoblastoma(RB)/E2F/DP pathway is crucial because E2F/DP transcription factors regulate the expression of genes required for G1/S transition and S-phase progression (Trimarchi and Lees, 2002). Hypophosphorylated forms of RB family members negatively regulate E2F/DP activity while CDK/cyclin complexes inactivate RB function thus releasing the E2F/DP activity (Harbour and Dean, 2000; Trimarchi and Lees, 2002). Recent evidence demonstrate that E2Fs/DPs are important to regulate gene expression not only in proliferating cells but also during differentiation, development and apoptotic response (Muller et al., 2001; Ren et al., 2002; Wells et al., 2002), although the mechanisms behind are far from being fully understood. Genomic approaches carried out in human cells have uncovered a relatively large number of genes that are regulated by E2F/DP proteins although no information could be extracted as to whether these genes are direct or indirect E2F targets (Ishida et al., 2001; Muller et al., 2001).
Significant progress has been made recently towards understanding cell cycle regulation in plants organisms where, to some extent surprisingly, core cell cycle components are similar to those present in metazoa (de Jager and Murray, 1999; Gutierrez, 1998; Gutierrez et al., 2002; Mironov et al., 1999). However, plants have unique growth and developmental characteristics and, consequently, it is conceivable that in addition to sharing the basic cell cycle regulatory strategies, unique pathways have evolved. For example, plant cells have some properties in common to stem cells as they can dedifferentiate, enter the cell cycle, proliferate actively and, eventually, differentiate into a variety of cell types and regenerate a whole plant body. This reinforces the concept of the plasticity of plant cells in terms of cell cycle regulation.
The publication of the entire Arabidopsis genomic sequence (Arabidopsis Genome Initiative, 2000) has been instrumental in defining the presence of 1 RBR, 6 E2F and 2 DP genes (De Veylder et al., 2002; de Jager et al., 2001; Kosugi and Ohashi, 2002a,b; Magyar et al., 2000; Mariconti et al., 2002; Rossignol et al., 2002; Vandepoele et al., 2002). Interestingly, while AtE2Fa–c (also named AtE2F3, 1 and 2, respectively) require heterodimerization with AtDPs for efficient DNA binding and regulation of reporter gene expression (Kosugi and Ohashi, 2002a; Mariconti et al., 2002), the other three atypical AtE2F family members which have a duplicated DNA-binding domain, bind DNA efficiently in the absence of DP heterodimerization (Kosugi and Ohashi, 2002b). Although one of the primary benefits of genomic information is gene identification, studies on regulation of gene expression could also benefit significantly because potential-binding sites for transcription factors, e.g. E2F/DP, within putative promoter regions can be identified. However, this information is currently lacking as it is not directly available from raw genomic sequence information. Thus, we reasoned that systematic data mining, combined with direct evaluation of mRNA levels of target genes could be a useful approach to complement microarray-derived information.
Here, we have carried out a genome-wide search for direct E2F/DP target genes in Arabidopsis. In this study, we have focused on one typical well-defined E2F-binding site (TTTCCCGCC) and in a variant site (TCTCCCGCC). The three main conclusions are: (1) the approach can be extended to other E2F-binding sites or other transcription factors (2) in addition to cell cycle and DNA replication, genes of other functional categories also contain putative E2F sites, and (3) most of the genes identified, but not all, are up-regulated at the G1/S or mid S-phase and downregulated in Arabidopsis transgenic plants expressing a truncated DP protein with a dominant negative effect because it binds E2F but inhibits DNA binding of the E2F/DP heterodimer. The set of putative E2F targets identified in this study can be used as pathway-specific genes to monitor their expression under a variety of experimental conditions and genetic backgrounds.
Results and discussion
A preliminary search in the Arabidopsis genome using the degenerate E2F-binding sites TTTSSSSSS (S being C or G) yielded several thousands of hits. Individual inspection of each of those was inefficient for practical reasons. To restrict the criteria in a refined search we used the rationale described below. The DNA-binding domain is highly conserved between human and plant E2Fs and DPs, including the Arabidopsis proteins (Figure 1a), in particular the α3-helices, directly involved in DNA contacts (Zheng et al., 1999). In agreement with the structural data, changes within the first three nucleotides of the binding motif (TTT) are better tolerated than in the internal CG dinucleotide at positions 6th and 7th (Ouellette et al., 1992; Tao et al., 1997). These characteristics are conserved in plant E2F/DP proteins, as revealed by EMSA (Figure 1b). Thus, it is conceivable that optimal-binding sites for Arabidopsis E2Fs have similar sequence requirements.
We searched the Arabidopsis MIPS database for the presence of all possible combinations of the sequence TTTSSCGSS (excluding the T stretch; Table 1). The total amount (>9300) was approximately threefold higher than randomly expected, suggesting that these motifs may have been positively selected during evolution. Among all sites, TTTCCCGCC and TTTGGCGGG were the most abundant (25.8 and 24.9%, respectively; Table 1). It is known that a shorter sequence (TTTCCCGC) is able to direct binding of E2F/DP (Castellano et al., 2001; Chaboutéet al., 2000; Mariconti et al., 2002). However, in vitro selection of oligonucleotides that bind to human E2F/DP revealed that the most frequently bound has the sequence TTTCCCGCC (Ouellette et al., 1992). This motif is present in known E2F-regulated genes and binds E2Fs, such as human c-myc (Wang et al., 2000), Drosophila DNA polymerase α (Yamaguchi et al., 1997), tobacco ribonucleotide reductase RNR1b (Chaboutéet al., 2002) and PCNA (Egelkrout et al., 2001) genes. The other abundant site, TTTGGCGGG, has not been found yet to mediate E2F-dependent transcription in any organism. Thus, based on these observations, we focused first on the extended TTTCCCGCC site for a detailed survey of genes containing it in their putative promoters.
Table 1. Number of potential E2F-binding sites (TTTSSCGSS, S being C or G) within the Arabidopsis genome. Hits in both orientations were scored. Since promoters frequently contain more than one site, the total number of genes and sites were scored separately
Identification of putative E2F target genes by a refined database search
The analysis of the genomic context where the TTTCCCGCC site appears produced two main conclusions. One is that it was frequently found within stretches of repetitive elements, which did not appear concentrated in certain chromosomal locations, e.g. centromeric regions. Motifs with permutations of the same nucleotides were not abundantly represented in the genome. This suggests that they may actually bind E2F although their functional relevance remains to be determined. Second, the E2F sites, other than those in repetitive regions, were largely located upstream from putative ATG initiator codons (the A residue being position +1). The detailed survey allowed us to identify a total of 126 genes (Table S1) that contained the TTTCCCGCC motif in their 5′ non-coding regions. It should be kept in mind that genes containing only the core TTTCCCGC sequence were not retrieved in this search (except if they have a C after the core), as a full match was imposed. We also scored their location between positions −800 and +50. Interestingly, the distribution of E2F sites in the promoters was not random because 50 and 75% of the genes contained the E2F site within 200 and 400 bp, respectively, upstream from the putative ATG (Figure 2). This pattern is remarkably similar to that of a set of human promoters containing E2F elements, a characteristic feature of functionality (Kel et al., 2001).
Recently, genes containing atypical variants of the E2F-binding site, e.g. TCTCCCGCC in the thymidine kinase gene (Dou et al., 1994), have been reported to be regulated by E2F, in agreement with structural data (Zheng et al., 1999). Thus, we also carried out a similar analysis with this atypical site that mediated DNA complex formation in vitro (Figure 1b). Such a site has not been identified yet in any plant gene. Our search yielded a set of 57 genes that contained at least one TCTCCCGCC site within their putative promoter regions. Also in this case, the distribution of this binding site along the promoters was not random (Figure 2).
Potential Arabidopsis E2F target genes belong to a variety of functional categories, not only to the cell cycle class of genes
The putative E2F target genes identified here can be classified according to the functional categories defined for the Arabidopsis genome. While 115 of the 183 genes could be ascribed to one of the proposed functional categories for genes (Figure 3), the rest appear as unknown or hypothetical proteins (Table S1).
The largest amount of the genes identified belongs to the cell cycle and DNA replication (31.3%) and transcription (21.7%) categories. These values are significantly higher than the representation of these categories in the entire genome (12% and 17%, respectively; Arabidopsis Genome Initiative, 2000). This finding further supports the idea that the E2F sites found in promoters were not the consequence of a random distribution throughout the genome and that the in silico search is useful to identify potential E2F target genes. One novel observation from our study is that, in addition to cell cycle, DNA replication and DNA repair genes, there is a significant amount of genes that belong to other functional categories or were not previously defined, as it has been reported for human cells (Ishida et al., 2001; Muller et al., 2001). These include genes such as the telomerase reverse transcriptase (TERT) gene, which is up-regulated in S-phase in tobacco cells (Tamura et al., 1999) and Arabidopsis (this work, see below), and the Arabidopsis homologue of the tumor suppressor BRCA2 gene which, as in human cells (Venkitaraman, 2002), may play a role in recombination and cell cycle checkpoint controls. In addition, genes involved in stress and defence-related functions, in signal transduction, in cellular biogenesis and in protein destination are also found (Figure 3). Future studies on specific genes should address the role of E2F in regulating their expression. It is also worth mentioning that genes belonging to some functional categories were not retrieved in this study, although some may contain E2F sites different from that studied here.
A more general outcome of our study is that the information derived from the putative E2F-regulated genes identified in Arabidopsis has allowed us to define genes in other organisms, e.g. humans, as putative primary E2F targets. Genomic approaches to identify E2F targets in the human genome have recently been reported (Ishida et al., 2001; Kel et al., 2001; Muller et al., 2001; Ren et al., 2002; Wells et al., 2002). In these cases, a microarray-derived collection of genes was obtained although they may not necessarily be primary E2F targets. In most cases, sequence analysis of human gene promoters is not possible. However, some of the E2F-containing Arabidopsis genes identified here have an assigned homologue in the human genome. One class (Table 2A) contains genes previously identified as cell cycle-regulated by E2F (Helin, 1998). Except for tobacco RNR (Chaboutéet al., 2000, 2002), PCNA (Egelkrout et al., 2001; Kosugi and Ohashi, 2002c) and AtCDC6 (Castellano et al., 2001; de Jager et al., 2001), there is no information available for plants. A second class (Table 2B) contains genes identified as E2F targets in human cells with a microarray approach (Ishida et al., 2001; Kalma et al., 2001; Kel et al., 2001; Muller et al., 2001). Among these, we found genes encoding the mismatch repair MSH6 protein, subunits of replication protein A (RPA) and replication factor C (RFC), and RAD51. In these cases, direct information about the human promoters is not available. However, the presence of E2F sites in the promoters of the Arabidopsis homologues identified in our study suggest that these genes may be direct E2F target genes also in human cells. A third class contains Arabidopsis genes whose human homologues have not been identified as putative E2F targets such as ribosomal protein L10, the signalosome COP9 component or the RAD3 helicase (Table 2C). Contrary to the previous class of genes, promoter information is available for the human counterparts and, interestingly, E2F sites could be found there. The fourth class contains genes of which no direct information about their cell cycle regulated expression is available, although some of them have a misregulated expression in tumor cells, e.g. HAT1, ISW1-like (Table 2D). In short, the presence of E2F-binding sites in the promoters of the Arabidopsis and human homologues strongly suggests that they can actually be E2F targets in both organisms.
Table 2. Human homologues of Arabidopsis genes identified in this work that contain E2F-binding sites in their promoters
Arabidopsis Protein Code (MIPS)
A. Human E2F-regulated genes (previously identified)
RNR large subunit
B. Human E2F-regulated genes (microarrays)
CCAAT transcription factor
DNA pol delta subunit
Replication factor C-like (RFC-5)
Replication protein A1
C. Unknown human genes with E2F sites in their promoters
Quinone oxidoreductase-like protein
Replication factor C-like (RFC-2)
RPL10 ribosomal protein
D. No direct information about cell cycle regulation
DNA-directed RNA polymerase
Pyridoxal kinase-like protein
Cell cycle regulated expression of putative Arabidopsis E2F target genes
To confirm that the database search served to identify cell cycle regulated genes, the expression of selected genes chosen from different functional categories was analysed by semiquantitative RT–PCR. We compared the level of expression of these genes in three experimental conditions: asynchronously growing Arabidopsis-cultured cells as well as cells treated with aphidicolin and 4 h after releasing from the aphidicolin block, largely accumulated in G1/early S-phase and mid/late S-phase, respectively, based on the mRNA levels of histone H4 (Figure 4), a well-known marker of S-phase (Reichheld et al., 1998). For each gene monitored, conditions were set up to ensure that the amplification reactions were not saturated (data not shown). Quantitation of the relative amounts in each case allowed us to conclude that, except for a few genes whose mRNA levels did not change, most of them were expressed in a cell cycle-dependent manner (Figure 4). We did not find any apparent correlation between their pattern of expression and the functional categories that they belong to. Furthermore, most of the genes analyzed were actually up-regulated with a peak either in early or mid S-phase, although a few of them were downregulated, e.g. a SKP6-related and a DNA helicase. In addition, we included other Arabidopsis genes that are likely E2F targets, although they do not contain the TTTCCCGCC sequence in their promoters (Figure 4, asteriks). In agreement with previous data, AtCDC6 (At2g29680) is up-regulated in aphidicolin-arrested cells and in early S-phase (Figure 4; Castellano et al., 2001). AtRNR1 (At2g21790) was also found to have a peak of expression in early mid S-phase (Figure 4) while PCNA (At2g29570) did not show detectable changes in mRNA levels. This is possibly due to the complex regulation of this gene that has been reported in tobacco and rice (Egelkrout et al., 2001; Kosugi and Ohashi, 2002c).
Expression of E2F target genes in Arabidopsis expressing a truncated DP gene
To evaluate the expression of E2F target genes in planta, we decided to generate transgenic Arabidopsis plants where E2F function is impaired. To achieve this we used a truncated DP gene lacking the α1 helix of the DP DNA-binding domain (Zheng et al., 1999) that is highly conserved (DPΔBD; Figure 5a). Expression of a similar DPΔBD gene in animal cells has a dominant negative effect on E2F proteins that heterodimerize efficiently with DP (Wu et al., 1996). We showed that both full-length DP and the truncated form DPΔBD interact efficiently with E2F of plant and human origins (Figure 5b). This conserved E2F/DP interaction is consistent with the high homology among different DP proteins and suggests that DPΔBD may well interfere with the activity of AtE2Fa/b/c in planta, as these three AtE2Fs share a similar domain organization (de Jager et al., 2001). We also confirmed that DP and DPΔBD bind E2F with comparable efficiency in pull-down assays with purified proteins (Figure 5c) and showed that E2F/DPΔBD heterodimers are impaired in binding to a DNA containing the consensus E2F-binding site (Figure 5d). Therefore, we generated Arabidopsis transgenic plants expressing an HA-tagged version of DPΔBD under the control of the 35S CaMV promoter. Multiple lines with a single insertion were recovered and used to obtain homozygous lines for further analysis. Transgene expression was confirmed by RT–PCR (Figure 5e), because we have, so far, been unable to detect transgene expression at the protein level (neither by western nor immunoprecipitation assays), suggesting that it is present at very low levels. The DPΔBD transgenic plants used here did not show any gross morphological changes compared with either wild type or plants transformed with an empty vector. We chose two lines (L5.1 and L7.1) to evaluate in detail the expression levels of putative E2F target genes by RT–PCR. Contrary to the expression results obtained in aphidicolin-treated cells, where an increased expression level was detected for most of the target genes, their expression level in DPΔBD transgenic plants was significantly decreased (Figure 6). Furthermore, an inverse correlation between the two experimental situations (up-regulated in synchronized cells and downregulated in transgenic plants) was observed in the expression levels of genes belonging to different categories, e.g. ORC1 and RPA1 (cell cycle), HSP17 (stress), CCAAT factor, Myb and BRCA2 (transcription), or MSI3 (signal transduction). This strongly suggests that the DPΔBD construct is actually working as a dominant negative of one (or more) AtE2Fs, most likely of the AtE2Fa, b and c (also named AtE2F3, 1 and 2, respectively) class that interact efficiently with DP and need heterodimerization for DNA binding and regulation of gene expression (Kosugi and Ohashi, 2002a, 2002b; Magyar et al., 2000; Mariconti et al., 2002). Interestingly, both AtCDC6 and AtRNR1 were downregulated in transgenic plants (Figure 6). In this context, it is worth mentioning that Arabidopsis plants overexpressing E2Fa/DPa have elevated levels of AtCDC6 (De Veylder et al., 2002) while overexpression of E2Fc leads to downregulation of this gene (Del Pozo et al., 2002).
An Arabidopsis genome-wide search for E2F/DP target genes in Arabidopsis has been carried out by identifying the chromosomal locations of E2F-binding sites, with a special focus on two very abundant sites (TTTCCCGCC and TCTCCCGCC). The three main conclusions are: (1) the strategy used here can be extended to other E2F-binding sites or, even, to other transcription factors, (2) genes belonging to different functional categories, not only involved in regulation of cell cycle and DNA replication, contain E2F-binding sites in their promoters, and (3) most of the genes identified, although not all, are up-regulated at the G1/S or mid S-phase and downregulated in transgenic plants expressing a DP protein lacking part of the DNA-binding domain which has a dominant negative effect on E2F binding. It is worth mentioning that more than 60% of the genes with a clear misregulation in the two experimental situations tested here have a cluster with two (or more) E2F consensus sites in their promoter (Table S1). In any case, it should be kept in mind that promoters without an E2F consensus site may also be, perhaps indirectly, E2F targets through the interaction with other cellular factors (Weinmann et al., 2001, 2002). Our results strongly support that the RBR/E2F pathway is crucial for the expression of the genes identified in this work, which can be used as pathway-specific genes to monitor their expression under a variety of experimental conditions and genetic backgrounds.
AGTGCAGTCGACAGCCCCCTTCCTTTG and SK primers, and GAGACAGTCGACGAAGCCAAAGGAAGAACAACA and T7 primers were used with plasmid pBS-TmDP (Ramirez-Parra and Gutierrez, 2000) to amplify DNA fragments that were digested with SalI and fused to generate TmDPΔBD mutant lacking aminoacids 48–78 located in the DNA-binding domain (see Figure 5a). Yeast two-hybrid assays were carried out as described (Ramirez-Parra et al., 1999). Plasmid pGBT-TmDPΔBD was generated by cloning TmDPΔBD in-frame into the pGBT8 vector (Clontech). Plasmid pGAD-AtE2F2 was constructed by cloning the full-length AtE2F2 cDNA (MIPS code At1g47870) in-frame into the pACT2 vector. Plasmid pGAD-HuE2F1 was provided by N. LaThangue and S. de la Luna. Pull down experiments were done as previously described using the polyclonal serum against TmE2F for detection (Ramirez-Parra et al., 1999). Plasmid pMBP-TmDPΔBD was constructed by cloning TmDPΔBD in-frame into the pMal-c2 vector (New England Biolabs), transferred to Escherichia coli BL21(DE3) and the recombinant protein purified using amylose beads (New England Biolabs).
Electrophoretic mobility shift assay (EMSA)
EMSA was carried out using purified GST-TmE2F, MBP-TmDP or MBP-TmDPΔBD, as described (Ramirez-Parra and Gutierrez, 2000). Oligonucleotides containing E2F sites (underlined) were: 5′AATCCGCTTTCCCGCCAATTCGACACCATA; 5′GCGTCTCATAATTTCGGCCCAAATCTTTTT; 5′CAATCAAGGAATTTGCGGCGACAATATGAA; 5′TCTCTGTTCCACTTCCGGCGATGTATTATA; 5′AGACCTTTAATCTCCCGCCTCTTTCACACC, and 5′ATTTAAGTTTCGCGCCCTTTCTCAA, plus the reverse oligonucleotides.
Plant cell culture
An Arabidopsis thaliana cell suspension culture was used. For partial synchronization, 8-day-old cultures were diluted 1 : 8 into fresh MSS medium plus aphidicolin (15 µg ml−1). After 24 h, the cells were washed and transferred to a medium without aphidicolin and collected for RNA extraction at the indicated times.
TmDPΔBD was cloned in-frame with the HA epitope using pPily vector (Ferrando et al., 2000) and then into the pROK2 binary vector for transformation of Agrobacterium tumefaciens C58CRifR. To generate transgenic CaMV35S-TmDPΔBD overexpressing lines, A. thaliana (Columbia-0 ecotype) was transformed by the floral dip method (Clough and Bent, 1998). Transformed seedlings (T0 generation) were selected on MS agar plates containing 50 µg ml−1 kanamycin and transferred to soil. T2 homozygous plants were selected for further analysis. Transgenic and control (transformed with the empty pROK2 vector) Arabidopsis seedlings were grown on MS during 10 days and collected for RNA extraction.
RNA extraction and RT–PCR
Total RNA was extracted using Trizol reagent (Invitrogen) and RT–PCRs were carried out with the ThermoScript RT System (Invitrogen), using oligo-dT and gene-specific primers (Table S2). The products were quantified with a GS-710 Calibrated Imaging Densitometer (BIO-RAD). The values were normalized relative to a constitutive loading control (the RNA helicase DHR1 gene). Then, the fold-change was calculated relative to the values of asynchronized cells in cell cycle analysis (Figure 4) or of control plants in the transgenic analysis (Figure 6). We consider a minimum of a 2-fold change in the relative levels of the RT–PCR products as a criteria to consider increased (I) or decreased (D) gene expression. To evaluate the reproducibility, the data were derived from two independent experiments, and in the case of the analysis of transgenic plants, two independent lines were used.
Authors are indebted to J.A.H. Murray for the A. thaliana cell suspension culture, to N. LaThangue and S. de la Luna fo plasmid pGAD-HuE2F1, to M. M. Castellano and S. Diaz-Triviño for advice on PCR amplification of ORC-related genes, to Gema Gomez-Mariano for technical assitance during the initial stages of this work, and to E. Martinez-Salas for comments on the manuscript. This work has been partially supported by grants BMC2000-1004 (MCyT) and 07G/0033/00 (CAM), and by an institutional grant from Fundación Ramón Areces.
Table S1 Arabidopsis genes containing E2F sites in their promoters. The sites searched were TTTCCCGCC and TCTCCCGCC, both in the dirfect and reverse orientations. The position of the E2F-binding site, relative to the putative ATG, and the presence or E2F consensus sites, other than those specifically searched, are also indicated
Table S2 Gene-specific oligonucleotides used for PCR amplification experiemnts described in Figures 4 and 6