By continuing to browse this site you agree to us using cookies as described in About Cookies
Notice: Wiley Online Library will be unavailable on Saturday 7th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 08.00 EDT / 13.00 BST / 17:30 IST / 20.00 SGT and Sunday 8th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 06.00 EDT / 11.00 BST / 15:30 IST / 18.00 SGT for essential maintenance. Apologies for the inconvenience.
Full-length cDNAs are very important for genome annotation and functional analysis of genes. The number of full-length cDNAs from maize (Zea mays L.) remains limited. Here we report the construction of a full-length enriched cDNA library from osmotically stressed maize seedlings by using the modified CAP trapper method. From this library, 2073 full-length cDNAs (accession numbers DQ244142–DQ246214) were collected and further analyzed by sequencing from both the 5′- and 3′-ends. A total of 1728 (83.4%) sequences did not match known maize mRNA and full-length cDNA sequences in the GenBank database and represent new full-length genes. After alignment of the 2073 full-length cDNAs with 448 maize BAC sequences, it was found that 84 full-length cDNAs could be mapped to the BACs. Of these, 43 genes (51.2%) have been correctly annotated from the BAC clones, 37 genes (44.0%) have been annotated with a different exon–intron structure from our cDNA, and four genes (4.76%) had no annotations in the TIGR database. Expression analysis of 2073 full-length maize cDNAs using a cDNA macroarray led to the identification of 79 genes upregulated by stress treatments and 329 downregulated genes. Of the 79 stress-inducible genes, 30 genes contain ABRE, DRE, MYB, MYC core sequences or other abiotic-responsive cis-acting elements in their promoters. These results suggest that these cis-acting elements and the corresponding transcription factors take part in plant responses to osmotic stress either cooperatively or independently. Additionally, the data suggest that an ethylene signaling pathway may be involved in the maize response to drought stress.
Although many attempts have been made to predict transcription units using genomic sequence data, the accuracy of these predictions remains rather limited. Coding regions are often interspersed with non-coding DNA in genome sequences, and an individual gene may encode several peptides due to alternative splicing. Thus, genomic sequences do not always correspond to a certain transcript and the corresponding proteins. A more direct and efficient approach to collect information on the coding sequences that entails the analysis of full-length cDNA sequences has recently been developed (Carninci et al., 2003; Haas et al., 2002). The results from these studies have demonstrated that use of the full-length cDNA sequences could improve the quality of multiple aspects of genome annotation (Castelli et al., 2004; Haas et al., 2002, 2003; Seki et al., 2002a).
Although Lai et al. (2004) published 3160 full-length cDNA sequences from a maize endosperm EST library, discovery of full-length cDNA sequences in maize is still lacking compared with rice and Arabidopsis. In a previous report, polyethylene glycol (PEG) stress was demonstrated to be an effective strategy for simulating drought stress conditions (Zheng et al., 2004). In order to collect more full-length cDNAs from maize and understand the gene expression profiles under drought stress, a full-length enriched cDNA library was constructed from PEG-treated maize seedlings using the CAP trapper method with some modifications (Carninci et al., 1996, 1997, 1998, 2000; Sugahara et al., 2001; Suzuki et al., 1997). From that library, 2073 full-length cDNAs were identified, fully sequenced, and annotated. These results could improve maize genome annotation and supplement current available public databases. In addition, we also assessed the gene expression patterns of these full-length cDNAs under PEG-6000 stress treatment using the macroarray.
Assessment of the cDNA library and collection of full-length cDNA sequences
The full-length cDNA library was constructed from 20% PEG-treated maize seedlings (inbred line Han 21) using the CAP trapper method with some modifications. The library was composed of 1.4 × 106 independent clones, with an average size of approximately 0.75–1.2 kb. A total of 20 000 cDNA clones were randomly selected and sequenced from the 5′-end. After eliminating low-quality sequences and contaminating clones, 13 557 5′-end reads were generated and clustered into 5867 groups. The longest 5′-end sequence in each group was selected as a representative of that group. Ultimately, 2842 candidate full-length clones were selected according to the criterion that the subset of 5′ ESTs should originate at or upstream of the translation start site of the known protein (Strausberg et al., 2002). According to this criterion, 68% of the clones in this library were scored as ‘full-length’ and were submitted for full-length sequencing, and, of these, 2073 cDNAs that represent the complete coding regions and UTRs of the original transcripts were fully sequenced.
Features of 5′ and 3′ untranslated regions
The average length of the 5′ untranslated regions (UTRs) from the 2073 full-length cDNA sequences was 99 bp, with wide variations in size ranging from 1 to 754 bp. The frequencies of the stop codons UAA, UAG and UGA were 26.8%, 41.0% and 32.2% respectively. The overall GC content in the 5′ UTRs was 57.9%, which was higher than that in the coding regions (56.8%) and 3′ UTRs (41.5%). This result is consistent with observations from maize mRNA resources in the database (Figure 1).
The average length of 3′ untranslated regions (UTRs) was 206 bp (not including the poly(A) tail; Figure 2). Almost all clones had an identifiable polyadenylation tail. Of those with a polyadenylation tail, 1378 sequences (66.5%) had polyadenylation signals. Of these, 299 (21.7%) had one of the two canonical polyadenylation signals, AAUAAA and AUUAAA, while 1079 sequences (78.3%) had an alternative polyadenylation signal. Table 1 shows the distribution of polyadenylation signal usage for those sequences with a polyadenylation tail. In some cases, the translation stop codon TAA was part of the polyadenylation signal (e.g. in clone DQ246126 and DQ245350).
Table 1. Distribution of alternative polyadenylation signals
Repeat structures were found in 297 full-length cDNAs. Among them, 266 simple sequence repeats were found, which were the most frequently observed repeat structures in the cDNAs (89.6%), while 14 retrotransposon repeats (4.7%) and 10 transposon repeats (3.3%) were also found, and an additional 3.3% of the cDNA sequences had other types of repeat. Among the repeat structures, 42.4% (126) were found in the coding regions, 35.0% (104) in the 5′ UTR and 22.6% (67) in the 3′ UTR.
Alignment of the 2073 full-length cDNAs to maize BAC sequences
The 2073 full-length cDNAs were aligned to 448 maize BAC sequences from the public databases (http://www.tigr.org). Eighty-four full-length cDNAs could be mapped to the BAC sequences, and the remaining 1989 cDNAs could not. Of the 84 mapped cDNAs, 43 genes (51.2%) have been correctly annotated from the BAC sequences, 37 genes (44.0%) have been annotated with a different structure from our cDNAs, and four genes (4.8%) had no annotations in the tigr database. After comparing the cDNAs with the annotations of BAC sequences in tigr, 65 cDNAs (77.4%) from our library were found to carry longer 5′- and 3′-ends. These results could improve maize genome annotation and supplement current available public databases.
The results also showed that three BAC sequences (AC155364, AC155375 and AC155575) each had two full-length cDNA alignments, which may indicate different gene splicing forms (Figure 3). In Figure 3(a), the cDNA clone DQ244763 has a cryptic exon compared with clone DQ245110. They may produce different protein sequences. In Figure 3(b) and (c), the transcripts from alternative splicing have the same coding region, meaning that they code for the same protein.
Mapping of the 2073 full-length cDNAs onto maize genome survey sequences (GSSs), rice and Arabidopsis genomic sequences
The 2073 maize full-length cDNAs were aligned to the maize GSS assembly, and rice and Arabidopsis genomic sequences as described in Experimental procedures. Of the 2073 full-length cDNA sequences, 1644 (79.3%) of sequences could be aligned to the maize GSS loci at >95% identity over the entire length (Figure 4). The fact that the remaining 429 clones did not match any sequence is probably due to the incompleteness of the maize genomic sequence.
The clones were also mapped to the rice and Arabidopsis chromosomes by a homology search. The criteria for matching clones to the genome sequence were set at >60% coverage of clone lengths and 85% sequence identity over the length of coverage. These criteria meant that 1877 full-length cDNA sequences could be aligned to rice genomic sequences and 1507 sequences to the Arabidopsis genome. Taken together, 1116 sequences could be mapped to all three genomic sequences, and 1567 common full-length cDNAs were identified in maize and rice genomes. However, there were only 1126 cDNA sequences that were common between maize and Arabidopsis genomes. Of the 2073 full-length maize cDNAs, 67 sequences (3.2%) were unique to maize (Figure 4). Furthermore, the homologous genes found in rice were spread over 12 chromosomes. The highest densities of homologous genes were found on chromosomes 1 and 3, which both had a density five times higher than that of chromosome 11. Fewer homologous genes were found on chromosomes 9–12.
Comparative analysis of the maize full-length cDNA clones with known mRNAs or full-length cDNAs in maize
The average length of the 2073 full-length cDNA sequences is 799 bp, which is similar to that reported for maize endosperm cDNAs (Lai et al., 2004), but shorter than the Arabidopsis cDNAs of approximately 1.2 kb (Seki et al., 1998). blastx searches were also performed against the GenBank database for the 2073 representative clones, and 225 clones were deduced to encode small proteins (<100 amino acids).
By homology search against known mRNA sequences (822) and endosperm full-length cDNAs (3160) of maize in the GenBank database, 91 and 285 sequences of the 2073 full-length cDNAs were found to be identical (95% cut-off) to sequences from the two public sources, respectively. A total of 1728 (83.4%) sequences did not match known maize mRNA and full-length cDNA sequences, and thus represented new full-length genes. When homologous full-length cDNAs were further analyzed, 219 transcripts were found in common. Moreover, 149 (68.0%) genes have longer 5′ UTRs than corresponding full-length genes in known mRNA sequences and endosperm full-length cDNAs, whereas 105 genes (47.9%) were found to have longer 3′ UTRs.
The codon usage of the 2073 full-length cDNAs is quite similar to the 330 predicted genes from 100 random-selected maize BACs (Haberer et al., 2005) and the 42 653 rice genes from the International Rice Genome Sequencing Project, but differs greatly from the genes of Arabidopsis (Table S1). It was found from the 2073 full-length cDNAs that codons with high GC content are used more frequently than those with low GC content. For example, the percentage of GCC and GCG is higher than that of GCT and GCA among the codons for alanine, the percentage of GAC is higher than that of GAT for aspartate, and the percentage of CAG is higher than that of CAA for glutamine (Table S1).
To classify and identify the biological roles and molecular functions of the 2073 cDNA clones, we used the blast program to analyze gene ontology assignments against a non-redundant database of proteins from model species (Table 2). The results showed that 1081 cDNA clones (corresponding to 3241 GO terms) were classified into three broad categories, including ‘biological processes’, ‘cellular component’ and ‘molecular function’. These accounted for 79.7%, 55.2% and 88.9% of the 1081 cDNA clones, respectively.
Table 2. Numbers of clones assigned to GO functional categories
Any given sequence may have been assigned to more than one category. We assigned 3241 GO functional terms to 1081 genes as described in Results.
Biological process (total number = 862)
Cell growth and/or maintenance
Cell organization and biogenesis
Response to stress
Cell communication/signal transduction
Response to biotic stimulus
Response to external stimulus
Cellular component (total number = 597)
Molecular function (total number = 961)
Structural molecule activity
Molecular function unknown
Transcription regulator activity
Translation regulator activity
Enzyme regulator activity
Signal transducer activity
The GO assignment yielded 862 clones with GO terms associated with ‘biological processes’; these could be further divided into 12 smaller categories. Within the larger category (‘biological processes’), 83.4% of cDNAs were involved in metabolism and 32.8% of cDNAs were involved in protein biosynthesis. cDNA clones of other functional categories, such as cell growth and/or maintenance, transport, cell organization and biogenesis, transcription and response to stress etc., were also found in ‘biological processes’. The total numbers of GO terms associated with ‘cellular component’ were 597, the total number related to ‘molecular function’ was 961, which could be further divided into 11 smaller categories. The three most dominant categories of cDNAs in ‘molecular function’ were binding, structural molecule activity and catalytic activity, and these accounted for 41.8%, 27.4% and 27.3%, respectively. Twenty-two and 11 cDNAs were associated with transcription regulatory and kinase activity, respectively.
Analysis of transcription factors
Transcription factors are important proteins in higher eukaryotes, especially in regulating plant responses to stress (Chen et al., 2002). Each clone was searched for the presence of known transcription factor domains using the InterPro database (Apweiler et al., 2000). This search led to the identification of 2472 protein domains and 566 transcription factors (Table 3). These transcription factors could be classified into 85 classes. The most highly represented class of transcription factors were the histone-fold type (103), followed by the zinc finger type (56), RNA binding region (23), nucleic acid binding OB (oligosaccharide/oligonucleotide binding) fold (23) and GTP binding protein. Additionally, RNA polymerase (14), KOW (14), HMG1/2 (high mobility group) box (11), KH (8), RINGv (RING variant) (6), bZIP transcription factor (6), WD-40 repeat (5), pathogenesis-related transcriptional factor and ERF (ethylene-responsive element binding factor) (5), Myb and DNA-binding (5) motifs were also found.
Table 3. Transcription factors identified through an Interpro search in the Interpro database
Macroarray analysis of up- and downregulated cDNAs
Macroarray experiments were carried out to evaluate the expression pattern of the 2073 full-length cDNAs under PEG stress. The experiments were repeated twice with independent macroarray membranes from the same biological sample. Clones with more than twofold changes in expression level between treated and untreated samples were identified as differentially expressed. A gene was designated as being upregulated if the signal intensity from duplicate PEG-treated samples was equal to or greater than twice that of untreated control samples. Similarly, a gene was designated as being downregulated if the hybridization signal was less than half that of the untreated control samples.
According to the macroarray and scatter plot results (Figures 5 and 6), 79 cDNAs were upregulated and 329 cDNAs were downregulated. The 79 upregulated genes in maize included functional and regulatory proteins (Table 4). The upregulated genes in the functional category included a late embryogenesis-abundant (LEA) protein, a lipid transfer protein, a proline-rich protein, ferredoxin, a senescence-associated protein, membrane proteins, detoxification enzymes including glutathione S-transferase (GST) and superoxide dismutase, photosynthesis-related proteins, brown plant hopper susceptibility proteins, tubby-like protein, proteases and protease inhibitors. These functional proteins are considered to play important roles in protecting cells from dehydration and active oxygen and in adjusting the osmotic pressure under stress conditions (Cushman and Bohnert, 2000; Hasegawa et al., 2000; Seki et al., 2002a). Proteases, including ubiquitin-conjugating enzyme, are thought to be related to protein turnover and recycling of amino acids (Seki et al., 2002d).
Table 4. PEG stress upregulated clones involved in various functions
Various transcription factors, enzymes involved in metabolism and other proteins were found among the regulatory proteins of the upregulated genes (Table 4), including ERF domain proteins, zinc finger proteins, RNA polymerases, general regulatory factors (14-3-3-like protein), etc. These transcription factors and regulatory factors might regulate various stress-inducible genes. In addition, alcohol dehydrogenase, adenine phosphoribosyl transferase, cytochrome, ribosomal proteins and other genes of unknown function were also identified. These regulatory proteins are thought to be important in regulating various functional genes under stress conditions.
Highly inducible genes with fivefold-induced expression included glutathione S-transferase, rhodanese-like family protein, zinc methallothionein class protein and two cDNAs for ethylene-responsive elements. Genes with fourfold-induced expression were ferredoxin and MPI substilin/chymotrypsin-like inhibitor. There were ten PEG-inducible genes with threefold-induced expression, which included genes encoding thioredoxin, lactoylglutathione lyase, blue copper protein, ribosomal protein L32, RING box protein, hypoxia-responsive family protein, oxygen-evolving enhancer protein and three hypothetical proteins (Table 4).
Of the downregulated cDNA clones, 177 genes (corresponding to 559 GO terms) could be classified into three categories, including ‘biological processes’, ‘cellular component’ and ‘molecular function’. Protein biosynthesis (52), intracellular components (53) and DNA binding (12) were the most abundant groups among the three categories, respectively (Table 5). The downregulated genes included detoxification enzymes, transcription factors, protein kinases, chaperones, phosphatases, enzymes involved in metabolism, signaling molecules including calcium-binding protein, calmodulin and calcium sensing receptor, ATPases, GTPases, proteases and proteinase inhibitors (Table S2). The transcription factor DnaJ protein was also downregulated; this had been reported as a heat shock-inducible protein and played an important role in regulating protein renaturation after stress (de Crouy-Chanel et al., 1995; Pellecchia et al., 1996). In addition, many photosynthesis-related genes, such as chlorophyll a/b-binding protein and the components of photosystems I and II, were also found to be downregulated under PEG stress. There were also 97 genes encoding hypothetical proteins (Table S3), 52 genes encoding unclassified proteins (Table S4) and 55 genes encoding protein synthesis-related proteins (Table S5) among the 329 downregulated genes.
Table 5. Number of clones in different functional groups downregulated by PEG stress
Biological process (total number = 84)
Response to stress
Cellular component (total number = 138)
Molecular function (total number = 46)
Molecular function unknown
Nucleic acid binding
Structural molecule activity
Northern blot analysis
Northern blot analysis was carried out to confirm the reliability of the macroarray. Two clones were chosen as hybridization probes from the functional group and regulatory group of upregulated genes, respectively. The functional protein genes are DQ245455 (encoding a Bowman–Birk-type trypsin inhibitor) and DQ244467 (encoding a GST7 protein); the regulatory protein genes are DQ244224 (encoding a zinc finger protein family-like protein) and DQ245906 (encoding a zinc finger protein). In general, the results of Northern blot analysis were consistent with the expression data obtained by array analysis (Figure 7).
Promoter analysis of stress response genes
Macroarray or microarray analysis could serve as one of the strategies for identifying novel cis-acting elements that regulate the expression of genes in response to various stresses (Seki et al., 2001). In this study, we combined sequence analysis with macroarray data to identify interesting cis-elements in the upstream regions of the 2073 full-length maize cDNAs. Based on the comparison of full-length cDNAs with maize GSS sequences, we obtained 1121 putative promoter sequences with at least 1000 base pairs upstream (i.e. at least 1000 bp long) of the 5′ terminus of each mapped cDNA clone. Of these, 356 were cis-acting elements. Of particular interest was the identification of ABRE, DRE-core, MYB and MYC core sequences in the putative promoters of 30 genes (Table 6). Of these 30 genes, nine contained the DRE-core sequence (CCGAC), 22 contained ABRE (ACGTG(T/G)), 18 contained MYB ((C/T)AAC(T/G)G) and 29 contained MYC (CANNTG) in their putative promoter regions. In addition, other cis-acting elements responsive to abiotic factors, including WRKY (ACGT), GRAZMRAB17 (CACTGGCCGCCC) and GCC box (GCCGCC), could also be found in the promoters of some genes.
Table 6. ABRE, DRE, MYB and MYC core sequences observed in the putative promoter regions of the PEG-stress inducible genes
Full-length cDNAs are very useful for analyzing gene structure and function. Seki et al. (1998) constructed Arabidopsis full-length cDNA libraries using the biotinylated CAP trapper method from Arabidopsis plants grown under various conditions. They found that this method was effective in obtaining full-length cDNAs from Arabidopsis on a large-scale. In this study, we constructed a full-length cDNA library from maize seedlings using the same method with some modifications. The library quality was analyzed by determining the titre of the library (1.4 × 109 CFU ml−1) and sequencing the 5′-end tags of the clones. Collectively, these data showed that this full-length cDNA library was of high quality and suitable for further analysis, and that the CAP trapper method was effective in collecting full-length cDNAs of maize.
Assessment of alternative polyadenylation
The 3′-end processing machinery of plant mRNA can recognize various AAUAAA-like sequences (Li and Hunt, 1997; Rothnie, 1996; Wu et al., 1993). Variations of AAUAAA-like sequences, including AAUAAA (Rothnie et al., 1994, 2001), AAUGAA (Wu et al., 1993) and those with a single pyrimidine substitution (AUUAAA, CAUAAA, UAUAAA, AAUACA and AAUAUA; Graber et al., 1999), have been reported. In addition, because maize polyadenylation signals have not been widely investigated, we adopted the alternative polyadenylation signals defined for the human and rat (Beaudoing et al., 2000; Gautheret et al., 1998; Scheetz et al., 2004). The most frequently observed alternative polyadenylation signals were AAUGAA, UUUAAA and AAAACA, whereas GGGGCU was the least frequently observed. The major difference from previous results was the relatively low number of sequences observed with a single pyrimidine substitution in maize polyadenylation signals. mRNAs with multiple poly(A) sites tend to use non-canonical polyadenylation signals (including the common AUUAAA), whereas mRNAs with a single poly(A) site do not. However, in the 2073 mRNAs collected in this experiment, the canonical polyadenylation signals were the most frequently observed for both multiple poly(A) sites and a single poly(A) site. This may suggest that variant signals are not processed as efficiently as the AAUAAA signal.
Comparison of the 2073 full-length cDNAs to annotations of maize BAC sequences
After aligning the 2073 full-length cDNAs to the maize BAC sequences, we found that 37 of 84 genes (44.0%) mapped in the BACs had a different structure from annotations with BAC sequences, and four genes (4.8%) had no annotations. Therefore, the annotations for maize BAC sequences should be further modified. Haas et al. (2002) also reported that about 35% of previously annotated Arabidopsis genes required modification according to the full-length cDNA sequences. These results could improve maize genome annotation and supplement currently available public databases. It should be pointed out that the 2073 full-length cDNA collection is from a different inbred line than B73. Therefore, the comparisons of the 2073 full-length cDNA to the BAC clones and the GSS sequences of B73 may be subject to haplotype variation (Brunner et al., 2005; Fu and Dooner, 2002; Song and Messing, 2003).
According to Haas et al. (2002), if two cDNAs can be mapped to the same locus in genomic sequences, but show distinct exon–intron structures, they are designated as alternative splicing, or another type of splicing abnormality. Alternative pre-mRNA splicing plays a major role in expanding protein diversity and regulating gene expression in higher eukaryotes (Black, 2000, 2003). In this study, we found that three genes from different BAC sequences had alternative splicing, producing two transcripts for each gene (Figure 3). The alternative splicing in Figure 3(a) would produce transcripts coding for different proteins, whereas the two transcripts in Figure 3(b) and (c) encode the same protein.
Comparative analysis of the maize full-length cDNA to the maize GSS, rice and Arabidopsis genomes
Maize full-length cDNA clones are useful for determining gene structure in maize and other Poaceae species by integrative analyses with the genomic sequences of these species (Bennetzen and Ma, 2003). Gene structures cannot be correctly predicted based on genome sequences only; however, mapping of full-length cDNA clones to the genome and further comparative analysis of genomic sequences could lead to a better elucidation of gene structures. The availability of rice and Arabidopsis genome sequences offers a unique opportunity for us to compare maize cDNA sequences with Arabidopsis and rice sequences and to search for regions conserved during evolution.
Comparative mapping on the rice and Arabidopsis genomes of the 2073 maize full-length cDNAs showed that 1877 and 1507 cDNA sequences could be aligned to the rice and Arabidopsis genomes, respectively, i.e. there are more conserved sequences between the maize and rice genomes than between the maize and Arabidopsis genomes. This was consistent with earlier findings that the individual chromosomes in maize are highly collinear with those of rice, wheat, sorghum and other grass species (Ahn et al., 1993; Bennetzen and Ramakrishna, 2002; Devos and Gale, 2000).
GC content and codon usage
The overall GC content of the 2073 full-length cDNA sequences is consistent with the maize mRNA available in public databases (Figure 1). The distribution of GC content is not uniform within the full-length cDNA, and an obvious polarity could be observed with the highest GC content (57.9%) in the 5′ UTR, 56.8% in the coding regions and the lowest (41.5%) in the 3′ UTR. This polar distribution of GC content is consistent with findings in rice (Wong et al., 2002) and 172 full-length genes from 100 random selected maize BACs (Haberer et al., 2005). The significant difference in GC content between monocot and dicot plants also leads to an obvious difference in codon usage between them (Table S1).
cDNA microarrays and macroarrays are valuable tools in analyzing gene expression on a large scale. Several studies have reported gene expression changes under abiotic stresses using cDNA microarrays in plant species such as rice (Bohnert et al., 2001) and Arabidopsis (Seki et al., 2001, 2002a,b, 2002d). Zheng et al. (2004) reported that PEG and drought stress have similar effects on maize seedlings. Based on the collected full-length cDNA clones, we prepared a maize full-length cDNA macroarray using 2073 full-length cDNAs to identify the candidate genes responsive to water deficit. The results showed that 79 genes were upregulated and 329 genes downregulated by PEG treatment, which accounts for 3.8% and 15.9% of the 2073 full-length cDNAs, respectively. Although the full-length cDNA library was constructed from PEG-stressed maize seedlings, the cDNAs were not screened, which may be one of the reasons for the low percentage of upregulated genes and higher percentage of downregulated genes. Another possible reason is that the cDNA clones were randomly chosen for sequencing without normalization, and these 79 upregulated genes are independent cDNAs, any one of which may represent a number of abundant transcripts.
These upregulated genes encode proteins that fall into two main categories: functional proteins and regulatory proteins (Table 4). The results of Northern blot using two representative probes from each of the two main categories were consistent with the macroarray analysis. It has been reported that functional proteins, such as LEA proteins, GST and proline-rich proteins, play important roles in avoiding or reducing stress injury in plants. The upregulated genes from the maize full-length cDNA macroarray analysis were similar to those induced by drought, cold and high-salinity stresses in Arabidopsis (Seki et al., 2002a). Transcription factors are important in regulating plant responses to environmental stresses. In our experiment, zinc finger proteins, a family of transcription factors that have previously been shown to be stress-inducible (Brinker et al., 2004), were the main transcription factors identified following treatment with PEG-6000. Two ethylene-responsive genes were also induced by PEG-6000, suggesting that the ethylene signaling pathway may be involved in the drought stress response in maize. In addition, a 14-3-3 protein was also found to be inducible in our experiment, which has previously been reported to be involved in signal transduction and response to abiotic stresses in animal cells and higher plants (Koskinen et al., 2004; Roberts et al., 2002).
In our experiment, photosynthesis-related genes such as chlorophyll a/b-binding protein and the components of photosystems I and II (Table 5) were downregulated. Several similar studies have reported that drought, cold, high salinity and ABA stress inhibit photosynthesis (Seki et al., 2002a; Tezara et al., 1999; Weatherwax et al., 1996). Additionally, our data showed that several genes involved in signal transduction, adenylate kinase B, ABA-responsive protein and cold-responsive proteins were also downregulated by PEG-6000 treatment. Of the transcription factors, it has been reported that DnaJ is a heat shock-induced protein, and that it had a negative auto-regulation effect on heat shock responses (de Crouy-Chanel et al., 1995; Pellecchia et al., 1996). In our experiment, DnaJ might also play the same role in regulating protein renaturation after PEG stress.
Analysis of promoters and transcription factors
In Arabidopsis, Oono et al. (2003) demonstrated that full-length cDNA microarray was useful not only in analyzing patterns of gene expression, but also in identifying the target genes of stress-related transcription factors and potential cis-acting DNA elements by combining the expression data with the identification of cis-acting sequences in the corresponding genomic sequence data. Cis-acting elements are special DNA sequences that mediate the regulation of gene transcription. Several cis-acting elements, such as the G-box-containing ABREs and the recognition sequences for the MYB and MYC class of transcription factors, have been shown to contribute to ABA responses of individual genes (Busk and Pages, 1998; Leonhardt et al., 2004). In our study, 79 genes were identified as PEG stress-inducible genes. Of these, nine contained the DRE-core sequence (CCGAC) in their promoters, suggesting that they were regulated by DREB transcription factors. Twenty genes contained ABRE (ACGTG(T/G) in their promoter. These results suggest that these cis-acting elements, combined with the corresponding transcription factors, are involved in the response to various environmental stresses cooperatively or separately.
Plant materials, stress treatments, and RNA preparation
Maize (Zea mays L. Han 21) was used in this study. Maize seeds were surface-sterilized in 5% sodium hypochloride for 5 min and washed with distilled water. Then seeds were sown in blended soil (soil:sand 2:1) and incubated at 25°C. For PEG treatment, seedlings at the three-leaf stage were removed from the soil and subjected to treatment with 20% PEG-6000 in a sealed container through which air was continuously bubbled for the duration of treatment. Control and PEG-stressed leaves and roots were harvested 1 and 6 h after the start of the experimental treatment, and frozen immediately in liquid nitrogen and stored at −80°C for further analysis. Total RNA was prepared as described previously (Kay et al., 1987) with some modifications. mRNA was isolated using an OligetexTM mRNA isolation kit (Qiagen, Valencia, CA, USA).
The reverse transcription reaction was carried out in a 100 μl volume using RNaseH-free reverse transcriptase (SuperscriptTM II reverse transcriptase, Invitrogen, Carlsbad, CA, USA). First, 12.5 μg pooled mRNA from maize seedlings treated with 20% of PEG-6000 for 1 and 6 h was denatured at 65°C for 10 min, and 200 pmol oligo(dT)18 (5′-AGATTGGTCTCCTCGAGT(18)VN-3′, where N is G, A, T or C and V is A, C or G) were added. Then, 20 μl of first-strand buffer, 5 μl of 10 mm DTT, 10 μl of 10 mm 5-methyl-dNTP and 3 μl RNase inhibitor were added and pre-incubated at 42°C for 5 min, before incubation with 10 μl Superscript II reverse transcriptase at 45°C for 30 min, 50°C for 30 min and 60°C for 30 min. To stop the reaction, 2 μl of 0.5 m EDTA, 2 μl of 10% SDS and 5 μl of 10 mg ml−1 proteinase K were added, and the reaction mixture was further incubated at 45°C for 15 min. Subsequently, the cDNA/RNA was extracted once with phenol–chloroform, precipitated with ethanol, washed with 70% ethanol and resuspended in RNase-free water.
RNA oxidation and RNA biotinylation
The resuspended cDNA/RNA was oxidized in 66 mm sodium acetate (pH 4.5) and 5 mm NaIO4. The oxidation was carried out on ice in the dark for 45 min. To precipitate the oxidized RNA, 10% SDS, 5 m NaCl and isopropanol were added, and the mixture was centrifuged at 12 000 g for 30 min at 4°C. The pellet was washed with 70% ethanol and resuspended in RNase-free water. Then 1 m sodium acetate (pH 6.1), 10% SDS and 10 mm biotin hydrazide long-arm were added to the oxidized cDNA/RNA for biotinylation overnight at room temperature. Subsequently, the cDNA/RNA were precipitated at −80°C for 1 h by adding 1 m sodium acetate (pH 6.1), 5 m NaCl and 2.5 volumes of ethanol. Finally, the pellet was washed twice with 80% ethanol and resuspended in RNase-free water. RNase digestion of the first-strand cDNA reaction was performed using RNase I at 37°C for 1 h.
Full-length cDNA capture
MPG–streptavidin beads (500 μg; CPG, Lincoln Park, NJ, USA) and DNase-free tRNA (100 μg) were mixed and incubated on ice for 30 min. The beads were separated using a magnetic stand and washed three times with 2 m NaCl and 50 mm EDTA (pH 8.0). Finally, the beads were resuspended in 2 m NaCl and 50 mm EDTA (pH 8.0) and mixed with the cDNA/RNA sample at room temperature for 30 min with gentle mixing. After removal of unbound cDNA/RNA, the beads were washed three times with 2 m NaCl and 50 mm EDTA (pH 8.0). To release the cDNA from the beads, 100 μl of 50 mm NaOH/1 mm EDTA, pH 8.0, were added to the cDNA/RNA mixture and incubated at 65°C for 10 min. Eluted cDNA was added to a tube containing 100 μl of 1 m Tris-HCl (pH 7.5). Subsequently, the cDNA was extracted once with phenol–chloroform, precipitated with ethanol and resuspended in RNase-free water.
Ligation of cDNA with adapter
The adapter was designed and prepared by annealing with the following two primers: forward primer 5′-pCCTGACTGATCGACT-3′ (where p = phosphate) and reverse primer 5′-AGTCAGGNNN (where N = G, A, T or C). A 50 μl mixture of forward and reverse primers was added to 1x PCR buffer (50 mm KCl, 10 mm Tris-HCl (pH 8.3), 1.5 mm MgCl2), denatured at 95°C for 5 min in a water bath, and then cooled to room temperature. The adapter was diluted to 10 μm before use. cDNA was ligated with 5 pmol of the adapter in 10 μl volume at 16°C overnight. After ligation, the samples were heated at 65°C for 5 min to inactivate ligase, and purified with a Qiagen PCR purification column. cDNA was eluted in 80 μl water.
Construction of full-length cDNA library
The primer 5′-GTACGTAGGTCTCGAATTCAGTCGATCAGTCAGG-3′ was used for the dsDNA synthesis. To 80 μl of cDNA, 3 μl of 10 μm primer, 5 μl of 2.5 mm dNTP, 10 μl PCR buffer and 1 μl (5 U) of Taq polymerase were added. The reaction was performed on a GeneAmp PCR System 9700 (ABI, Foster City, CA, USA) by initially denaturing for 5 min at 94°C, and followed by five cycles of 50°C for 30 sec, 60°C for 30 sec and 72°C for 6 min. dsDNA was then purified by a Qiagen PCR spin column and digested with 1 U of the restriction enzymes EcoRI and XhoI at 37°C for 1 h. The digested dscDNA was size-fractionated on a 1% agarose gel. Fragments ranging from 0.5 to 8 kb were recovered from the agarose gel and cloned into a pre-digested pBluescript SK+ vector (Stratagene, La Jolla, CA, USA). The ligation mix was extracted once with phenol–chloroform, precipitated with ethanol, and resuspended in 10 μl water. A 1 μl aliquot of ligation mix was transformed into Escherichia coli strain DH10B by electroporation using a Gene-Pulser II according to the manufacturer's instructions (Bio-Rad, Richmond, CA, USA). Twenty thousand colonies were selected from LB agar plates containing ampicillin and transferred to 384-well plates containing LB medium with ampicillin. After overnight growth at 37°C, the plates were stored at −80°C.
Twenty thousand clones were inoculated at 37°C overnight with shaking at 200 rpm. Using the method of alkaline lysis, plasmids were extracted and purified with Multiscreen filter plates for high-throughput separations (Millipore, Bedford, MA, USA). Sequencing reactions were performed with 200 ng of plasmid as template and T3 and M13F as primers using an ABI PRISM Big Dye Terminator version 3.1 cycle sequencing kit (Applied BioSystems, Foster City, CA, USA) on a GeneAmp PCR System 9700 (ABI). Sequencing reactions were further cleaned by ethanol/EDTA/sodium acetate precipitation prior to capillary electrophoretic separation and detection by an ABI 3730 DNA Analyser.
Data processing and assembly
Each cDNA sequence was processed and assembled using PHRED (Ewing and Green, 1998; Ewing et al., 1998) and PHRAP (Green, 1999; http://www.phrap.org). Lucy was used for vector trimming (Chou and Holmes, 2001); trimmed data were stored in FASTA format using a Perl script. Clones containing cDNAs that were >90% similar over 80 bases or more were classed into the same cluster using the tgicl program (Osato et al., 2002; Pertea et al., 2003). The end sequences in each cluster were aligned using the FASTA homology search software to Uniprot (Apweiler et al., 2004). Based on the alignment of 5′ and 3′-end sequences, the clone carrying the longest cDNA insert in each cluster was selected as representative of the cluster.
All the sequences were compared with the public sequence database as both nucleotides and amino acids based on blast (E-value of <10−5). The features of 5′ and 3′ untranslated regions of full-length cDNAs, function ontology, transcription factors and promoters associated with PEG stress were then analyzed by macroarray. We aligned the full-length sequences with the maize GSSs, and the rice and Arabidopsis genomes using paracelblast, with an E-value of 10−5. We extracted the maize genomic region of the best locus assigned to each cDNA (>95% identity over >100 bases).
cDNA macroarray preparation
The full-length cDNA macroarray was prepared following the method described by Ji et al. (2003), with slight modifications. The cDNA macroarray for each treatment, comprising two nylon membranes containing 2073 genes (6219 spots) and three controls (114 spots), was used for analyzing changes in the transcript level of 2073 maize full-length cDNAs in response to stress treatment with PEG-6000. Maize α-tubulin cDNAs were also printed on each membrane as an internal control. Pig β-TGF was used as a negative control. RAB17 was used as a positive control. PCR products from each unique full-length cDNA were arrayed from six 384-well plates onto nylon membranes (Amersham, Arlington Heights, IL, USA) using a Biomek 2000 Laboratory Automation Workstation (Beckman Coulter, Fullerton, CA, USA). Each clone was printed in triplicate with an average spot diameter of 1.125 mm and spot-to-spot distance of 1.25 mm. After air-drying, the membranes were denatured in 0.6 m NaOH for 5 min, neutralized in 0.5 m Tris-HCl (pH 7.5) for 5 min, and rinsed in distilled water for 3 min. The spotted samples were cross-linked to membranes using a low-energy UV source and baked for 2 h at 80°C.
Macroarray hybridization and scanning
Total RNA prepared from untreated and PEG-treated maize roots and leaves for 1 and 6 h was reverse-transcribed and used as a probe for expression profile analysis. The reverse transcription reaction was performed in a 20 μl solution and set up as follows: 1 μl oligo(dT)18, 10 μg total RNA and distilled water (up to 8 μl), heating at 65°C for 5 min, quick chilling on ice, and collection of the contents by 20 sec of centrifugation at 10 000 g. Then, 4 μl of 5x first-strand buffer, 2 μl of 0.1 m DTT, 1 μl of 10 mm dNTP mix (10 mm each dATP, dTTP and dGTP), 1 μl RNasin (40 U μl−1), 3 μl [32P]-dCTP (10 μCi μl−1) and 1 μl (200 U) of SuperScriptTM II (Invitrogen, Carlsbad, CA, USA) were added, and the tubes were mixed by gentle vortex and incubated at 42°C for 1 h. The probes were denatured in a heat block at 100°C for 5 min, and then chilled for 5 min on ice before hybridization. Membranes were pre-hybridized in 20 ml Church solution (1% BSA, 1 mm EDTA, 0.25 m Na2HPO4-NaH2PO4, 7% SDS) at 65°C for 12 h. The denatured probes were then added to the solution, and hybridization was carried out overnight at 65°C. After hybridization, the membranes were washed at 65°C in 2x SSC, 0.5% SDS for 15 min, in 1x SSC, 0.5% SDS for 15 min, in 0.5x SSC, 0.5% SDS for 15 min, and finally in 0.1x SSC, 0.1% SDS for 15 min. The membranes were then exposed to storage phosphor screens (Amersham Biosciences, Piscataway, NJ, USA) for 3 days. Images were acquired by scanning the membranes with a Typhoon 9210 scanner (Amersham Biosciences).
Northern blot analysis
Total RNA from PEG-stressed and unstressed maize seedlings was also used for RNA gel blot analysis. Total RNA (25 μg) was separated by electrophoresis in denaturing formaldehyde 1.2% w/v agarose gels and then transferred to Hybond N+ nylon membrane (Amersham Biosciences). DNA probes were purified from PCR-amplified fragments of selected genes and labeled with α-[32P]-dCTP using a Prime-a-Gene® Labeling System Kit (Promega, Madison, WI, USA) according to the manufacturer's protocol. 28S rRNA was also hybridized as a loading control. Hybridization, washing and scanning were performed as described previously (Zheng et al., 2004).
Data normalization and statistical analysis
Raw intensity measurements and data analysis were performed using GPC Visualgrid software (http://www.gpc-biotech.com). Median signal intensity measurements were obtained for each spot on the hybridized array. The signal value of the negative control (pig β-TGF) on the array was used as background and consequently subtracted from the raw intensity values for each gene on the array. Maize α-tubulin cDNAs were used as an internal control gene to equalize hybridization signals generated from different samples. The signal value after background subtraction and signal equalization was used for further analysis.
The authors would like to thank Dr Ruiguang Zhen and Dr John Klejnot for their critical reading and comments on the manuscript. This work was supported by the China High-Tech Program (863) and the Cultivation Fund of the Key Scientific and Technical Innovation Project, China Ministry of Education (number 705009).