Annotation and expression profile analysis of 2073 full-length cDNAs from stress-induced maize (Zea mays L.) seedlings

Authors


*(fax +86 10 62732012; e-mail gywang@cau.edu.cn).

Summary

Full-length cDNAs are very important for genome annotation and functional analysis of genes. The number of full-length cDNAs from maize (Zea mays L.) remains limited. Here we report the construction of a full-length enriched cDNA library from osmotically stressed maize seedlings by using the modified CAP trapper method. From this library, 2073 full-length cDNAs (accession numbers DQ244142DQ246214) were collected and further analyzed by sequencing from both the 5′- and 3′-ends. A total of 1728 (83.4%) sequences did not match known maize mRNA and full-length cDNA sequences in the GenBank database and represent new full-length genes. After alignment of the 2073 full-length cDNAs with 448 maize BAC sequences, it was found that 84 full-length cDNAs could be mapped to the BACs. Of these, 43 genes (51.2%) have been correctly annotated from the BAC clones, 37 genes (44.0%) have been annotated with a different exon–intron structure from our cDNA, and four genes (4.76%) had no annotations in the TIGR database. Expression analysis of 2073 full-length maize cDNAs using a cDNA macroarray led to the identification of 79 genes upregulated by stress treatments and 329 downregulated genes. Of the 79 stress-inducible genes, 30 genes contain ABRE, DRE, MYB, MYC core sequences or other abiotic-responsive cis-acting elements in their promoters. These results suggest that these cis-acting elements and the corresponding transcription factors take part in plant responses to osmotic stress either cooperatively or independently. Additionally, the data suggest that an ethylene signaling pathway may be involved in the maize response to drought stress.

Introduction

Over the last few years, as a result of various large-scale genome sequencing projects, a wealth of genome sequences have become available for higher plants as well as for mammals, including the sequences of the Arabidopsis thaliana genome (Arabidopsis Genome Initiative, 2000; Lin et al., 1999; Mayer et al., 1999; Salanoubat et al., 2000; Tabata et al., 2000; Theologis et al., 2000), the rice genome (Dickson and Cyranoski, 2001; Feng et al., 2002; Goff et al., 2002; International Rice Genome Sequencing Project, 2005; Yu et al., 2002), the human genome (Lander et al., 2001; Venter et al., 2001), and the mouse genome (Waterston et al., 2002). Maize (Zea mays L.) is an important agronomic crop and a traditional genetic model plant (Moore et al., 1995; Swigonova et al., 2004). The maize genome is most likely the next plant genome that will be sequenced after Arabidopsis and rice (Jorgensen, 2004; Messing et al., 2004). It has been reported that the maize genome contains large duplications (Ahn and Tanksley, 1993; Goodman et al., 1980; Helentjaris et al., 1988; McMillin and Scandalios, 1980; Rhoades, 1951), so maize is also thought to be the next technical challenge in genome sequencing (Messing et al., 2004). To date, there are 2 681 812 maize genome survey sequences (GSSs) and 395 968 maize ESTs in the GenBank database.

Although many attempts have been made to predict transcription units using genomic sequence data, the accuracy of these predictions remains rather limited. Coding regions are often interspersed with non-coding DNA in genome sequences, and an individual gene may encode several peptides due to alternative splicing. Thus, genomic sequences do not always correspond to a certain transcript and the corresponding proteins. A more direct and efficient approach to collect information on the coding sequences that entails the analysis of full-length cDNA sequences has recently been developed (Carninci et al., 2003; Haas et al., 2002). The results from these studies have demonstrated that use of the full-length cDNA sequences could improve the quality of multiple aspects of genome annotation (Castelli et al., 2004; Haas et al., 2002, 2003; Seki et al., 2002a).

Full-length cDNAs are useful not only for accurately determining the genomic structure of genes (Mammalian Gene Collection Program Team, 2002), but also for functional analysis of genes (Hirochika et al., 2004; Seki et al., 2004). Construction of a cDNA library containing full-length cDNAs is an important method and the first step for rapid discovery of full-length cDNAs. Full-length cDNA libraries from rice (Kikuchi et al., 2003), Arabidopsis (Seki et al., 2002b,c), mouse (Carninci et al., 2003; RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium, 2001), and Drosophila (Rubin et al., 2000; Stapleton et al., 2002a,b) are well documented. However, there is little information on the construction of a full-length cDNA library and subsequent large-scale sequence analysis for maize.

Although Lai et al. (2004) published 3160 full-length cDNA sequences from a maize endosperm EST library, discovery of full-length cDNA sequences in maize is still lacking compared with rice and Arabidopsis. In a previous report, polyethylene glycol (PEG) stress was demonstrated to be an effective strategy for simulating drought stress conditions (Zheng et al., 2004). In order to collect more full-length cDNAs from maize and understand the gene expression profiles under drought stress, a full-length enriched cDNA library was constructed from PEG-treated maize seedlings using the CAP trapper method with some modifications (Carninci et al., 1996, 1997, 1998, 2000; Sugahara et al., 2001; Suzuki et al., 1997). From that library, 2073 full-length cDNAs were identified, fully sequenced, and annotated. These results could improve maize genome annotation and supplement current available public databases. In addition, we also assessed the gene expression patterns of these full-length cDNAs under PEG-6000 stress treatment using the macroarray.

Results

Assessment of the cDNA library and collection of full-length cDNA sequences

The full-length cDNA library was constructed from 20% PEG-treated maize seedlings (inbred line Han  21) using the CAP trapper method with some modifications. The library was composed of 1.4 × 106 independent clones, with an average size of approximately 0.75–1.2 kb. A total of 20 000 cDNA clones were randomly selected and sequenced from the 5′-end. After eliminating low-quality sequences and contaminating clones, 13 557 5′-end reads were generated and clustered into 5867 groups. The longest 5′-end sequence in each group was selected as a representative of that group. Ultimately, 2842 candidate full-length clones were selected according to the criterion that the subset of 5′ ESTs should originate at or upstream of the translation start site of the known protein (Strausberg et al., 2002). According to this criterion, 68% of the clones in this library were scored as ‘full-length’ and were submitted for full-length sequencing, and, of these, 2073 cDNAs that represent the complete coding regions and UTRs of the original transcripts were fully sequenced.

Features of 5′ and 3′ untranslated regions

The average length of the 5′ untranslated regions (UTRs) from the 2073 full-length cDNA sequences was 99 bp, with wide variations in size ranging from 1 to 754 bp. The frequencies of the stop codons UAA, UAG and UGA were 26.8%, 41.0% and 32.2% respectively. The overall GC content in the 5′ UTRs was 57.9%, which was higher than that in the coding regions (56.8%) and 3′ UTRs (41.5%). This result is consistent with observations from maize mRNA resources in the database (Figure 1).

Figure 1.

 Comparison of GC content in the 5′ UTR, 3′ UTR, coding regions (CDS) and total sequences between the collected full-length cDNAs and maize mRNAs in GenBank.

The average length of 3′ untranslated regions (UTRs) was 206 bp (not including the poly(A) tail; Figure 2). Almost all clones had an identifiable polyadenylation tail. Of those with a polyadenylation tail, 1378 sequences (66.5%) had polyadenylation signals. Of these, 299 (21.7%) had one of the two canonical polyadenylation signals, AAUAAA and AUUAAA, while 1079 sequences (78.3%) had an alternative polyadenylation signal. Table 1 shows the distribution of polyadenylation signal usage for those sequences with a polyadenylation tail. In some cases, the translation stop codon TAA was part of the polyadenylation signal (e.g. in clone DQ246126 and DQ245350).

Figure 2.

 Comparison of size distribution of the 5′ UTR, 3′ UTR, coding regions (CDS) and total sequences between the collected full-length cDNAs and maize mRNAs in GenBank.

Table 1.   Distribution of alternative polyadenylation signals
Polyadenylation signalsN
GGGGCT29
ACTAAA46
AATAGA50
AGTAAA50
GATAAA55
AATACA57
CATAAA58
ATTAAA64
AAGAAA93
AATATA93
AAAAAG98
TATAAA105
AAAACA109
TTTAAA117
AATGAA119
AATAAA235

Repeat structures were found in 297 full-length cDNAs. Among them, 266 simple sequence repeats were found, which were the most frequently observed repeat structures in the cDNAs (89.6%), while 14 retrotransposon repeats (4.7%) and 10 transposon repeats (3.3%) were also found, and an additional 3.3% of the cDNA sequences had other types of repeat. Among the repeat structures, 42.4% (126) were found in the coding regions, 35.0% (104) in the 5′ UTR and 22.6% (67) in the 3′ UTR.

Alignment of the 2073 full-length cDNAs to maize BAC sequences

The 2073 full-length cDNAs were aligned to 448 maize BAC sequences from the public databases (http://www.tigr.org). Eighty-four full-length cDNAs could be mapped to the BAC sequences, and the remaining 1989 cDNAs could not. Of the 84 mapped cDNAs, 43 genes (51.2%) have been correctly annotated from the BAC sequences, 37 genes (44.0%) have been annotated with a different structure from our cDNAs, and four genes (4.8%) had no annotations in the tigr database. After comparing the cDNAs with the annotations of BAC sequences in tigr, 65 cDNAs (77.4%) from our library were found to carry longer 5′- and 3′-ends. These results could improve maize genome annotation and supplement current available public databases.

The results also showed that three BAC sequences (AC155364, AC155375 and AC155575) each had two full-length cDNA alignments, which may indicate different gene splicing forms (Figure 3). In Figure 3(a), the cDNA clone DQ244763 has a cryptic exon compared with clone DQ245110. They may produce different protein sequences. In Figure 3(b) and (c), the transcripts from alternative splicing have the same coding region, meaning that they code for the same protein.

Figure 3.

 Alternative splice models discovered by cDNA alignments.
Red bars represent the coding regions. Black bars represent the 5′ and 3′ UTR of the full-length cDNA. Thin lines connecting the exons represent introns. The genes involved are: (a) hypothetical protein gene (DQ245110 and DQ244763) and BAC clone AC155364; (b) putative RNA polymerase gene (DQ244414 and DQ244706) and BAC clone AC155375; (c) hypothetical UPF0222 protein gene (DQ245949 and DQ245006) and BAC clone AC155575.

Mapping of the 2073 full-length cDNAs onto maize genome survey sequences (GSSs), rice and Arabidopsis genomic sequences

The 2073 maize full-length cDNAs were aligned to the maize GSS assembly, and rice and Arabidopsis genomic sequences as described in Experimental procedures. Of the 2073 full-length cDNA sequences, 1644 (79.3%) of sequences could be aligned to the maize GSS loci at >95% identity over the entire length (Figure 4). The fact that the remaining 429 clones did not match any sequence is probably due to the incompleteness of the maize genomic sequence.

Figure 4.

 Alignment of the maize full-length cDNAs with the maize GSSs, and rice and Arabidopsis genomic sequences.

The clones were also mapped to the rice and Arabidopsis chromosomes by a homology search. The criteria for matching clones to the genome sequence were set at >60% coverage of clone lengths and 85% sequence identity over the length of coverage. These criteria meant that 1877 full-length cDNA sequences could be aligned to rice genomic sequences and 1507 sequences to the Arabidopsis genome. Taken together, 1116 sequences could be mapped to all three genomic sequences, and 1567 common full-length cDNAs were identified in maize and rice genomes. However, there were only 1126 cDNA sequences that were common between maize and Arabidopsis genomes. Of the 2073 full-length maize cDNAs, 67 sequences (3.2%) were unique to maize (Figure 4). Furthermore, the homologous genes found in rice were spread over 12 chromosomes. The highest densities of homologous genes were found on chromosomes 1 and 3, which both had a density five times higher than that of chromosome 11. Fewer homologous genes were found on chromosomes 9–12.

Comparative analysis of the maize full-length cDNA clones with known mRNAs or full-length cDNAs in maize

The average length of the 2073 full-length cDNA sequences is 799 bp, which is similar to that reported for maize endosperm cDNAs (Lai et al., 2004), but shorter than the Arabidopsis cDNAs of approximately 1.2 kb (Seki et al., 1998). blastx searches were also performed against the GenBank database for the 2073 representative clones, and 225 clones were deduced to encode small proteins (<100 amino acids).

By homology search against known mRNA sequences (822) and endosperm full-length cDNAs (3160) of maize in the GenBank database, 91 and 285 sequences of the 2073 full-length cDNAs were found to be identical (95% cut-off) to sequences from the two public sources, respectively. A total of 1728 (83.4%) sequences did not match known maize mRNA and full-length cDNA sequences, and thus represented new full-length genes. When homologous full-length cDNAs were further analyzed, 219 transcripts were found in common. Moreover, 149 (68.0%) genes have longer 5′ UTRs than corresponding full-length genes in known mRNA sequences and endosperm full-length cDNAs, whereas 105 genes (47.9%) were found to have longer 3′ UTRs.

The codon usage of the 2073 full-length cDNAs is quite similar to the 330 predicted genes from 100 random-selected maize BACs (Haberer et al., 2005) and the 42 653 rice genes from the International Rice Genome Sequencing Project, but differs greatly from the genes of Arabidopsis (Table S1). It was found from the 2073 full-length cDNAs that codons with high GC content are used more frequently than those with low GC content. For example, the percentage of GCC and GCG is higher than that of GCT and GCA among the codons for alanine, the percentage of GAC is higher than that of GAT for aspartate, and the percentage of CAG is higher than that of CAA for glutamine (Table S1).

Gene ontology

To classify and identify the biological roles and molecular functions of the 2073 cDNA clones, we used the blast program to analyze gene ontology assignments against a non-redundant database of proteins from model species (Table 2). The results showed that 1081 cDNA clones (corresponding to 3241 GO terms) were classified into three broad categories, including ‘biological processes’, ‘cellular component’ and ‘molecular function’. These accounted for 79.7%, 55.2% and 88.9% of the 1081 cDNA clones, respectively.

Table 2.   Numbers of clones assigned to GO functional categories
 GO termGO IDN
  1. Any given sequence may have been assigned to more than one category. We assigned 3241 GO functional terms to 1081 genes as described in Results.

Biological process (total number = 862)MetabolismGO:0008152719
Protein biosynthesisGO:0006412283
Cell growth and/or maintenanceGO:0008151171
TransportGO:0006810103
Cell organization and biogenesisGO:001604370
TranscriptionGO:000635046
Response to stressGO:000695032
Cell communication/signal transductionGO:000715429
Energy pathwaysGO:000609126
Response to biotic stimulusGO:000960718
Response to external stimulusGO:00096059
DevelopmentGO:00072753
Cellular component (total number = 597)CellGO:0005623592
Extracellular regionGO:00055765
Molecular function (total number = 961)BindingGO:0005488402
Structural molecule activityGO:0005198264
Catalytic activityGO:0003824263
Transporter activityGO:0005215130
Molecular function unknownGO:000555439
Transcription regulator activityGO:003052822
Translation regulator activityGO:004518218
Enzyme regulator activityGO:003023417
Kinase activityGO:001630111
Signal transducer activityGO:00048713
Motor activityGO:00037741

The GO assignment yielded 862 clones with GO terms associated with ‘biological processes’; these could be further divided into 12 smaller categories. Within the larger category (‘biological processes’), 83.4% of cDNAs were involved in metabolism and 32.8% of cDNAs were involved in protein biosynthesis. cDNA clones of other functional categories, such as cell growth and/or maintenance, transport, cell organization and biogenesis, transcription and response to stress etc., were also found in ‘biological processes’. The total numbers of GO terms associated with ‘cellular component’ were 597, the total number related to ‘molecular function’ was 961, which could be further divided into 11 smaller categories. The three most dominant categories of cDNAs in ‘molecular function’ were binding, structural molecule activity and catalytic activity, and these accounted for 41.8%, 27.4% and 27.3%, respectively. Twenty-two and 11 cDNAs were associated with transcription regulatory and kinase activity, respectively.

Analysis of transcription factors

Transcription factors are important proteins in higher eukaryotes, especially in regulating plant responses to stress (Chen et al., 2002). Each clone was searched for the presence of known transcription factor domains using the InterPro database (Apweiler et al., 2000). This search led to the identification of 2472 protein domains and 566 transcription factors (Table 3). These transcription factors could be classified into 85 classes. The most highly represented class of transcription factors were the histone-fold type (103), followed by the zinc finger type (56), RNA binding region (23), nucleic acid binding OB (oligosaccharide/oligonucleotide binding) fold (23) and GTP binding protein. Additionally, RNA polymerase (14), KOW (14), HMG1/2 (high mobility group) box (11), KH (8), RINGv (RING variant) (6), bZIP transcription factor (6), WD-40 repeat (5), pathogenesis-related transcriptional factor and ERF (ethylene-responsive element binding factor) (5), Myb and DNA-binding (5) motifs were also found.

Table 3.    Transcription factors identified through an Interpro search in the Interpro database
DomainCommentsNo. genes
Histone-foldTFIID,TAF,NF-Y103
Zinc fingerA20-like, AN1-like, B-box, C2H2 type, CCHC type, Dof, Ran-binding RING, CSLCONSTANS, Tim10/DDP type, TRAF type, PHD finger, FYVE/PHD.HIT 56
RNA-binding regionRNP-123
Nucleic acid-binding regionOB-fold23
Small GTP-binding protein domain 22
GTP-binding proteinRan, 6 SAR1, HSR1-related22
snRNP 18
Eukaryotic translation initiation factorTIF  3, IF5, IF6, 5 SUI1, eIF-4E, IF1 type17
Transcription factorCBF/NF-Y/archaeal histone, TFIIA, TFIIF, TAFII-31, TFIIS, K-box, MADS-box, TAFII2815
RNA polymeraseH/23 kDa subunit  II, 4 RPB4, Rpb5, 2 Rpb7, 3 Rpb8, RPB5-like,N/8 kDa subunits14
KOW 14
Heat shock proteinDnaJ, Hsp2014
Translation proteinSH3-like11
High mobility group boxHMG1/2, HMG-I, HMG-Y11
Ubiquitin 9
Cupin region 9
KHProkaryotic-type, type 1, type 28
Homeodomain-like 7
ZIM 6
Sterile alpha motif homologySAM6
RINGv 6
Basic leucine zipper 6
ARF/SAR superfamily 6
Actin-bindingCofilin/tropomyosin type6
WD-40 repeat 5
DNA-bindingSAP, TFAR19-related, HORMA5
Pathogenesis-related and ERF 5
NADH ubiquinone oxidoreductase20 kDa subunit, 17.2 kDa subunit, B18 subunit5
Myb, DNA-binding 5
Mitochondrial ribosomal protein L5 5
Light chain 3 (LC3) 5
Transcription initiation factorIIA, IIF, Spt44
SWIB/MDM2 4
Rapid alkalinization factor 4
Nucleoside diphosphate kinase 4
Mss4-like 4
Mov34/MPN/PAD-1 4
Helix-turn-helix motif 4
GCN5-related N-acetyltransferase 4
Eukaryotic initiation factor3 eIF-5A, eIF-1A,4
ssDNA-binding transcriptional regulator 3
Ribonuclease CAF1 3
No apical meristem (NAM) protein 3
Zinc-containing alcohol dehydrogenase 2
Translationally controlled tumor protein 2
Transcriptional co-activator p15 2
RNA-binding S4 2
Putative zinc-binding region DUF701 2
Peroxisomal biogenesis factor 11 2
Peptidase S24 and S26, C-terminal region 2
Nuclear transport factor 2 2
Glyceraldehyde 3-phosphate dehydrogenase 2
Eukaryotic transcription factorDNA-binding2
Concanavalin  A-like lectin/glucanase 2
AUX/IAA protein 2
Zn-binding proteinLIM1
UBX 1
Tyrosine protein kinase 1
Translation factor 1
Transferase 1
Transcriptional co-activator/pterin dehydratase 1
Rubber elongation factor 1
RNA recognition, region 1 1
PUA 1
Pleckstrin -like 1
Phosphatidylinositol transfer protein 1
Paired amphipathic helix 1
Ovarian tumor, otubain 1
Macrophage migration inhibitory factor 1
Lissencephaly type-1-like homology motif 1
Initiation factor 2B 1
Glycerophosphoryl diester phosphodiesterase 1
Galactose-binding like 1
Elongation factor 1Beta/beta’/delta chain1
EGF-likeSubtype  21
Early nodulin 93 (ENOD93) protein 1
DNA-directed RNA polymeraseM/15 kDa subunit1
Cytidine/deoxycytidylate deaminaseZinc-binding region1
CheY-like 1
Cellular retinaldehyde-binding/triple functionN-terminal, C-terminal2
CDK-activating kinase assembly factorMAT11
CD9/CD37/CD63 antigen 1
Auxin-responsive SAUR protein 1
Anti-hemostatic protein 1

Macroarray analysis of up- and downregulated cDNAs

Macroarray experiments were carried out to evaluate the expression pattern of the 2073 full-length cDNAs under PEG stress. The experiments were repeated twice with independent macroarray membranes from the same biological sample. Clones with more than twofold changes in expression level between treated and untreated samples were identified as differentially expressed. A gene was designated as being upregulated if the signal intensity from duplicate PEG-treated samples was equal to or greater than twice that of untreated control samples. Similarly, a gene was designated as being downregulated if the hybridization signal was less than half that of the untreated control samples.

According to the macroarray and scatter plot results (Figures 5 and 6), 79 cDNAs were upregulated and 329 cDNAs were downregulated. The 79 upregulated genes in maize included functional and regulatory proteins (Table 4). The upregulated genes in the functional category included a late embryogenesis-abundant (LEA) protein, a lipid transfer protein, a proline-rich protein, ferredoxin, a senescence-associated protein, membrane proteins, detoxification enzymes including glutathione S-transferase (GST) and superoxide dismutase, photosynthesis-related proteins, brown plant hopper susceptibility proteins, tubby-like protein, proteases and protease inhibitors. These functional proteins are considered to play important roles in protecting cells from dehydration and active oxygen and in adjusting the osmotic pressure under stress conditions (Cushman and Bohnert, 2000; Hasegawa et al., 2000; Seki et al., 2002a). Proteases, including ubiquitin-conjugating enzyme, are thought to be related to protein turnover and recycling of amino acids (Seki et al., 2002d).

Figure 5.

 Results of cDNA macroarray hybridization.
α-[32P]-labeled probe was prepared from the total RNA isolated from untreated and PEG-stressed maize seedlings.
(b, d) PEG-stressed macroarray; (a, c) controls corresponding to (b) and (d), respectively. A maize α-tubulin gene (spots B2, P2, B24, P24, E7, M7, E19, M19, I13) was used as the internal control. A pig β-TGF gene (spots A2, A24, O2, O24, H13) was used as the negative control. A maize RAB17 gene (spots D6, L6, H12, D18, L18) was used as the positive control.

Figure 6.

 Statistical analysis of cDNA hybridization results.
The scatter plot displays the normalized median signal intensities of all genes or spot locations and the changes in expression levels of genes under PEG stress compared with the unstressed controls. Each gene is represented by one dot.

Table 4.   PEG stress upregulated clones involved in various functions
Protein categoryGene numberClone accession numberRatioGenBank accession numberAnnotationE-value
  1. –, no hit in the GenBank database.

Photosynthesis-related protein1DQ2445492.536312849Q8W159Chlorophyll a/b binding protein1e-151
LEA protein1DQ2456982.58677686Q93XL5LEA III protein isoform  17e-44
Low-temperature regulated protein1DQ2455872.529616725Q42386Low -temperature regulated protein BN1152e-65
Membrane protein3DQ2445372.8359375Q9SL69Membrane protein COV1e-131
DQ2457492.054945055Q94CS9Tonoplast membrane integral protein1e-118
DQ2448092.55799373Q7XE16Endoplasmic reticulum membrane fusion protein (putative)4e-28
Lipid transfer protein1DQ2457692.941176471Q42642Non-specific lipid transfer protein B precursor (LTP B)5e-63
Proline-rich protein1DQ2448952.439461883Q9LU14Proline-rich protein APG-like1e-173
Alcohol dehydrogenase1DQ2454122.027944112Q8S411Cinnamyl alcohol dehydrogenase1e-137
Cytochrome2DQ2447543.089005236P40934Cytochrome b55e-65
DQ2455932.612565445P48502Ubiquinol-cytochrome C reductase complex, 14 kDa protein1e-58
Ferredoxin1DQ2448864.25O23344Ferredoxin4e-63
Thioredoxin1DQ2456793.546875Q9SEU6Thioredoxin  M type 4, chloroplast precursor (TRX-M4)4e-88
Tubby-like protein1DQ2442852.018867925Q68Y48Tubby-like protein (putative)5e-34
Immunophilin1DQ2444152.185116279Q5XLE1Immunophilin8e-51
Enzyme involved in metabolism6DQ2450222.068627451Q9SE42d-ribulose-5-phosphate 3-epimerase1e-111
DQ2457262.439461883Q9FPK7Inositol-3-phosphate synthase7e-70
DQ2460413.222826087Q39366Lactoylglutathione lyase (putative)1e-152
DQ2441462.295936396Q9LW89Adenine phosphoribosyl transferase4e-85
DQ2447315.671875O48529Rhodanese-like family protein1e-109
DQ2442972.064935065Q94HY4Aldo/keto reductase (putative)1e-175
Detoxification enzyme2DQ2444675.928104575Q9ZP60GST7 protein1e-130
DQ2446452.283018868P23345Superoxide dismutase [Cu-Zn] 4A2e-72
Protease inhibitor3DQ2441672.631184408Q852G9Putative protease inhibitor (root-specific protein/seed storage/LTP family protein)4e-30
DQ2448354.197761194Q6E676Proteinase inhibitor (MPI substilin/chymotrypsin-like inhibitor)9e-31
DQ2454552.246543779P81713Bowman–Birk-type trypsin inhibitor4e-16
Senescence-associated protein1DQ2444462.065088757Q6ET53Senescence-associated protein-like3e-24
Protease2DQ2453412.14Q8L458Ubiquitin-conjugating enzyme5e-84
DQ2454272.050955414Q9SJ44Ubiquitin-conjugating enzyme, E21e-66
Blue copper protein1DQ2444043.184931507Q851Z0Basic blue copper protein (putative)2e-39
Brown plant hopper protein2DQ2460272.946502058Q6K9S2Brown plant hopper susceptibility protein Hd002A (putative)7e-67
DQ2460284.325102881Q6K9S2Brown plant hopper susceptibility protein Hd002A (putative)7e-67
Histone H2A1DQ2448982.29972752Q42680Histone H2A1e-40
Ribosomal protein10DQ2452572.055749129Q6Z8E0Ribosomal protein L12 60S4e-82
DQ2447332.702194357Q9FKC0Ribosomal protein L13a-4 60S1e-110
DQ2453872.045751634P49637Ribosomal protein L27a-3 60S7e-82
DQ2449003.788617886Q8EUK2Ribosomal protein L32 50S1e-47
DQ2448822.75Q6L510Ribosomal protein L36 60S5e-45
DQ2457942.238372093Q8LBE4Ribosomal protein L7Ae-like1e-50
DQ2443582.032692308Q873A6Ribosomal protein s10-b 40S9e-40
DQ2451032.756097561Q00332Ribosomal protein S15a 40S1e-68
DQ2457022.376963351Q9FNP8Ribosomal protein S19-3 40S8e-74
DQ2456283.033472803P51430Ribosomal protein S6 40S1e-101
Transcription factor8DQ2459062.588571429O82113Zinc finger protein7e-77
DQ2442242.848837209Q6Z8T9Zinc finger protein family-like2e-31
DQ2451282.113333333Q6YWF1C2 domain-containing protein-like2e-45
DQ2454512.458174905Q75KE6C2H2-type zinc finger protein3e-37
DQ2448493.592857143Q8S3S0RING box protein 1 (putative)2e-53
DQ2458692.233128834Q8L7N4RING zinc finger protein (putative)6e-86
DQ2441572.627306273Q8L6T7Ring-H2 zinc finger protein9e-19
DQ2442892.076388889Q8GSD5RNA polymerase  II subunit 14.5 kDa (putative)8e-58
14-3-3-like protein1DQ2447382.220708447Q0152514-3-3-like protein GF14 omega1e-139
Ethylene-responsive element2DQ2442735.132075472Q9C5I3Ethylene-responsive element binding factor (putative) (ERF domain protein 11)3e-26
DQ2448325.349775785Q69XD8Ethylene-responsive transcriptional co-activator2e-36
Zinc metallothionein class1DQ2446915.683760684P43401Zinc metallothionein class II)4e-24
Hypothetical protein16DQ2444442.022058824Q9SKX9Hypothetical protein
DQ2445392.035874439Q6I627Hypothetical protein1e-60
DQ2455862.185542169Q8LC64Hypothetical protein3e-80
DQ2441632.382653061Q8LDY8Hypothetical protein4e-16
DQ2446422.566037736Q8LDU7Hypothetical protein1e-82
DQ2451723.865470852Q8LEK1Hypothetical protein3e-62
DQ2449572.065527066Q9FE18Hypothetical protein6e-59
DQ2446523.128526646O80895Hypothetical protein4e-65
DQ2447002.506024096Q9SJI8Hypothetical protein8e-69
DQ2446142.071661238Q949R9Hypothetical protein1e-56
DQ2444833.340206186Q6EPY3Hypothetical protein-
DQ2451902.793478261Q8S5V4Hypothetical protein5e-12
DQ2454992.741641337Q6KA54Hypothetical protein8e-55
DQ2442822.43495935Q688G0Hypothetical protein-
DQ2454292.706552707Q8LHY4Hypothetical protein4e-66
DQ2445092.304347826Q69KK0Hypothetical protein.2e-62
Other proteins8DQ2446332.084870849Q75LJ2Flavoprotein alpha-subunit, having alternative splicing (putative)4e-50
DQ2452922.237668161Q9SY11Catalyzing the hydroxylation of phenazine-1-carboxylic acid to 2-hydroxy-phenazine-1-carboxylic acid1e-128
DQ2445862.136125654Q69WS1Synaptobrevin-like protein1e-107
DQ2459202.321937322Q9MA44T20M3.9 protein2e-39
DQ2457812.025157233Q9ZSK3Actin-depolymerizing factor 43e-73
DQ2448223.62369338Q69L71Hypoxia-responsive family protein-like1e-37
DQ2445423.546875Q9S841Oxygen-evolving enhancer protein 1-2, chloroplast precursor (OEE1)1e-178
DQ2457202.4765625Q9XFM6Progesterone-binding protein-like6e-97

Various transcription factors, enzymes involved in metabolism and other proteins were found among the regulatory proteins of the upregulated genes (Table 4), including ERF domain proteins, zinc finger proteins, RNA polymerases, general regulatory factors (14-3-3-like protein), etc. These transcription factors and regulatory factors might regulate various stress-inducible genes. In addition, alcohol dehydrogenase, adenine phosphoribosyl transferase, cytochrome, ribosomal proteins and other genes of unknown function were also identified. These regulatory proteins are thought to be important in regulating various functional genes under stress conditions.

Highly inducible genes with fivefold-induced expression included glutathione S-transferase, rhodanese-like family protein, zinc methallothionein class protein and two cDNAs for ethylene-responsive elements. Genes with fourfold-induced expression were ferredoxin and MPI substilin/chymotrypsin-like inhibitor. There were ten PEG-inducible genes with threefold-induced expression, which included genes encoding thioredoxin, lactoylglutathione lyase, blue copper protein, ribosomal protein L32, RING box protein, hypoxia-responsive family protein, oxygen-evolving enhancer protein and three hypothetical proteins (Table 4).

Of the downregulated cDNA clones, 177 genes (corresponding to 559 GO terms) could be classified into three categories, including ‘biological processes’, ‘cellular component’ and ‘molecular function’. Protein biosynthesis (52), intracellular components (53) and DNA binding (12) were the most abundant groups among the three categories, respectively (Table 5). The downregulated genes included detoxification enzymes, transcription factors, protein kinases, chaperones, phosphatases, enzymes involved in metabolism, signaling molecules including calcium-binding protein, calmodulin and calcium sensing receptor, ATPases, GTPases, proteases and proteinase inhibitors (Table S2). The transcription factor DnaJ protein was also downregulated; this had been reported as a heat shock-inducible protein and played an important role in regulating protein renaturation after stress (de Crouy-Chanel et al., 1995; Pellecchia et al., 1996). In addition, many photosynthesis-related genes, such as chlorophyll a/b-binding protein and the components of photosystems I and II, were also found to be downregulated under PEG stress. There were also 97 genes encoding hypothetical proteins (Table S3), 52 genes encoding unclassified proteins (Table S4) and 55 genes encoding protein synthesis-related proteins (Table S5) among the 329 downregulated genes.

Table 5.   Number of clones in different functional groups downregulated by PEG stress
 GO termGO IDNo. genes
Biological process (total number = 84)Protein biosynthesisGO:000641252
Electron transportGO:000611815
Protein modificationGO:00064645
TransportGO:00068103
TranscriptionGO:00063502
Response to stressGO:00069502
PhotosynthesisGO:00159792
MetabolismGO:00081522
Cell cycleGO:00070491
Cellular component (total number = 138)IntracellularGO:000562253
RibosomeGO:000584052
NucleusGO:000563418
MembraneGO:001602010
MitochondrionGO:00057394
Extracellular regionGO:00055761
Molecular function (total number = 46)DNA bindingGO:000367712
RNA bindingGO:00037239
Molecular function unknownGO:00055547
Catalytic activityGO:00038244
Nucleic acid bindingGO:00036767
Protein bindingGO:00055152
Hydrolase activityGO:00167872
Nucleotide bindingGO:00001661
BindingGO:00054881
Structural molecule activityGO:00051981

Northern blot analysis

Northern blot analysis was carried out to confirm the reliability of the macroarray. Two clones were chosen as hybridization probes from the functional group and regulatory group of upregulated genes, respectively. The functional protein genes are DQ245455 (encoding a Bowman–Birk-type trypsin inhibitor) and DQ244467 (encoding a GST7 protein); the regulatory protein genes are DQ244224 (encoding a zinc finger protein family-like protein) and DQ245906 (encoding a zinc finger protein). In general, the results of Northern blot analysis were consistent with the expression data obtained by array analysis (Figure 7).

Figure 7.

 RNA gel blot analysis of four genes and the corresponding cDNA macroarray data.
Total RNAs from unstressed (C) and PEG-stressed maize seedlings (S) were used for RNA gel blot analysis (25 μg per lane). RNA blots were hybridized with α-[32P]-labeled DNA probes of selected upregulated genes. The corresponding gene expression ratios from the macroarray are shown under the RNA blots. 28S rRNA was hybridized and used as the loading control.

Promoter analysis of stress response genes

Macroarray or microarray analysis could serve as one of the strategies for identifying novel cis-acting elements that regulate the expression of genes in response to various stresses (Seki et al., 2001). In this study, we combined sequence analysis with macroarray data to identify interesting cis-elements in the upstream regions of the 2073 full-length maize cDNAs. Based on the comparison of full-length cDNAs with maize GSS sequences, we obtained 1121 putative promoter sequences with at least 1000 base pairs upstream (i.e. at least 1000 bp long) of the 5′ terminus of each mapped cDNA clone. Of these, 356 were cis-acting elements. Of particular interest was the identification of ABRE, DRE-core, MYB and MYC core sequences in the putative promoters of 30 genes (Table 6). Of these 30 genes, nine contained the DRE-core sequence (CCGAC), 22 contained ABRE (ACGTG(T/G)), 18 contained MYB ((C/T)AAC(T/G)G) and 29 contained MYC (CANNTG) in their putative promoter regions. In addition, other cis-acting elements responsive to abiotic factors, including WRKY (ACGT), GRAZMRAB17 (CACTGGCCGCCC) and GCC box (GCCGCC), could also be found in the promoters of some genes.

Table 6.   ABRE, DRE, MYB and MYC core sequences observed in the putative promoter regions of the PEG-stress inducible genes
Accession numberABRE-like sequenceCCGA(C/G) core motifMYBMYC
Consensus sequenceACGTG(T/G)CCCGA(C/G)C(A/G/C/T)GTT(A/G)CANNTG
DQ244163ACGTG (−569 to −565)CCGAG (−1541 to −1535)CAGTTG (−866 to −861)CAGTTG (−2092 to −2087)
DQ244167ACGTG (−23 to −18) CGGTTA (−164 to −159)CAGGTG (−371 to −366)
DQ244224ACGTG (−951 to −947)GCCGAC (−1341 to −1336)CTGTTA (−92 to −87)CATATG (−424 to −419)
DQ244273ACGTG (−386 to −382)  CACCTG (−73 to −68)
DQ244282 GCCGAC (−185 to −180) CACCTG (−131 to −126)
DQ244285 GCCGAC (−114 to −109) CATTTG (−299 to −294)
DQ244289CACGTGGC (−547 to −540)  CAGCTG (−109 to −104)
DQ244297  CTGTTA (−251 to −246)CATCTG (−917 to −912)
DQ246027ACGTG (−439 to −435)  CAGATG (−208 to −203)
DQ246028CACGTGGC (−1987 to −1983)  CATTTG (−2221 to −2216)
DQ244404ACGTG (−334 to −330) CGGTTG (−692 to −687)CAAATG (−449 to −444)
DQ244415ACGTG (−187 to −183)  CAAATG (−62 to −57)
DQ244444ACGTG (−267 to −263)ACCGAC (−389 to −384)CAGTTG (−399 to −394)CACATG (−310 to −305)
DQ244446ACGTG (−338 to −334) CCGTTG (−189−184)CATTTG (−268 to −263)
DQ244467   CACATG (−104 to −99)
DQ244483ACGTG (−129 to −125)GCCGAC (−1528 to −1523)CCGTTG (−345 to −330)CAGGTG (−130 to −125)
DQ244539   CAGATG (−56 to −51)
DQ244633ACGTG (−91 to −87)ACCGAC (−66 to −61)CTGTTG (−884 to −879)CAGCTG (−86 to −81)
DQ244691ACGTG (−106 to 101)  CAAGTG (−21 to −16)
DQ244809ACGTG (−85 to 81)  CAGGTG (−91 to −86)
DQ245128ACGTG (−79 to −75) CTGTTG (−253 to −248)CAACTG (−38 to −33)
DQ245292ACGTG (−7 to −3) CTGTTA (−155−150) 
DQ245412  CAGTTA (−698 to −693)CAGCTG (−366 to −361)
DQ245427   CAAATG (−147 to −142)
DQ245429ACGTG (−1008 to −1004) CCGTTG (−349 to −344)CAGGTG (−332 to −327)
DQ245451ACGTG (−1678 to −1674) CTGTTA (−1505 to −1500)CAATTG (−1699−1694)
DQ245455ACGTG (−85 to −81) CGGTTG (−230 to −225)CAAGTG (−355 to −350)
DQ245726ACGTG (−399 to −395)GCCGAC(−223 to −218)CCGTTG (−1003 to −998)CATGTG (−1814 to −1809)
DQ245869ACGTG (−2748 to −2744) CTGTTG(−2980 to −-$2975)CATTTG (−2909 to −2904)
DQ245906 ACCGAC (−362 to −357)CAGTTG (−12 to −7)CACATG (−423 to −418)

Discussion

Quality of the full-length cDNA library

Full-length cDNAs are very useful for analyzing gene structure and function. Seki et al. (1998) constructed Arabidopsis full-length cDNA libraries using the biotinylated CAP trapper method from Arabidopsis plants grown under various conditions. They found that this method was effective in obtaining full-length cDNAs from Arabidopsis on a large-scale. In this study, we constructed a full-length cDNA library from maize seedlings using the same method with some modifications. The library quality was analyzed by determining the titre of the library (1.4 × 109 CFU ml−1) and sequencing the 5′-end tags of the clones. Collectively, these data showed that this full-length cDNA library was of high quality and suitable for further analysis, and that the CAP trapper method was effective in collecting full-length cDNAs of maize.

Assessment of alternative polyadenylation

The 3′-end processing machinery of plant mRNA can recognize various AAUAAA-like sequences (Li and Hunt, 1997; Rothnie, 1996; Wu et al., 1993). Variations of AAUAAA-like sequences, including AAUAAA (Rothnie et al., 1994, 2001), AAUGAA (Wu et al., 1993) and those with a single pyrimidine substitution (AUUAAA, CAUAAA, UAUAAA, AAUACA and AAUAUA; Graber et al., 1999), have been reported. In addition, because maize polyadenylation signals have not been widely investigated, we adopted the alternative polyadenylation signals defined for the human and rat (Beaudoing et al., 2000; Gautheret et al., 1998; Scheetz et al., 2004). The most frequently observed alternative polyadenylation signals were AAUGAA, UUUAAA and AAAACA, whereas GGGGCU was the least frequently observed. The major difference from previous results was the relatively low number of sequences observed with a single pyrimidine substitution in maize polyadenylation signals. mRNAs with multiple poly(A) sites tend to use non-canonical polyadenylation signals (including the common AUUAAA), whereas mRNAs with a single poly(A) site do not. However, in the 2073 mRNAs collected in this experiment, the canonical polyadenylation signals were the most frequently observed for both multiple poly(A) sites and a single poly(A) site. This may suggest that variant signals are not processed as efficiently as the AAUAAA signal.

Comparison of the 2073 full-length cDNAs to annotations of maize BAC sequences

After aligning the 2073 full-length cDNAs to the maize BAC sequences, we found that 37 of 84 genes (44.0%) mapped in the BACs had a different structure from annotations with BAC sequences, and four genes (4.8%) had no annotations. Therefore, the annotations for maize BAC sequences should be further modified. Haas et al. (2002) also reported that about 35% of previously annotated Arabidopsis genes required modification according to the full-length cDNA sequences. These results could improve maize genome annotation and supplement currently available public databases. It should be pointed out that the 2073 full-length cDNA collection is from a different inbred line than B73. Therefore, the comparisons of the 2073 full-length cDNA to the BAC clones and the GSS sequences of B73 may be subject to haplotype variation (Brunner et al., 2005; Fu and Dooner, 2002; Song and Messing, 2003).

According to Haas et al. (2002), if two cDNAs can be mapped to the same locus in genomic sequences, but show distinct exon–intron structures, they are designated as alternative splicing, or another type of splicing abnormality. Alternative pre-mRNA splicing plays a major role in expanding protein diversity and regulating gene expression in higher eukaryotes (Black, 2000, 2003). In this study, we found that three genes from different BAC sequences had alternative splicing, producing two transcripts for each gene (Figure 3). The alternative splicing in Figure 3(a) would produce transcripts coding for different proteins, whereas the two transcripts in Figure 3(b) and (c) encode the same protein.

Comparative analysis of the maize full-length cDNA to the maize GSS, rice and Arabidopsis genomes

Maize full-length cDNA clones are useful for determining gene structure in maize and other Poaceae species by integrative analyses with the genomic sequences of these species (Bennetzen and Ma, 2003). Gene structures cannot be correctly predicted based on genome sequences only; however, mapping of full-length cDNA clones to the genome and further comparative analysis of genomic sequences could lead to a better elucidation of gene structures. The availability of rice and Arabidopsis genome sequences offers a unique opportunity for us to compare maize cDNA sequences with Arabidopsis and rice sequences and to search for regions conserved during evolution.

Comparative mapping on the rice and Arabidopsis genomes of the 2073 maize full-length cDNAs showed that 1877 and 1507 cDNA sequences could be aligned to the rice and Arabidopsis genomes, respectively, i.e. there are more conserved sequences between the maize and rice genomes than between the maize and Arabidopsis genomes. This was consistent with earlier findings that the individual chromosomes in maize are highly collinear with those of rice, wheat, sorghum and other grass species (Ahn et al., 1993; Bennetzen and Ramakrishna, 2002; Devos and Gale, 2000).

GC content and codon usage

The overall GC content of the 2073 full-length cDNA sequences is consistent with the maize mRNA available in public databases (Figure 1). The distribution of GC content is not uniform within the full-length cDNA, and an obvious polarity could be observed with the highest GC content (57.9%) in the 5′ UTR, 56.8% in the coding regions and the lowest (41.5%) in the 3′ UTR. This polar distribution of GC content is consistent with findings in rice (Wong et al., 2002) and 172 full-length genes from 100 random selected maize BACs (Haberer et al., 2005). The significant difference in GC content between monocot and dicot plants also leads to an obvious difference in codon usage between them (Table S1).

Macroarray analysis

cDNA microarrays and macroarrays are valuable tools in analyzing gene expression on a large scale. Several studies have reported gene expression changes under abiotic stresses using cDNA microarrays in plant species such as rice (Bohnert et al., 2001) and Arabidopsis (Seki et al., 2001, 2002a,b, 2002d). Zheng et al. (2004) reported that PEG and drought stress have similar effects on maize seedlings. Based on the collected full-length cDNA clones, we prepared a maize full-length cDNA macroarray using 2073 full-length cDNAs to identify the candidate genes responsive to water deficit. The results showed that 79 genes were upregulated and 329 genes downregulated by PEG treatment, which accounts for 3.8% and 15.9% of the 2073 full-length cDNAs, respectively. Although the full-length cDNA library was constructed from PEG-stressed maize seedlings, the cDNAs were not screened, which may be one of the reasons for the low percentage of upregulated genes and higher percentage of downregulated genes. Another possible reason is that the cDNA clones were randomly chosen for sequencing without normalization, and these 79 upregulated genes are independent cDNAs, any one of which may represent a number of abundant transcripts.

These upregulated genes encode proteins that fall into two main categories: functional proteins and regulatory proteins (Table 4). The results of Northern blot using two representative probes from each of the two main categories were consistent with the macroarray analysis. It has been reported that functional proteins, such as LEA proteins, GST and proline-rich proteins, play important roles in avoiding or reducing stress injury in plants. The upregulated genes from the maize full-length cDNA macroarray analysis were similar to those induced by drought, cold and high-salinity stresses in Arabidopsis (Seki et al., 2002a). Transcription factors are important in regulating plant responses to environmental stresses. In our experiment, zinc finger proteins, a family of transcription factors that have previously been shown to be stress-inducible (Brinker et al., 2004), were the main transcription factors identified following treatment with PEG-6000. Two ethylene-responsive genes were also induced by PEG-6000, suggesting that the ethylene signaling pathway may be involved in the drought stress response in maize. In addition, a 14-3-3 protein was also found to be inducible in our experiment, which has previously been reported to be involved in signal transduction and response to abiotic stresses in animal cells and higher plants (Koskinen et al., 2004; Roberts et al., 2002).

In our experiment, photosynthesis-related genes such as chlorophyll a/b-binding protein and the components of photosystems I and II (Table 5) were downregulated. Several similar studies have reported that drought, cold, high salinity and ABA stress inhibit photosynthesis (Seki et al., 2002a; Tezara et al., 1999; Weatherwax et al., 1996). Additionally, our data showed that several genes involved in signal transduction, adenylate kinase  B, ABA-responsive protein and cold-responsive proteins were also downregulated by PEG-6000 treatment. Of the transcription factors, it has been reported that DnaJ is a heat shock-induced protein, and that it had a negative auto-regulation effect on heat shock responses (de Crouy-Chanel et al., 1995; Pellecchia et al., 1996). In our experiment, DnaJ might also play the same role in regulating protein renaturation after PEG stress.

Analysis of promoters and transcription factors

In Arabidopsis, Oono et al. (2003) demonstrated that full-length cDNA microarray was useful not only in analyzing patterns of gene expression, but also in identifying the target genes of stress-related transcription factors and potential cis-acting DNA elements by combining the expression data with the identification of cis-acting sequences in the corresponding genomic sequence data. Cis-acting elements are special DNA sequences that mediate the regulation of gene transcription. Several cis-acting elements, such as the G-box-containing ABREs and the recognition sequences for the MYB and MYC class of transcription factors, have been shown to contribute to ABA responses of individual genes (Busk and Pages, 1998; Leonhardt et al., 2004). In our study, 79 genes were identified as PEG stress-inducible genes. Of these, nine contained the DRE-core sequence (CCGAC) in their promoters, suggesting that they were regulated by DREB transcription factors. Twenty genes contained ABRE (ACGTG(T/G) in their promoter. These results suggest that these cis-acting elements, combined with the corresponding transcription factors, are involved in the response to various environmental stresses cooperatively or separately.

Experimental procedures

Plant materials, stress treatments, and RNA preparation

Maize (Zea mays L. Han  21) was used in this study. Maize seeds were surface-sterilized in 5% sodium hypochloride for 5 min and washed with distilled water. Then seeds were sown in blended soil (soil:sand 2:1) and incubated at 25°C. For PEG treatment, seedlings at the three-leaf stage were removed from the soil and subjected to treatment with 20% PEG-6000 in a sealed container through which air was continuously bubbled for the duration of treatment. Control and PEG-stressed leaves and roots were harvested 1 and 6 h after the start of the experimental treatment, and frozen immediately in liquid nitrogen and stored at −80°C for further analysis. Total RNA was prepared as described previously (Kay et al., 1987) with some modifications. mRNA was isolated using an OligetexTM mRNA isolation kit (Qiagen, Valencia, CA, USA).

Reverse transcription

The reverse transcription reaction was carried out in a 100 μl volume using RNaseH-free reverse transcriptase (SuperscriptTM II reverse transcriptase, Invitrogen, Carlsbad, CA, USA). First, 12.5 μg pooled mRNA from maize seedlings treated with 20% of PEG-6000 for 1 and 6 h was denatured at 65°C for 10 min, and 200 pmol oligo(dT)18 (5′-AGATTGGTCTCCTCGAGT(18)VN-3′, where N is G, A, T or C and V is A, C or G) were added. Then, 20 μl of first-strand buffer, 5 μl of 10 mm DTT, 10 μl of 10 mm 5-methyl-dNTP and 3 μl RNase inhibitor were added and pre-incubated at 42°C for 5 min, before incubation with 10 μl Superscript II reverse transcriptase at 45°C for 30 min, 50°C for 30 min and 60°C for 30 min. To stop the reaction, 2 μl of 0.5 m EDTA, 2 μl of 10% SDS and 5 μl of 10 mg ml−1 proteinase  K were added, and the reaction mixture was further incubated at 45°C for 15 min. Subsequently, the cDNA/RNA was extracted once with phenol–chloroform, precipitated with ethanol, washed with 70% ethanol and resuspended in RNase-free water.

RNA oxidation and RNA biotinylation

The resuspended cDNA/RNA was oxidized in 66 mm sodium acetate (pH 4.5) and 5 mm NaIO4. The oxidation was carried out on ice in the dark for 45 min. To precipitate the oxidized RNA, 10% SDS, 5 m NaCl and isopropanol were added, and the mixture was centrifuged at 12 000 g for 30 min at 4°C. The pellet was washed with 70% ethanol and resuspended in RNase-free water. Then 1 m sodium acetate (pH 6.1), 10% SDS and 10 mm biotin hydrazide long-arm were added to the oxidized cDNA/RNA for biotinylation overnight at room temperature. Subsequently, the cDNA/RNA were precipitated at −80°C for 1 h by adding 1 m sodium acetate (pH 6.1), 5 m NaCl and 2.5 volumes of ethanol. Finally, the pellet was washed twice with 80% ethanol and resuspended in RNase-free water. RNase digestion of the first-strand cDNA reaction was performed using RNase  I at 37°C for 1 h.

Full-length cDNA capture

MPG–streptavidin beads (500 μg; CPG, Lincoln Park, NJ, USA) and DNase-free tRNA (100 μg) were mixed and incubated on ice for 30 min. The beads were separated using a magnetic stand and washed three times with 2 m NaCl and 50 mm EDTA (pH 8.0). Finally, the beads were resuspended in 2 m NaCl and 50 mm EDTA (pH 8.0) and mixed with the cDNA/RNA sample at room temperature for 30 min with gentle mixing. After removal of unbound cDNA/RNA, the beads were washed three times with 2 m NaCl and 50 mm EDTA (pH 8.0). To release the cDNA from the beads, 100 μl of 50 mm NaOH/1 mm EDTA, pH 8.0, were added to the cDNA/RNA mixture and incubated at 65°C for 10 min. Eluted cDNA was added to a tube containing 100 μl of 1 m Tris-HCl (pH 7.5). Subsequently, the cDNA was extracted once with phenol–chloroform, precipitated with ethanol and resuspended in RNase-free water.

Ligation of cDNA with adapter

The adapter was designed and prepared by annealing with the following two primers: forward primer 5′-pCCTGACTGATCGACT-3′ (where p = phosphate) and reverse primer 5′-AGTCAGGNNN (where N = G, A, T or C). A 50 μl mixture of forward and reverse primers was added to 1x PCR buffer (50 mm KCl, 10 mm Tris-HCl (pH 8.3), 1.5 mm MgCl2), denatured at 95°C for 5 min in a water bath, and then cooled to room temperature. The adapter was diluted to 10 μm before use. cDNA was ligated with 5 pmol of the adapter in 10 μl volume at 16°C overnight. After ligation, the samples were heated at 65°C for 5 min to inactivate ligase, and purified with a Qiagen PCR purification column. cDNA was eluted in 80 μl water.

Construction of full-length cDNA library

The primer 5′-GTACGTAGGTCTCGAATTCAGTCGATCAGTCAGG-3′ was used for the dsDNA synthesis. To 80 μl of cDNA, 3 μl of 10 μm primer, 5 μl of 2.5 mm dNTP, 10 μl PCR buffer and 1 μl (5 U) of Taq polymerase were added. The reaction was performed on a GeneAmp PCR System 9700 (ABI, Foster City, CA, USA) by initially denaturing for 5 min at 94°C, and followed by five cycles of 50°C for 30 sec, 60°C for 30 sec and 72°C for 6 min. dsDNA was then purified by a Qiagen PCR spin column and digested with 1 U of the restriction enzymes EcoRI and XhoI at 37°C for 1 h. The digested dscDNA was size-fractionated on a 1% agarose gel. Fragments ranging from 0.5 to 8 kb were recovered from the agarose gel and cloned into a pre-digested pBluescript SK+ vector (Stratagene, La Jolla, CA, USA). The ligation mix was extracted once with phenol–chloroform, precipitated with ethanol, and resuspended in 10 μl water. A 1 μl aliquot of ligation mix was transformed into Escherichia coli strain DH10B by electroporation using a Gene-Pulser II according to the manufacturer's instructions (Bio-Rad, Richmond, CA, USA). Twenty thousand colonies were selected from LB agar plates containing ampicillin and transferred to 384-well plates containing LB medium with ampicillin. After overnight growth at 37°C, the plates were stored at −80°C.

cDNA sequencing

Twenty thousand clones were inoculated at 37°C overnight with shaking at 200 rpm. Using the method of alkaline lysis, plasmids were extracted and purified with Multiscreen filter plates for high-throughput separations (Millipore, Bedford, MA, USA). Sequencing reactions were performed with 200 ng of plasmid as template and T3 and M13F as primers using an ABI PRISM Big Dye Terminator version 3.1 cycle sequencing kit (Applied BioSystems, Foster City, CA, USA) on a GeneAmp PCR System 9700 (ABI). Sequencing reactions were further cleaned by ethanol/EDTA/sodium acetate precipitation prior to capillary electrophoretic separation and detection by an ABI 3730 DNA Analyser.

Data processing and assembly

Each cDNA sequence was processed and assembled using PHRED (Ewing and Green, 1998; Ewing et al., 1998) and PHRAP (Green, 1999; http://www.phrap.org). Lucy was used for vector trimming (Chou and Holmes, 2001); trimmed data were stored in FASTA format using a Perl script. Clones containing cDNAs that were >90% similar over 80 bases or more were classed into the same cluster using the tgicl program (Osato et al., 2002; Pertea et al., 2003). The end sequences in each cluster were aligned using the FASTA homology search software to Uniprot (Apweiler et al., 2004). Based on the alignment of 5′ and 3′-end sequences, the clone carrying the longest cDNA insert in each cluster was selected as representative of the cluster.

Sequence analysis

All the sequences were compared with the public sequence database as both nucleotides and amino acids based on blast (E-value of <10−5). The features of 5′ and 3′ untranslated regions of full-length cDNAs, function ontology, transcription factors and promoters associated with PEG stress were then analyzed by macroarray. We aligned the full-length sequences with the maize GSSs, and the rice and Arabidopsis genomes using paracelblast, with an E-value of 10−5. We extracted the maize genomic region of the best locus assigned to each cDNA (>95% identity over >100 bases).

cDNA macroarray preparation

The full-length cDNA macroarray was prepared following the method described by Ji et al. (2003), with slight modifications. The cDNA macroarray for each treatment, comprising two nylon membranes containing 2073 genes (6219 spots) and three controls (114 spots), was used for analyzing changes in the transcript level of 2073 maize full-length cDNAs in response to stress treatment with PEG-6000. Maize α-tubulin cDNAs were also printed on each membrane as an internal control. Pig β-TGF was used as a negative control. RAB17 was used as a positive control. PCR products from each unique full-length cDNA were arrayed from six 384-well plates onto nylon membranes (Amersham, Arlington Heights, IL, USA) using a Biomek  2000 Laboratory Automation Workstation (Beckman Coulter, Fullerton, CA, USA). Each clone was printed in triplicate with an average spot diameter of 1.125 mm and spot-to-spot distance of 1.25 mm. After air-drying, the membranes were denatured in 0.6 m NaOH for 5 min, neutralized in 0.5 m Tris-HCl (pH 7.5) for 5 min, and rinsed in distilled water for 3 min. The spotted samples were cross-linked to membranes using a low-energy UV source and baked for 2 h at 80°C.

Macroarray hybridization and scanning

Total RNA prepared from untreated and PEG-treated maize roots and leaves for 1 and 6 h was reverse-transcribed and used as a probe for expression profile analysis. The reverse transcription reaction was performed in a 20 μl solution and set up as follows: 1 μl oligo(dT)18, 10 μg total RNA and distilled water (up to 8 μl), heating at 65°C for 5 min, quick chilling on ice, and collection of the contents by 20 sec of centrifugation at 10 000 g. Then, 4 μl of 5x first-strand buffer, 2 μl of 0.1 m DTT, 1 μl of 10 mm dNTP mix (10 mm each dATP, dTTP and dGTP), 1 μl RNasin (40 U μl−1), 3 μl [32P]-dCTP (10 μCi μl−1) and 1 μl (200 U) of SuperScriptTM II (Invitrogen, Carlsbad, CA, USA) were added, and the tubes were mixed by gentle vortex and incubated at 42°C for 1 h. The probes were denatured in a heat block at 100°C for 5 min, and then chilled for 5 min on ice before hybridization. Membranes were pre-hybridized in 20 ml Church solution (1% BSA, 1 mm EDTA, 0.25 m Na2HPO4-NaH2PO4, 7% SDS) at 65°C for 12 h. The denatured probes were then added to the solution, and hybridization was carried out overnight at 65°C. After hybridization, the membranes were washed at 65°C in 2x SSC, 0.5% SDS for 15 min, in 1x SSC, 0.5% SDS for 15 min, in 0.5x SSC, 0.5% SDS for 15 min, and finally in 0.1x SSC, 0.1% SDS for 15 min. The membranes were then exposed to storage phosphor screens (Amersham Biosciences, Piscataway, NJ, USA) for 3 days. Images were acquired by scanning the membranes with a Typhoon 9210 scanner (Amersham Biosciences).

Northern blot analysis

Total RNA from PEG-stressed and unstressed maize seedlings was also used for RNA gel blot analysis. Total RNA (25 μg) was separated by electrophoresis in denaturing formaldehyde 1.2% w/v agarose gels and then transferred to Hybond  N+ nylon membrane (Amersham Biosciences). DNA probes were purified from PCR-amplified fragments of selected genes and labeled with α-[32P]-dCTP using a Prime-a-Gene® Labeling System Kit (Promega, Madison, WI, USA) according to the manufacturer's protocol. 28S rRNA was also hybridized as a loading control. Hybridization, washing and scanning were performed as described previously (Zheng et al., 2004).

Data normalization and statistical analysis

Raw intensity measurements and data analysis were performed using GPC Visualgrid software (http://www.gpc-biotech.com). Median signal intensity measurements were obtained for each spot on the hybridized array. The signal value of the negative control (pig β-TGF) on the array was used as background and consequently subtracted from the raw intensity values for each gene on the array. Maize α-tubulin cDNAs were used as an internal control gene to equalize hybridization signals generated from different samples. The signal value after background subtraction and signal equalization was used for further analysis.

Acknowledgements

The authors would like to thank Dr Ruiguang Zhen and Dr John Klejnot for their critical reading and comments on the manuscript. This work was supported by the China High-Tech Program (863) and the Cultivation Fund of the Key Scientific and Technical Innovation Project, China Ministry of Education (number 705009).

Ancillary