These authors contributed equally to this work.
A dynamic gene expression atlas covering the entire life cycle of rice
Article first published online: 9 DEC 2009
© 2010 The Authors. Journal compilation © 2010 Blackwell Publishing Ltd
The Plant Journal
Volume 61, Issue 5, pages 752–766, March 2010
How to Cite
Wang, L., Xie, W., Chen, Y., Tang, W., Yang, J., Ye, R., Liu, L., Lin, Y., Xu, C., Xiao, J. and Zhang, Q. (2010), A dynamic gene expression atlas covering the entire life cycle of rice. The Plant Journal, 61: 752–766. doi: 10.1111/j.1365-313X.2009.04100.x
- Issue published online: 22 FEB 2010
- Article first published online: 9 DEC 2009
- Received 3 October 2009; revised 14 November 2009; accepted 27 November 2009; published online 18 January 2010.
- Oryza sativa L.;
- expression profile;
- functional genomics;
- genome annotation;
- Top of page
- Experimental procedures
- Supporting Information
Growth and development of a plant are controlled by programmed expression of suits of genes at the appropriate time, tissue and abundance. Although genomic resources have been developed rapidly in recent years in rice, a model plant for cereal genome research, data of gene expression profiling are still insufficient to relate the developmental processes to transcriptomes, leaving a large gap between the genome sequence and phenotype. In this study, we generated genome-wide expression data by hybridizing 190 Affymetrix GeneChip Rice Genome Arrays with RNA from 39 tissues collected throughout the life cycle of the rice plant from two varieties, Zhenshan 97 and Minghui 63. Analyses of the global transcriptomes revealed many interesting features of dynamic patterns of gene expression across the tissues and stages. In total, 38 793 probe sets were detected as expressed and 69% of the expressed transcripts showed significantly variable expression levels among tissues/organs. We found that similarity of transcriptomes among organs corresponded well to their developmental relatedness. About 5.2% of the expressed transcripts showed tissue-specific expression in one or both varieties and 22.7% of the transcripts exhibited constitutive expression including 19 genes with high and stable expression in all the tissues. This dataset provided a versatile resource for plant genomic research, which can be used for associating the transcriptomes to the developmental processes, understanding the regulatory network of these processes, tracing the expression profile of individual genes and identifying reference genes for quantitative expression analyses.
- Top of page
- Experimental procedures
- Supporting Information
Transcriptional programs are important for the development of complex eukaryotic organisms. Suites of genes expressed with temporal and spatial controls by regulatory networks in response to environmental cues are the cornerstone for achieving the specification of morphology and physiology of the tissue or organ systems. Thus, an important issue of developmental biology is to define the subsets of expressed genes and their expression patterns that are related to the organ or tissue system.
DNA microarray technology is a powerful high-throughput tool for constructing the transcriptional map at the whole-genome scale. This approach has been widely used to detect gene expression patterns in many model organisms (Walker et al., 2004; Son et al., 2005; Xia et al., 2007). The transcriptomes covering the developmental processes of Arabidopsis and the model legume Medicago truncatula were elucidated using this approach (Schmid et al., 2005; Benedito et al., 2008). A comparison revealed that the expression patterns of many orthologs had been altered between the two plant species, indicating that similarity of protein sequences may not be a good predictor of identical functions of genes in different species.
Functional divergence of orthologs widely exists among plants. For example, the gene UFO promotes the conversion of inflorescence to floral meristem in Arabidopsis (Wilkinson and Haughn, 1995), whereas its ortholog APO1 inhibits this change in rice (Ikeda et al., 2007). Thus the two orthologous genes had different effects in the two plants on inflorescence meristem fate. Furthermore, the developmental process must change when one ancestor evolved into an offspring with different morphological form, hence an altered body plan. For example, the floral meristem is generated directly from the peripheral zone of the inflorescence meristem in Arabidopsis. In grasses, however, there is a series of meristem cell fate transition after the development of inflorescence meristem and the floral meristem mainly originates from the peripheral zone of the branch but not inflorescence meristem (Bommert et al., 2005). Because of such developmental divergence, the transcriptional map data from Arabidopsis or Medicago may not be applicable for elucidating the developmental and transcriptional characters of the grasses.
Rice is one of the most important food crops worldwide and also a model for genomic research in cereals (Zhang, 2007; Zhang et al., 2008). Resources for genomic research have been developed rapidly in rice in recent years, including the high quality genome sequence (International Rice Genome Sequencing Project, 2005), large scale expression sequence tags (ESTs), full-length cDNA collections (Kikuchi et al., 2003; Xie et al., 2005; Lu et al., 2008) and mutant libraries (Jeon et al., 2000; Wu et al., 2003; Sallaud et al., 2004; Krishnan et al., 2009). In particular, large amounts of data of whole genome expression have been generated in recent years, using several whole-genome oligomicroarray platforms including the BGI/Yale 60K (Ma et al., 2005), NSF 45K (Jung et al., 2008), Agilent 44K (Shimono et al., 2007) and Affymetrix 57K (Li et al., 2007). Currently, datasets of 793 arrays from 47 experiments have been deposited in the public database (http://www.ncbi.nlm.nih.gov, http://www.ricearray.org, Table S1). A method called ‘massively parallel signature sequencing’ was also used for expression profiling in rice (Nobuta et al., 2007). However, the majority of the studies were designed to identify differentially expressed genes between controls and treatments with certain experimental conditions such as biotic, abiotic or light, or to investigate the comparative expression patterns between wild type and mutants of certain genes. Only in a few cases were the datasets designed for studying the transcriptomes of a limited number of organs and cell types (Ma et al., 2005; Jain et al., 2007; Li et al., 2007; Jiao et al., 2009; Xue et al., 2009). Thus, there is still insufficiency in the available datasets that would allow for the establishment of expression patterns for suits of genes during the developmental processes.
In this study, 39 tissues/organs were sampled throughout the life cycle of the rice plant from two indica varieties. Genome-wide expression data were generated by hybridizing RNA from the 39 tissues with the Affymetrix GeneChip Rice Genome Array (http://www.affymetrix.com). The objective was to develop a genomic resource of genome-wide dynamic transcriptome of the rice plant. The data can be used in various ways in genomic research, such as understanding the dynamics of the global expression for elucidating the mechanisms of the developmental processes, characterizing expression profiles of individual genes for identifying those involved in specific processes and identifying specific organs in which the genes may play functions, among others.
- Top of page
- Experimental procedures
- Supporting Information
Generation and quality assessment of the data
The Affymetrix GeneChip Rice Genome Array (http://www.affymetrix.com/products/arrays/specific/rice.affx) was used for the expression study. The array was designed mainly based on the annotation of TIGR (The Institute of Genome Research) version 2.0 and it contains 57 381 probe sets in total, of which 1347 represent 1260 transcripts of an indica variety and 54 168 represent 48 564 transcripts of a japonica variety.
A total of 31 tissues (organs) covering the entire life cycle of the rice plant from two indica varieties, Minghui 63 and Zhenshan 97, were collected for RNA extraction in the gene expression analysis (Figure 1, Table S2). In addition, eight samples spanning the tissue culture process of genetic transformation were also included, thus the total number of tissues were 39. Two independent biological replicates were sampled from most tissues, except two seedling and three panicle tissues, for which three independent biological replicates each with two technical replicates were sampled, resulting in a dataset of 190 microarrays.
The original data, in the form of scanned image stored in.cel files, were transformed to expression values that were normalized and deposited in the database CREP (Collections of Rice Expression Profiling, http://crep.ncpgr.cn). It can be searched by quarrying single gene or multiple genes using gene sequences, gene names, or the probe identifications, for items like quantified expression values, the co-expressed genes, the tissues and varieties in which they are expressed.
The data quality was assessed using several measurements (Figure S1). First, the average of 95 correlation coefficients between biological replicates was 0.9658 ± 0.0031, of which 91 (96%) exceeded 0.9 (Figure S1a). The average of the 30 correlation coefficients between technical replicates of the five tissues each with three biological replicates was 0.9973 ± 0.0002 (Figure S1b). These results indicated that the quality of the data is high, although the correlation coefficients of samples collected from the field plants were lower than the tissues prepared in the laboratory. Correlation coefficients of three leaf samples at the stage of transition from vegetative to reproductive phase were below 0.9 (0.82 in Leaf 1 from Minghui 63, 0.88 in Leaf 1 and 0.87 in Sheath 1 from Zhenshan 97), indicating that such transition is also reflected in the expression profiles.
Second, expression profiles for a number of gene families had been validated using RT-PCR or quantitative real-time PCR, which produced highly consistent results. Examples included the thioredoxin family (Nuruzzaman et al., 2008), PEX11 family (Nayidu et al., 2008), KT/HAK/KUP potassium transporter family (Gupta et al., 2008), tubby-like family (Kou et al., 2009), calcium-dependent protein kinase gene family (Ye et al., 2009), C3HC4-type RING finger gene family (Ma et al., 2009) and the ankyrin repeat gene family (Huang et al., 2009).
Third, we compared expression patterns of many genes with previous reports, with a few examples given in Figure S1(c–f). For instance, D54O, encoding a photosystem II 10-kDa polypeptide with green tissue-specific expression (Cai et al., 2007), was shown to be expressed in green tissues in our data (Figure S1d).
Taken together, all these analyses demonstrated that the microarray data obtained in this study were of high quality.
Features of the expressed genes
Totally, 38 793 probe sets were detected by RNA hybridization, of which 2,233 were detected only in Zhenshan 97 and 2157 only in Minghui 63 (Table S3). Based on combined results of the two varieties, the number of detected probe sets among the tissues ranged from 20 992 in the endosperm 14 days after pollination to 26 295 in the spikelet 3 days after flowering. Smaller numbers of probe sets were detected in stamen, endosperm and germinating seed, while larger numbers were detected in panicle, spikelet, juvenile seedling and callus (Figure S2). Expression was detected for 8817 probe sets in all the organs, while only 2376 probe sets were detected as tissue-specific, each of which was detected only in one tissue cluster (see below). There was considerable difference in the distribution patterns of expression values among the tissues (Figure S3). Stamen stood apart by having large portions of detected probe sets biased to both higher and lower expression levels, compared with other tissues. Endosperm and panicle also had relative more detected probe sets in the high side.
All the probes were mapped to the genome to establish the association between probe sets and annotated genes in the TIGR database using the Nipponbare genome sequence as the reference. According to the criterion adopted (see Methods), 33 811 unique genes were represented by 40 525 probe sets on the array and the remaining probe sets did not correspond to annotated gene loci. Based on the annotations of TIGR Rice Genome Pseudomolecules Release no. 5, 21 829 of the 33 811 genes were supported by ESTs or full-length cDNAs which were referred to as ‘known genes’, while the remaining 11 982 were predicted genes (Ouyang et al., 2007). Totally, 25 093 genes produced detectable signals including 20 795 (95.3%) of the 21 829 known genes and 4298 (35.9%) of the 11 982 predicted genes. Clearly, the predicted genes have a much lower rate of detection than the ‘known genes’ by the microarray analysis, suggesting that low (or no) expression mainly account for the low EST-support of the predicted genes (Figure S4). Similar tendency was also observed using tissues from the japonica variety Nipponbare as the materials (Satoh et al., 2007).
Gene ontology (GO) categories were used to evaluate the average expression levels of genes encoding specific classes of proteins (Figure S5). It was found that the expression of genes encoding transcription factors, DNA replication and cell cycle proteins was at the average levels of all genes. While genes related to photosynthesis and carbohydrate catabolism were expressed at higher levels, those involved in cell wall were expressed much below the average.
Relatedness of organs revealed by the whole genome expression patterns
We used the entire gene expression profiles of all organs (tissues) to examine their relatedness of gene expression patterns. Six main groups with 10 clusters of organs (tissues) were identified by their transcriptome similarity (Figure 2). Stamen (cluster 1) formed a group by itself with a unique expression profile. The second group consisted of three clusters of organs: leaf and sheath (cluster 2), seedlings (cluster 3) and flower (cluster 4) including palea/lemma, spikelet and mature panicle (panicle 5), all of which are mostly green tissues. Young panicles (cluster 5) of four developmental stages formed the third group. The fourth group contained two plumule tissues (cluster 6) and two stem tissues (cluster 7). The three samples of the endosperm (cluster 8) formed the fifth group. Roots (cluster 9), callus and germinating seed (cluster 10) were tightly clustered in the last group. Clearly the expression profiles at the whole genome level matched the developmental relatedness of the organs. Similar result was also obtained in similar analysis of relatedness between cell types in rice (Jiao et al., 2009).
We also investigated whether genes belonging to specific GO categories were differentially expressed in certain type of the organs. As shown in Figure 3, the overrepresented expressed GO categories corresponded to the functions of the organs. For example, the average expression level of all genes involved in male gamete generation (GO: 0048232) was the highest in stamen where expression of genes belonging to most GO categories were reduced compared with other tissues. Similarly, genes related to chlorophyll catabolic process (GO: 0015996) were expressed the highest in leaves, but low in roots and young panicles. And genes functioning in sex determination (GO: 0007530) were the highest in the panicle.
We further examined genes associated with the biological processes in each tissue by GO enrichment analysis (Jung et al., 2008; Lin et al., 2008). The results showed that genes associated with the term spermatogenesis (GO: 0007283) were significantly enriched in the stamen and panicle 5 and ones related to flower development (GO: 0009908) and reproductive structure development (GO: 0048808) were enriched in the palea/lemma and spikelet (Figure S6, Table S4). We also found that genes associated with embryonic development (GO: 0009790) were enriched in the endosperm, suggesting probable signal communication between embryo and endosperm. Genes in the category of transcription (GO: 0006350) were enriched in developing panicle, indicating roles of transcription regulation in panicle development. These results indicated that the functions of the expressed genes, by either the gene number or transcript abundance, corresponded to the biological processes characteristic of the organs.
The dynamic patterns of gene expression
We investigated the dynamic expression patterns of the genes in the developmental processes by identifying genes showing differential expression among tissues using F-tests with 1000 permutations. Overall, 69% (26 750) of the detected probe sets showed significantly variable expression among tissues (with Benjamin & Hochberg adjusted P < 0.001 by anova and P < 0.01 by permutation test), including large variation along the developmental stages in a specific organ system. We will describe in detail the genes involved in panicle development as an example.
In the development of a young panicle, the primary and secondary branches are determined at early stage and the flower organs, male and female gametes are formed at late stage, such that the meristem activity decreased gradually from early to late stages. The analysis revealed that genes represented by 2667 probe sets showed differential expression in the four consecutive stages of young panicle development (panicle 1 to panicle 4) (Figure 4a). Strikingly, two groups of genes exhibited opposite dynamic expression trends from early to late stages (Figure 4b): expression levels of 215 probe sets (first group) decreased gradually from early to late stages (Table S5), while in contrast, 794 probe sets (second group) showed increased expression from early to late stages (Table S6). The first group included many transcription factors related to branch but not flower development, such as RFL and LAX that are known to regulate meristem activity for determining rice inflorescence architecture (Komatsu et al., 2003; Rao et al., 2008). OSH1, a marker gene of meristem cell in rice (Sentoku et al., 1999), was also in this group. It was reported that some target genes of miR156 are essential for rice inflorescence development (Xie et al., 2006). Expression of three SPL family transcription factors, which are the targets of miR156, decreased gradually from early to late stages. As these three genes also showed panicle-specific expression, they may be crucial for panicle development. The phytohormone auxin has very important roles in determining inflorescence architecture in maize and Arabidopsis (McSteen et al., 2007; Gallavotti et al., 2008a,b; Skirpan et al., 2008). However, the role of auxin in rice inflorescence development is not yet clear. In this dataset, at least 11 genes related to auxin transport or response pathways were identified and their expression levels decreased gradually from early to late stages, indicating that they play important roles at early stages, but not at late stages, of panicle development. These should be good candidates for investigating the roles of auxin in rice inflorescence, especially for primary and secondary branch development.
Among the 794 probe sets in the second (Figure 4b, Table S6), 10 MADS family transcription factors, known to have crucial roles in flower organ development, were identified. Also, Ugp2, a gene involved in pollen development of rice (Chen et al., 2007), was also in this gene group.
These two groups of genes with opposite expression trends provided very important information about gene functions during this process. We performed GO enrichment analysis (Jung et al., 2008; Lin et al., 2008) of these two groups of genes to gain a glimpse of the biological processes (Tables S7 and S8). The results showed that genes involved in the transcription process were enriched in both groups. In particular, transcription factors were highly significantly enriched in both groups (χ2 = 73.4, P = 2.2e–16 in the first group; and χ2 = 17.6, P = 2.7e–05 in the second group). Also, genes in the category of hormone stimulus response were enriched in the first group, which was in agreement to the observation that many genes related to auxin were involved in this group. Surprisingly, genes for photosynthesis had the highest enrichment among the genes in the second group. Similar light-independent induction of photosynthesis genes was also observed in maize and Arabidopsis during juvenile leaf development before chloroplast formation as well as embryogenesis (Spencer et al., 2007; Strable et al., 2008). These results suggested that genes for photosynthesis are not only regulated by light but also by the developmental cues.
Identification of tissue-specific expressed genes
We used the Shannon entropy to evaluate the tissue specificity of the expressed genes (Schug et al., 2005; Kadota et al., 2006). A total of 2376 probe sets were identified as tissue-specific expressed in both Zhenshan 97 and Minghui 63 (Figure 5, Table S9). The flower organ cluster including stamen, palea/lemma and panicle 5, had the largest numbers of the tissue-specific expressed genes and endosperm, root and callus tissues also had relative large numbers, while other vegetative tissues and developing panicles had smaller numbers.
The differences of Shannon entropy value between Zhenshan 97 and Minghui 63 for 1,878 probe sets were <0.5 and these genes were considered as tissue-specific in both varieties, or common tissue-specific expressed genes. Shannon entropy values for 132 probe sets differed by 2 or more between the two varieties and these were considered to be variety-specific and tissue-specific. Ninety (68.2%) of the 132 variety-specific, tissue-specific probe sets could be associated to 86 TIGR5 annotated genes, which is <85.4% (1603) of the 1878 common tissue-specific probe sets corresponding to 1444 annotated genes. Furthermore, 19 251 (34.2%) genes are annotated as hypothetical protein, expressed protein or conserved hypothetical protein, thus regarded as unknown function genes. Interestingly, 22.0% (317) of the 1444 common tissue-specific genes have been annotated as unknown function genes, compared with 37.2% (32) of the 86 variety-specific tissue-specific genes.
To examine whether certain gene families were overrepresented among the tissue-specific expressed genes, the distribution of 3865 paralogous protein families based on TIGR5 annotations was analyzed (Lin et al., 2008). Result of Fisher’s exact test indicated that genes belonging to 65 protein families showed tissue-specific expression (Figure S7, Table S10), which was significantly more than expected (P < 0.01). Among them, members of 14 families were all specifically expressed in certain tissues, especially in root and reproductive organs (seed, stamen and young panicle).
Identification of constitutively expressed genes (CEGs)
In our dataset, 8817 (22.7%) probe sets corresponding to 7276 of the 25 093 expressed genes by the TIGR5 annotation were detected with a ‘present call’ in all 190 arrays and thus regarded as CEGs, while expression of the remaining 17 817 genes was not detected in one or more of the arrays (non-CEGs).
Although the CEGs had detectable transcripts in all conditions, their expression levels varied drastically among tissues. In the extreme cases, five CEGs showed more than 500-fold difference in transcript abundance among tissues. For example, the gene LOC_Os06g49840 encoding a MADS-box transcription factor 16 showed the highest expression in stamen where the transcript level was 615-fold higher than its lowest expression in callus. To provide a statistical assessment for expression variation of CEGs among the tissues, the mean, standard division (SD) and coefficient of variation (CV) of the CEGs were calculated (Figure 6a), which showed a wide range of variations.
Several CEGs including actin, ubiquitin, tubulin, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) were frequently used as references for signal normalization in quantitative expression studies in the literature. However, the stability of the expression levels of these genes throughout the entire life cycle has been seldom assessed. Thus the present dataset provided a rare opportunity for assessing genes for expression stability. The 100 most stably expressed genes were identified according to their CV values, among them, 79 genes had CVs <20% (Figure 6a, Table S11). We also used another algorithm geNORM (Vandesompele et al., 2002) to validate the top 100 stably expressed genes (Table S11). There was a good agreement between the results from the two methods (r = 0.89, P = 2.2e–16).
A GO enrichment analysis showed that genes involved in biological processes related to protein translation, protein transport and metabolic, ubiquitin and cell cycle were overrepresented in the top 100 stably expressed genes (Table S12). The most stably expressed gene encodes a glycine-rich RNA-binding protein, with a CV of 9.98% for expression levels across all organs. Interestingly, a transcription factor encoding squamosa-promoter binding protein 12 (SPL12) also appeared in this list. Surprisingly, however, only two (GAPDH and actin) of the genes that are frequently used as references appeared in the top 100.
For assessing usefulness of these genes as internal reference, we compared the stability and absolute expression levels of these genes with the five genes, actin-1 (LOC_Os03g50890), ubiquitin fusion protein (LOC_Os03g13170), elongation factor1-alpha (LOC_Os03g08020), tubulin beta-6 chain (LOC_Os01g59150) and GAPDH (LOC_Os02g38920), which have frequently been used as internal controls in expression analyses (Kim et al., 2003; Liu et al., 2007; Park et al., 2008; Wang et al., 2009). The results showed that the expression of all the five genes fluctuated drastically across the developmental course (Figure 6a,b). In contrast, 19 genes among the top 100 stably expressed genes had average transcript levels higher than actin in all tissues, with CVs lower than all the five genes (Table 1, Figure 6a,c).
|Probe set||Gene ID||Mean||CV (%)||Annotation|
|Os.28425.1.S1_x_at||LOC_Os12g43600||12533||9.98||Glycine-rich RNA-binding protein|
|Os.46231.1.S1_a_at||LOC_Os03g46770||10606||15.35||Glycine-rich RNA-bing protein|
|Os.4157.1.S1_at||LOC_Os02g02890||10138||15.62||Peptidyl-prolyl cis-trans isomerase|
|Os.7897.1.S1_at||LOC_Os08g27850||7446||15.80||Endothelial differentiation factor|
|Os.7945.1.S1_at||LOC_Os07g34589||12145||16.36||Protein translation factor SUI1|
|Os.4705.1.S1_at||LOC_Os02g32030||5535||17.45||Protein elongation factor|
|Os.318.1.S1_at||LOC_Os03g55150||5760||19.16||Translation initiation factor|
|Os.8152.1.S1_at||LOC_Os05g49890||6154||19.51||GTP-binding nuclear protein|
|Os.4746.1.S1_a_at||LOC_Os02g48660||6715||20.97||60S ribosomal protein L31|
|*Os.12772.1.S1_at||LOC_Os03g13170||4365||25.00||Ubiquitin fusion protein|
|*Os.7916.1.S1_at||LOC_Os01g59150||2532||76.93||Tubulin beta-6 chain|
We also compared these 19 genes with the 25 genes that were recommended as novel internal control genes of rice based on the microarray data of 15 tissues (Jain, 2009). Only three genes identified here were also in that list. For further comparison, we outsourced 25 datasets, consisting of 365 arrays involving both indica and japonica varieties (http://www.plexdb.org). All these 19 genes were detected as expressed ones in all the datasets regardless of the genetic background, confirming the constitutiveness of the expression of these genes. Overall these 19 genes had more uniform expression than the 25 identified by Jain (2009) and some of the genes in that list also showed highly stable expression (Figure S8, Table S13).
- Top of page
- Experimental procedures
- Supporting Information
Analyses of this large dataset have revealed a wealth of features and phenomena about the global gene expression with fundamental implications for understanding the biological processes underlying growth and development, which are too numerous to be examined in a single paper. In the following, we would only discuss a few points leaving the vast majority of the information unattended.
Implications for annotation of the rice genome
The results of the genome annotation are still controversial and the numbers of genes estimated are not consistent, despite the large efforts invested in the last several years. The reported numbers of genes vary from 38 000 to approximately 40 000 (Yu et al., 2005), to 37 544 protein-coding genes (International Rice Genome Sequencing Project, 2005) and to approximately 32,000 genes (The Rice Annotation Project, 2007). The largest number was reported in the TIGR annotation (release 5), in which the estimate was 56 278 (Ouyang et al., 2007), including 24 435 known genes supported by ESTs and cDNAs and 31 843 predicted genes. However, the annotation and expression analyses published thus far were mostly based on japonica genotype especially the variety Nipponbare, whereas substantial sequence diversity exists between the two subspecies and sometimes even within subspecies. Actually a large difference was detected even between the two indica varieties used in this study, such that a 38 kb region carrying the Ghd7 locus was deleted in Zhenshan 97 relative to Minghui 63 (Xue et al., 2008). Thus expression data of more genotypes especially indica varieties may be particularly helpful for the annotation. Indeed, 35.87% of the predicted genes in the TIGR annotation which were present in the Affymetrix array had expression signals in this dataset. By extrapolation, expression analysis of these two indica genotypes could potentially recover expression information for more than 11 000 of the 31 843 predicted genes, which may increase the number of ‘known genes’ to the order of >35 000.
Implications for identifying genes functioning in inflorescence development
As a major determinant of the yield and an important component of the whole plant architecture, development of the inflorescence has attracted much research attention in the past decades. The inflorescence architecture of rice is largely determined by the branches of the panicle, including primary, secondary and even tertiary branches. Several genes controlling the inflorescence branching of rice have been isolated in recent years (Komatsu et al., 2003; Rao et al., 2008; Xue et al., 2008). We identified a total of 2667 probe sets showing differential expression in four consecutive stages of young panicle development, in which two large groups of genes showed opposite expression trends from early to late stages. Genes related to transcription regulation especially transcription factors were enriched in both groups. This is similar to the result of Furutani et al. (2006) that many transcription factors started expression at a very early stage of rice panicle development. Similar result was also obtained in Arabidopsis, such that a large portion of preferentially expressed genes in the young inflorescence encoded transcription factors (Zhang et al., 2005b). Therefore transcription factors play crucial roles in regulating the architecture of plant inflorescence. This is consistent with the results that most of the cloned genes regulating rice panicle morphology encoded transcription factors (Komatsu et al., 2003; Rao et al., 2008; Xue et al., 2008).
Some known transcriptional factors regulating branching but not flower development were found in the first group whose expression decreased gradually during the development, including LAX, RFL and three SPL family transcription factors. In contrast, many genes including ten MADS-box transcription factors related to flower and gamete development were found in the second group showing increased expression from early to late stages. These two groups of genes may have fundamental implications for studying rice inflorescence development and provide an index for identifying novel genes related to rice inflorescence architecture and flower development. In particular, the genes showing decreased expression from early to late stages and with tissue-specific expression are good candidates for investigating regulatory network of the branch development.
Stamen has a unique transcriptome
Our data showed that stamen had a unique transcriptional profile compared with all other tissues. Although the total number of genes expressed in the stamen is relatively small compared with other tissues, it had the largest number of tissue-specific expressed genes including ones featuring male gamete generation. Stamen has the largest portions of genes with expression levels at both high and low ends, indicating that many expressed genes were specifically down-regulated or silenced, while many others were specifically up-regulated. Similar results were reported in Arabidopsis that pollen had a unique expression pattern compared with other vegetative and reproductive organs (Becker et al., 2003). Relatively large proportions of expressed genes in pollen also appeared at the low or high ends of the expression levels compared with other tissues, although pollen in Arabidopsis had the smallest number of expressed genes (Schmid et al., 2005). Consequently, pollen was the major cause for variation of expression levels for many genes (Czechowski et al., 2005).
Thus, the overall transcriptional characteristics of pollen (stamen) were quite similar in rice and Arabidopsis implying conserved developmental mechanisms of the reproductive organs. Such unique transcriptional feature of the stamen (pollen) indicates fundamental difference in the developmental regimes between the gametophyte and the sporophyte.
Utility of the tissue-specific and constitutively expressed genes
Quantitative RT-PCR has been widely used to quantify gene expression. Data normalization is indispensable for comparing the measurements of different samples. It is crucial in such analyses to have reference genes that are highly stably expressed in all tissues and under different environmental conditions to avoid obscure or contradictory results.
Several genes, such as Actin, Ubiquitin, Elongation facotr1-alpha, Tubulin and GAPDH, have been widely used as reference to normalize the results of quantitative expression studies (Kim et al., 2003; Liu et al., 2007; Park et al., 2008; Wang et al., 2009), chosen based on the assumption of constant expression for housekeeping roles in basic cellular processes. However, we showed that the expression levels of these five genes fluctuated drastically among the tissues (Figure 6b, Table 1), which questioned the appropriateness of these genes as reference in expression analyses.
This dataset provided a rare opportunity for identifying highly stably expressed genes that may be useful as reference for gene expression analysis in rice. A total of 19 genes were identified as the candidates for novel reference genes because of higher and more stable expression than those widely used reference genes. When these 19 genes were compared with the 25 genes recommended as references by another study (Jain, 2009), only three of them were in common. Validation by outsourcing 25 datasets consisting of 365 microarrays covering both indica and japonica varieties showed that overall the expression of the 19 genes identified in this study was more stable than those 25 genes. This is probably because genes in this study were identified based on two varieties rather than one, which to certain extent reduced the effect of genetic backgrounds. In addition, more tissues used in this study may also have contributed positively to the identification of stably expressed genes. Nonetheless, the results from the two studies may be considered together in selecting candidates as better internal references in future quantitative expression analysis in rice.
This dataset can also be used in search of promoters in transgenic researches for various purposes, such as high and constitutive expression, tissue/organ specific expression of various levels and/or any defined temporal–spatial expression patterns.
The usefulness of this data set in functional genomic research
Generating global gene expression profiling data covering different tissues, developmental stages and genotypes can promote the understanding of gene functions in the biological processes. Thus this dataset has provided a general resource for functional genomic research in rice.
In particular, Zhenshan 97 and Minghui 63 used here are the parents of Shanyou 63, the most widely cultivated rice hybrid in China, mostly because of its high yielding and wide adaptability. In the past, many studies have been performed using materials derived from the cross between these two parents, including identification of hundreds of QTLs controlling a large array of traits and yield heterosis (Yu et al., 1997; Xing et al., 2002; Hua et al., 2003). More recently, genes of agronomical importance have been cloned and molecularly characterized based on information from this cross, including Xa26 for bacterial blight resistance (Sun et al., 2004), GS3 controlling grain size (Fan et al., 2006) and Ghd7 controlling heading date, plant height and panicle size (Xue et al., 2008). Genome resources such as BAC libraries, EST and full-length cDNA of either or both parents have also been constructed (Xie et al., 2005; Zhang et al., 2005a). The expression profiling data provided here together with the pre-existing resources will greatly facilitate identification of genes and molecular mechanisms that have contributed to the superior performance of this hybrid.
Moreover, as rice has become a model plant for cereal genomic research and genetic improvement (Xu et al., 2005; Zhang, 2007; Zhang et al., 2008), the expression data presented will become a useful resource for many cereals including a number of the most important food crops such as wheat and maize.
- Top of page
- Experimental procedures
- Supporting Information
Two rice varieties, Zhenshan 97 and Minghui 63, the parents of Shanyou 63, one of the most widely cultivated rice hybrid in China, were used in this study. The 39 tissues or organs used in this study included 31 tissues collected throughout the life cycle of the rice plant, plus 8 tissues spanning the entire length of tissue culture for transformation (Figure 1, Table S2).
The germinating seeds were collected by soaking dry seeds in water (changed every 12 h) for 72 h at 37°C. The germinated seeds were planted on moist, sterilized filter paper and divided into two groups. One group was kept under normal natural light and the other placed under darkness, both at 30°C for 48 h, after which plumule and radicle were collected.
Seedlings at 3 days after sowing and at trefoil stage, shoots and roots at two-tiller stage were collected from the plants cultured hydroponically as described (Lian et al., 2006). For phytohormone treatments, seedlings at trefoil stage were treated with 100 μm gibberellin (GA3), auxin (NAA) or cytokinin (kinetin) and samples collected at 5, 15, 30 and 60 min after treatments were pooled for each hormone treatment.
For collecting tissues of leaf, leaf-sheath, stem, flag leaf, palea, lemma, spikelet, endosperm and panicle, germinated seeds were planted in a seed bed in mid-May and seedlings of 20 days old were transplanted to the field and managed under normal agricultural conditions in the experimental farm of Huazhong Agricultural University, Wuhan, China. The tissues were harvested at various stages as described in Table S2 and Figure 1. The young panicles were collected under a dissection microscope and the developmental stages were determined according to the lengths. The endosperm was collected with the embryo excised.
Calli at eight time points during the tissue culture process of Agrobacterium-mediated transformation were prepared following the reported protocol (Lin and Zhang, 2005).
All samples were harvested at 16:00–18:00 of the day, placed in liquid N2 immediately and stored at −70°C until RNA extraction. Tissues were taken from at least five plants and pooled for each biological replicate.
RNA extraction, microarray hybridization and measurement of expression levels
RNA isolation, purification and microarray hybridization were conducted by the CapitalBio Corporation (http://www.capitalbio.com) according to the Affymetrix standard protocols.
To avoid influence of polymorphisms between the probe sequence on the array and the genomes of the varieties used, we hybridized Affymetrix rice microarrays with the genomic DNA of Zhenshan 97 and Minghui 63 following the standard protocols. Single feature polymorphisms were detected using the protocol of Borevitz (2006). After scanning with an Affymetrix 3000 GeneArray scanner, the probe intensity files (.cel files) were generated using the Microarray Analysis Suite (MAS Version 5, Affymetrix). Raw probe intensities from the gDNA.cel files were read into R (http://www.R-project.org). Background correction and quantile normalization were implemented using the robust multi-array average method (Gautier et al., 2004; Gentleman et al., 2004). Finally, the intensities of perfectly matched probes were extracted. Probes with hybridization intensity in Zhenshan 97 or Minghui 63 <8.4 (the 0.1 quantile of minimum probe intensity) were masked as low quality probes and a customized chip description library (R package) with un-masked probes was generated and used in the analyses.
The probe intensity files (.cel files) resulting from RNA hybridization were read into R. Background correction, quantile normalization and summarization were also performed using the robust multi-array average method in Bioconductor Affy package (Bolstad et al., 2003; Irizarry et al., 2003; Gautier et al., 2004; Gentleman et al., 2004) based on the customized chip description library. The expression flags (indicators of expressed genes) were determined using the MAS 5.0, as present, marginal and absent calls. A probe set with at least two present calls in at least two biological replicates of a tissue in at least one variety was considered to be expressed. Both the original chip description library and the customized one were used to calculate the MAS 5.0 calls and the results were merged.
Gene annotation and probe set–gene association
The annotation of genes and gene families used in this study were downloaded from TIGR Rice Genome Pseudomolecules Release 5 (Ouyang et al., 2007). Due to the update of genome assemble and gene annotation, we revaluated the definition of probe sets. All perfect match probes were mapped to the rice genome, 53 325 probe sets, each having at least six perfect match probes (more than half of the 11 probes in a probe set) and with a unique genomic location, were considered as core probe sets. A core probe set with at least four perfect match probes located in a TIGR annotated gene was considered to represent this gene. Based on the above criteria, 40 525 core probe sets on the array were associated to 33 811 unique genes annotated in TIGR5.
Gene Ontology (GO) analysis
GO terms of each gene were annotated based on the TIGR5 (Jung et al., 2008; Lin et al., 2008). There are three principle GO categories: biological process, molecular function and cellular component. We used the terms in the biological process category for GO analysis. GO enrichment analysis was performed using the weight method and Fisher’s exact test provided by the Bioconductor topGO package (Alexa et al., 2006). For detecting the GO categories with genes highly expressed in certain tissues, the average expression values (Z-score transformed) of all genes in the GO terms were calculated and only the GO categories having absolute Z-score value no <2.0 in at least one organ were selected. Furthermore, to identify the biological processes associated with each tissue, the expression levels of the genes in each tissue with absolute Z-score values exceeding 3.0 were used to perform the GO enrichment analysis.
Identification of tissue-specific expressed genes
A total of 29 tissues/organs from plants under normal growth conditions were used to identify tissue-specific genes (excluding callus 2, 4, 6, 7, 8, plumule 2, radicle 2, seedling 3–5). The Shannon entropy was used to assess the tissue specificity of the expressed genes (Schug et al., 2005; Kadota et al., 2006). The entropy value of a gene ranges from 0, whose expression is strictly in a single tissue, to log2(N) whose expression pattern is flat in all the N interrogated tissues. All the expressed genes were ranked according to their Shannon entropy and 2376 probe sets with entropy value <3 were considered as showing tissue-specific expression in this dataset. To identify the tissues where a gene was specifically expressed, the log2-transformed expression values of the gene in all the organs were ranked and the largest difference (max gap) of expression values between the adjacent tissues was used as the cutoff for differential expression. The tissues preceding the largest gap were regarded as where the gene was specifically expressed. In this way, the expression values of the genes in these tissues would be at least 2max gap fold higher than in other tissues.
Identification of constitutively expressed genes
Gene transcripts with relatively invariant abundance were identified using the method as described previously (Czechowski et al., 2005). Only the probe sets detected with ‘present’ calls by MAS 5.0 in all arrays in both varieties were used for identifying constitutively expressed genes. To select the most stably expressed genes, we employed filters using the inter-quartile range with logarithmic value <0.5 and the minimum expression level larger than 100. The top 100 most stably expressed genes were identified according to their coefficients of variation and their stability measurements were also calculated using geNORM (Vandesompele et al., 2002).
NCBI Gene Expression Omnibus (GEO): microarray data have been submitted under accession number GSE19024.
- Top of page
- Experimental procedures
- Supporting Information
This research was supported by grants from the National Special Key Project of China on Functional Genomics of Major Plants and Animals, the National Natural Science Foundation of China and National Program on Key Basic Research Project.
- Top of page
- Experimental procedures
- Supporting Information
- 2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics, 22, 1600–1607. , and (
- 2003) Transcriptional profiling of Arabidopsis tissues reveals the unique characteristics of the pollen transcriptome. Plant Physiol. 133, 713–725. , , , and (
- 2008) A gene expression atlas of the model legume Medicago truncatula. Plant J. 55, 504–513. , , et al. (
- 2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193. , , and (
- 2005) Genetics and evolution of inflorescence and flower development in grasses. Plant Cell Physiol. 46, 69–78. , , and (
- 2006) Genotyping and mapping with high-density oligonucleotide arrays. Methods Mol. Biol. 323, 137–145. (
- 2007) A rice promoter containing both novel positive and negative cis-elements for regulation of green tissue-specific gene expression in transgenic plants. Plant Biotechnol. J. 5, 664–674. , , , and (
- 2007) Rice UDP-glucose pyrophosphorylase1 is essential for pollen callose deposition and its cosuppression results in a new type of thermosensitive genic male sterility. Plant Cell, 19, 847–861. , , , , , , , , and (
- 2005) Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant Physiol. 139, 5–17. , , , and (
- 2006) GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theor. Appl. Genet. 112, 1164–1171. , , , , , , and (
- 2006) Genome-wide analysis of spatial and temporal gene expression in rice panicle development. Plant J. 46, 503–511. , and (
- 2008a) Sparse inflorescence1 encodes a monocot-specific YUCCA-like gene required for vegetative and reproductive development in maize. Proc. Natl Acad. Sci. USA, 105, 15196–15201. , , , , , and (
- 2008b) The Relationship between auxin transport and maize branching. Plant Physiol. 147, 1913–1923. , , and (
- 2004) Affy-analysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20, 307–315. , , and (
- 2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80. , , et al. (
- 2008) KT/HAK/KUP potassium transporters gene family and their whole-life cycle expression profile in rice (Oryza sativa). Mol. Genet. Genomics, 280, 437–452. , , , , , , and (
- 2003) Single-locus heterotic effects and dominance by dominance interactions can adequately explain the genetic basis of heterosis in an elite rice hybrid. Proc. Natl Acad. Sci. USA, 100, 2574–2579. , , , , , and (
- 2009) The ankyrin repeat gene family in rice: genome-wide identification, classification and expression profiling. Plant Mol. Biol. 71, 207–226. , , , , and (
- 2007) Rice ABERRANT PANICLE ORGANIZATION 1, encoding an F-box protein, regulates meristem fate. Plant J. 51, 1030–1040. , , , and (
- International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature, 436, 793–800.
- 2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264. , , , , , and (
- 2009) Genome-wide identification of novel internal control genes for normalization of gene expression during various stages of development in rice. Plant Sci. 176, 702–706. (
- 2007) F-box proteins in rice. Genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development and regulation by light and abiotic stress. Plant Physiol. 143, 1467–1483. , , , , , , , and (
- 2000) T-DNA insertional mutagenesis for functional genomics in rice. Plant J. 22, 561–570. , , et al. (
- 2009) A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies. Nat. Genet. 41, 258–263. , , et al. (
- 2008) Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy. PLoS ONE, 3, e3337. , , et al. (
- 2006) ROKU: a novel method for identification of tissue-specific genes. BMC Bioinformatics, 7, 294. , , , and (
- 2003) Collection, mapping and annotation of over 28,000 cDNA clones from japonica rice. Science, 301, 376–379. , , et al. (
- 2003) Normalization of reverse transcription quantitative-PCR with housekeeping genes in rice. Biotechnol. Lett. 25, 1869–1872. , , , and (
- 2003) LAX and SPA: major regulators of shoot branching in rice. Proc. Natl Acad. Sci. USA, 100, 11765–11770. , , , , , , and (
- 2009) Molecular analyses of the rice tubby-like protein gene family and their response to bacterial infection. Plant Cell Rep. 28, 113–121. , , , and (
- 2009) Mutant resources in rice for functional genomics of the grasses. Plant Physiol. 149, 165–170. , , et al. (
- 2007) Genome-wide gene expression profiling reveals conserved and novel molecular functions of the stigma in rice. Plant Physiol. 144, 1797–1812. , , , and (
- 2006) Expression profiles of 10,422 genes at early stage of low nitrogen stress in rice assayed using a cDNA microarray. Plant Mol. Biol. 60, 617–631. , , , , , , , , and (
- 2005) Optimising the tissue culture conditions for high efficiency transformation of indica rice. Plant Cell Rep. 23, 540–547. and (
- 2008) Characterization of paralogous protein families in rice. BMC Plant Biol. 8, 18. , , , , , , , , and (
- 2007) Oryza sativa dicer-like4 reveals a key role for small interfering RNA silencing in plant development. Plant Cell, 19, 2705–2718. , , et al. (
- 2008) RICD: a rice indica cDNA database resource for rice functional genomics. BMC Plant Biol. 8, 118. , , , , , , , and (
- 2005) A microarray analysis of the rice transcriptome and its comparison to Arabidopsis. Genome Res. 15, 1274–1283. , , et al. (
- 2009) Sequence and expression analysis of the C3HC4-type RING finger gene family in rice. Gene, 444, 33–45. , , , and (
- 2007) barren inflorescence2 Encodes a co-ortholog of the PINOID serine/threonine kinase and is required for organogenesis during inflorescence and vegetative development in maize. Plant Physiol. 144, 1000–1011. , , , , , and (
- 2008) Comprehensive sequence and expression profile analysis of PEX11 gene family in rice. Gene, 412, 59–70. , , , , , , and (
- 2007) An expression atlas of rice mRNAs and small RNAs. Nat. Biotechnol. 25, 473–477. , , et al. (
- 2008) Sequence and expression analysis of the thioredoxin protein gene family in rice. Mol. Genet. Genomics, 280, 139–151. , , , , , , and (
- 2007) The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883–D887. , , et al. (
- 2008) Rice Indeterminate 1 (OsId1) is necessary for the expression of Ehd1 (Early heading date 1) regardless of photoperiod. Plant J. 56, 1018–1029. , , et al. (
- 2008) Distinct regulatory role for RFL, the rice LFY homolog, in determining flowering time and plant architecture. Proc. Natl Acad. Sci. USA, 105, 3646–3651. , , and (
- 2004) High throughput T-DNA insertion mutagenesis in rice: a first step towards in silico reverse genetics. Plant J. 39, 450–464. , , et al. (
- 2007) Gene organization in rice revealed by full-length cDNA mapping and gene expression analysis through microarray. PLoS ONE, 2, e1235. , , et al. (
- 2005) A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506. , , , , , , , and (
- 2005) Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33. , , , , and (
- 1999) Regional expression of the rice KN1-type homeobox gene family during embryo, shoot and flower development. Plant Cell, 11, 1651–1664. , , , , and (
- 2007) Rice WRKY45 plays a crucial role in benzothiadiazole-inducible blast resistance. Plant Cell, 19, 2064–2076. , , , , , and (
- 2008) Genetic and physical interaction suggest that BARREN STALK 1 is a target of BARREN INFLORESCENCE2 in maize inflorescence development. Plant J. 55, 787–797. , and (
- 2005) Database of mRNA gene expression profiles of multiple human organs. Genome Res. 15, 443–450. , , , , , , , and (
- 2007) Transcriptional profiling of the Arabidopsis embryo. Plant Physiol. 143, 924–940. , and (
- 2008) Microarray analysis of vegetative phase change in maize. Plant J. 56, 1045–1057. , , , and (
- 2004) Xa26, a gene conferring resistance to Xanthomonas oryzae pv. oryzae in rice, encodes an LRR receptor kinase-like protein. Plant J. 37, 517–527. , , , , , and (
- The Rice Annotation Project (2007) Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res. 17, 175–183.
- 2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, RESEARCH0034. , , , , , and (
- 2004) Applications of a rat multiple tissue gene expression data set. Genome Res. 14, 742–749. , , , , , , and (
- 2009) The vacuolar processing enzyme OsVPE1 is required for efficient glutelin processing in rice. Plant J. 58, 606–617. , , et al. (
- 1995) UNUSUAL FLORAL ORGANS Controls Meristem Identity and Organ Primordia Fate in Arabidopsis. Plant Cell, 7, 1485–1499. and (
- 2003) Development of enhancer trap lines for functional analysis of the rice genome. Plant J. 35, 418–427. , , et al. (
- 2007) Microarray-based gene expression profiles in multiple tissues of the domesticated silkworm, Bombyx mori. Genome Biol. 8, R162. , , et al. (
- 2005) Isolation and annotation of 10828 putative full length cDNAs from indica rice. Sci. China, C, Life Sci. 48, 445–451. , , , , , , , and (
- 2006) Genomic organization, differential expression and interaction of SQUAMOSA promoter-binding-like transcription factors and microRNA156 in rice. Plant Physiol. 142, 280–293. , and (
- 2002) Characterization of the main effects, epistatic effects and their environmental interactions of QTLs on the genetic basis of yield traits in rice. Theor. Appl. Genet. 105, 248–257. , , , , and (
- 2005) How can we use genomics to improve cereals with rice as a reference genome? Plant Mol. Biol. 59, 7–26. , and (
- 2008) Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nat. Genet. 40, 761–767. , , et al. (
- 2009) Characterization and expression profiles of miRNAs in rice seeds. Nucleic Acids Res. 37, 916–930. , and (
- 2009) Expression profile of calcium-dependent protein kinase (CDPKs) genes during the whole lifespan and under phytohormone treatment conditions in rice (Oryza sativa L. ssp. indica). Plant Mol. Biol. 70, 311–325. , , , , and (
- 1997) Importance of epistasis as the genetic basis of heterosis in an elite rice hybrid. Proc. Natl Acad. Sci. USA, 94, 9226–9231. , , , , , , and (
- 2005) The genomes of Oryza sativa: a history of duplications. PLoS Biol. 3, e38. , , et al. (
- 2007) Strategies for developing Green Super Rice. Proc. Natl Acad. Sci. USA, 104, 16402–16409. (
- 2005a) Features of the expressed sequences revealed by a large-scale analysis of ESTs from a normalized cDNA library of the elite indica rice cultivar Minghui 63. Plant J. 42, 772–780. , , , , , , , , and (
- 2005b) Genome-wide expression profiling and identification of gene activities during early flower development in Arabidopsis. Plant Mol. Biol. 58, 401–419. , , , , and (
- 2008) Rice 2020: a call for an international coordinated effort in rice functional genomics. Mol. Plant, 1, 715–719. , , , and (
- Top of page
- Experimental procedures
- Supporting Information
Figure S1. Evaluation of data quality. (a, b) The distribution of correlation coefficients of biological replicates (a) and technical replicates (b). (c–f) Consistency of the tissue-specific genes identified in this study with previous reports. (c) OsCSLD1, a root-specific gene (Kim et al., 2007); (d) D54O, a green-tissue-specific gene (Cai et al, 2007); (e) GBSS1, an endosperm-specific gene (Kluth et al., 2002); and (f). RTS, an anther-specific gene (Luo et al., 2006). For (c–f), the vertical dimension shows the expression levels, while the number in horizontal dimension denotes the samples following the order provided in the Table S2. Only the expression patterns of genes detected in Zhenshan 97 are shown here and the same expression trends can also be obtained in Minghui 63.
Figure S2. Number of detected probe sets in various organs. The numbers were based on the combined results of Zhenshan 97 and Minghui 63 such that transcripts detected in at least one organ in the two varieties.
Figure S3. Distribution of relative expression levels (Z-score) of expressed genes in each organ cluster. The average transcript abundance was log2-transformed before calculating the Z-score value according to the 10 organ clusters presented in Figure 2.
Figure S4. The distribution of the highest signal of the expressed predicted (red) and known (gray) genes. The maximum expression value was log2-transformed. The green line denotes the expression value of 344.
Figure S5. Expression levels of genes belonging to certain Gene Ontology (GO) categories in all samples. The box plot denotes the distribution of geometric mean of expression values of genes involved in a given GO category across all samples. The red dashed line indicates the geometric mean expression levels of all genes.
Figure S6. GO enrichment analysis of the organs/tissues. The gene lists in each tissue/organ with absolute Z-score exceeding 2 and 3 were used to performed GO enrichment analysis and only the GO terms showing consistent results through two different Z-score were illustrated here. The color shows the value of–log10 (p).
Figure S7. Paralogous protein families enriched in the tissue-specifically expressed genes. A total of 65 paralogous protein families were enriched in the list of tissue-specific expressed genes. The color shows the average Z-score of the certain families in each tissue.
Figure S8. Comparison of the best 19 CEGs with the 25 genes reported by Jain (2009) using public datasets. The datasets that include 365 microarrays covering 25 series of experiments and involving both indica and japonica varieties were obtained from the database Plexdb (http://www.plexdb.org) and the coefficients of variation (CVs) of the genes in each experiment were also provided in the database. The box-plot shows the distribution of 25 CV values of each gene. Block I indicates the five reference genes, actin-1 (LOC_Os03g50890), ubiquitin fusion protein (LOC_Os03g13170), elongation factor1-alpha (LOC_Os03g08020), tubulin beta-6 chain (LOC_Os01g59150). Blocks II and IV show the genes identified by this study and Jain (2009), respectively. Block III shows the three genes identified in both this study and that of Jain (2009). However, two different probe sets were identified for the same locus LOC_Os07g34589 (labeled by asterisk) in the two studies.
Table S1. Summary of the current rice whole-genome oligo microarrays in the public databases (as of Sept 30, 2009).
Table S2. The samples used in this study.
Table S3. Variety-specifically expressed genes.
Table S4. The enriched GO terms in each organs revealed by weight method and Fisher’s exact test (P < 0.05). The gene lists in each tissue/organ with absolute Z-score value exceeding 2 or 3 were used to perform GO enrichment analysis, and only the consistent results through the two cut-off Z-scores were illustrated here.
Table S5.The expression levels of 215 transcripts decreased gradually from early to late stages of panicle development.
Table S6.The expression levels of the 794 transcripts showing gradually increased expression from early to late stages of panicle development.
Table S7. Enriched GO terms of the group of genes with decreased expression levels from early to late stages of developing panicle (p < 0.05).
Table S8. Enriched GO terms of the second group of genes with increased levels from early to late stages of developing panicle (p < 0.05).
Table S9. A list of tissue-specifically expressed genes. “Z”,”M” and “ZM” denote the corresponding organs of tissue-specific genes in Zhenshan 97, Minghui 63 and both varieties respectively. “0” indicates that the genes are not tissue-specifically expressed in given organs of either variety. Max gap means the biggest difference of expression values between all the adjacent organs which were ranked according to the log2-transformed values of the tissue-specifically expressed genes.
Table S10. The gene families enriched in the tissue-specifically expressed genes identified by Fisher’s exact test (p < 0.01). The gene family classification is based on TIGR5 annotation.
Table S11. A list of the 100 most stably expressed genes.
Table S12. The gene ontology (GO) categories significantly enriched in the 100 most stably expressed genes revealed by weight method and Fisher’s exact test (p < 0.05).
Table S13. Evaluation of the 19 candidates of novel reference genes using other datasets. The coefficients of variation (CVs) (%) of the candidates in the 25 datasets consisting of 365 arrays were provided in the database Plexdb itself (http://www.plexdb.org), which were based on log2-transformed expression values.
Data S1. References.
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.