• Sequencing of the Populus trichocarpa genome creates an opportunity to describe the transcriptome of a woody perennial species and establish an atlas of gene expression. A comparison with the transcriptomes of other species can also define genes that are conserved or diverging in plant species.
• Here, the transcriptome in vegetative organs of the P. trichocarpa reference genotype Nisqually-1 was characterized. A comparison with Arabidopsis thaliana orthologs was used to distinguish gene functional categories that may be evolving differently in a woody perennial and an annual herbaceous species.
• A core set of genes expressed in common among vegetative organs was detected, as well as organ-specific genes. Statistical tests identified chromatin domains, where adjacent genes were expressed more frequently than expected by chance. Extensive divergence was detected in the expression patterns of A. thaliana and P. trichocarpa orthologs, but transcription of a small number of genes appeared to have remained conserved in the two species.
• Despite separation of lineages for over 100 million yr, these results suggest that selection has limited transcriptional divergence of genes associated with some essential functions in A. thaliana and P. trichocarpa. However, extensive remodeling of transcriptional networks indicates that expression regulation may be a key determinant of plant diversity.
The sequencing of the first woody perennial plant species, Populus trichocarpa (Torr. & Gray ex Brayshaw) (Tuskan et al., 2006), creates opportunities for novel comparative genomic studies in plants. Populus trichocarpa is a model species for tree genetic and genomic research (Bradshaw et al., 2000; Taylor, 2002; Brunner et al., 2004a) because of its relatively small genome size (480 Mbp), ease of genetic transformation and vegetative propagation, and abundant natural genetic variation. With the release of the first draft of the genome sequence, novel genomic tools such as whole-genome microarrays have become available, providing the first opportunity for analysis of the transcriptome of a woody species and comparison with the well-studied plant model Arabidopsis thaliana.
The relative phylogenetic proximity to A. thaliana makes P. trichocarpa valuable for comparison of the plant architecture, development and life history of a woody perennial relative to an annual herbaceous species. Populus trichocarpa and A. thaliana share a large common set of genes (c. 90%; Tuskan et al., 2006), suggesting that transcriptional regulation plays a significant role in the morphological and developmental differences that distinguish the two species. Differential regulation of gene expression, rather than the creation of novel transcriptional units, has been implicated in the wide diversity observed in animals (King & Wilson, 1975; Baltimore, 2001; Levine & Tjian, 2003). The same mechanism may be important in plants, judging from the high sequence similarity and contrasting developmental and morphological traits between distantly related plant species such as A. thaliana and the conifer Pinus taeda (Kirst et al., 2003). Gene duplication, cis-regulatory elements and protein interaction complexes that modulate gene expression may create the opportunity for the differential regulation of a common set of genes and, as a consequence, the evolution of the woody, perennial habit in P. trichocarpa.
Populus trichocarpa and other perennial woody plants have improved supportive and solute-conductive vegetative structures that arise through secondary growth of the cambial meristem. The aerial support of leaves by the woody stem creates a competitive advantage for light interception, while the perennial growth of the root system may condition a greater potential to explore soil for water and nutrients. Perennial species also undergo intermittent periods of cambial and shoot apical dormancy associated with seasonal and other environmental condition changes, presumably as a stress avoidance adaptation. Shifts in programs of gene transcription have been implicated in the development of the woody stem, including cell differentiation and lignin biosynthesis (Schrader et al., 2004b; Paux et al., 2005), auxin-stimulated cell signaling (Moyle et al., 2002); synthesis, transport and remodeling of structural carbohydrates (Mellerowicz et al., 2001; Samuels et al., 2002; Aspeborg et al., 2005), and programmed cell death (Neill, 2005). However, the transcriptional regulation of genes implicated in essential physiological processes would be expected to remain conserved.
Genome-wide assessment of the transcriptome could aid in explaining the molecular basis of the variation in plant growth, development, environmental response and, ultimately, adaptation and evolution (Doebley & Lukens, 1998; Purugganan, 2000; Tautz, 2000; Wray et al., 2003). Gene expression analyses using whole-genome microarrays provide a time-defined snapshot of genes that are expressed in specific plant organs and growth stages. Recently, microarray-based whole-genome surveys of genes expressed in A. thaliana and rice (Oryza sativa) have become available (L. Ma et al., 2005; L. G. Ma et al., 2005; Schmid et al., 2005), permitting a comparison of the expression of orthologs on a whole-genome scale. Thus, the analysis of the P. trichocarpa vegetative transcriptome could provide evidence of which genes are expressed throughout the genome, and define those with significant roles in processes unique to trees, such as the development of woodiness and the perennial habit. A comparative analysis could also define the orthologous genes for which transcript abundance has diverged dramatically among species (implying functionalization) as well as those genes on which selection has acted to maintain transcript abundance (implying physiological relevance) in angiosperms.
Here we report a detailed whole-genome survey of the genes transcribed in the vegetative organs of the woody plant P. trichocarpa. A compendium of genes expressed in five vegetative organs of P. trichocarpa was created by analyzing their expression in whole-genome microarrays representing the majority of the 45 555 predicted transcriptional units. The analysis identified the woody stem as the vegetative organ with the greatest variety of expressed genes, but also the one with the highest proportion of uncharacterized transcripts. New statistical approaches were developed and implemented to determine whether adjacently expressed genes exhibit significant deviations from random chance. Comparisons of expression between P. trichocarpa and A. thaliana orthologs showed very little conservation in expression based on rank correlations. However, exceptions may identify conserved physiological mechanisms in plant species.
Materials and Methods
Greenwood cuttings of P. trichocarpa (Torr. & Gray ex Brayshaw) reference genotype Nisqually-1 were rooted in a misthouse for 2 wk. Four rooted cuttings (cloned biological replicates) were planted in separate pots in a glasshouse equipped with an ebb-and-flow flood bench system with daily supply of Peters Professional® 20-10-20 water-soluble fertilizer (Scotts, Marysville, OH, USA) diluted to a final concentration of 4 mM nitrogen. After 45 d, whole roots (R), young leaves (YL; leaf plastochrone index (LPI) 0–5), mature leaves (ML; LPI 6–9), nodes (N) and internodes (IN) were collected from each of the four biological replicates, and immediately frozen in liquid nitrogen. RNA was extracted using standard methods (Chang et al., 1993), DNAse-treated and purified in RNAeasy Qiagen columns (Valencia, CA, USA). The P. trichocarpa ×P. deltoides hybrid genotype H11-11, used for a comparison of transcript abundance between poplar species, was grown under the same conditions as those described above. Mature leaves, stems (nodes and internodes) and whole roots and were collected 45 d after rooting.
Total RNA was treated with RNAse-free DNAse I (1 units/1 µg RNA) and cleaned using the RNEasy Mini Kit (Qiagen). Total RNA (c. 5 µg) was used to synthesize cDNA using a mixture of 500 ng oligo dT, 100 ng random primers, and M-MLV RT (Invitrogen, Carlsbad, CA, USA). Gene expression was analyzed using the SYBR Green kit (Stratagene, Cedar Creek, TX, USA), in a Mx3000P thermo-cycler (Stratagene). A total of 0.5 µl of the synthesized cDNA and 0.075 µl of a 0.25 µM solution of each primer were used for each 25-µl real-time PCR reaction. Primers were designed using NetPrimer (Premier Biosoft International, Palo Alto, CA, USA) and synthesized (Invitrogen). Reactions were carried out with annealing, extension, and melting temperatures of 55, 72 and 95°C, respectively. A melting curve was generated to check the specificity of the amplified fragments. Changes in gene expression relative to the geometric means (Vandesompele et al., 2002) of three control genes (actin, ubiquitin, ubiquitin-ligase) (Brunner et al., 2004b) were determined using the program dart-pcr version 1.0 (Peirson et al., 2003).
Poplar whole-genome oligonucleotide microarrays
This study was based on hybridizations to whole-genome microarrays containing features representing 42 364 predicted transcriptional units from the P. trichocarpa nuclear genome. All transcriptional units were represented by three 60-mer probes, designed by NimbleGen (Madison, WI, USA) in collaboration with Oak Ridge National Laboratory and were synthesized using maskless lithography. cRNA was synthesized from total RNA extracted from individual plants. Labeling, hybridization and scanning were carried out by NimbleGen using standard procedures.
Microarray data analysis
The data were analyzed using a two-step strategy previously outlined by Chu et al. (2002). Data from the hybridizations of four biological replicates were used for identification of genes expressed in each organ. Each vegetative organ was analyzed separately. Initially, the signal intensity detected in each probe, in each microarray, was log2 transformed. The data were not background corrected. After inspection of the signal distribution (box plots) across all microarrays, log2 signal intensities were microarray-centered to zero. A mixed-model analysis of variance (ANOVA) was applied to estimate relative transcript levels for each gene with PROC MIXED in sas (SAS Institute, Cary, NC, USA) using the model yij = µ + Gi + P(G)j(i) + ɛij which included the sample mean µ, gene (Gi) as a fixed effect and probe nested within gene (P(G)j(i)) as a random effect. Probe was included in the model to account for general effects (for instance, melting temperature and potential for forming secondary structures) that may contribute to differences in signal detected among the three probes representing a gene, and are associated with specific probe properties. Residual plots indicated that residual assumption criteria (µ = 0, constant σ2) were met across all microarrays. Least-square means were calculated for each gene and for the negative control probe set (20 probes). Pairwise comparisons (one-sided t-tests) were carried out to evaluate whether the least-square mean (estimated transcript level) of each gene was significantly higher than that of the negative control. P-values were adjusted for false discovery rate (Benjamini & Hochberg, 1995), with modifications (Storey & Tibshirani, 2003). Genes were considered expressed above background and were placed in a binary scale – a value of ‘1’ was given if they had a Q-value below 0.01 (false-discovery rate (FDR) < 1%). Genes with a Q-value above or equal to 0.01 (FDR ≥ 1%) were assigned a value of ‘0’. All the analyses described above were carried out using the sas and jmp software (SAS Institute). Analysis carried with other poplar genotypes followed the same procedures.
For contrasting the transcript abundance between vegetative organs, each gene was analyzed individually using sas (SAS Institute) in a mixed ANOVA model yij = µ + Ti + Pj + ɛij which included the plant organs node, internode, young leaf, mature leaf and root (Ti) as fixed effects and probe (Pj) as a random effect, and the sample mean µ. We further filtered this list to identify genes up- or down-regulated among organs (FDR < 1%), and differentially regulated by at least twofold.
Gene expression data are deposited in the Gene Expression Omnibus Database under the accession numbers GSM146141–GSM146299; series GSE6422, and platform GPL2618.
Test for random distribution of expressed genes
A new statistical test was developed to determine whether expressed genes were randomly distributed across each of the 19 poplar linkage groups. Expressed and assembled genes were assigned a value of 1 and nonexpressed genes were given a value of 0 (unassembled genes were not considered for these analyses). A search was then carried out through the sequence of 0 and 1 values to identify patterns, or runs of expressed genes, using a ‘runs test’ as first proposed by O’Brien & Dyck (1985). A run was defined as a succession of the same digit, bordered by different digits. The length of a run was defined by the number of digits in that run. The null distribution was obtained using a bootstrap approach. A set of 25 000 independent data sets were generated under the null hypothesis, where each data set is a sequence with a random pattern of 0 and 1 values with length of sequence and proportion P of 1 values identical to those of the observed data. The χ2 statistic (O’Brien & Dyck, 1985) was calculated for each random data set and compared with the χ2 value of the observed data, generating a P-value for the test of no co-regulation.
To define the extent of clusters of co-expressed genes we also evaluated the run lengths. For each data set generated under the null hypothesis we calculate the number of runs of length 1 (i.e. one gene expressed flanked by nonexpressed genes), the number of runs of length 2 (i.e. two consecutive genes flanked by nonexpressed genes), up to the longest run in the data set. This provided a null distribution for each run length. From the observed data, the number of runs of each length was calculated to assess whether the observed value of the run length deviated from what was to be expected under the null hypothesis. Finally, for each run length, if the null hypothesis was rejected, the position of the runs of that length was defined using a sliding window to detect the exact position of the nonrandomly expressed genes in the sequence.
To what extent are genes dedicated to specific organ types during development of a woody plant? To evaluate this, we initially defined the set of genes expressed in five major organs of the tree species P. trichocarpa– stem nodes and internodes, young and mature leaves, and whole roots (Fig. 1) – by contrasting the signal intensity detected across each of 42 364 predicted transcriptional units to a set of seven negative-control genes (20 negative-control probes). Each plant organ was represented by four biological replicates collected from the P. trichocarpa reference genotype Nisqually-1. All analyses were carried out on whole-genome microarrays representing 93% of predicted transcriptional units from the P. trichocarpa genome, each represented by three independent 60-mer probes based on the sequence of the reference genotype Nisqually-1. Expression was detected for 22 616 transcription units (1% FDR), 53% of the genes represented in the microarray (Supporting Information Table S1). The specificity of the microarray in detecting organ-specific expression and discriminating individual members of genes families was validated for a set of transcriptional units by real-time quantitative PCR (Fig. S1).
The highest diversity in expressed genes was detected in stem nodes, where expression evidence was detected for 21 081 transcriptional units. Stems (nodes and/or internodes) contained the largest proportion of organ-specific genes – that is, genes expressed exclusively in a particular organ. Transcripts for 3468 genes were detected exclusively in stems, 811 in leaves (young and/or mature) and 332 in roots (Fig. 2). Organ-specific genes were classified into GO functional categories (http://www.geneontology.org) and the frequency of genes in each category, and each major plant organ (stem, leaves and roots), was calculated (for example, 175 of 3468 genes specifically expressed in stems were categorized as ‘cell organization and biogenesis’, a frequency of 0.05 (175/3468)). Next, the frequency of organ-specific genes in each GO class was compared among the different organs to identify categories over- or underrepresented (Fig. 3). As expected, leaves revealed a higher proportion of tissue-specific genes assigned to the chloroplast and plastid cellular component, largely involved in carbon fixation. However, if the number (rather than the proportion) of organ-specific genes in each GO category is considered, the stems actually displayed a larger set of genes expressed exclusively in the chloroplast. Because the stem of juvenile poplar trees is photosynthetically active this may suggest that different chloroplast genes are transcribed in stems and leaves. Roots had a much higher proportion of organ-specific genes dedicated to the biological process of responding to external biotic and abiotic stimuli – 27% of root-specific genes versus ≈ 10% in stems and leaves – and stress. Therefore, the relatively limited number of root-specific genes (332) appears to be largely dedicated to condition root system-specific responses to belowground external stimuli encountered by perennial plants during their extended life cycle. Nodes and internodes significantly exceeded leaves and roots in the number of genes classified as both ‘biological process unknown’ and ‘cellular component unknown’, suggesting that there is a comparatively poor understanding of the genes that govern basic physiological and molecular mechanisms of wood development and vegetative bud dormancy relative to other vegetative organs. The same trend was observed when organ-specific genes were classified according to annotation. Nodes had the lowest proportion of annotated genes (31.9%) and the highest proportion of unknown genes (37.6%), when compared with mature leaves (41.4 and 28.3%) and roots (49.1 and 17.9%) (Fig. S2).
A large fraction of the transcriptome – 14 555 transcriptional units – was detected in all three main vegetative organs (stems, roots and leaves). Because the role of gene expression in organ identity may not only be evident from the presence or absence of transcripts, but also through differential quantitative regulation of the expressed gene (i.e. quantitative vs qualitative measure), we evaluated whether genes were also equally expressed across vegetative organs. A series of F-tests were carried out to contrast the transcript levels of each constitutively expressed gene among nodes, internodes, roots and young and mature leaves. Only one-third (4954 out of 14 555) of expressed genes were differentially regulated among the five organs (FDR 1%). Therefore, there appears to be a core set of ≈ 10 000 genes that are constitutively expressed at similar levels across the various vegetative organs.
Pairwise quantitative differences in expression levels among vegetative organs
Which genes are differentially regulated during development of the vegetative plant body of P. trichocarpa? Here we compared organ-preferred expression of genes among vegetative organs (Table 1). Below we describe the main features and unique characteristics of each vegetative organ.
Table 1. Summary of pairwise comparisons among five poplar (Populus trichocarpa reference genotype Nisqually-1) vegetative organs showing preferentially expressed genes according to main categories
Node- and internode-preferred genes A relatively small set of genes (83) was detected as node-preferred in the contrast with internodes, potentially reflecting the presence of unique meristematic structures (i.e. axillary buds) in this organ. In general, node-preferred expression was detected for several genes involved in the phenylpropanoid biosynthesis pathway, such as ferulate-5-hydrolase, pinoresinol-lariciresinol reductase, cinnamyl alcohol dehydrogenase, caffeoyl-CoA 3-O-methyltransferase, and 4-coumarate: CoA ligase. Several genes encoding xyloglucan endotransglycosylase and the fasciclin-like arabinogalactan proteins FLA11 and FLA12 were detected at higher levels in internodes, compared with the other nonwoody organs.
Young leaf-preferred genes Young leaves were enriched for transcripts related to pathogen defense, such as germin-like proteins, pathogenesis-related proteins and glycosyl hydrolases such as chitinase, when compared with mature leaves. Overall, young leaves also had higher expression of genes involved in lipid metabolism (e.g. lipases, lipid hydrolases and lipid transfer proteins). Leaves also showed a higher abundance of RNAs for photosynthesis-related genes, particularly when compared with roots. The same was not observed in the comparison of young leaves to poplar stem expression (internode plus node), probably because poplar stem tissues external to the cambium (‘bark’) contain photosynthetically active cells.
Mature leaf-preferred genes Mature leaves differed significantly from young leaves particularly for genes encoding metalloproteinases, and enzymes involved in cell wall biosynthesis and isoprenoid metabolism. Several highly expressed mature leaf-preferred genes did not have putative homologs in plants. Genes related to carbon metabolism (e.g. starch biosynthesis) and sugar transporters were also preferentially expressed in mature leaves compared with young leaves. As expected, mature leaves presented preferential expression of genes involved in the photosynthesis machinery and the citric acid and carbon fixation cycles, when compared with roots. When compared with nodes and internodes, mature leaves presented a higher mRNA abundance for genes involved in defense response, oligosaccharide biosynthesis (galactinol synthase and fructose bisphosphate aldolase), photoassimilate response, the electron transport chain, photosynthesis-related genes, and citric tricarboxylic acid and carbon fixation cycles.
Root-preferred genes mRNA abundance profiles of root-preferred genes were consistent with the expected physiological role of roots and contrasts were similar regardless of the organ to which the comparison was made. Metal-binding proteins, such as metallothioneins, iron transport proteins, metal ion transporters and copper-binding proteins were consistently preferentially expressed in roots, as well as specific disease resistance proteins, water channel proteins and dehydration-induced proteins. Nitrate transporters were also preferentially expressed in roots in comparison with the other organs that were analyzed.
Nonrandom distribution of expressed genes in the genome
Are expressed genes distributed nonrandomly in the genome, suggesting epigenetic mechanisms of transcription regulation? Co-expression of adjacent genes could be influenced at the level of chromatin architecture; however, objective criteria for declaring the significance of adjacently expressed genes are needed. The phenomenon has been reported previously, but studies have typically focused on identifying correlations among the expression levels of contiguous genes in order to identify patterns of co-regulation (Cohen et al., 2000; Spellman & Rubin, 2002; Zhan et al., 2006). We approached the problem by testing whether expressed genes were observed in clusters, considering the presence or absence of transcripts (qualitative measure) rather than their level (quantitative measure) and that of its neighbor(s). This statistical approach uses a runs test based on run lengths (O’Brien & Dyck, 1985), adapted for plant genome analysis. Initially, a binomial system was established where genes expressed in any given vegetative organ were assigned a value of 1 whereas nonexpressed genes were assigned a value of 0. We detected several genomic regions with large numbers of expressed genes adjacent to each other, including up to 15 consecutively expressed genes in young leaves. To evaluate the statistical significance of these runs, null distributions were generated for runs of one to 13 adjacently expressed genes, for every linkage group and organ. The statistical significance of the observed data was assessed by comparing the data to the null distributions generated for each organ and linkage group. Most of the runs of short length, such as a single gene flanked by nonexpressed genes, showed significant departure from the null distribution (P < 0.01), occurring less frequently than would be expected by chance alone (Fig. 4, green). By contrast, larger run lengths occurred more frequently than expected by chance (Fig. 4, red). Our results reveal islands of genes for which there is a statistically significant tendency for co-expression.
We carried out a similar analysis to identify potential chromatin domains that would be specific to a given vegetative organ. The binomial system was defined so that a gene expressed in one organ but not another was recorded as 1, while if expressed or not expressed in both organs it was given a value of 0. Although runs of up to six organ-specific genes were detected in some instances, none departed significantly from what would have been expected by chance. Therefore, although chromatin domains can be identified in the poplar genome, they do not appear to be associated with the specific plant organs we analyzed.
Is the origin of the woody habit in the Salicaceae attributable to novel, unique genes?
Among the 42 364 genes evaluated in the microarray, 5674 (13%) had no similarity to A. thaliana genes (E-value ≥ 1 × 10−3) and 3636 (9%) had no identifiable homolog, for any species. We evaluated the pattern of expression of these predicted genes that appear to be unique to poplar and assessed whether there was evidence of expression and/or bias towards woody organs. Evidence of transcription could be identified for more than a third (1321) of the genes, with the majority (945) being detected in all five organs. Within the genes that appear to be unique to poplar, the small fraction that were organ-specific (147) were highly enriched for genes expressed in stems – 98 genes were detected only in nodes or internodes, compared with 32 in leaves and 17 in roots. For the majority (72/98) of those poplar unique genes expressed exclusively in stems, the difference in signal intensity relative to the controls was relatively small (< 2-fold), although differences of up to 3.3-fold could be detected.
Extensive diversification of transcription regulation in P. trichocarpa and A. thaliana orthologs
The availability of whole-transcriptome microarrays from P. trichocarpa and A. thaliana offers one of the first opportunities to examine how genome-wide regulation of gene expression has evolved in angiosperms. A conserved expression pattern derived from a common ancestor (i.e. ortholog) could suggest that the transcriptional units are under balancing selection, because of conserved mechanisms of regulation. Gene expression regulation of paralogs – that is, genes derived from duplication events after lineage separation – diverged to a great extent in P. trichocarpa and may have played a significant role in the establishment of the woody habit in the Salicaceae (Tuskan et al., 2006). Here we compared expression patterns between A. thaliana and P. trichocarpa orthologs in corresponding organs, to evaluate the extent of transcript abundance conservation in these plant transcriptomes. Arabidopsis thaliana–Populus trichocarpa orthologs were identified using an Inparanoid analysis (Remm et al., 2001; O’Brien et al., 2005) carried out as part of the initial analysis of the poplar genome sequence (Tuskan et al., 2006). The analysis focused on 4188 Inparanoid clusters of orthologous genes with a single member from each of A. thaliana and P. trichocarpa, and gene expression information available for both species (Table S1). The most comprehensive atlas of the A. thaliana transcriptome, AtGenExpress (Schmid et al., 2005), was utilized as a reference for organ and developmental gene expression. AtGenExpress provides transcriptional information for a broad range of plant organs, including stem nodes and internodes, mature and young leaves, and roots, measured under several development and growth conditions. We focused on gene expression measured in A. thaliana plants harvested at 9 and 17 d, grown in soil. In cases where multiple sample types were collected for an organ (for instance, fully expanded mature rosette leaves numbers 2, 4 and 6 were analyzed separately), they were compared individually with the P. trichocarpa transcriptome.
We initially compared the qualitative patterns of expression (i.e. presence or absence of transcripts) of P. trichocarpa genes relative to the A. thaliana orthologs in internodes, nodes, leaves (young and mature) and root organs. Where both genes (i.e. the A. thaliana and P. trichocarpa orthologs) were identified as either expressed or nonexpressed in the two species, in each organ, the pattern was considered to be in agreement. Out of the 4188 pairs of orthologs, the lowest similarity in the expression pattern was detected in roots and young leaves (60% and 58% of genes expressed similarly, respectively). Better agreement was detected between the orthologs expressed in mature leaves and internodes (69%), and stem nodes (76%). In all cases, this proportion was higher (6–13%) than expected by chance alone (Fig. S3A). The proportion of orthologs expected to have the same expression pattern by chance was estimated by summing the product of the frequency of P. trichocarpa and A. thaliana expressed genes to the product of the frequency of nonexpressed genes in the two species. The fraction of orthologous genes with the same pattern of expression in each organ was significantly higher (P-value < 0.0001) relative to what would be expected by chance for all organs evaluated, based on a χ2 test. Nonetheless, extensive diversification of gene expression regulation appears to have occurred between A. thaliana and P. trichocarpa orthologs.
Expression of genes implicated in essential plant processes
We used rank correlation of orthologs expressed in the two species to evaluate the extent of conservation in relative transcript abundance. This analysis no longer assigns a presence (1) or absence (0) value to each gene, but assesses whether the relative transcript abundance of a given expressed gene is conserved or not between A. thaliana and P. trichocarpa. This analysis therefore focuses exclusively on genes expressed in both species in each organ. Gene transcript abundance was ranked in the two species, in each organ, and a Spearman correlation was estimated. For all plant organs, the estimated rank correlation indicated positive but limited conservation in the quantitative expression patterns (r2 < 18%), suggesting that there is significant remodeling of the transcription networks in the two species (Fig. S4). The highest degree of conservation in transcriptional pattern was observed in young and mature leaves, where the expression of a larger number of genes appears to remain highly consistent between the two species. The gene expression data distribution in the two species was similar in each of the five plant organs (Fig. S5).
To further contrast the A. thaliana and P. trichocarpa transcriptomes, we narrowed the comparative analysis to genes that are expressed in multiple plant organs in both species. Genes broadly expressed may be under stronger selection pressure because mutations that lead to differential transcription regulation only need to be negative in one organ to be removed by selection (Khaitovich et al., 2006). Therefore, genes expressed in a large number of organs might be less divergent between species. We compared rank correlations in a subset of the orthologs expressed in all five organs in both A. thaliana and P. trichocarpa, and contrasted them to the correlations detected previously, including all orthologs. As predicted, the quantitative pattern of expression between the two species is more conserved for this subset of genes (Fig. 5) suggesting that, indeed, transcript abundance may be more conserved in genes that are expressed in a broad spectrum of organs in plants, compared with those that are organ-specific.
Finally, we identified the functional categories of the orthologs expressed at similar levels in the two species. For each organ we selected orthologs with the smallest transcript abundance rank differences (upper 10%) between the two species. Next we compared the frequency at which they were observed in each GO category, relative to the entire set of orthologs expressed in both species (Fig. 6). Several GO cellular component categories are enriched for these conserved genes, particularly in young and mature leaves. In these organs, genes associated with the chloroplast and plastids, as well as other intracellular and cytoplasm components, are overrepresented. These genes include those encoding subunits of the photosynthesis I and II complex and carbohydrate metabolism. By contrast, genes implicated in categories related to gene expression regulation appear to show little conservation between A. thaliana and P. trichocarpa. The eight genes more highly conserved in the two species, across all five vegetative organs, are described in the Supporting Information Table S2. Most of these genes are implicated in the endomembrane system and stress response, but encompass a relatively broad spectrum of functions.
The genus Populus encompasses the most important short-rotation woody species in North America for biomass production and plant-based carbon sequestration strategies. Populus trichocarpa is the first woody perennial plant species to have a sequenced genome (Tuskan et al., 2006), providing the foundation for understanding developmental properties of evolutionary interest in trees such as secondary growth, dormancy and plant architecture. Here we generated the first ‘compendium’ of expressed genes in a woody perennial, utilizing a microarray platform that includes the majority of predicted genes. We assessed the genes expressed in five vegetative organs of the reference genotype Nisqually-1 and detected evidence of transcription for approximately half of the 45 555 predicted transcriptional units. Previously less than one-third of the gene models showed evidence of expression based on expressed sequence tag (EST) data (Tuskan et al., 2006). Our study does not address whether these genes are being actively translated into proteins. A subset of the genes for which we inferred transcription activity may be regulated post-transcriptionally, or may not be translated. Nonetheless, studies in other eukaryotes show that most of the mRNA captured in microarray analysis is associated with polyribosomes, suggesting translation (Arava et al., 2003). This study also identified segments of the poplar genome where there is a significantly larger than expected number of expressed genes, suggesting evidence of domains or regions that are transcriptionally active in certain plant organs. Factors such as histone distribution and modification could contribute to transcriptional activation in these regions. While there is currently no further experimental evidence to support the theory that the genome structure is dissimilar in these segments from other parts of the genome, experiments such as chromatin immunoprecipitation coupled to tiling microarrays (ChIP-chip), supported by the availability of the poplar genome sequence, could be used to test this hypothesis.
The comparison among P. trichocarpa organs indicates that stem nodes contain transcripts from the highest number of genes in the genome. Nodes are presumed to have elaborated during the evolution of perenniality in woody plants such as P. trichocarpa (Groover, 2005). Nodes comprise most tissues represented in the stem internodes, but differ in that a dormant vegetative meristem represented by the lateral axillary bud is also present. Despite the similarity between the two organs, nodes had a strikingly higher diversity of genes being actively transcribed; by contrast, all internode transcribed genes were also detected in nodes at a twofold change threshold. Part of the diversity of transcripts may also be attributable to suppression of mRNA degradation in dormant vegetative meristems. Transcriptional richness of the node may be necessary for the dormant shoot meristem (bud) to acquire novel vegetative or reproductive functions in subsequent growth periods. For example, the poplar dormant cambium has a surprisingly large number of genes (c. 1600) that are up-regulated when compared with the active cambium (Schrader et al., 2004a), suggesting that these genes are transcriptionally active in the dormant state. Similarly, nitrogen-starved plants entering eco-dormancy yielded c. 4× more up-regulated transcripts compared with rapidly growing plants grown in adequate nitrogen concentrations (Cooke et al., 2005). Epigenetic mechanisms may play a role in maintaining this transcriptionally active state in tissues of perennial species and have been proposed to play a role in controlling the timing of vegetative-to-floral transition in buds (Bohlenius et al., 2006). Genomic DNA in undifferentiated or juvenile tissues of perennial woody plants is generally undermethylated when compared with older, differentiated tissues (Bitonti et al., 2002; Fraga et al., 2002a,b). If maintenance of a dormant state in perennial plants requires broad transcriptional activation, then, as growth commences after the dormant period, selective repression of transcriptionally active regulons may be a mechanism by which growth and development are initiated and maintained. The statistical tools applied in this manuscript should be useful for identifying such domains to test these hypotheses.
We also describe the first comparative genomic analysis comparing the transcriptome of the model tree species P. trichocarpa with that of the model plant A. thaliana. The two species belong to distinct clades within the Eudicotyledenous angiosperms – P. trichocarpa is part of the Eurosid I clade, while A. thaliana occurs in the Eurosid II clade. Despite the differences, sequencing of the P. trichocarpa genome showed that the two species share a substantial number of genes – almost 90% of P. trichocarpa predicted genes are homologs to A. thaliana genes (Tuskan et al., 2006). To what extent is the P. trichocarpa transcriptome comparable to that of A. thaliana? In animal systems, it has been argued that, because of the high levels of gene sequence similarity among closely related species, most of the developmental and morphological diversity must have been created by evolution at the level of transcription regulation. Similarly, our data suggest that gene expression regulation has evolved to a large extent between A. thaliana and P. trichocarpa, as we detect very weak similarity between quantitative patterns of expression. Although they are both angiosperms, P. trichocarpa shares limited morphological similarity with A. thaliana. Poplars are woody perennials with an indeterminate growth habit, while A. thaliana is a herbaceous annual with a basal rosette of vegetative tissues. The distinct patterns of gene expression between orthologs – despite a large set of shared genes – support the hypothesis that transcriptional regulation directs expression of novel traits that are important for adaptation and evolution of plants. These results have implications for gene functional annotation of genes in woody species, as comparative sequence analysis to model organisms has been largely used to make indirect inferences. Extensive divergence in gene function, inferred from low conservation in quantitative gene expression patterns, suggests that the use of A. thaliana as a model for functional genomics of woody species such as P. trichocarpa may be limited. Still, we have identified a small set of genes that appear to maintain highly consistent expression levels in the two lineages, despite 100 million yr of separation. These genes may be under some form of selection to maintain transcript abundance at physiologically relevant levels in diverse plant lineages, based perhaps on the levels of transcripts required for production of enzymes in core plant processes.
A compendium establishing the genes expressed throughout development, in distinct organs and tissues, is the first step for a comprehensive functional characterization of the P. trichocarpa genes. This study presents a first comprehensive contribution towards this goal and will assist in efforts to define target genes for genetic modification and candidates for genetic control of complex traits. The challenge ahead is highlighted by our observation that the most elaborated organs of trees – the stem nodes – contain the highest proportion of unknown predicted genes, and that function may not be immediately inferred from A. thaliana or other models. The path forward should take advantage of the tremendous nucleotide diversity of poplars and attempt to link genotype with phenotype, providing the information needed to assign function to genes whose role is still largely undefined.
This work was supported by a grant from the Department of Energy, Office of Science, Office of Biological and Environmental Research, Grant Award No. DE-AC05-00OR22725 (to JD) and Grant Award No. DE-FG02-05ER64114 (to MK). We thank Ron Sederoff for useful discussions and for reviewing the manuscript.