Global transcript profiling of primary stems from Arabidopsis thaliana identifies candidate genes for missing links in lignin biosynthesis and transcriptional regulators of fiber differentiation


(fax 1 604 822 6089; e-mail


Different stages of vascular and interfascicular fiber differentiation can be identified along the axis of bolting stems in Arabidopsis. To gain insights into the metabolic, developmental, and regulatory events that control this pattern, we applied global transcript profiling employing an Arabidopsis full-genome longmer microarray. More than 5000 genes were differentially expressed, among which more than 3000 changed more than twofold, and were placed into eight expression clusters based on polynomial regression models. Within these, 182 upregulated transcription factors represent candidate regulators of fiber development. A subset of these candidates has been associated with fiber development and/or secondary wall formation and lignification in the literature, making them targets for functional studies and comparative genomic analyses with woody plants. Analysis of differentially expressed phenylpropanoid genes identified a set known to be involved in lignin biosynthesis. These were used to anchor co-expression analyses that allowed us to identify candidate genes encoding proteins involved in monolignol transport and monolignol dehydrogenation and polymerization. Similar analyses revealed candidate genes encoding enzymes that catalyze missing links in the shikimate pathway, namely arogenate dehydrogenase and prephenate aminotransferase.


Differentiation of lignified secondary cell walls is an important plant-specific process that is critical to the life histories of both woody perennial plants such as trees, and short-lived herbaceous plants. Several biological systems have been exploited in recent years to probe the genetic, cellular, and biochemical events that underlie various aspects of xylem and fiber differentiation and maturation including Arabidopsis thaliana, Zinnia elegans, and Populus spp. (poplar). Although Arabidopsis is a short-lived herbaceous plant, it produces abundant interfascicular fibers in the inflorescence stems (Zhong and Ye, 2001) with secondarily thickened cell walls that are rich in both G and S lignin at maturity (Chapple et al., 1992; Dharmawardhana et al., 1992; Lee et al., 1997). These fibers provide mechanical support to the stem, but are distinct from xylem in the vascular bundles of the primary stem. It is also possible to induce secondary xylem differentiation in the Arabidopsis hypocotyl and inflorescence stem (Chaffey et al., 2002; Ko et al., 2004; Little et al., 2002; Oh et al., 2003). Recent reviews by Ye et al. (2002) and Nieminen et al. (2004) highlight the potential to exploit the genetic and genomic resources available in Arabidopsis to study xylem and fiber differentiation and maturation in this model plant.

Mutant screens in Arabidopsis focusing on defects in fiber differentiation in the inflorescence stem have identified a number of genes required for proper execution of this process. Mutations in these genes lead to either altered timing of fiber differentiation (REVOLUTA/INTERFASCICULAR FIBERLESS; Zhong et al., 1997) or the formation of fibers with defective secondary cell walls [e.g., irx mutants; reviewed in Nieminen et al., 2004; fragile fiber (fra) mutants, Burk et al., 2001; Hu et al., 2003]. REV/IFL encodes a class III HD-ZIP transcription factor that is required for interfascicular fiber differentiation, establishment of organ polarity, and regulation of tissue arrangement in vascular bundles (Emery et al., 2003), while IRX1, IRX3, and IRX5 encode cellulose synthase (CesA) gene family members that form a CesA enzyme complex required for secondary wall biosynthesis (Taylor et al., 2003).

These genetic results provide initial insights into the regulatory mechanisms required for fiber cell differentiation, and into the important roles played by specific CesA complexes and microtubules in secondary cell wall biosynthesis. Another important process during the final stages of interfascicular fiber differentiation is the deposition of lignin into the elongated and secondarily thickened walls. Lignin is a polymer of hydroxycinnamyl alcohols (monolignols) and other phenylpropanoids (Raes et al., 2003 and references therein). Biosynthesis of monolignols requires enzymes of the phenylpropanoid pathway, which carry out a series of reactions using intermediates derived from phenylalanine. These reactions lead from phenylalanine to coniferyl or sinapyl alcohols (monolignols for G and S lignin subunits, respectively; Boerjan et al., 2003). Enzymes involved in phenylpropanoid biosynthesis are usually encoded by gene families, which often vary in size according to plant species (Dixon et al., 2002; Hahlbrock and Scheel, 1989). Because lignin is the most abundant phenylpropanoid natural product, and an important constituent of wood and fiber, the monolignol biosynthetic pathway has been the subject of intensive research efforts (Boerjan et al., 2003; Humphreys and Chapple, 2002). This work, together with genomic approaches, has revealed a set of Arabidopsis phenylpropanoid gene family members that is likely involved in monolignol biosynthesis (Costa et al., 2003; Goujon et al., 2003; Raes et al., 2003). In addition, a number of phenylpropanoid-like enzymes of unknown biochemical function, which are related to members of phenylpropanoid gene families, are annotated in the Arabidopsis genome (The Arabidopsis Genome Initiative, 2000) and have been described in recent reports (Costa et al., 2003; Cukovic et al., 2001; Raes et al., 2003; Shockey et al., 2003).

There is limited information on the mechanisms underlying the transcriptional control of these genes, but Mele et al. (2003) showed that the Arabidopsis homeobox KNOX gene BREVIPEDICELLUS negatively regulates the expression of several phenylpropanoid genes, and lignin deposition, in interfascicular fibers of the Arabidopsis inflorescence stem. In addition, specific MYB type transcription factors are involved in regulating vascular differentiation and phenylpropanoid gene expression (Bonke et al., 2003; Borevitz et al., 2000; Jin et al., 2000).

While the Arabidopsis genome contains more than 1400 known and putative transcription factors (Davuluri et al., 2003), only very few transcriptional regulators that potentially control the transition from interfascicular parenchyma cells into the interfascicular fiber cells have been identified. Moreover, monolignol secretion to the secondary cell wall and monolignol polymerization during the final stages of fiber differentiation are still poorly understood (Boerjan et al., 2003). Monolignol secretion could occur via vesicular trafficking, as abundant vesicle activity correlated with hemicellulose secretion has been observed in developing pine xylem (Samuels et al., 2002). An alternative hypothesis is export of monolignols directly from their site of synthesis in the cytoplasm across the plasma membrane via ABC transporters (Boerjan et al., 2003; Samuels et al., 2002), a class of small molecule membrane transporters involved in Arabidopsis cuticular wax secretion (Pighin et al., 2004). Monolignol coupling to generate the complex lignin polymer is achieved in the cell wall via free radical intermediates produced via one or more oxidative enzymes. Several classes of oxidative enzymes have been associated with monolignol dehydrogenation including class III peroxidases and laccases. In addition, a role for NADPH oxidases, ascorbate peroxidases, oxalate oxidases (germin), and copper amine oxidases has also been suggested (Boerjan et al., 2003; Gavnholt and Larsen, 2002; Whetten et al., 1998). However, the multiplicity of physiological functions and the large size of gene families encoding these oxidative enzymes make it difficult to identify enzymes that function specifically in lignin polymerization.

In order to supply an adequate supply of carbon skeletons for phenylpropanoid biosynthesis, aromatic amino acid biosynthesis through the shikimate pathway must be coordinated with the metabolic demand for phenylalanine, but only limited evidence for coordination of shikimate and phenylpropanoid metabolism exists in the context of lignin formation (Lee et al., 1997). Plant genes have been characterized for all enzymes of the pre-chorismate pathway as well as for the tryptophan branch of shikimate metabolism (for references see Table S5). In contrast, our knowledge of the branch leading to tyrosine and phenylalanine is still fragmentary. While plant genes encoding chorismate mutase have been identified (Eberhard et al., 1993, 1996; Mobley et al., 1999), no plant gene encoding prephenate aminotransferase has been cloned, although this enzyme has been characterized biochemically (De-Eknamkul and Ellis, 1988; Siehl et al., 1986). Similarly, no plant gene encoding the only phenylalanine specific step, arogenate dehydratase, has been characterized.

Global transcript profiling using cDNA or oligonucleotide-based microarrays is a powerful approach to identify genes involved in various aspects of fiber cell and xylem differentiation. Demura et al. (2002) used this approach to identify genes closely associated with morphogenic events during secondary cell wall biosynthesis in Zinnia cell cultures undergoing tracheary element trans-differentiation. Hertzberg et al. (2001) used poplar cDNA arrays to profile changes in gene expression at various stages of secondary xylem differentiation. Oh et al. (2003) and Ko et al. (2004) used short oligonucleotide arrays to identify genes that display preferred expression in secondary xylem, and during the transition from primary to secondary growth in Arabidopsis stems. Finally, Birnbaum et al. (2003) used a novel combination of cell sorting and microarrays to profile the developmentally regulated expression of genes in several different Arabidopsis root cell types and tissues, including the stele, where xylem and phloem differentiate.

We have used near-full-genome Arabidopsis-spotted 70-mer oligo arrays to profile global changes in gene expression along the developmental gradient of stem maturation and interfascicular fiber differentiation in the Arabidopsis inflorescence stem. Analysis of the profiling results has allowed us to identify discrete sets of candidate genes involved in various aspects of fiber cell differentiation and maturation, including novel candidates for transcriptional regulation, monolignol polymerization, monolignol transport, and phenylalanine biosynthesis.


Analysis of fiber differentiation

Arabidopsis plants (ecotype Landsberg erecta, Ler) were grown under short-day conditions for 8 weeks and were then transferred to a long-day regime in order to induce synchronized development of inflorescence stems. Plant material was harvested when these inflorescence stem were 4–6 cm (5 cm stems) or 9–11 cm (10 cm stems) in length (Figure 1). For histological characterization of the developmental stages at increasing distances from the shoot apical meristem, serial hand sections were taken of the primary stem and samples were analyzed using bright field and UV-fluorescence light microscopy. In 5 cm stems, at a distance of 1 cm from the apex, primary vascular bundles were already differentiated, but few, if any, vessel elements showed thickened cell walls and lignin deposition, as indicated by pale blue autofluorescence of phenolic compounds upon illumination with UV light (data not shown). Red autofluorescence indicated the presence of chloroplasts only in the parenchymatic layer outside the stele but not in vascular or pith cells. At 2 cm from the apex, vascular bundles had developed further, and several vessel elements (5–10 per vascular bundle) with thickened cell walls and lignin deposition were obvious (not shown). However, no signs of interfascicular fiber differentiation were obvious and acid-insoluble lignin content was below the detection limit up to this stage, while soluble phenolics were readily detectable (Table 1). In 10 cm stems, similar results were observed: vascular bundles with lignified vessel elements were almost completely developed at 3 cm from the tip, while no obvious differentiation of interfascicular fibers was observed at this stage (Figure 1c,d). Soluble phenolics were abundant in this section, but total lignin did not exceed 3% of stem dry weight and was predominantly composed of guaiacyl subunits (Table 1). Starting at 4 cm from the tip, the first signs of fiber differentiation in the interfascicular region became obvious, and at 6 cm thickened secondary cell walls that were undergoing lignification became obvious (Figure 1e–h). At 8 cm fiber differentiation was almost completed, and a full ring of interfascicular fibers characteristic of Arabidopsis (Little et al., 2002; Zhong and Ye, 2001) was established. Accordingly, total lignin content increased in consecutive sections of 10 cm stems with increasing relative amounts of syringyl subunits, while soluble phenolics decreased in abundance (Table 1). While limited material prevented us from performing lignin analysis on 7–9 cm sections, lignin content in the bottom parts of fully mature stems reached 18% with up to 31% syringyl subunits in other samples analyzed in parallel (data not shown), suggesting that lignin continued to be deposited in the 7–9 cm sections.

Figure 1.

Primary stem development in Arabidopsis.
Nine-week-old Arabidopsis Ler plants were harvested when primary stems reached a total length of 4–6 cm (a) or 9–11 cm (b). Stems were divided into different segments as indicated in inlays (marker unit in cm). Serial hand sections of the primary stem were performed at different distances from the tip and samples were analyzed using bright field (c, e, g, and i) and UV-fluorescence (d, f, h, and j) light microscopy. From 10 cm stems, sections were analyzed at 2 cm from the tip (c and d), at 4 cm (e–h), at 6 cm (g and h), and at 8 cm from the tip (i and j). All microscopical photographs were taken at a magnification of 200×.

Table 1.  Lignin and phenolic analysis of Arabidopsis floral stems at different stages of development
Section (cm)Acid-insoluble lignin (% DW)aAcid-soluble phenolics (% DW)bGuaiacyl lignin (%)cSyringyl lignin (%)cTotal yield (μmol g−1 lignin)d
  1. ND, not determined.

  2. aPercentage of acid-insoluble lignin per total stem dry weight.

  3. bPercentage of soluble phenolics per total stem dry weight.

  4. cPercentage of guaiacyl (G) and syringyl (S) subunits, as determined by thioacidolysis.

  5. dQuantity of monolignols recovered by thioacidolysis based on the starting acid-insoluble lignin amount.


In order to characterize global expression patterns associated with this developmental progression, in which fiber differentiation is the most prominent developmental change, we collected the following samples for RNA extraction and gene expression profiling: (i) 0–2 cm from 5 cm stems (developing vascular bundles, no interfascicular fibers), (ii) 0–3 cm from 10 cm stems (vascular bundle differentiation almost completed, no interfascicular fibers), (iii) 2–4 cm from 5 cm stems (late stage of vascular bundle differentiation, early stage of interfascicular fiber differentiation), (iv) 3–5 cm from 10 cm stems (early to middle stages of interfascicular fiber differentiation), (v) 5–7 cm from 10 cm stems (middle to late stages of fiber differentiation), and (vi) 7–9 cm from 10 cm stems (fiber differentiation largely completed). Although vascular and fiber differentiation is visibly the most predominant process occurring along the axis of primary stems, other tissue and cell types also clearly undergo differentiation and maturation over the developmental gradient sampled. For example, epidermal cells undergo massive cell elongation and synthesize large amount of cuticular waxes, which might be modified in later stages of epidermal development. Similarly, parenchyma and pith cells will also undergo maturation processes and may change their metabolic states at different times of their lifespans. Thus, changes in transcript abundance related to fiber differentiation will be a subset of the overall change in gene expression associated with cell and tissue differentiation and maturation along the developing stem.

Global expression profiling of stem and fiber development

We employed a microarray for global expression profiling that was built from a set of 25 792 optimized 70mer oligonucleotides based on the annotated Arabidopsis genome. We isolated total RNA from each of the primary stem samples using pooled sections from more than 20 plants. RNA derived from the 0–2 cm sample from 5 cm stems was used as a reference for two-channel microarray analysis of all remaining samples. Each experimental/reference hybridization was replicated four times, for a total of 20 arrays used in this experiment. We observed high signal-to-noise ratios with 70% of gene specific spots on average having signal intensities that are more than threefold higher than the local background surrounding each spot. Background-corrected signal intensities were used for Loess normalization (Yang et al., 2002), thus generating normalized log2 expression ratios comparing each sample with the 0–2 cm sample for each element on a given array. Normalized expression ratios for all probes on the array as well as results for all statistical analysis are provided in Table S1. For each probe, we first used the data from the four replicate arrays for each sample to perform a Student's t-test. In order to assess the type I error rate, we calculated q-values estimating the false discovery rate based on the parametric P-values obtained from the t-statistic (Storey and Tibshirani, 2003), shown in Figure S1.

Table 2 lists the number of significantly differentially expressed genes using three different definitions. As expected, the number of differentially expressed genes increased with distance from the apex (reference), and more than 10 000 genes (41% of all genes present on the array) had q-values below 0.05. Application of a more rigid definition for a differentially expressed gene (t-test: P < 0.01 and fold-change > 2) yielded up to almost 3000 differentially expressed genes. The estimated false-discovery rate for these genes does not exceed 0.016 (Table 2), suggesting that <50 genes among these were falsely detected as differentially expressed.

Table 2.  Estimation of differentially expressed genes in developing primary stem samples
Section (cm)aNumber of differentially expressed geneseEstimated FDRf
P < 0.01P < 0.01 FC > 2q < 0.05P < 0.01 FC > 2
  1. aEach sample is compared with a 0–2 cm sample from 5 cm stems and Loess-normalized expression ratios of four replicate experiments were used for statistical analysis.

  2. bFrom 10 cm stems.

  3. cFrom 5 cm stems.

  4. dIdentical RNA was used in four replicate hybridizations in both channels.

  5. eDifferentially expressed genes (total 25 739) in each sample were defined as: P-value (false-positive probability) of a Student's t-test for a given gene is less than indicated and the fold change (FC) difference is greater than indicated, or the q-value (false discovery probability) for a given gene is less than indicated.

  6. fEstimation of the maximal proportion of false positives incurred when tests were called significant [max. false discovery rate (FDR) for differentially expressed genes].


In order to assess the false-positive rate experimentally, we also carried out four self–self hybridizations using an identical mixture of total RNAs in both channels. Statistical analysis of these self–self control hybridizations identified zero genes having a q-value of <0.05. As expected for multiple testing approaches, 618 genes (2.1%) with low P-values (t-test: P < 0.01) were observed in this control data set; however, when an arbitrary twofold cut-off was included, only 10 of 25 792 genes on the array were assigned to be differentially expressed (Table 2). This translates into an observed false-positive rate of 0.04%.

In order to estimate the number of genes that change their expression along the axis of primary stems, we performed an analysis of variance (anova) using the expression ratios from 10 cm stems only. This results in a total of 5459 genes with a significant statistic (anova: P < 0.01), of which 3658 genes are more than twofold different between at least two samples. The false-discovery rate among these genes was estimated to be 0.025 based on the q-value calculation proposed by Storey and Tibshirani (2003). Therefore, we expect 91 of the 3658 genes to be false positives.

Given the high number of differentially expressed genes, we next used a second-degree polynomial regression model to place differentially regulated genes (F-statistic: P < 0.01) in eight expression categories, as illustrated in Figure 2 (see Experimental procedures for details). Relatively few genes were placed in those expression categories, which represent genes that are upregulated or downregulated in the oldest part of the stem only (categories 1 and 3, respectively, in Figure 2). In contrast, more than 700 genes each followed regression curves that are characterized by upregulation or downregulation early in development (3–5 cm sample) and subsequent maintenance of expression levels (categories 2 and 4 in Figure 2). Similarly, more than 600 genes showed a pattern of linear upregulation or downregulation along the axis of primary stems while comparatively few (<90) genes followed a transient expression profile with maximal differential expression ratios in the center of the developmental series (Figure 2).

Figure 2.

Expression profile categorization.
Normalized expression ratios from different stem sections of 10 cm stems compared with the tip (0–2 cm) of 5 cm stems were obtained for 25 739 probes using printed oligo microarrays. For each comparison four replicate arrays were hybridized. The 16 data points for each probe were used to estimate the parameters of a gene-specific regression model y = β2x2 + β1x + β0 with each developmental sample being assigned a median-centered x (−1.5, −0.5, 0.5, and 1.5 for the 0–3, 3–5, 5–7, and 7–9 cm sample, respectively). Significant results (F: P < 0.01) were subsequently divided based on the sign and significance of each β2 and β1: (a) β2 positive and significant, β1 positive and significant (category 1, late upregulated transcripts); (b) β2 positive and significant, β1 negative and significant (category 2, early downregulated transcripts); (c) β2 negative and significant, β1 negative and significant (category 3, late downregulated transcripts); (d) β2 negative and significant, β1 positive and significant (category 4, early upregulated transcripts); (e) β2 not significant, β1 positive and significant (category 5, linear upregulated transcripts); (f) β2 not significant, β1 negative and significant (category 6, linear downregulated transcripts); (g) β2 positive and significant, β1 not significant (category 7, transiently downregulated transcripts); (h) β2 negative and significant, β1 not significant (category 8, transiently upregulated transcripts). Shown in gray are the regression curves for transcripts in each category that change more than twofold (a and c) or more than threefold (b and d–h); in black the mean profile of all genes in that category is shown. The total number of transcripts in each category and the fraction that is changing more than twofold and threefold is given underneath each graph.

In order to group genes within each category according to predicted functions, we next identified Functional Catalogue (FunCat) terms (Ruepp et al., 2004) associated with each gene using the MAtDB database (MIPS Arabidopsis thaliana Data Base, For this analysis only probes with an annotated locus identifier were used; the current oligonucleotide-annotation provided by Operon contains 22 890 locus identifiers. For each expression category, frequencies of genes in a given FunCat group were compared with the frequency found for all genes represented on the array. Only FunCat groups with the lowest hierarchical levels found to be significantly over-represented (hypergeometric distribution: P < 0.05) are shown in Figure 3 and Figure S2; if higher levels within the FunCat hierarchy were also over-represented they were excluded to avoid replication. Many of the over-represented FunCat groups in expression category 2 (highest expression in the tip of the stem, Figure S2b) are associated with rapid cell division and growth, as would be anticipated for a region of active growth (e.g., carbohydrate and fatty acid metabolism, transcription and translation, and biogenesis of primary cell walls). Other over-represented groups in this category are connected to chloroplast biogenesis and photosynthesis, in agreement with a reduced production and proportion of photosynthetically active cells in older parts of the stem. Similar functional groups were also over-represented in expression category 6 (linear decrease in expression ratios along the stem), in agreement with the broadly similar profile of these two categories. The most prominent morphological changes along the axis of the primary stem analyzed are associated with the development and differentiation of fibers in and between vascular bundles (Figure 1). Based on a positive correlation between gene expression and fiber differentiation, it can be anticipated that genes which are transcriptionally activated as part of this developmental process would be found in expression categories 4, 5, and potentially 8. As expected, genes that are involved in the biosynthesis of phenylpropanoids, intermediates in lignin biosynthesis, were over-represented in category 4 (Figure 3). Interestingly, a prominent functional group over-represented both in categories 4 and 5 is ‘transcriptional control,’ suggesting that potential regulators of fiber differentiation are enriched in these expression categories. Therefore, we focused our further attention on a detailed analysis of genes potentially involved in these two aspects of fiber differentiation.

Figure 3.

Functional categories of upregulated genes.
Genes that have been placed into expression categories 4 and 5 as shown in Figure 2 were used to screen the Functional Catalogue (FunCat) terms (Ruepp et al., 2004) at the MAtDB database (MIPS Arabidopsis thaliana Data Base, The frequencies (in percentage of the total number of genes in that expression category) for the lowest hierarchical levels found to be over-represented (see text for details) are shown as black bars. As a comparison the frequency for the same FunCat term observed with all 22 890 probes present on the array used and for which a locus identifier is available are shown as white bars. Functional groupings depicted in (a) correspond to expression cluster 4 (Figure 2d), those shown in (b) correspond to expression cluster 5 (Figure 2e).

Transcription factors

Of the 1410 transcription factor genes annotated and assembled into transcription factor gene families (Davuluri et al., 2003); AGRIS database at, version of December 2003), 1287 are represented on the microarray used in this study. Among those, 270 could be placed in one of the eight expression categories depicted in Figure 2. A total of 191 of the differentially regulated transcription factor genes were placed into expression categories 1, 4, 5, and 8 (7% of all upregulated genes), while only 79 were placed in categories 2, 3, 6, or 7 (3% of downregulated genes, Table S2). Interestingly, with the exception of the EIL family, at least one member of every Arabidopsis transcription factor family was represented in one or more of the four upregulated expression clusters (Figure 4). However, half of all upregulated transcription factors fell into six gene families, with the AP2-EREBP, MYB, and bZIP families having the most members represented (22, 19, and 17 members, respectively, Figure 4). Only genes within the AP2-EREBP and bZIP classes, however, were represented at significantly higher frequencies within the stem upregulated gene sets compared with their frequency of representation within the whole set of transcription factors represented on the array.

Figure 4.

Categorization of differentially regulated transcription factors.
A total of 191 putative transcription factor genes were placed into expression categories 1, 4, 5, and 8 (upregulated, Figure 2). Shown is the transcription factor class distribution for theses genes. For each class, the total numbers of genes in these expression clusters are given in parenthesis. The proportion of transcription factors of a given class in these expression clusters was compared with the proportion of that class in the whole set of transcription factors analyzed. The P-value of a hypergeometric distribution indicating if the class is over-represented is given in parenthesis.

Genes encoding members of the smaller classes of C2C2-CO, GRAS, and Alfin-type transcription factors were also found to be over-represented within the group of upregulated genes (Figure 4), although functionally characterized members of these three classes (e.g., CONSTANS, SCARECROW, and Alfin1, respectively; Di Laurenzio et al., 1996; Putterill et al., 1995; Winicov, 2000) were not differentially upregulated over the course of stem development. In fact, among all the upregulated transcription factor genes identified in this experiment, no function has yet been ascribed to the great majority. Exceptions include ERF1 of the ERF subclass of AP2-EREBP genes, which is involved in ethylene and jasmonate-mediated defense response signaling (reviewed by Gutterson and Reuber, 2004), and APETALA2, which controls floral organ and meristem identity (Jofuku et al., 1994).

Certain auxin response factors (ARF) and homeodomain transcription factors, in particular ARF5/MONOPTEROS and ATHB8, respectively, are known to be involved in regulating vascular differentiation (Baima et al., 2001; Hardtke and Berleth, 1998). While neither of these two genes was differentially expressed over the course of stem development, other members of these families did show such differential expression (Figure 4; Table S2). Similarly, although a total of 28 of the approximately 120 Arabidopsis genes encoding R2R3 MYB transcription factors were differentially expressed (Figure 5; Table S2), this group did not include any previously identified MYB transcription factors that have been shown to be involved in regulating phenylpropanoid gene expression and vascular development (Bonke et al., 2003; Borevitz et al., 2000; Jin et al., 2000; Newman et al., 2004; Penfield et al., 2001).

Figure 5.

Expression profiles of genes involved in monolignol and cellulose biosynthesis.
A total of 68 phenylpropanoid and phenylpropanoid-like genes (Table S3) were represented on the array employed and 40 of these were differentially expressed (anova: P < 0.05, q < 0.083). Mean expression ratios [log2 (sample/0–2 cm sample)] of these genes were used for hierarchical cluster analysis with average linkage. Expression ratios are shown in (a) as a color-coded heatmap with more than eightfold higher expression indicated in red, and more than eightfold lower expression in a sample compared with the 0–2 cm reference indicated in blue. Locus identifier and gene names are given to the right of each row. Genes known or inferred to encode bona fide monolignol biosynthetic genes are highlighted in green. Expression profiles of differentially expressed members of the CesA gene family are shown in (b). Of the 10 CesA genes present in Arabidopsis, nine were represented on the array used, and five were differentially expressed by the same criteria as above. Abbreviations of phenylpropanoid gene names are explained in Table S3. CesA, cellulose synthase.

Our results thus reflect a massive change in the expression of potential regulatory genes over the course of Arabidopsis stem development. However, due to the biological complexity of the developmental gradient sampled in our experiment, it is likely that many of the differentially expressed transcription factor genes are involved in regulating processes other than fiber differentiation and maturation. In order to filter the transcription factor data set and enrich it for candidate transcription factor genes most likely to be involved in fiber or vascular differentiation, we compared our data to those from an expression profiling experiment that focused on differential gene expression in various tissues, cell types, and developmental stages of the Arabidopsis root (Birnbaum et al., 2003). Of the 1287 transcription factor genes represented on our array, 1110 were also represented on the microarray used in that study. Among these, a common set of 71 genes were differentially expressed during the course of both stem and root development, i.e., being placed in an expression cluster in both analyses (Table S2). Among those transcription factor genes, nine belonged to one of the upregulated expression clusters in our stem study (category 4, 5, or 8), were expressed in a stele-specific manner, and were upregulated during the course of stele development (expression cluster LED1 in Birnbaum et al. (2003). This group of candidate fiber development transcription factors, indicated in Table 3, has members from five different families: three MYB (MYB20, At1g66230; MYB43, At5g16600; and MYB63, At1g79180) two bHLH (bHLH068, At4g29100 and bHLH144, At1g29950), one bZIP (bZIP9, At5g24800), one AP2-EREBP (At5g07580), one C3H (At5g42200), and one homeodomain protein (KNAT7, At1g62990). Besides the study by Birnbaum et al. (2003), to our knowledge, none of these transcription factors has previously been associated with fiber or vascular development.

Table 3.  Candidate transcription factors for fiber differentiation identified by expression profiling and expression pattern filtering
AGI codeGene familyNameExpression categoryaRoot LED categoryb
  1. aThis study.

  2. bBirnbaum et al. (2003).

At5g07580AP2-EREBPNo name41
At5g42200C3HNo name41

The predicted association of these gene expression patterns with progressive stem maturation was validated by RNA blots and quantitative RT-PCR, using gene-specific primers for these transcription factor genes, as well as a control gene (AtbHLH110, At1g27660), which was not significantly differentially regulated, and one other transcription factor gene (TGA1, At5g65210), which was placed in expression cluster 4. For most genes, only weak signals were obtained using RNA blots (data not shown), but the RT-PCR results, presented in Figure S3, confirm differential expression of eight of the nine transcription factor genes tested. Only for bHLH144 no clear increase in transcript amounts relative to the 0–2 cm sample was observed (Figure S3).

Monolignol biosynthesis

The formation of thickened secondary cell walls that are heavily lignified is integral to the development of both treachery elements and fibers. Most monolignol biosynthesis enzymes are encoded by small canonical gene families in Arabidopsis, although additional genes with significant sequence similarity but unknown function are present in the Arabidopsis genome. Based on previously published surveys of the Arabidopsis genome (Costa et al., 2003; Goujon et al., 2003; Raes et al., 2003; Shockey et al., 2003) and our own bioinformatic screens (see Experimental procedures for details) we identified a total of 79 genes in 10 families encoding bona fide phenylpropanoid enzymes, their relatives, and phenylpropanoid-related enzymes such as cytochrome P450 reductases, which are required for the activity of cytochrome P450 enzymes such as C4H, C3H, and F5H. Due to inconsistency in the naming of these genes in Arabidopsis, we used a revised nomenclature, as detailed in Table S3.

Among these 79 genes, 68 were represented on the array used and 40 of these were differentially expressed (anova: P < 0.05, q < 0.083) during stem maturation. For the following gene family analyses we used a less stringent definition of differentially expressed genes in order not to exclude possibly interesting candidate genes in this subset of genes. Expression data for these 40 phenylpropanoid genes were used for hierarchical cluster analysis with average linkage (Figure 5). The cluster analysis identified a prominent group of phenylpropanoid genes that were co-ordinately upregulated in stem samples as fiber differentiation proceeds, and this group included all but one of the genes encoding enzymes known or inferred to be involved in lignin biosynthesis (highlighted in green in Figure 5a; Goujon et al., 2003; Kim et al., 2002; Raes et al., 2003). Among the genes known or inferred to take part in monolignol biosynthesis, only CCR2 did not cluster with this group; instead it displayed up to sixfold lower expression in older parts of the stem compared with the top. Otherwise, the behavior of the identified gene set is entirely consistent with the expression pattern that would be predicted for these relatively well-characterized genes during the process of stem development and fiber differentiation.

In addition to lignin biosynthesis, cellulose biosynthesis is a prominent metabolic activity integrated into secondary wall formation in developing fibers. Of the 10 CesA genes in Arabidopsis, nine are represented on our array. As shown in Figure 5(b), five of these were differentially expressed. Consistent with their roles as part of a CesA complex specific to secondary wall biogenesis (Taylor et al., 2003), genes encoding CesA7 (IRX3), CesA8 (IRX1), and CesA4 (IRX5) were upregulated over the course of fiber differentiation, with expression patterns closely related to those of the monolignol biosynthetic genes (Figure 5b).

These gene sets thus provide a sound basis for mining our larger expression profiling data set for co-expressed candidate genes that could be involved in facets of lignin biosynthesis and deposition and secondary wall formation that are less well characterized. For example, the cluster analysis suggests that ATR2, one of three Arabidopsis genes encoding known or putative cytochrome P450 reductases (Mizutani and Ohta, 1998), encodes the reductase most likely to be involved in lignification by supporting the activity of lignin-related P450 enzymes such as C3H, C4H, and F5H, as it is the only ATR gene whose expression appears to be co-regulated with the phenylpropanoid gene set (Figure 5a).

Monolignol polymerization

In comparison with our knowledge about the monolignol biosynthetic pathway, the processes by which monolignols undergo dehydrogenation and polymerization to form the lignin polymer remain obscure. Several classes of oxidative enzymes have been associated with monolignol dehydrogenation (Boerjan et al., 2003; Gavnholt and Larsen, 2002). Based on previous publications, publicly available gene family collections, and our own bioinformatic surveys of the Arabidopsis genome (see Experimental procedures), we compiled lists encompassing all members of class III peroxidase, laccase, ascorbate peroxidase, NADPH oxidase, copper amine oxidase, and oxalate oxidase gene families (Table S4). Among the 324 genes included in these gene families, 287 were represented on the microarray and 39 of theses genes displayed significant changes in expression (anova: P < 0.05, q < 0.083) and more than twofold change between expression ratios along the stem developmental gradient.

We combined expression profiles of these 39 genes with the expression data compiled for known monolignol biosynthetic genes (highlighted in green in Figures 5 and 6) and performed hierarchical cluster analysis to identify co-expressed genes (Figure 6). Among the 39 differentially expressed oxidases and peroxidases, 22 displayed expression profiles comparable to those of the phenylpropanoid gene set described above. These are highlighted in blue in Figure 6. Four of the 15 laccases (LAC17, LAC04, LAC02, and LAC11) displayed higher expression in all older parts of the stem compared with the tip and clustered with most monolignol biosynthetic genes, whereas LAC05 and LAC12 show highest expression in the 2–4 cm and 3–5 cm samples only. Only one laccase (LAC03) was downregulated in older parts of the stem compared with the tip, while the remaining eight laccase genes were not differentially expressed in the samples analyzed.

Figure 6.

Identification of monolignol polymerization and transport candidate genes by hierarchical clustering of expression profile data.
A total of 287 known or putative ascorbate peroxidases, NADPH oxidases, laccases, class III peroxidases, copper amine oxidases, oxalate oxidases (all potentially involved in monolignol dehydrogenation), dirigents, and ABC transporters were represented on the microarray, 39 of theses genes displayed significant changes in expression (anova: P < 0.05, q < 0.083) and changed more than twofold between expression the samples analyzed. The mean expression ratios of these genes were combined with expression ratios of known or inferred monolignol biosynthetic genes and used for hierarchical cluster analysis using average linkage. Shown is a heatmap using the same color coding as in Figure 5. Genes that cluster with known monolignol biosynthetic genes (clusters Ia, Ib, and Ic) represent monolignol dehydrogenation, dirigent, and ABC transporter candidates and are highlighted in blue, yellow, and red, respectively. Potential polymerization and transport genes that are differentially regulated but do not cluster with monolignol biosynthetic genes are shown in clusters IIa and IIb. Abbreviations of gene names are detailed in Table S4.

Interestingly, none of the nine annotated NADPH oxidases (RhoARhoJ; Torres et al., 1998) were co-expressed with monolignol biosynthetic genes, whereas three NADPH oxidase-like genes (NADPHOL2, NADPHOL3, and NADPHOL7) were placed in expression cluster I (Figure 6). Within the group of 73 class III peroxidases present in Arabidopsis (Welinder et al., 2002), 69 were represented on the microarray; among those, eight peroxidases displayed expression profiles similar to that of the monolignol biosynthetic gene set (Figure 6). P2, P17, P37, P9, and P30, which were placed in expression cluster Ia/b, displayed the strongest similarity. Other oxidase genes placed in expression cluster Ia/b include three oxalate oxidases (germins GLP2a and GLP4, Table S4; Carter et al., 1998), and the ascorbate oxidase-like gene AOL12. In summary, only 22 candidate genes from four oxidative enzyme gene family classes were correlated with developmentally regulated lignin biosynthesis and deposition.

We also included the Arabidopsis dirigent gene family in this analysis. Dirigent proteins have the potential to control lignin polymerization by binding to and orienting free monolignol radicals prior to oxidative coupling (Gang et al., 1999). The Arabidopsis genome harbors 21 known or putative dirigent genes (Table S4), among which 19 are represented on the microarray. Interestingly, none of these genes was assigned to expression cluster Ia/b, which contains most of the monolignol biosynthetic genes (Figure 6). However, DIR11 exhibits an expression pattern similar to that of PAL3 (higher expression in the 3–5 cm and 5–7 cm samples compared with the tip), while DIR5 and DIR6 were characterized by higher expression in younger stem sections compared with the very tip of developing stems and were placed in expression category Ic (Figure 6). Similarly, DIR13 was expressed at highest levels in the 2–4 cm stem section, a zone that is devoid of developing fibers, but displays ongoing vascular differentiation. In contrast, DIR1, DIR3, DIR7, and DIR19 were significantly downregulated during stem maturation (Figure 6).

Monolignol transport

Very little is known about monolignol transport into the extracellular space. Monolignols may be transported as glycosides (Lim et al., 2001), but which molecular transport mechanism might be involved is unknown. Based on the ability of ATP-binding cassette (ABC) transporters to transport a diverse set of small molecules across membranes and the large number of such ABC transporters encoded in the Arabidopsis genome, ABC transporters could conceivably be involved in the secretion of monolignols (Samuels et al., 2002). To test this hypothesis, we examined the expression profiles of ABC transporter genes relative to that of the monolignol biosynthetic gene set. The Arabidopsis genome contains at least 129 genes encoding putative ABC-transporters (Sánchez-Fernández et al., 2001). Of the 118 ABC-transporter genes represented on the microarray used, 13 were differentially expressed along the axis of developing primary stems and among those, seven clustered with known monolignol biosynthetic genes (highlighted in red in Figure 6). Within this group of upregulated ABC-transporters, MDR13, MDR8, WBC23, and PDR8 most closely resembled the expression profile of known monolignol biosynthetic genes, while PDR13 and PDR1 displayed highest expression in the 3–5 cm sample only. We used quantitative RT-PCR to validate the expression patterns of MDR13 and PDR1, and observed strong induction of both genes along the axis of 10 cm stems compared with the 0–2 cm sample from 5 cm stems (Figure S3).

Shikimate pathway

The shikimate pathway channels the flow of carbon from sugar metabolism into the biosynthesis of the aromatic amino acids tryptophan, tyrosine, and phenylalanine, the latter being the precursor of the monolignol biosynthetic pathway. Given the major commitment of carbon to secondary cell wall synthesis in the maturing vasculature and fibers, it could be expected that genes encoding enzymes of the shikimate pathway are co-ordinately expressed with those of monolignol biosynthesis.

In order to generate an expression profile of the shikimate pathway, we first surveyed the Arabidopsis genome and identified complete gene families encoding known or putative shikimate pathway enzymes (Table S5). While most of these enzymes are encoded by small gene families (two to six members), three enzymes [3-dehydroquinate synthase (DQS), 3-dehydroquinate dehydratase/shikimate dehydrogenase (DHQD/SD), and chorismate synthase (CS)] are encoded by single-copy genes in Arabidopsis. The majority of these enzymes are targeted to the chloroplast based on in silico predictions (Table S5). However, for many families at least one isoform exists for which the subcellular localization could not be clearly predicted and/or they are likely localized in the cytoplasm (Table S5).

For most of the identified shikimate pathway genes, 70mer probes were present on the microarray used here, with the notable exception of the two genes encoding ESPS synthase. The expression profiles of these genes are depicted in Figure 7. Most genes on the array encoding enzymes of the pre-chorismate part of the shikimate pathway, as well as those of the phenylalanine-specific branch, were upregulated along the axis of stem development, in correlation with lignification of developing fibers. One exception is the gene encoding DHQD/SD, which was not significantly upregulated over the developmental series. In contrast, no genes encoding enzymes of the tryptophan-specific branch were differentially expressed (Figure 7). Similarly, the two genes encoding arogenate dehydrogenase (ADH; Rippert and Matringe, 2002a,b), the only tyrosine-specific step in the pathway, were not differentially expressed (Figure 7).

Figure 7.

Expression profiles of known and candidate shikimate pathway genes during stem development.
A total of 35 known or putative genes encoding enzymes of the shikimate pathway (Table S5) were analyzed. (a) Overview of the biochemical pathway and the mean expression ratios of all gene family encoding the corresponding enzymes represented on the array. The actual number of each gene family is given in parenthesis behind the enzyme abbreviation (see Table S5 for an explanation of abbreviations used). Given beside each expression profile is the P-value of an analysis of variance for that gene, the locus identifier, and the gene name. Color coding of heatmaps are as described for Figure 5. In order to identify potential candidate genes that encode a prephenate aminotransferase (PNT), all genes that contain an aminotransferase domain (Pfam motif PF00155, total of 57genes present in Arabidopsis), are present on the array (55 genes), displayed significant changes in expression (anova: P < 0.05, q < 0.083), and changed more than twofold between the samples analyzed were identified (12 genes). (b) Expression profiles of these 12 genes.

The enzyme that catalyzes the initial step of the shikimate pathway, DHS, is encoded by three gene family members in Arabidopsis (Table S5) of which DHS1 is not represented on the array. However, DHS3, which has not been previously characterized, is expressed at increasingly higher levels over the course of fiber differentiation, while DHS2 is expressed at higher levels only in mature stem sections. Similarly, among the four Arabidopsis genes with homology to a functionally characterized shikimate kinase (SK) from tomato (Schmid et al., 1992), SK1 and SK4 displayed an expression pattern similar to that of monolignol biosynthesis genes, while SK2 was only expressed at higher levels in the most mature stem sections. In contrast, SK3 was transcriptionally downregulated during primary stem development (Figure 7). Interestingly, of the three single-copy genes encoding enzymes in the pre-chorismate pathway (DQS, DHQD/SD, and CS) all were only slightly more highly expressed in older parts of the stem relative to the tip of young stems, and DQS was the only one which showed statistically significant differential expression over the course of development (Figure 7). Among the three genes encoding chorismate mutase (CM; Eberhard et al., 1993, 1996; Mobley et al., 1999), the first step of the Phe/Tyr-specific branch, only CM1 was more highly expressed in stem sections associated with lignification of developing fibers, while CM2 and CM3 were not differentially expressed (Figure 7).

The last two steps of phenylalanine and tyrosine biosynthesis in plants have been characterized biochemically and are distinct from those characterized in most fungi and bacteria. However, genes encoding the enzymes presumed to be involved, prephenate:glutamate aminotransferase (PNT), which is common to tyrosine and phenylalanine biosynthesis and arogenate dehydratase (ADT), specific to phenylalanine biosynthesis, have not been identified from plants. Within the Arabidopsis genome, six genes that display limited sequence similarity to prephenate dehydratase from bacteria and fungi (Ackerman et al., 1992) have been suggested to be candidate genes encoding arogenate dehydratase in Arabidopsis (Hsieh and Goodman, 2002), and each of these is represented on our array. Among these, ADT5, ADT6, and possibly ADT3 displayed expression patterns over the course of stem development that are consistent with a role in phenylalanine biosynthesis, that is, coordinate expression with other shikimate pathway and phenylpropanoid genes (Figure 7).

As no genes have been identified encoding PNT, we retrieved all the Arabidopsis protein sequences that contain a putative aminotransferase domain (pfam motif PF00155, from TAIR ( Of those 57 genes, 55 were represented on the microarray employed and 12 of these were differentially expressed in the samples analyzed (Figure 7b). Among these differentially expressed genes only At1g34060, Aat2g38400 (AGT3), and At2g20610 (SUR1/ALF1) displayed expression patterns along the stem development gradient that would be consistent with a function in the shikimate/lignin biosynthetic pathway, thus highlighting them as candidate prephenate aminotransferase genes.


Interfascicular fiber formation in Arabidopsis requires the regulation of cell fate and differentiation, and the integration of cell biological processes and metabolic pathways leading to cell maturation. A combination of approaches will be necessary to gain a more complete understanding of these complex processes and their regulation. In this study, we assayed global changes in the Arabidopsis transcriptome along a developmental gradient of inflorescence stem maturation that incorporates progressive fiber differentiation. While neither the expression profile of a given gene of unknown function nor sequence similarity to functionally characterized genes can prove biochemical or cellular function, a combination of both techniques on a whole genome level provides a powerful tool for hypothesis building and identification of candidate genes worthy of more detailed study. We have applied this approach to identify candidate genes potentially encoding missing links in lignin biosynthesis and transcriptional regulators of fiber differentiation and maturation. Our results shed light on the coordination of metabolic pathways with cellular differentiation and complement previous genetic, expression profiling, and bioinformatic studies to provide new insights into fiber differentiation and maturation.

Shikimate pathway

Upregulation of the shikimate pathway in conjunction with stem development might be anticipated as one mechanism by which maturing fiber cells could meet the metabolic demand for phenylalanine required for monolignol biosynthesis. Indeed, the expression of genes encoding individual shikimate pathway enzymes is both developmentally regulated and responsive to several environmental stimuli such as light, mechanical wounding, or pathogen infection (Eberhard et al., 1996; Keith et al., 1991; Lee et al., 1997; Mobley et al., 1999). Although most steps of the shikimate pathway have been well characterized in plants (for references on each enzyme see Table S5), no compilation of the gene families encoding this pathway has previously been published. Most Arabidopsis shikimate pathway enzymes are encoded by single genes or small gene families sharing high sequence similarity. This suggests that in most cases all gene family members encode enzyme isoforms with the same biochemical function. While more distantly related genes are generally absent from the Arabidopsis genome, an exception is two putative Arabidopsis SK genes that share relatively high sequence identity with a characterized SK gene from tomato (Schmid et al., 1992), also share somewhat lower similarity with two additional Arabidopsis genes (Table S4). In general, however, it appears that duplicated genes encoding enzymes of the shikimate pathway have not been recruited by other metabolic pathways, as has apparently happened with genes encoding enzymes related to phenylpropanoid enzymes (Table S2 and references therein).

Our global expression profiling results show that genes encoding enzymes of the shikimate pathway leading to the biosynthesis of phenylalanine are transcriptionally upregulated in primary stems consistent with enhanced demand of this amino acid for lignin biosynthesis (Figure 1; Table 1). In contrast, the parts of the pathway that are specific for the other aromatic amino acids are not differentially expressed. This suggests a pattern of transcriptional control of the aromatic amino acid biosynthetic pathway that directly reflects the physiological requirement for individual amino acids. A tight crosstalk between phenylpropanoid metabolism and the shikimate pathway was also observed in Arabidopsis plants mutated in the phenylalanine ammonia lyase encoding genes PAL1 and PAL2, which results in the accumulation of aromatic amino acids and a transcriptional upregulation of shikimate pathway biosynthesis genes (Rohde et al., 2004). Within some gene families individual members respond differently along the developmental gradient. For example, only one of the three CM genes in Arabidopsis (CM1) is upregulated, while CM2 and CM3 are not differentially expressed along the axis of primary stems (Figure 7). CM1, CM2, and CM3 display different organ-specific and stress-induced expression patterns (Eberhard et al., 1996; Mobley et al., 1999). The authors suggest, based on enzymatic properties and expression patterns, that CM1 might become, in particular, important when flux through the pathway is rapidly increased, for example, in response to environmental stress (Mobley et al., 1999) and our data support this hypothesis.

A similar pattern of apparent functional differentiation was detected in the case of DAHP synthase (DHS) genes. Previous studies have found that DHS1 is induced by wounding and pathogen attack, and may thus provide precursors for secondary metabolism, while DHS2 is more constitutively expressed, suggesting a role in providing aromatic amino acids for protein biosynthesis (Keith et al., 1991). Unfortunately, DHS1 was not represented on the array used, but DHS2 expression is only elevated in the oldest part of the stem, while DHS3, an uncharacterized third Arabidopsis gene, is strongly upregulated in coordination with ongoing lignin biosynthesis.

Taken together, in cases where multiple genes exist, the distinct isoforms appear to be differentially regulated and could potentially provide aromatic amino acid precursors for different physiological requirements, for example, protein biosynthesis and secondary metabolism/lignin biosynthesis. In contrast, those enzymes in the pre-chorismate pathway encoded by single-copy genes that are likely targeted to the chloroplast (DQS, DHQD/SD, and CS) must meet all physiological requirements during plant growth and development. In accordance with such broader roles, these genes are constitutively expressed or are only weakly upregulated in primary stems in concert with lignin biosynthesis (Figure 7).

Phenylalanine biosynthesis in plants follows an alternative pathway in which prephenate is first transaminated to form arogenate, which in turn is dehydrated to phenylalanine (De-Eknamkul and Ellis, 1988; Siehl and Conn, 1988). While neither a prephenate aminotransferase (PNT) nor an arogenate dehydratase (ADT) gene has been identified in plants to date, our global expression profiling approach showed that three Arabidopsis genes with similarity to prephenate dehydratase from yeast (ADT3, At2g27820; ADT5, At5g22630; and ADT6, At1g08250) have expression patterns consistent with roles in phenylalanine biosynthesis (Figure 7) and are predicted to be localized to the chloroplast (Table S5), making them candidates for arogenate dehydratase. Our profiling also identified three candidate PNT genes. Among these, At2g20610 has previously been identified in several mutant screens. Loss of At2g20610 function causes a variety of phenotypic abnormalities including excessive root formation, and a drastic increase in endogenous auxin levels (Boerjan et al., 1995 [sur1], Celenza et al., 1995 [afl1], King et al., 1995 [rty], Lehman et al., 1996 [hls3]). This phenotype could be explained by re-channeling carbon flow into the tryptophan/auxin biosynthesis pathway by a block in the phenylalanine pathway. However, Mikkelsen et al. (2004) recently showed that sur1 plants lack glucosinolates, accumulate l-cysteine conjugate precursors and that SUR1 encodes a C-S lyase that likely converts S-(alkylacetohydroximoyl)-l-cysteines to the corresponding thiohydroximic acids.

The second candidate PNT gene, At1g34060, encodes a homolog of Allium spp. alliinases that catalyze the cleavage of cysteine sulphoxide derivatives to produce the volatiles responsible for the typical flavor of onion and garlic (Jones et al., 2004) and thus also belongs to the C-S lyase protein family (Lancaster et al., 2000). The known or likely activity of SUR1 and At1g34060 as C-S lyases makes these two genes unlikely candidates to encode prephenate aminotransferase. However, as aminotransferases can also have C-S lyase activity (Gaskin et al., 1995), we cannot exclude the possibility that these gene products may also be capable of transaminating prephenate.

The third candidate PNT gene (At2g38400) is most likely to encode a true aminotransferase rather than a C-S lyase. It shares 37% amino acid sequence identity with an alanine:glyoxylate aminotransferase (AGT2) from rat and has been described as one of three Arabidopsis AGT2 homologs (Liepman and Olsen, 2003). However, the recombinant protein of one of this AGT2 homologs failed to exhibit glyoxylate aminotransferase activity using several amino acid donors (Liepman and Olsen, 2003), making it possible that AGT3 actually encodes the prephenate amino transferase of the shikimate pathway.

The identification of candidate genes for prephenate aminotransferase and arogenate dehydratase opens the door to functionally test their candidacy. One approach would be to test their abilities to complement the pha2 (prephenate dehydratase) and aro8/aro9 (aromatic amino acid aminotransferase; Iraqui et al., 1998) mutants from yeast by simultaneously expressing combinations of both candidate genes and thereby potentially establishing the end point of plant phenylalanine metabolism in yeast.

Monolignol transport

We used the expression pattern of a set of known or inferred monolignol biosynthetic genes as a benchmark to identify candidate genes for less well-characterized steps of lignin biosynthesis. No direct experimental evidence exists regarding the nature of the transporters involved in monolignol export. Nevertheless, ATP binding cassette (ABC) type transporters are plausible candidates fulfilling this function, along with vesicle-mediated secretion of monolignols (Samuels et al., 2002). This class of transporters is involved in the import or the export of a wide variety of soluble metabolites ranging from carbohydrates, fatty acids, and proteins to aromatic molecules (Higgins, 2001). They have been associated with detoxification, xenobiotic, and heavy metal transport in all kingdoms, but it is becoming increasingly clear that in plants ABC transporters are also involved in the transport of secondary metabolites such as terpenoids, alkaloids, very long-chain fatty acids, and anthocyanins (Goodman et al., 2004; Jasinski et al., 2001; Pighin et al., 2004; Shitan et al., 2003). We identified seven candidate ABC transporters that display expression patterns in primary stems consistent with expression profiles of monolignol biosynthetic genes and increased lignin content. No function has yet been attributed to any of these, but two of them (MDR13 and MDR8) have expression patterns that most closely resemble those of monolignol biosynthesis genes. Both proteins are likely targeted to the cytoplasm and belong to the multiple drug resistance (MDR) subfamily of the ABCB class of full transporters featuring two trans-membrane domains and two nucleotide binding domains (TM-ABC)2 (Garcia et al., 2004; Sánchez-Fernández et al., 2001). Interestingly, MDR8 is the Arabidopsis homolog to an MDR from Coptis japonica (CjMDR1) that is involved in the translocation of berberine, a benzylisoquinoline alkaloid (Shitan et al., 2003). The other candidate transporter genes identified in our analysis, WBC23, PDR1, PDR8, and PDR13, are related to the human ABCG subfamily (Garcia et al., 2004). While WBC23 encodes a half transporter consisting of a single ABC and TM domain, PDR (pleiotropic drug resistance) genes encode full transporters with an (ABC-TM)2 organization (Sánchez-Fernández et al., 2001). While all three PDR proteins are predicted to be targeted to the chloroplast making them unlikely monolignol transporter candidates, WBC23 is likely targeted to the cytoplasm. WBC23 is a homolog of the WHITE, BROWN, and SCARLET proteins from Drosophila melanogaster, which are involved in the export of aromatic metabolites such as 3-hydroxykynurenine, a tryptophan derivative (Mackenzie et al., 2000). Thus far, PDR like genes in plants, have only been implicated in the transport of the diterpene sclareol (van den Brûle et al., 2002) while the functions of most family members remain elusive.

Monolignol dehydrogenation and polymerization

Upon transport into the apoplast, polymerization of monolignols is initiated by dehydrogenation of monolignols followed by enzyme-independent radical coupling. Many different types of oxidative enzymes have been associated with the dehydrogenation of monolignols (for review see Boerjan et al., 2003). However, in addition to developmental lignification, oxidative enzymes such as peroxidases are involved in a variety of other physiological responses including auxin catabolism, defense against pathogens, salt tolerance, and oxidative stress (Hiraga et al., 2001). This functional diversity, and the large size of the gene families encoding these enzymes (Table S4 and references therein), makes it difficult to identify isoforms that are specifically involved in lignin polymerization. Our profiling study identified 22 oxidizing enzymes (class III peroxidases, laccases, NADPH oxidase-like enzymes, oxalate oxidases, and copper amine oxidases) that display co-expression with monolignol biosynthetic genes (Figure 6). These belong to five of the six classes analyzed, a range that might reflect the participation of a mixture of divergent oxidizing enzymes in the dehydrogenation process (for references see Table S4). Among these, eight class III peroxidases displayed expression profiles similar to monolignol biosynthetic genes and none of them has previously been implicated in lignin polymerization.

In Arabidopsis, only the peroxidase gene AtPA2 (At5g06720) is expressed in lignifying tissues using promoter-reporter gene fusions and its expression level is enhanced in a mutant characterized by elevated lignin levels (Østergaard et al., 2000). AtPA2 expression was also significantly elevated in middle sections of primary stems in our study. However, the fold change remained below twofold, and therefore it was not included in the cluster analysis depicted in Figure 6. Similarly, the two Arabidopsis genes most similar to a poplar peroxidase that has been correlated with lignin biosynthesis (At4g08770 and At4g08780 compared with PXP3-4; Christensen et al., 2001) as well as At4g21960, the closest Arabidopsis relative to a peroxidase from tobacco that has been implicated in lignin biosynthesis (Blee et al., 2003), were excluded from further analysis because they barely missed our stringent criteria for being differentially expressed. Taken together, these results suggest that the eight peroxidases identified may not even represent the whole set of peroxidases involved in lignin polymerization, suggesting that a plethora of different enzymes could be involved.

Although peroxidases have been traditionally implicated in monolignol polymerization, laccases, which use molecular oxygen as a electron donor and can oxidize monolignols in vitro, have been implicated in lignification in many plant species (Gavnholt et al., 2002; Kiefer-Meyer et al., 1996; LaFayette et al., 1999; Ranocha et al., 2002). The Arabidopsis genome harbors many genes belonging to the blue copper superfamily to which laccases belong. However, only 17 deduced protein sequences are more similar to any of the characterized laccases from dicots (Table S4) than to an ascorbate oxidase from cucumber (Ohkawa et al., 1989). Among the six laccase genes transcriptionally upregulated over the course of stem development, two, LAC04 and LAC11, are most closely related to LAC01, LAC02, and LAC03 from poplar and a tobacco laccase (Kiefer-Meyer et al., 1996; Ranocha et al., 2002). Antisense suppression of LAC03 from poplar results in adhesion defects in cell walls of xylem fibers (Ranocha et al., 2002). Two other proteins, LAC02 and LAC17, are the only Arabidopsis proteins that group with poplar LAC110 (Ranocha et al., 2002) and all tulip tree laccases (LaFayette et al., 1999) in phylogenetic reconstructions, while the two remaining proteins, LAC05 and LAC12 group with the poplar LAC90 protein sequence (data not shown). This suggests that in Arabidopsis, multiple, phylogenetically divergent laccase isoforms are under similar transcriptional control and may serve redundant functions in lignin polymerization.

Original models for lignin monomer polymerization assumed that radical coupling occurs randomly driven by the supply of monolignol radicals. However, the discovery of dirigent proteins showed that radical coupling reactions can be guided by binding proteins that provide stereoselectivity to the reaction in lignan (monolignol dimer) biosynthesis (Davin et al., 1997). Dirigent isoforms could fulfill the task of generating ordered structure to the lignin polymer (Gang et al., 1999). However, an essential role for dirigent proteins in lignin polymerization has yet to be demonstrated (Boerjan et al., 2003).

We identified 21 genes in the Arabidopsis genome with significant sequence similarity to the Forsythia dirigent protein (Table S4), and each is represented on the microarray we used. While none of the dirigent genes is tightly co-expressed with the majority of monolignol biosynthetic genes in Arabidopsis stems (i.e., in expression categories Ia/b; Figure 6), three genes are co-expressed with some monolignol biosynthetic genes, and could play roles in incipient lignin polymerization. In phylogenetic reconstructions (data not shown), Arabidopsis dirigent proteins form three distinct clades. Three related members (DIR9, DIR10, and DIR18) form one cluster and these sequences all have large insertions with repetitive sequences in their coding region that are not found in other dirigent proteins. None of these genes is differentially expressed in primary stems. DIR5, 6, 12, 13, and 14 form a second cluster and are most closely related to the original lignan-specific protein from Forsythia. Among these, DIR5 and DIR6 were differentially expressed with patterns that most closely resemble the expression of monolignol biosynthetic genes in our study. The remaining proteins form a third distinct clade in phylogenetic reconstructions, and among these 16 genes, only DIR11 is co-expressed with PAL3 in expression cluster Ic (Figure 6). Interestingly, five dirigent genes are actually expressed more highly in young parts of the stem and are downregulated in stem sections with ongoing fiber lignification. This suggests that these genes serve biochemical functions in the upper part of developing stems, perhaps related to the high levels of soluble phenolic compounds in these samples. Expression profiling at higher resolution and reverse genetics approaches could shed more light on the functions of the three lignification-associated dirigent candidates we identified.

Transcriptional regulation

Transcriptional regulation is likely to play a key role in the complex series of events leading to fiber cell differentiation and maturation along the developmental axis we surveyed by global expression profiling. However, not all genes encoding transcription factors that are involved in regulating these events are necessarily differentially expressed, but could be expressed constitutively and/or be activated post-translationally. Alternatively, subtle but critical differences in expression levels over distances of only a few cells might not be detectable with our approach. Thus, it is not surprising that certain genes that have been implicated in the regulation of lignin biosynthesis and/or fiber differentiation were not differentially expressed in our experiment, for example, AtHB8 and REV/IFL (Baima et al., 2001; Zhong and Ye, 1999). Probes for other functionally characterized transcription factor genes were not represented on our array (e.g., PAP1/MYB75; Borevitz et al., 2000), or just barely failed to meet our statistical criteria (F: P < 0.01) for inclusion in an expression category. For example, MYB61, which when over-expressed results in ectopic lignification of roots and stems (Newman et al., 2004), was up to fourfold more highly expressed in older stem sections compared with the tip, with an anovaP-value of 0.007, but the analysis of the quadratic expression model for this gene resulted in a P-value of 0.014 for the F-statistic, therefore excluding it from further analysis. This suggests that our rigid statistical analysis likely excludes valid candidate genes. However, we still identified 271 transcription factor genes that were differentially expressed, among which 191 were upregulated along the axis of primary stem development in a manner consistent with potential regulatory functions in fiber differentiation.

Not all the candidate DE transcription factors identified here are involved in regulating fiber differentiation, but may be involved in other developmental processes within the primary stem. To filter our data, we compared it with an expression map of Arabidopsis root cells and tissues over the course of root development (Birnbaum et al., 2003). Like xylem and interfascicular fiber cells of the stem, certain cells destined to form xylem in the root stele also undergo a differentiation process culminating in the formation of elongated secondarily thickened and lignified cells. Furthermore, auxin has been identified as a key signal that regulates both xylem differentiation in the stele and interfascicular fiber differentiation (Birnbaum et al., 2003; Little et al., 2002; Zhong and Ye, 2001). Of the 16 transcription factor genes upregulated specifically in the stele (Birnbaum et al., 2003) nine were also represented in our upregulated expression categories 4, 5, and 8 (Table S2). This suggests conserved and possibly redundant functions of this small set of candidate genes in primary stem fiber and root stele development.

Interestingly, this candidate gene set contains MYB and bHLH transcription factors with the potential to interact with each other to regulate target gene transcription. MYB/bHLH interactions are important in regulating anthocyanin and proanthocyanidin biosynthesis, trichome development, and other developmental processes (Baudry et al., 2004; Goff et al., 1992; Payne et al., 2000 and references therein). Among the genes we identified, MYB43 and MYB20 are closely related based on phylogenetic reconstructions (Stracke et al., 2001). As we can exclude cross-hybridization based on the oligo sequences used as probes, this suggests a conserved and possibly redundant functions of these genes. In contrast, MYB63 has been placed in a different phylogenetic cluster (Stracke et al., 2001) and the closest relative, MYB58, is not differentially expressed in stems or roots. No information is available regarding the potential functions of these genes, or of the co-expressed bHLH genes At1g29950 (AtbHLH144) and At4g29100 (AtbHLH068), but it is interesting that MYB43, MYB63, and AtbHLH068 are at least twofold more highly expressed in developing Arabidopsis secondary xylem relative to bark (Oh et al., 2003; Table S2).

Included among the transcription factor candidate genes, is the AP2-EREBP gene At5g07580. This class of transcription factors is unique to plants, and those members for which functions are known play regulatory roles in various developmental and defense-related processes (Riechmann and Meyerowitz, 1998). Within the large AP2-EREBP family, the closest relative of At5g07580 is At5g61590; these two genes are 71% identical, and could thus easily be distinguished by gene-specific probes on our array. Both genes were placed in expression cluster 4, but only At5g07580 was also placed in stele-specific LED 1, while At5g61590 is not associated with an LED in roots (Birnbaum et al., 2003). Therefore, both genes could have similar functions in stems, while no redundancy is expected in roots. The candidate gene KNAT7 (At1g62990), which belongs to the class II of KNOX-like homeobox genes (Serikawa et al., 1996), shares high sequence similarity with KNAT4 (At5g11060) and KNAT3 (At5g25220), neither of which were differentially expressed in our experiment or placed in a root LED expression category by Birnbaum et al. (2003). In comparison with class I KNOX genes that play roles in regulating cell fate and meristem indeterminancy (Tsiantis, 2001), little is known about this subgroup and to our knowledge no function has been assigned yet to any of its members. The Arabidopsis class I KNOX gene KNAT1/BREVIPEDICELLUS (BP) plays a role in regulating interfascicular fiber differentiation and lignin deposition, in addition to maintenance of meristem indeterminacy (Mele et al., 2003), and the expression of a putative ortholog was correlated with secondary wall formation in a poplar microarray experiment (Hertzberg et al., 2001). Unfortunately, BP was not represented on our array, but these results highlight the possible roles of KNAT genes in general and KNAT7 in particular in fiber differentiation.

Also represented among the nine candidate transcription factor genes is bZIP9 (At5g24880), an uncharacterized member of the basic leucine zipper motif (bZIP) transcription factor class. bZIP transcription factors form homodimers and heterodimers that play important roles in regulating defense responses and development (Jakoby et al., 2002). Interestingly, a second bZIP gene, TGA1 (At5g65210) was placed in expression category 4, and root LED 7 (upregulated in all root tissues over development). TGA1 together with other TGA partner proteins, is known to play a key role in defense signaling by interaction with the regulatory protein NPR1 (Després et al., 2003), and is also more than twofold more highly expressed in Arabidopsis secondary xylem relative to bark (Oh et al., 2003). The final candidate gene in this set is At5g42200, a member of the C3H ring zinc finger family, whose functions are poorly characterized. Interestingly, another C3H transcription factor, At4g26580, was also placed in our expression category 4, but was not represented on the array used by Birnbaum et al. (2003). This gene is highly upregulated in Arabidopsis secondary xylem (Oh et al., 2003), and a putative poplar At4g26580 ortholog is strongly upregulated in association with the final stages of secondary xylem development in poplar (Hertzberg et al., 2001; Table S2). Thus, these C3H genes are interesting candidates for further functional analyses.

We chose to filter our transcription factor candidate genes against the root expression data set of Birnbaum et al. (2003) as this is one of the most comprehensive and detailed microarray data sets in the literature. However, this would exclude candidate regulators that are stem or interfascicular fiber cell-specific or are regulating late stages of vascular differentiation given that the root tissue analyzed by Birnbaum et al. (2003) likely do not contain cells with secondarily thickened walls. We thus performed filtering against other published microarray experiments, including Arabidopsis secondary xylem (Oh et al., 2003), weight-induced secondary xylem formation in Arabidopsis (Ko et al., 2004), Zinnia tracheary element trans-differentiation (Demura et al., 2002), and developing poplar xylem (Hertzberg et al., 2001). These results, highlighted in gray in Table S2, identified several other candidates among our set of transcription factors in expression categories 4, 5, and 8, including those in MYB, bZIP, C3H, AP2-EREBP, and other classes (Table S2), that may play important roles in stem fiber differentiation, but are not correlated with stele development.

In summary, by carefully comparing our data set with that of other, related experiments, we were able to identify a total of 19 putative transcription factors that are upregulated in our analysis and also in at least two of other global expression profiling experiments. Although each experiment had a different focus, in all cases samples enriched for cells undergoing secondary cell wall biosynthesis were compared with samples that are mainly characterized by cells with primary walls. Therefore, we believe that these genes are strong candidates for transcriptional regulators of secondary cell wall formation, and are worthy of further investigation.

Experimental procedures

Plant material

Arabidopsis plants (ecotype Ler) were grown under short-day conditions (8 h cool white fluorescent light at 100 μE, 16 h dark) at 20 °C and ambient humidity in PGW36 phytochambers (Controlled Environment Ltd, Winnipeg, MB, Canada). Plants were grown in Redi soil (Grace & Co., Ajax, ON, Canada) in 12 cm pots at a density of five plants per pot without fertilization for 8 weeks. Light conditions were then changed to long day (16 h light, 8 h dark), and plants were fertilized once with standard 10-15-10 plant fertilizer (150 μl in 300 ml water per pot) (Schultz, St Louis, MO, USA). Twelve and 14 days after changing to long-day conditions we harvested sections from stems totaling 4–6 cm and 9–11 cm in length, respectively. All flowers, siliques, cauline leaves, and secondary branches were discarded and the remaining bolting stems were cut with a razor blade to generate the following sections: 0–2 and 2–4 cm from 5 cm stems; and 0–3, 3–5, 5–7, and 7–9 cm from 10 cm stems.

Plant material was transferred to liquid nitrogen and stored at −80°C until further use. For microscopical analysis, stem sections were placed in styrofoam blocks and serial hand sections were performed using a razor blade. Sections were mounted in tap water and examined with an Axiophot epifluorescence microscope (Zeiss, Jena, Germany) using either brightfield or UV illumination with a Zeiss filter 18 (excitation at BP390–440 nm and emission at 470 nm).

RNA isolation

Total RNA was isolated using a modified TRIZOL extraction method as follows. Approximately 1 g of plant material was ground in liquid nitrogen using a mortar and pestle, resuspended in 15 ml TRIZOL reagent (Invitrogen, Carlsbad CA, USA), vortexed and incubated at 65°C for 5 min with regular mixing. Cell debris was pelleted by centrifugation for 30 min at 12 000 g and 4°C and the supernatant was extracted with 3 ml chloroform twice. After centrifugation for 20 min at 12 000 g, the aqueous phase was recovered and RNA was precipitated at room temperature for 5 min with 0.5 volumes of 0.8 m sodium citrate and 0.5 volumes of isopropanol. After centrifugation for 30 min at 12 000 g, the pellet was washed with 70% ethanol and re-centrifuged. The RNA pellet was air dried for 5 min and resuspended in 200 μl RNAse free water. Following a spectrophotometric determination of RNA concentration, RNA was precipitated with 2.5 volumes of ethanol and a 1/10 volume of 3 m sodium acetate at −20°C overnight, and subsequently pelleted at 20 000 g for 30 min at 4°C. The precipitate was washed with 70% ethanol, re-centrifuged, air dried and resuspended in RNAse free water to an approximate concentration of 5 μg/μl. Actual concentration was determined spectrophotometrically, and RNA quality was determined using a 2100 Bioanalyzer (Agilent Technologies, Mississauga, ON, Canada).

Microarray design and production

The Arabidopsis Genome Oligo Set Version 1.0 (Operon Biotechnologies, Huntsville, AL, USA) consists of 26 090 70mer oligonucleotides ( As positive controls, we included oligos for 12 housekeeping genes (Operon). As negative controls, we synthesized four oligonucleotides specific for human genes with no similarity to any Arabidopsis gene and included 12 oligonucleotides with no similarity to any Arabidopsis gene. Three of these oligos were complementary to the human cRNAs generated as internal standards (spikes, see below). We used a PCR-amplified green fluorescent protein (GFP) cDNA (Invitrogen) as an orientation marker. Oligonucleotides were resuspended in 384-well flat bottom plates (Nunc, Rochester, NY, USA) to a concentration of 100 mm in 3x SSC. Oligos were printed on MicroGrid II robots using Microspot 10 k pins (Biorobotics, Huntington, UK) depositing approximately 0.5 nl (0.0075 pmol) of each oligo onto EZ rays aminosilane slides (Apogent Discoveries, Hudson, NH, USA). The pitch of the grid used for this library was 0.3 mm. Oligos were UV cross-linked at 3000 × 100 μJ using a UV Stratalinker 2400 (Stratagene, La Jolla, CA, USA). We spotted single spots for 25 792 of the 26 090 target oligonucleotides (due to software problems the print run was terminated shortly before completion); in addition, each subgrid contained six replicate spots of each of the four human negative controls, three spots of equally distributed Operon-negative controls, a single spot of each of the 12 housekeeping controls, and GFP marker on each corner of the subgrid. The location of each oligo is given in the platform file deposited to the GEO database (series GSE2000).

RNA labeling and microarray hybridization

Total RNA was used for a direct labeling procedure. Total RNA (80 μg) was incubated with 0.27 μm T17VN primer, 0.15 mm dATP, dCTP, and dGTP, 0.05 mm dTTP (Invitrogen), 0.025 mm Cyanidin3- or Cyanidin5-conjugated dUTP (Amersham, Piscataway, NJ, USA), 40 U RNAseInh (Promega, San Luis Obispo, CA, USA), and 400 U SuperscriptII (Invitrogen) in 10 mm DTT and 1x first strand buffer in a total volume of 40 μl. In addition, 0.3 fmol human cRNAs complementary to the human negative control oligonucleotides were used in labeling reactions (HsD17B1, KRT1, and MB). Prior to the addition of enzymes the solution was heated to 65°C for 5 min and for primer annealing cooled to 42°C. Following an incubation at 42°C for 2.5 h, the RNA was degraded with 8 μl 1 M sodium hydroxide for 15 min at 65°C, neutralized with 8 μl 1 m hydrochloric acid and buffered with 4 μl 1 m Tris, pH 7.5. Subsequently, the labeled cDNA was purified using a PCR purification kit according to the manufacturer's protocol (Qiagen, Mississauga, ON, Canada). DNA was eluted in 100 μl 10 mm Tris, pH 8.5, the two labeling reactions were combined, and 1 μl Cyanidin5-labeled GFP was added. Following an ethanol/sodium acetate precipitation the air-dried cDNA pellet was resuspended in 3 μl water, denatured at 95°C for 3 min, added to 50 μl pre-warmed array hybridization buffer no. 1 (Ambion, Austin, TX, USA), and kept at 65°C until further use. We pre-hybridized microarray slides for 45 min at 48°C in 5x SSC, 0.1% SDS, 0.2% BSA. Slides were washed twice with water for 1 min, dipped five times in isopropanol, and spun dry in Falcon tubes at 100 g for 3 min. The hybridization solution was applied to the microarray slides and covered with untreated glass cover slips (Fisher Scientific, Nepean, ON, Canada). Arrays were incubated over night in CMT hybridization chambers (Corning, Corning, NY, USA) submerged in a water bath at 42°C with moderate vertical shaking. Hybridization chambers were disassembled and slides were washed for 15 min at 42°C in 2x SSC, 0.5% SDS, and for two times 15 min in 0.5x SSC, 0.5% SDS. Subsequently, arrays were dipped five times in 0.1x SSC and spun dry as described above. Microarrays were scanned with a ScanArray Express (Perkin-Elmer, Woodbridge, ON, Canada) scanner with laser power set to 95% and photo-multiplier tube set to 54–64%.

Global expression profiling analysis

We identified and quantified spots using the ImaGene software (BioDiscovery, Marina Del Rey, CA, USA). Grids were manually placed and spot finding was performed using the ‘Auto adjust’ spot function repeated for three times. Spot finding was subsequently verified by visual inspection and manually adjusted when necessary. Poor spots were manually flagged (flag 1) and were not used in further data analyses. For all analyses, the median pixel intensities for each spot were used. The raw data files were deposited into the GEO database (series GSE2000). Further analyses were performed with gene-specific elements only using customized scripts for R and Bioconducter (The R Development Core Team, For background correction, we defined the mean of the lowest 10% of spot intensities from a particular subgrid as the background for that subgrid. This mean was subtracted from each spot in the subgrid. We normalized using Loess curves (Yang et al., 2002). For each element, we first used the data from the four replicate arrays for each sample to perform a paired Student's t-test using the Welch approximation to the degrees of freedom. Subsequently, an anova using data from all experimental samples was performed for each element. In order to assess the type I error rate, we calculated q-values estimating the false-discovery rate based on the parametric P-values (Storey and Tibshirani, 2003). We then categorized genes by using a quadratic polynomial regression model (Bryan, 2004) using the ratios for the samples from 10 cm stems only. Genes of interest should follow a smooth path across the four types that should be describable by some section of a parabola with positive or negative leading coefficient. Each unique element gets its own model of the form y = β0 + β1*x + β2*x2. The set of responses for each model, the y's, are the 16 log ratios from the 16 arrays where the treatment is material from the 10 cm sections. We numbered the four different section types, the x's, as −1.5, −0.5, 0.5, and 1.5 in increasing order from earliest to latest stages of stem development. This causes the linear and quadratic terms in our models, β1 and β2 respectively, to be uncorrelated which in turn allows inferring prevalence of genes with one expression pattern compared to another. We initially fitted a quadratic model to the set of responses. If the quadratic term had a P-value >0.05, a linear model was refitted. Overall, only genes with a P-value <0.01 for the model F-statistic were deemed significant. We can then categorize the genes according to properties of β1 and β2. Among the genes for which both β1 and β2 were deemed significantly different from zero, β1 and β2 may each be positive or negative so these genes give rise to four categories. We obtain two more categories from the genes for which β1 is either positive or negative and β2 is not significant and another two categories from the genes for which β2 is either positive or negative and β1 is not significant. We thus have eight mutually exclusive categories into which we may place any gene for which at least one of β1 and β2 is deemed significant (F-statistic: P < 0.01). Values obtained for F-statistic P, β1, p(β1), β2, p(β2), and β0 for all genes are given in Table S1.

Gene family analyses

Based on the available literature we first collected all plant protein sequences for a gene family of interest. These were used to search the Arabidopsis proteome using WU-BLAST at TAIR ( The resulting Arabidopsis sequences were again searched against the Arabidopsis proteome at TAIR and the results were combined with FASTA scores for the same genes at MAtDB (Schoof et al., 2002) to identify all protein sequences with sequence similarity for the gene families analyzed. Instead of using a pre-defined e-value as a cut-off point, alignments were inspected manually and cut-off points were determined individually based on sequence similarity and length of the resulting hits. All protein sequences were aligned using the DiAlign algorithm (Morgenstern et al., 1998) provided by Genomatix (München, Germany, The resulting pairwise similarity matrices were used to identify protein sequence identities given in the corresponding Supplementary Tables. Similarities are given to the most closely related described plant protein based on phylogenetic analyses using parsimony methods (paup v4.0; Sinauer Associates, Sunderland, MA, USA). If proteins from Arabidopsis have been published, they were given preference. For cluster analyses of gene families, the expression data for these genes were first filtered based on the anovaP-value and only genes with P < 0.05 (q < 0.08) were used. The mean values of the normalized expression ratios were subjected to a hierarchical cluster analysis with complete linkage using Genesis v1.2 (Institute for Biomedical Engineering, Graz University of Technology, Graz, Austria).

Quantitative real-time RT-PCR

Total RNA (15 μg) was first digested with 15 U DNAse in 1x buffer (Invitrogen) for 15 min at room temperature. The reaction was stopped with EDTA (2.3 mm final concentration) and heat inactivation (65°C, 10 min). RNA was precipitated with 2.5 volumes of ethanol and a 1/10 volume of 3 m sodium acetate at −20°C overnight, and subsequently pelleted at 20 000 g for 30 min at 4°C. The precipitate was washed with 70% ethanol, re-centrifuged, air-dried, and resuspended in RNAse free water to an approximate concentration of 1 μg/μl. Actual concentration was determined spectrophotometrically. Total RNA (10 μg) was used for reverse transcription with 0.27 μm T17VN primer, 0.15 mm dNTPs, 40 U RNAseOut, and 400 U SuperscriptII (Invitrogen) in 10 mm DTT and 1x first strand buffer in a total volume of 40 μl. Prior to addition of enzymes the solution was heated to 65°C for 5 min and for primer annealing cooled to 42°C. Following an incubation at 42°C for 2.5 h, the RNA was degraded with 8 μl 1 m sodium hydroxide for 15 min at 65°C, neutralized with 8 μl 1 m hydrochloric acid and buffered with 4 μl 1 m Tris, pH 7.5. For quantitative PCR reactions, 2 μl cDNA (the cDNA equivalent of 200 ng total RNA) was incubated with 10 μl QuantiTect SYBR Green Mastermix (Qiagen) and 30 nmol of each forward and reverse primer in a total volume of 20 μl. Oligonucleotide sequences of all primers are given in Table S6. After an initial denaturation at 95°C for 15 min, 35 cycles at 95°C for 30 sec, 60°C for 30 sec, and 72°C for 25 sec followed by a fluorescence reading were performed. After a final incubation at 72°C for 5 min, a melting curve was generated ranging from 95 to 52°C. Threshold cycles (CT) were adjusted manually, and the resulting CT were subtracted from CT values obtained for an actin2 probe amplified in parallel on each plate thus generating normalized ΔCT values. ΔCT values obtained for RNA from the 0–2 cm section was subtracted from ΔCT values for each sample thereby generation the equivalent of log2-ratios comparing expression levels in each sample with the 0–2 cm sample as a reference. These ratios were visualized as heatmaps using Genesis v1.2 (Institute for Biomedical Engineering, Graz University of Technology).

Chemical analysis of lignin

The chemical composition of Arabidopsis stems was determined according to a modified micro-Klason analysis (Huntley et al., 2003). In brief, 0.1 g freeze-dried stems were ground to pass a 40-mesh screen using a Wiley mill, soxhlet-extracted with acetone for 6 h, digested with 72% H2SO4 for 2 h, and then hydrolyzed in 4% H2SO4 for 1 h at 121°C. The total weight of extractable components was determined gravimetrically (acid-insoluble lignin), while the filtrate was analyzed for acid-soluble lignin by absorbance at 205 nm according to TAPPI Useful Method UM250. Thioacidolysis of extracted material was conducted according to published methods (Lapierre et al., 1999), with the volumes scaled to accommodate 20 mg of starting material.


This project was financially supported by Genome British Columbia/Genome Canada and the Province of British Columbia (funds to J.B., C.J.D., B.E.E., and K.R.), and by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants to C.J.D. and A.L.S. We thank Jeff Zeznik for crucial input in microarray design and Kim Wiegand of the Genome BC Gene Array Facility headed by Colleen Nelson for technical assistance and printing of the microarrays. We appreciate fruitful discussions with Michael Friedmann, Dae Kyun Ro, and Björn Hamberger.

Supplementary Material

The following material is available from

Figure S1. Distribution of parametric P-values based on t-statistics.
Four replicate microarrays were co-hybridized with labeled cDNA from 0–2 cm stem sections (reference) and (a) 0–3 cm, (b) 2–4 cm, (c) 3–5 cm, (d) 5–7 cm, and (e) 7–9 cm stem sections. Loess-normalized log2-transformed expression ratios for each element were used for t-test statistics. Histograms describe the distribution of observed P-value frequencies. A blue line indicates the fraction of expected false positives in each P-value bin.

Figure S2. Functional categories of differentially expressed genes.
Genes that have been placed into each expression categories as shown in Figure 2 were used to screen the Functional Catalogue (FunCat) terms (Ruepp et al., 2004) using the MAtDB database (MIPS Arabidopsis thaliana Data Base, The frequencies (in percentage of the total number of genes in that expression category) for the lowest hierarchical levels found to be over-represented (see text for details) are shown as blue bars. As a comparison the frequency for the same FunCat term observed with all 22 890 probes present on the array used and for which a locus identifier is available are shown as white bars. Functional groupings depicted in (a)–(h) correspond to the respective expression clusters shown in Figure 2(a–h).

Figure S3. Validation of selected transcription factor and ABC transporter genes using quantitative real-time RT-PCR.
Real time RT-PCR using SYBR Green with gene-specific primers (Table S6) for the genes indicated to the right was performed with cDNA derived from RNA of the samples indicated on top of the figure. As a control expression of a gene encoding tubulin (TUB9) was included. As internal standard, expression of the ACTIN3 cDNA was amplified in parallel with each of the target genes. Threshold detection cycles (CT) were normalized using the corresponding actin CT values to generate ΔCT values. ΔCT values for each gene were compared with the ΔCT value obtained for the 0–2 cm sample, which was used as a reference in microarray experiments and has therefore also been used as a reference here. Thus generated ΔΔCT values depicting the expression level in each sample in comparison with the expression level in the 0–2 cm sample are shown as heatmaps. Each RT-PCR was reproduced twice and results from these duplicates are shown for each gene. Red indicates ΔΔCT values larger than 3 (eightfold higher in the sample compared with the 0–2 cm sample assuming exponential amplification for both genes), yellow indicates identical ΔΔCT values (no difference), and blue indicates ΔΔCT smaller than −3 (eightfold lower in the sample compared with the 0–2 cm sample).

Table S1 Expression data for the 25,792 gene set

Table S2 Differentially expressed transcription factor genes grouped by gene family and expression category

Table S3 Locus information, gene names used in this study, and previously used gene names for monolignol biosynthesis genes in Arabidopsis thaliana

Table S4 Locus information, gene names used in this study, and source reference for genes potentially involved in monolignol transport, dehydrogenation, and polymerization

Table S5 Locus information and gene names used in this study for shikimate pathway genes in Arabidopsis thaliana

Table S6 Primer sequences used for real-time PCR analysis of transcription factor candidate genes