The integration of metabolomics and transcriptomics can provide precise information on gene-to-metabolite networks for identifying the function of unknown genes unless there has been a post-transcriptional modification. Here, we report a comprehensive analysis of the metabolome and transcriptome of Arabidopsis thaliana over-expressing the PAP1 gene encoding an MYB transcription factor, for the identification of novel gene functions involved in flavonoid biosynthesis. For metabolome analysis, we performed flavonoid-targeted analysis by high-performance liquid chromatography-mass spectrometry and non-targeted analysis by Fourier-transform ion-cyclotron mass spectrometry with an ultrahigh-resolution capacity. This combined analysis revealed the specific accumulation of cyanidin and quercetin derivatives, and identified eight novel anthocyanins from an array of putative 1800 metabolites in PAP1 over-expressing plants. The transcriptome analysis of 22 810 genes on a DNA microarray revealed the induction of 38 genes by ectopic PAP1 over-expression. In addition to well-known genes involved in anthocyanin production, several genes with unidentified functions or annotated with putative functions, encoding putative glycosyltransferase, acyltransferase, glutathione S-transferase, sugar transporters and transcription factors, were induced by PAP1. Two putative glycosyltransferase genes (At5g17050 and At4g14090) induced by PAP1 expression were confirmed to encode flavonoid 3-O-glucosyltransferase and anthocyanin 5-O-glucosyltransferase, respectively, from the enzymatic activity of their recombinant proteins in vitro and results of the analysis of anthocyanins in the respective T-DNA-inserted mutants. The functional genomics approach through the integration of metabolomics and transcriptomics presented here provides an innovative means of identifying novel gene functions involved in plant metabolism.
Plants produce a huge array of compounds that are potentially useful in developing novel medicines, flavors, industrial materials as alternatives for fossil fuel resources, and other specialty chemicals. Cumulatively plants are thought to produce about 200 000 natural products (Dixon and Strack, 2003). Unfortunately, only a limited number of genes involved in the production of these plant metabolites have been identified by classical genetic screening of mutants and enzyme purification.
The pap1-D mutant is a T-DNA activation-tagged line that over produces anthocyanins by the ectopic over-expression of the PAP1 gene encoding an MYB transcriptional factor by the action of an enhancer from the promoter of the cauliflower mosaic virus 35S transcript in the inserted T-DNA (Borevitz et al., 2000). In the pap1-D mutant, some structural genes for anthocyanin biosynthesis, such as those encoding phenylalanine ammonia lyase (PAL) and chalcone synthase (CHS), are expressed constitutively, and the accumulation of some phenylpropanoid derivatives such as anthocyanins is markedly enhanced (Borevitz et al., 2000). However, the transcriptome and metabolome have not been extensively characterized in this mutant. PAP1 over-expressing plants are an ideal model system for elucidating the whole cellular process at both transcriptome and metabolome levels under the expression of a single transcriptional factor.
The structures of flavonoids and their biosynthetic genes in A. thaliana have still to be completely elucidated. Recently, the structures of several anthocyanins (Bloor and Abrahams, 2002) and flavonol glycosides (Graham, 1998; Veit and Pauli, 1999) have been reported. Several genes encoding enzymes and regulatory proteins involved in the production of anthocyanins and proanthocyanidins have been isolated mainly by tt or ttg mutants of seed color (Winkel-Shirley, 2001). However, no genes encoding glycosyltransferase and acyltransferase for the modification of anthocyanin aglycones have been identified yet. For the identification of such genes involved in the production and modification of terminal metabolites in biosynthetic pathways, the combined analysis of transcripts and metabolites is a powerful technology (Jones et al., 2003).
Here, we performed the non-targeted comprehensive analysis of the metabolome and transcriptome of PAP1 over-expressing plants, with the following questions in mind: (1) what is the role of a single transcription factor in global gene expression and the subsequent cellular metabolite pattern? and (2) what are the specific gene-to-metabolite correlations resulting in the identification of the gene functions in the Arabidopsis genome? To answer these questions, we studied metabolomics by LC-MS for the targeted metabolite analysis of approximately 21 compounds combined with FT-MS for the non-targeted metabolite profiling of approximately 1800 putative metabolites, and transcriptomics using the DNA microarrays covering 22 810 genes of the Arabidopsis genome. We could show that a set of genes involved in anthocyanin accumulation were upregulated together with the production of cyanidin-type anthocyanins and quercetin-type flavonols; thus we determined induced gene functions in production of these compounds. Subsequently, two genes coding for flavonoid glucosyltransferases were identified by in vitro study using recombinant proteins and by anthocyanin analysis of T-DNA-inserted mutants. The present study shows a novel means of studying functional genomics through the integral analyses of the metabolome and transcriptome in plants.
Combined analysis of flavonoid-targeted and non-targeted methodologies indicate specific overaccumulation of cyanidin and quercetin derivatives and weak effects on global metabolome profiles by PAP1
Metabolome analysis involved a combination of flavonoid-targeted analysis by LC-MS, amino acid analysis by high-performance liquid chromatography (HPLC), anion and sugar analysis by capillary electrophoresis, and non-targeted large-scale metabolite analysis by FT-MS.
Anthocyanins. The flavonoid accumulation profiles of seven samples were analyzed by HPLC/photodiode array detection/electrospray ionization mass spectrometry (HPLC/PDA/ESI-MS). These samples included: (1) wild-type leaves grown on agar (WLA); (2) pap1-D leaves grown on agar (PLA); (3) PAP1 cDNA over-expressing transgenic Arabidopsis leaves grown on agar (OLA); (4) wild-type leaves grown on vermiculite (WLV); (5) pap1-D leaves grown on vermiculite (PLV); (6) wild-type roots grown on agar (WRA); and (7) pap1-D roots grown on agar (PRA).
The metabolites were putatively identified from their UV-visible absorption spectra and comprehensive analysis of mass fragmentation patterns obtained by tandem MS spectroscopy were compared with those of known compounds and reported data (cited in Table 1). Twenty-one peaks were detected, 17 of which were identified in the leaves and roots (Figures 1 and 2, Table 1). Eleven anthocyanin pigments (A1-A11) accumulated in the leaves of the PAP1 over-expressing lines (the pap1-D mutant and PAP1 cDNA over-expressing plant) (Figure 2c,ei, Table 1). However, these pigments were only detected at trace levels in the wild-type plant (Figure 2a,gk, Table 1). Among them, A5, A9 and A11 were the major anthocyanins in the leaves of the PAP1 over-expressing lines grown on agar and vermiculite.
Table 1. The flavonoid profiles in acidic MeOH-H2O extracts of the wild-type plant and PAP1 over-expressing lines
Leaf, agar (nmol g−1 FW)
Leaf, soil (nmol g−1 FW)
Root, agar (nmol g−1 FW)
Wild type WLA
Wild type WLV
Wild type WRA
Flavonoids were quantified by measuring peak area (anthocyanin; at 520 nm, flavonol; at 320 nm) using a standard curve of reference compounds (cyanidin derivatives; cyanidin, kaempferol glycosides; kaempferol, quercetin glycosides; quercetin, unknown flavonol derivatives; kaempferol). Cy, cyanidin; Km, kaempferol; Qr, quercetin; Glc, glucose; Xyl, xylose; Rha, rhamnose; Cou, p-coumaroyl moiety; Mal, malonyl moiety; Sin, sinapoly moiety.
In the leaves, the total anthocyanin in the pap1-D mutant is 50 times (grown on agar) and 11 times (grown on vermiculite) higher than that in the wild-type plant grown under each condition. The A11 contents were approximately 75 and 44% of the total anthocyanin in the wild-type plant and pap1-D mutant grown on agar, respectively. A11 is the most highly modified anthocyanin with 4 glycosides and 3 acyl moieties attached to its molecule.
In the roots, five anthocyanins (A1, A2, A3, A5 and A8) accumulated in pap1-D mutant grown on agar (Figure 2m, Table 1). A5 was the most abundant anthocyanin amounting to approximately 74% of the total anthocyanin in the wild-type plant, and to approximately 79% of the total anthocyanin in the pap1-D mutant. The total anthocyanin in the roots of the pap1-D mutant is 14 times as high as that in the roots of the wild-type plant. Anthocyanins attached to a sinapoyl moiety (A4, A7, A9, A10 and A11) were not detected in roots, suggesting the lack of sinapoyl transferase activity or the very low supply of sinapoyl-CoA in roots.
Flavonols. In addition to anthocyanins, three kaempferol glycosides (F1–F3), three quercetin glycosides (F4–F6), and four unknown flavonol glycosides (F7–F10) were detected and identified (Figures 1 and 2, Table 1).
In the wild-type leaves grown on agar, the kaempferol dirhamnoside F1 is the major flavonol amounting to approximately 51% of the total flavonol (Figure 2b, Table 1). However, in the leaves of PAP1 over-expressing lines, F1 accumulation was repressed (Figure 2d,f,j, Table 1) amounting to less than approximately 37% of the total flavonol. The amounts of the other kaempferol glycosides F2 and F3 in leaves were almost the same in the wild-type plant and pap1-D mutant grown on agar.
Quercetin glycosides (F4–F6) accumulated more in the leaves of the PAP1 over-expressing lines than in those of the wild-type plant. The total quercetin glycoside in the leaves of the pap1-D mutant is more than 10 times as high as that in the leaves of the wild-type plant.
Higher levels of F5 and F6 accumulated in the roots (Figure 2l,n) than in the leaves. F5 was the major flavonol amounting to approximately 41% of the total flavonol in the roots. In contrast to those in the leaves, no marked differences in the amounts of quercetin glycosides in the roots were observed between the wild-type plant and the pap1-D mutant. Trace amounts of F7–F10 were also found in the roots. The levels of these flavonol glycosides were the same in the roots of the wild-type plant and the pap1-D mutant. In general lower amounts of flavonols accumulated in the leaves than in the roots. The exceptions would be Flavonol 3-O-rhamnoside and 7-O-rhamnosides (F1 and F4) that were detected only in the leaves.
Amino acids, sugar and anions. In the PAP1 over-expressing lines, no significant changes in the levels of 16 amino acids were observed by HPLC with fluorescent detection, as well as in the amounts of 12 anions and sugars detected by capillary electrophoresis.
Non-targeted analysis by FT-MS. Non-targeted FT-MS metabolite analysis was conducted on seven leaf and root samples of the wild-type plant and PAP1 over-expressing lines grown on either agar or vermiculite. To identify the key determinant factors of the metabolome, principal component analysis (PCA) was conducted with approximately 1800 peaks of non-targeted FT-MS analysis and targeted anthocyanin metabolites (Figure 3). By this analysis (Figure 3a), seven experimental groups each of three independent plant lines were classified into three major clusters: leaves grown on agar (WLA, PLA and OLA), roots grown on agar (WRA and PRA) and leaves grown on vermiculite (WLV and PLV).
The first component of the PCA results (76% variance) predominantly reflects the difference in the type of organ (leaf or root), and the second component (9% variance) primarily indicates a difference in growth conditions (agar or vermiculite) as well as a secondary reflection of the total anthocyanin content (wild or pap1-D). Two major clusters (leaf on vermiculite and root on agar) formed two separate groups each reflecting two different genotypes (wild and pap1-D). This is presumably due to the small but significant difference in total anthocyanin content between the wild type and pap1-D plants as detected by FT-MS, supporting the results of the LC-MS analysis.
Altogether, these results suggest that the major determinant factors of the metabolome were the type of organ (leaf or root) and growth condition (agar or vermiculite). This implies that the global metabolome profiles of PAP1 over-expressing lines are relatively similar to those of wild-type plants despite the marked difference in total anthocyanin observed. Indeed, as shown in Figure 3(b), the PCA results of the anthocyanin-targeted analysis indicate that the major determinant factor of anthocyanin patterns is the genotype of plants reflected to the first component. The PAP1 over-expressing lines form three distinct clusters: (1) root on agar; (2) leaf on agar; and (3) leaf on vermiculite.
In contrast, the wild-type plants form a single cluster regardless of the type of organ and growth condition, exhibiting only slightly affected anthocyanin patterns. These results suggest that the PAP1 gene regulates anthocyanin accumulation in a relatively specific manner, causing only a small change in the metabolome.
Transcriptome analysis using DNA microarrays indicates upregulated expression of novel genes by PAP1
The transcript levels of 22 810 genes on the Arabidopsis Genome ATH1 GeneChip array were determined. Details of the experimental designs and procedures of chip hybridizations are summarized as a web supplementary file (Table S1 in online data) compliant with the MIAME checklist format (http://www.mged.org/Workgroups/MIAME/miame.html).
Hybridizations were conducted for the samples of WLA (WT/leaf/agar), PLA (pap1-D mutant/leaf/agar), OLA (PAP1 over-expressing transgenic plant/leaf/agar), WRA (WT/root/agar) and PRA (pap1-D mutant/root/agar). Four different sets of comparison were made to sort out the candidate genes responsible for anthocyanin accumulation in PAP1 over-expressing lines. A fold increase or decrease in the normalized intensity was calculated for the following comparisons: PLA1 (PLA experiment 1) versus WLA1 (WLA experiment 1); OLA1 (OLA experiment 1) versus WLA1 (WLA experiment 1); PLA2 (PLA experiment 2) versus WLA2 (WLA experiment 2); and PRA versus WRA. Figure S1 shows a scatter plot of PLA1 versus WLA1 as a typical example of the comparisons.
To identify genes exhibiting reproducible changes in expression, genes expressing more than 1.5-fold in the PLA1 versus WLA1, OLA1 versus WLA1 and PLA2 versus WLA2 comparisons were selected as induced genes, whereas genes expressing less than 0.66-fold in the same comparisons were selected as repressed genes. The results are illustrated as Venn diagrams in Figure 4.
Only a small portion of several paralogous genes for each biosynthetic enzyme were upregulated in the PAP1 over-expressing lines, suggesting that induced genes in these lines encode functional proteins involved in anthocyanin production. Combined with the metabolite profiles, these results suggest that the PAP1 gene specifically induces the expression of genes involved in anthocyanin production or accumulation, leading to anthocyanin accumulation.
Putative assignment of function of PAP1-upregulated genes
From the results of the metabolome and transcriptome analyses, we could putatively assign the function of PAP1-upregulated genes. In addition to the anthocyanin biosynthetic genes indicated above, several unconfirmed genes in certain gene families were upregulated. These include three glycosyltransferase-family genes (At5g54060, At4g14090 and At5g17050), two acyltransferase-family genes (At1g03940 and At3g29590), two glutathione S-transferase-family genes (At1g02930 and At1g02940) and two sugar-transporter-family genes (At1g34580 and At4g04750). Considering the accumulation of specific molecular species of anthocyanins in PAP1 over-expressing plants, the functions of these upregulated genes can be putatively assigned to be associated with the production of specific anthocyanin derivatives for their modification and transport (Figure 5).
Three glycosyltransferase genes are assigned to encode the proteins catalyzing one of four glycosylation reactions for the formation of the most extensively modified A11 anthocyanin. Two acyltransferases are assigned to one of three possible anthocyanin acyltransferases for the formation of A11 anthocyanin. Sugar-transporter-like proteins may be responsible for the uptake of anthocyanins into the vacuole. The AP2 domain transcription factor (At5g61600) and two Ca2+-binding EF-hand family proteins may be involved in the downstream regulation of anthocyanin biosynthesis by PAP1.
UGT78D2 and UGT75C1 as flavonoid 3-O-glucosyltransferase and anthocyanin 5-O-glucosyltransferase, respectively
Three glycosyltransferase genes, At5g54060 (UGT code; UGT79B1), At4g14090 (UGT75C1) and At5g17050 (UGT78D2), were induced in PAP1 over-expressing plants, suggesting the involvement of these three proteins in the modification of the sugar moieties of anthocyanins produced in PAP1 over-expressing plants. At5g17050 (UGT78D2) and At4g14090 (UGT75C1) were found to encode flavonoid 3-O-glucosyltransferase (3GT) and anthocyanin 5-O-glucosyltransferase (5GT), respectively.
Figure 6 shows the molecular phylogenetic tree of the amino acid sequences of the flavonoid glycosyltransferases. The phylogenetic tree shows that At5g17050 (UGT78D2) belongs to the subfamily of 3GT and At4g14090 (UGT75C1) to the subfamily of 5GT.
The T-DNA-inserted mutants of At5g17050 and At4g14090 were obtained from the collection of the Salk Institute (Alonso et al., 2003). Line-049338-designed ugt78d2 contained a T-DNA insertion at the second exon of At5g17050 (UGT78D2), and line-108458-designed ugt75c1 had a T-DNA insertion at the exon of At4g14090 (UGT75C1) (Figure 7a). The transcripts of At5g17050 and At4g14090 were not observed in the homozygotes of each T-DNA-inserted mutant (Figure 7b). In the homozygous ugt78d2 mutant, the total anthocyanin was reduced to 21% of that in the wild type, although the composition of the accumulated anthocyanins was the same as in the wild type (Figure 8a,c).
Because anthocyanin accumulation was suppressed in the 3GT-deficient maize mutant (bz1) (Dooner et al., 1985; Fedroff et al., 1984), the reduction in the anthocyanin level was due to decrease in UDP-glucose: cyanidin 3-O-glucosyltransferase activity. In addition to a reduction in the anthocyanin level, the pattern of accumulated flavonol glycosides also changed.
In the ugt78d2 mutant, the levels of four flavonol glycosides (F2, F3, F5 and F6) with glucose attached at the 3-position were reduced (Figure 8b,d). In contrast, the levels of two flavonol glycosides (F1 and F4) with a rhamnose residue attached at the 3-position were slightly elevated.
These results indicate that UGT78D2 is responsible for the glucosylation of both anthocyanins and flavonols at the 3-position. Furthermore, recombinant UGT78D2 with a 6X His tag at the N-terminal was produced in Escherichia coli BL-21 AI. UDP-glucose: cyanidin 3-O-glucosyltransferase activity was detected in the protein extract of E. coli expressing recombinant UGT78D2 (Figure 8j). Three anthocyanidins (cyanidin, pelargonidin and delphinidin) and three flavonols (kaempferol, quercetin and myricetin) were tested for use as substrates for the reaction catalyzed by recombinant UGT78D2. All of them were suitable substrates for the reaction catalyzed by recombinant UGT78D2, namely, their conversion to the corresponding 3-glucosides (data not shown). These results indicate that UGT78D2 catalyzes the glucosylation of both cyanidin and flavonols at the 3-position as UDP-glucose: flavonoid 3-O-glucosyltransferase in planta.
The homozygous ugt75c1 mutant exhibited an altered anthocyanin pattern, accumulating six new anthocyanins, A12–A17, which are not produced in the wild-type plant (Figure 8e). Detailed investigation of the mass spectra obtained using MSn analysis indicated that A12, A13, A14, A15, A16 and A17 (Figure 8g) are A1, A5, A4, A8, A7 and A11 de-glucosylated at the 5-position, respectively, suggesting the lack of 5-glucosylation activity of anthocyanins in the ugt75c1 mutant. No substantial change was observed in the composition of flavonols (Figure 8f). These results clearly indicate that UGT75C1 is a functional UDP-glucose: anthocyanin 5-O-glucosyltransferase.
Holistic changes of metabolome and transcriptome caused by ectopic PAP1 expression
Ectopic PAP1 over-expression resulted in a marked overaccumulation of cyanidin-type anthocyanins and quercetin-type flavonols. Only the levels of kaempferol glycosides in the PAP1 over-expressing lines decreased to approximately 30% of that in the wild-type plants. Regarding the intermediates of the biosynthetic pathways for such flavonoids, only phenylalanine did not exhibit a change in level; the other metabolic intermediates decreased in level to less than the detection limits of FT-MS and LC-MS analyses. Of the metabolites under consideration in this discussion, only flavonoid metabolite patterns dropped below the measurable limits of current technology.
Being associated with such metabolome changes, PAP1 expression resulted in the upregulation of almost all genes encoding anthocyanin biosynthetic enzymes (Figure 5). The expression of known flavonoid genes, such as TT4 (CHS) and TT5 (CHI) was upregulated in the leaves and roots of the PAP1 over-expressing lines.
In addition to these well-known anthocyanin biosynthetic genes, genes that are putatively annotated to anthocyanin biosynthetic genes, such as At5g05270 (CHI homologue) and At4g22870 (ANS homologue), were also upregulated. These paralogous genes, as well as previously characterized genes, are presumably involved in anthocyanin biosynthesis.
All these metabolome and transcriptome data suggest that PAP1 specifically regulates flavonoid biosynthetic genes causing the specific accumulation of cyanidin- and quercetin-type flavonoids in a relatively specific manner. This finding is in striking contrast to that of a recent study of the anthocyanin-accumulating pho3 mutant of the sucrose transporter gene, wherein a wide array of gene expressions changed (Lloyd and Zakhleniuk, 2004).
Functional identification of two flavonoid glycosyltransferases
In the Arabidopsis genome, 107 UDP-sugar-dependent glycosyltransferase genes are present (Bowles, 2002). Only a few of them, however, have been functionally characterized. In our present study, two glycosyltransferases, UGT78D2 (At5g17050) and UGT75C1 (At4g14090), were predicted to be involved in anthocyanin biosynthesis, and these were subsequently identified as flavonoid 3-O-glucosyltransferase and anthocyanin 5-O-glucosyltransferase, respectively.
As the mutant of UGT78D2 (ugt78d2) still accumulated a small amount of anthocyanins, the presence of a secondary activity of flavonoid 3-O-glucosyltransferase was suggested. UGT78D1, which is structurally similar to UGT78D2, has recently been identified as flavonoid 3-O-rhamnosyltransferase using UDP-rhamnose as the sugar donor (Jones et al., 2003). Both proteins belong to the same phylogenic group of flavonoid 3-O-glycosyltransferases. However, the specificities of UGT78D1 and UGT78D2 toward UDP-sugar are strict, as determined from the distinct flavonoid accumulation patterns of mutants lacking the gene for each protein.
UGT75C1 belongs to the phylogenic group of anthocyanin 5-O-glucosyltransferases together with functionally identified anthocyanin 5-O-glucosyltransferases from various plant species (Yamazaki et al., 1999, 2002). UGT75C1 is functionally non-redundant in A. thaliana, because its mutant (ugt75c1) completely lacks anthocyanin 5-O-glucosides.
Predicted functions of genes upregulated by PAP1 over-expression
In addition to the two glycosyltransferase genes functionally identified in our present investigation, two other genes, At5g54060 (UGT79B1) and At3g21560 (UGT84A2), were induced in PAP1 over-expressing lines, suggesting the possible participation of the proteins encoded by these genes in the production of anthocyanins. Due to the weak induction of At3g21560 by PAP1, this gene is not listed in Table 2; however, the induction in pap1-D was reproducible (Table S2). As the most extensively modified anthocyanin molecule A11 possesses, in addition to 3-O-glucose and 5-O-glucose, a xylose residue attached at the C2-position of 3-O-glucoside and a glucose residue attached at the p-position of a coumaroyl group, two unidentified proteins, UGT79B1 and UGT84A2, are assumed to be responsible for either of these two extra sugar attachments. Considering the differences in the pattern of anthocyanin accumulation and gene expression profile between the leaves and roots, UGT79B1 is assumed to be most likely responsible for xylosyltransfer to the C2-position of glucose, and UGT84A2 for glucosyltransfer to the p-position of a coumaroyl group. The clustering in the molecular phylogenic tree of the glycosyltransferase family is also consistent with these assumptions.
The Arabidopsis genome contains approximately 70 genes associated with acyl-CoA-dependent acyltransferase (Dudareva and Pichersky, 2000). Two putative acyltransferase genes, At1g03940 and At3g29590, were upregulated by PAP1 expression. The most extensively modified anthocyanin A11 contains three acyl groups: sinapoyl, p-coumaroyl and malonyl. Taking into account the distinct patterns of the expression of the two genes and anthocyanin accumulation in the leaves and roots, At1g03940 and At3g29590 would either be malonyltransferase or p-coumaroyltransferase. The patterns of gene expression and anthocyanin accumulation in stressed plants by sucrose treatment and UV irradiation (data not shown) suggest that sinapoyltransferase is expressed constitutively in such plants.
Glutathione S-transferase (GST) is required for the vacuolar sequestration of anthocyanin in maize (Bz2; Marrs et al., 1995) and petunia (An9; Alfenito et al., 1998). In the Arabidopsis genome, 47 GST family genes are present (Dixon et al., 2002). Recently, Arabidopsis GST TT19 (At5g17220, GST code; AtGSTF12) has been isolated as an anthocyanin-transport-facilitating protein (Kitamura et al., 2004). In our present study, in addition to the TT19 gene, two other genes, At1g02930 and At1g02940, located adjacently in chromosome 1 were induced by PAP1 expression. These results suggest that GSTs encoded by At1g02930 and At1g02940 are responsible, at least in part, for the vacuolar sequestration of anthocyanin in Arabidopsis in addition to TT19, as the tt19 mutant still accumulates a small amount of anthocyanins (Tohge, T. and Saito, K., Chiba University, Chiba, Japan, personal communication).
Networks of transcription factors
Recently, a network model of the TTG1-dependent transcriptional pathway including anthocyanin accumulation, seed coat pigmentation and trichome initiation has been proposed (Zhang et al., 2003). In the present study of PAP1 over-expressing plants, three transcription factor genes, TT8 (bHLH protein), TTG2 (WRKY protein) and At5g61600 (a AP2 domain factor), in addition to PAP1, were upregulated. The other transcription-factor genes did not change (Table S2). In addition, the pap1-D mutant exhibited no distinct changes in its seed coat pigmentation and trichome initiation, though a dominant chimeric PAP1 repressor downregulates proanthocyanidin formation (Matsui et al., 2004). These results demonstrate that PAP1 is responsible for the anthocyanin-specific downstream of the transcription network. TTG1 (WD40 protein) is necessary in addition to PAP1 for anthocyanin production (Borevitz et al., 2000). A basic MYC protein, TT8, required for DFR and BAN gene expression in Arabidopsis siliques is necessary for proanthocyanidin production (Nesi et al., 2000).
In addition, two sequences, CCCACC and CACGTG, were found as common motifs in the promoter regions of the upregulated genes. However, there is as yet no available information on the functions of these candidate cis-elements. Further detailed analysis is necessary to determine such functions.
Plant materials and growth conditions
Arabidopsis thaliana (ecotype Columbia) plants were used as the wild-type plant in this study. The pap1-D mutant was described previously (Borevitz et al., 2000). The PAP1 cDNA over-expressing transformant was obtained by transformation of A. thaliana with the engineered Ti plasmid carrying cauliflower mosaic virus 35S promoter linked with the coding sequence of PAP1 cDNA. The plants were cultured on GM-agar medium containing 1% sucrose (Valvekens et al., 1988) in a growth chamber at 22°C in 16/8 h light and dark cycles for 3 weeks, or in a standard greenhouse at 22°C in 16/8 h light for 4 weeks. Samples from wild-type plant and PAP1over-expressing lines were used, namely: WLA (wild-type leaves grown on GM agar medium); PLA (pap1-D mutant leaves grown on GM agar medium); OLA (PAP1-over-expressed transgenic leaves grown on GM agar medium); WLV (wild-type leaves grown on vermiculite); PLV (pap1-D mutant leaves grown on vermiculite); WRA (wild-type roots grown on GM medium); and PRA (pap1-D mutant roots grown on GM medium). The leaves and roots of plants were harvested, immediately frozen with liquid nitrogen and stored at −30°C until use. Identical plant materials were used for analysis of transcriptome using DNA microarrays, targeted flavonoid profile by HPLC/PDA/ESI-MS and non-targeted metabolome by FT-MS.
Evaluation of T-DNA insertion mutants
The T-DNA-inserted mutants of A. thaliana, line 049338 and line 108458, were obtained from the Salk Institute. Genomic DNA of the mutants of A. thaliana was extracted with DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). The left border of T-DNA and flanking sequence of each line was amplified by PCR using gene-specific primers (5′-CGGAGGTTGGTACGGAAGTGA-3′ for 049338, 5′-GCGGTCTTGTGGAGGTTGAGA-3′ for 108458) and LBb1 (5′-GCGTGGACCGCTTGCTGCAACT-3′). Nucleotide sequences of the PCR products were determined for confirmation of T-DNA insertion sites. Total RNA of mutants were extracted with RNeasy Plant Mini Kit (Qiagen), and cDNA was synthesized with SuperScript II RNase H- reverse transcriptase (Invitrogen Corp., Carlsbad, CA, USA) following the manufacturer's instruction. By RT-PCR, the lack of transcripts of At5g17050 and At4g14090 was confirmed in each line, in which homozygous T-DNA was inserted. The sequences for RT-PCR are 5′-CAACACCGCACAATCCAACTC-3′ and 5′-ACCCGTTGCTTCGTGTTTCA-3′ for UGT78D1, and 5′-CGACGGTCTCAAGTCATTCGA-3′ and 5′-TCAGCAAACTGCGGAAACG-3′ for UGT75C1, respectively.
Targeted flavonoid profiling by HPLC/PDA/MS, amino acid analysis and anion analysis
Frozen leaves and roots were homogenized in 5 μl extraction solvent (methanol:acetate:H2O = 9:1:10) per 1 mg fresh weight of tissues by mixer mill (MM300; Retsch Gmbl & Co. KG, Haan, Germany) at 30 Hz. After centrifugation at 12 000 g, cell debris was discarded and extracts were centrifuged again. Fifty microliters of supernatant was applied to HPLC/PDA/ESI-MS system comprising a Finnigan LCQ-DECA mass spectrometer (ThermoQuest, San Jose, CA, USA) and an Agilent HPLC 1100 series (Agilent Technologies, Palo Alto, CA, USA) as described previously (Jones et al., 2003; Yamazaki et al., 2003). HPLC was carried out on a TSK-GEL RP-18 (φ4.6 mm × 150 mm; TOSOH, Tokyo, Japan) at a flow rate of 0.5 ml min−1. Elution gradient with solvent A [CH3CN-H2O-TFA (10:90:0.1)] and solvent B [CH3CN-H2O-TFA (90:10:0.1)] and the following elution profile (0 min 100% A, 40 min 60% A, 40.1 min 100% B, 45 min 100% B, 45.1 min 100% A, 52 min 100% A) using linear gradients in between the time points. PDA was used for detection of UV-visible absorption in the range of 250–650 nm. Nitrogen was used as sheath gas for the positive-ion ESI-MS performed at capillary temperature and voltage of 350°C and 5.0 kV, respectively. The tube lens offset was set at 10.0 V. Full scan mass spectra were acquired from 200–1500 m/z at 2 scans sec−1. Tandem MS analysis was carried out with helium gas as the collision gas. The normalized collision energy was set to 30%. Metabolites were identified based on UV visible absorption spectra and mass fragmentation by tandem MS analysis in comparison with the known compounds of our laboratory stock (Jones et al., 2003; Yamazaki et al., 2003) and the reported data (cited in Table 1).
Amino acid analysis was carried out by post-column derivatization method using HPLC coupled fluorescent detection as described previously (Hirai et al., 2004). Anion and sugar analysis was performed by capillary electrophoresis as reported previously (Hirai et al., 2004).
Non-targeted metabolome analysis by FT-MS
High-, middle- and non-polar extracts of plant materials were subjected to FT-MS (APEX III FT-ICMS; Bruker Daltonics, Billerica, MA, USA) as described previously (Aharoni et al., 2002; Hirai et al., 2004). The fold change values of intensity of each mass peaks observed were calculated as the ratio of signal intensity in mutant and transformant samples to that in the wild-type sample. Metabolite identification was carried out based on elemental composition calculations from accurate m/z value using DISCOVArray (Phenomenome Discoveries, Inc., Saskatoon, Canada; http://www.phenomenome.com/). The detailed procedure is described elsewhere (Table S3). The fold change values were used for PCA. PCA was conducted using the software of GeneLinker Gold 3.0 (Molecular Mining Corp., Cambridge, MA, USA; http://www.molecularmining.com).
Transcriptome analysis using DNA microarrays
Total RNA was extracted using RNeasy Plant Mini Kit (Qiagen) from frozen plant materials. Labeled target cRNA was prepared according to the technical manual of Arabidopsis Genome ATH1 DNA array (Affymetrix, Santa Clara, CA, USA). Double-stranded cDNA was prepared from 40 μg of total RNA using SuperScript Choice System (Invitrogen). The resultant cDNA was transcribed in vitro using BioArray High Yield RNA Transcript Kit (Enzo, New York, NY, USA). Following purification and fragmentation, the labeled cRNA was hybridized to Arabidopsis Genome ATH1 GeneChip array (Affymetrix) in a Hybridization Oven model 640 (Affymetrix). Washing and staining of chips were carried out using GeneChip Fluidics Station model 400. Scanning was carried out with gene Array Scanner (Agilent Technologies). The procedure is described elsewhere (Table S1).
Calculation and analysis of transcriptome data
The GeneSpring 6.2 (Silicon Genetics, Redwood City, CA, USA, http://www.silicongenetics.com/cgi/SiG.cgi/index.smf) was used for GeneChip-array data calculation. The raw signal of each gene after subtraction of background was normalized with the median of all measurements for each sample on the chip. The minus values were converted to 0.01 signal value. Fold change was calculated as the ratio of normalized signal intensity in the mutant or transformant to that in the wild-type plant. To reduce false positives, we selected genes with ‘present’ absolute values out of the baseline data.
Full-length cDNA (RAFL clone no. RAFL05-12-P07; Seki et al., 1998, 2002) of At5g17050 was obtained from RIKEN BioResource Center, Tsukuba, Japan. To express recombinant protein, At5g17050 cDNA was introduced to GatewayTM system (Invitrogen Corp.) following the manufacturer's instruction. The attB site was introduced by two steps of PCR using gene-specific primers (5′-AAAAAGCAGGCTCCATGACCAAACCCTCCGAC-3′ and 5′-AGAAAGCTGGGTCACATTCAAATAATGTTTACAACTGCATCC-3′) and attB adaptor primers (5′-GGGGACAAGTTTGTACAAAAAAGCAGGCT-3′ and 5′-GGGGACCACTTTGTACAAGAAAGCTGGGT-3′), respectively. The entry clone pE5-17050 was then obtained by BP recombination with pDONR221. The nucleotide sequence of the entry clone was determined to confirm the sequence. Then, At5g17050 cDNA was introduced to pDEST17 from pE5-17050 by LR recombination to produce pD17-5g17050. Recombinant UGT78D2 protein with 6X His tag at the N-terminal was expressed in E. coli BL-21 AITM transformed with pD17-5g17050 as described before (Nakajima et al., 2001) with slight modification (0.2% of l-arabinose was used to induce the expression of recombinant protein). After induction, cells were cultured at 16°C for overnight. Detection of 3GT activity in the protein extracts of E. coli was performed as described previously (Taguchi et al., 2001).
We thank Dr Richard A. Dixon (Samuel Roberts Noble Foundation, Admore, OK, USA) for providing the pap1-D mutant. We also thank the Salk Institute Genomic Analysis Laboratory for providing the sequence-indexed A. thaliana T-DNA insertion mutants, and the RIKEN BioResource Center for providing the full-length cDNA. We thank Ms Rebecca Friend-Heath for kindly editing the English in the manuscript. This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology (Japan; Grants-in-Aid for Scientific Research), by CREST of Japan Science and Technology Agency (JST), and by Research for the Future Program (grant no. 00L01605; Molecular Mechanisms on Regulation of Morphogenesis and Metabolism Leading to Increased Plant Productivity).
Figure S1. Scatter plot of normalized signal intensity. Genes with ‘present’ values in the absolute call of the baseline data were selected for the analysis. Normalized signal intensity of each spot in the wild-type leaf sample (WLA1) (x-axis) is plotted against that in the pap1-D-mutant leaf sample (PLA1) (y-axis). Black, red, blue and purple arrows indicate flavonoid biosynthetic genes, PAP1 gene, glycosyltransferase genes, acyltransferase genes and glutathione-S-transferase genes, respectively. Green lines represent the threshold lines (y = 2x and y = 0.5x) and the diagonal line (y = x).
Table S1 The Minimum Information About Microarray Experiment (MIAME) checklist of GeneChip experiments. Experimental designs and procedures were described following the MIAME checklist format proposed by the Microarray Gene Expression Data Society (http://www.mged.org/Workgroups/MIAME/miame.html)
Table S2 Expression of genes annotated or presumed to be related with phenylpropanoid production by DNA array analysis
Table S3 The detailed procedure for non-targeted metabolome analysis by FT-MS