Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis plants over-expressing an MYB transcription factor


  • Takayuki Tohge,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Yasutaka Nishiyama,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    2. Institute of Life Science, Ehime Women's College, 421 Ibuki-cho Baba, Uwajima-shi, Ehime, 798-0025, Japan,
    Search for more papers by this author
  • Masami Yokota Hirai,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    2. CREST, JST (Japan Science and Technology Agency), Yayoi-cho 1-33, Inage-ku, Chiba-shi, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Mitsuru Yano,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Jun-ichiro Nakajima,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Motoko Awazuhara,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Eri Inoue,

    1. RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan, and
    Search for more papers by this author
  • Hideki Takahashi,

    1. RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan, and
    Search for more papers by this author
  • Dayan B. Goodenowe,

    1. Phenomenome Discoveries Inc., 204-407 Downey Road, Saskatoon, SK S7N 4L8, Canada
    Search for more papers by this author
  • Masahiko Kitayama,

    1. Institute of Life Science, Ehime Women's College, 421 Ibuki-cho Baba, Uwajima-shi, Ehime, 798-0025, Japan,
    Search for more papers by this author
  • Masaaki Noji,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Mami Yamazaki,

    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    Search for more papers by this author
  • Kazuki Saito

    Corresponding author
    1. Department of Molecular Biology and Biotechnology, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba 263-8522, Japan,
    2. CREST, JST (Japan Science and Technology Agency), Yayoi-cho 1-33, Inage-ku, Chiba-shi, Chiba 263-8522, Japan,
      (fax +81 43 290 2905; e-mail
    Search for more papers by this author

(fax +81 43 290 2905; e-mail


The integration of metabolomics and transcriptomics can provide precise information on gene-to-metabolite networks for identifying the function of unknown genes unless there has been a post-transcriptional modification. Here, we report a comprehensive analysis of the metabolome and transcriptome of Arabidopsis thaliana over-expressing the PAP1 gene encoding an MYB transcription factor, for the identification of novel gene functions involved in flavonoid biosynthesis. For metabolome analysis, we performed flavonoid-targeted analysis by high-performance liquid chromatography-mass spectrometry and non-targeted analysis by Fourier-transform ion-cyclotron mass spectrometry with an ultrahigh-resolution capacity. This combined analysis revealed the specific accumulation of cyanidin and quercetin derivatives, and identified eight novel anthocyanins from an array of putative 1800 metabolites in PAP1 over-expressing plants. The transcriptome analysis of 22 810 genes on a DNA microarray revealed the induction of 38 genes by ectopic PAP1 over-expression. In addition to well-known genes involved in anthocyanin production, several genes with unidentified functions or annotated with putative functions, encoding putative glycosyltransferase, acyltransferase, glutathione S-transferase, sugar transporters and transcription factors, were induced by PAP1. Two putative glycosyltransferase genes (At5g17050 and At4g14090) induced by PAP1 expression were confirmed to encode flavonoid 3-O-glucosyltransferase and anthocyanin 5-O-glucosyltransferase, respectively, from the enzymatic activity of their recombinant proteins in vitro and results of the analysis of anthocyanins in the respective T-DNA-inserted mutants. The functional genomics approach through the integration of metabolomics and transcriptomics presented here provides an innovative means of identifying novel gene functions involved in plant metabolism.


Plants produce a huge array of compounds that are potentially useful in developing novel medicines, flavors, industrial materials as alternatives for fossil fuel resources, and other specialty chemicals. Cumulatively plants are thought to produce about 200 000 natural products (Dixon and Strack, 2003). Unfortunately, only a limited number of genes involved in the production of these plant metabolites have been identified by classical genetic screening of mutants and enzyme purification.

However, after the determination of the whole genome sequence of Arabidopsis thaliana (Arabidopsis Genome Initiative, 2000), it is now possible to determine gene-to-metabolite correlation through the comprehensive analysis of gene expression (transcriptomics) and metabolite accumulation (metabolomics) (Bino et al., 2004; Fiehn, 2002; Kopka et al., 2004; Sumner et al., 2003; Weckwerth, 2003). In particular, non-targeted transcriptome analysis is now feasible using DNA microarrays with A. thaliana.

For non-targeted metabolome analysis, it is necessary to combine several different analytical technologies, particularly those based on mass spectrometry such as gas chromatography-mass spectrometry (Fiehn et al., 2000; Weckwerth et al., 2004), high-performance liquid chromatography-mass spectrometry (LC-MS) (Roepenack-Lahaye et al., 2004; Yamazaki et al., 2003), and Fourier-transform ion-cyclotron mass spectrometry (FT-MS) (Aharoni et al., 2002).

The integration of the transcriptome and metabolome or detailed targeted chemical analysis would be a breakthrough in identifying the function of unknown genes and determining all gene-to-metabolite correlations in cells. Only a limited number of reports, however, have been available on successful identification of novel gene functions by this approach (Aharoni et al., 2000; Goossens et al., 2003; Guterman et al., 2002; Hirai et al., 2004; Mathews et al., 2003; Mercke et al., 2004).

The pap1-D mutant is a T-DNA activation-tagged line that over produces anthocyanins by the ectopic over-expression of the PAP1 gene encoding an MYB transcriptional factor by the action of an enhancer from the promoter of the cauliflower mosaic virus 35S transcript in the inserted T-DNA (Borevitz et al., 2000). In the pap1-D mutant, some structural genes for anthocyanin biosynthesis, such as those encoding phenylalanine ammonia lyase (PAL) and chalcone synthase (CHS), are expressed constitutively, and the accumulation of some phenylpropanoid derivatives such as anthocyanins is markedly enhanced (Borevitz et al., 2000). However, the transcriptome and metabolome have not been extensively characterized in this mutant. PAP1 over-expressing plants are an ideal model system for elucidating the whole cellular process at both transcriptome and metabolome levels under the expression of a single transcriptional factor.

The structures of flavonoids and their biosynthetic genes in A. thaliana have still to be completely elucidated. Recently, the structures of several anthocyanins (Bloor and Abrahams, 2002) and flavonol glycosides (Graham, 1998; Veit and Pauli, 1999) have been reported. Several genes encoding enzymes and regulatory proteins involved in the production of anthocyanins and proanthocyanidins have been isolated mainly by tt or ttg mutants of seed color (Winkel-Shirley, 2001). However, no genes encoding glycosyltransferase and acyltransferase for the modification of anthocyanin aglycones have been identified yet. For the identification of such genes involved in the production and modification of terminal metabolites in biosynthetic pathways, the combined analysis of transcripts and metabolites is a powerful technology (Jones et al., 2003).

Here, we performed the non-targeted comprehensive analysis of the metabolome and transcriptome of PAP1 over-expressing plants, with the following questions in mind: (1) what is the role of a single transcription factor in global gene expression and the subsequent cellular metabolite pattern? and (2) what are the specific gene-to-metabolite correlations resulting in the identification of the gene functions in the Arabidopsis genome? To answer these questions, we studied metabolomics by LC-MS for the targeted metabolite analysis of approximately 21 compounds combined with FT-MS for the non-targeted metabolite profiling of approximately 1800 putative metabolites, and transcriptomics using the DNA microarrays covering 22 810 genes of the Arabidopsis genome. We could show that a set of genes involved in anthocyanin accumulation were upregulated together with the production of cyanidin-type anthocyanins and quercetin-type flavonols; thus we determined induced gene functions in production of these compounds. Subsequently, two genes coding for flavonoid glucosyltransferases were identified by in vitro study using recombinant proteins and by anthocyanin analysis of T-DNA-inserted mutants. The present study shows a novel means of studying functional genomics through the integral analyses of the metabolome and transcriptome in plants.


Combined analysis of flavonoid-targeted and non-targeted methodologies indicate specific overaccumulation of cyanidin and quercetin derivatives and weak effects on global metabolome profiles by PAP1

Metabolome analysis involved a combination of flavonoid-targeted analysis by LC-MS, amino acid analysis by high-performance liquid chromatography (HPLC), anion and sugar analysis by capillary electrophoresis, and non-targeted large-scale metabolite analysis by FT-MS.

Anthocyanins.  The flavonoid accumulation profiles of seven samples were analyzed by HPLC/photodiode array detection/electrospray ionization mass spectrometry (HPLC/PDA/ESI-MS). These samples included: (1) wild-type leaves grown on agar (WLA); (2) pap1-D leaves grown on agar (PLA); (3) PAP1 cDNA over-expressing transgenic Arabidopsis leaves grown on agar (OLA); (4) wild-type leaves grown on vermiculite (WLV); (5) pap1-D leaves grown on vermiculite (PLV); (6) wild-type roots grown on agar (WRA); and (7) pap1-D roots grown on agar (PRA).

The metabolites were putatively identified from their UV-visible absorption spectra and comprehensive analysis of mass fragmentation patterns obtained by tandem MS spectroscopy were compared with those of known compounds and reported data (cited in Table 1). Twenty-one peaks were detected, 17 of which were identified in the leaves and roots (Figures 1 and 2, Table 1). Eleven anthocyanin pigments (A1-A11) accumulated in the leaves of the PAP1 over-expressing lines (the pap1-D mutant and PAP1 cDNA over-expressing plant) (Figure 2c,ei, Table 1). However, these pigments were only detected at trace levels in the wild-type plant (Figure 2a,gk, Table 1). Among them, A5, A9 and A11 were the major anthocyanins in the leaves of the PAP1 over-expressing lines grown on agar and vermiculite.

Table 1.  The flavonoid profiles in acidic MeOH-H2O extracts of the wild-type plant and PAP1 over-expressing lines
Peak no.Rt (min)λmax (nm)ESI-MS (m/z)Fragmenta (m/z)Leaf, agar (nmol g−1 FW)Leaf, soil (nmol g−1 FW)Root, agar (nmol g−1 FW)Reference
Wild type WLApap1-D PLA35S:PAP1 OLAWild type WLVpap1-D PLVWild type WRApap1-D PRA
  1. Flavonoids were quantified by measuring peak area (anthocyanin; at 520 nm, flavonol; at 320 nm) using a standard curve of reference compounds (cyanidin derivatives; cyanidin, kaempferol glycosides; kaempferol, quercetin glycosides; quercetin, unknown flavonol derivatives; kaempferol). Cy, cyanidin; Km, kaempferol; Qr, quercetin; Glc, glucose; Xyl, xylose; Rha, rhamnose; Cou, p-coumaroyl moiety; Mal, malonyl moiety; Sin, sinapoly moiety.

  2. aDetected in mass and/or tandem mass data.

A111.6278–514743 [M]+287 [Cy]+ND 0.04 0.03 0.94 
A215.0282–518829 [M]+287 [Cy]+ND 0.15 0.27 3.83 
535 [Cy + Glc + Mal]+
A322.4312–520889 [M]+287 [Cy]+ 0.01 0.56 0.710.243.56 0.14 2.11 
449 [Cy + Glc]+
727 [Cy + Glc + Xyl + Cou]+
A414.6330–524949 [M]+287 [Cy]+ND 0.18 
449 [Cy + Glc]+
A523.6314–524975 [M]+287 [Cy]+ 0.03 6.25 9.160.3233.39 4.2561.80 
535 [Cy + Glc + Mal]+
727 [Cy + Glc + Xyl + Cou]+
A616.9298–5261051 [M]+287 [Cy]+ 0.01 0.37 0.161.4211.02 0.02 0.34 
449 [Cy + Glc]+
889 [Cy + 2Glc + Xyl + Cou]+
A724.1314–5261095 [M]+287 [Cy]+ 0.02 1.85 0.510.052.77NDND 
535 [Cy + Glc + Mal]+
975 [Cy + 2Glc + Xyl + Cou + Mal]+
A817.7282–5241137 [M]+287 [Cy]+ 0.01 2.92 2.141.4320.85 0.75 4.58Bloor and Abrahams (2002)
535 [Cy + Glc + Mal]+
889 [Cy + 2Glc + Xyl + Cou]+
A925.3318–5341181 [M]+287 [Cy]+ 0.03 2.59 and Abrahams (2002)
535 [Cy + Glc + Mal]+
933 [Cy + Glc + Xyl + Sin + Cou]+
A1018.9306–5321257 [M]+287 [Cy]+ 0.04 0.98 0.370.584.22NDND 
449 [Cy + Glc]+
1095 [Cy + 2Glc + Xyl + Cou + Sin]+
A1119.7286–5341343 [M]+287 [Cy]+ 0.4312.60 5.316.9446.21NDNDBloor and Abrahams (2002)
535 [Cy + Glc + Mal]+
1095 [Cy + 2Glc + Xyl + Cou + Sin]+
F121.8264–342579 [M + H]+287 [Km + H]+12.16 5.0411.392.1555.67NDNDVeit and Pauli (1999)
433 [Km + Rha + H]+
F219.1266–346595 [M + H]+287 [Km + H]+ 4.53 2.57 6.7428.9133.8868.2361.22Veit and Pauli (1999)
433 [Km + Rha + H]+
F315.4266–346741 [M + H]+287 [Km + H]+ 6.75 7.77 8.6640.361.5746.0935.40Graham (1998)
433 [Km + Rha + H]+
595 [Km + Rha + Glc + H]+
F419.5254–332595 [M + H]+303 [Qr + H]+ND 1.99ND22.6129.87NDNDGraham (1998)
617 [M + Na]+449 [Qr + Rha + H]+
F517.7254–312611 [M + H]+303 [Qr + H]+0.408.721.5410.6834.3690.2794.55Graham (1998)
633 [M + Na]+449 [Qr + Rha + H]+
F614.1256–356757 [M + H]+303 [Qr + H]+0.052.660.93 7.6331.4215.4319.00Graham (1998)
779 [M + Na]+449 [Qr + Rha + H]+
611 [Qr + Rha + Glc + H]+
F816.8256–354757303, 449, 611NDNDNDNDND8.809.47 
F918.8266–346741287, 433, 595NDNDNDNDND17.2113.43 
F1019.6258–354625317, 463NDNDNDNDND21.3922.29 
Figure 1.

Cyanidin derivatives and flavonol glycosides accumulated in PAP1 over-expressing Arabidopsis.
Numbers correspond to compounds described in the text, Table 1 and Figure 2.
A1. Cyanidin 3-O-[2′′-O-(xylosyl) glucoside] 5-O-glucoside.
A2. Cyanidin 3-O-[2′′-O-(xylosyl) glucoside] 5-O-(6′′′-O-malonyl) glucoside.
A3. Cyanidin 3-O-[2′′-O-(xylosyl) 6′′-O-(p-coumaroyl) glucoside] 5-O-glucoside.
A4. Cyanidin 3-O-[2′′-O-(2′′′-O-(sinapoyl) xylosyl) glucoside] 5-O-glucoside.
A5. Cyanidin 3-O-[2′′-O-(xylosyl)-6′′-O-(p-coumaroyl) glucoside] 5-O-malonylglucoside.
A6. Cyanidin 3-O-[2′′-O-(xylosyl)-6′′-O-(p-O-(glucosyl)-p-coumaroyl) glucoside] 5-O-glucoside.
A7. Cyanidin 3-O-[2′′-O-(2′′′-O-(sinapoyl) xylosyl) 6′′-O-(p-coumaroyl) glucoside] 5-O-glucoside.
A8. Cyanidin 3-O-[2′′-O-(xylosyl) 6′′-O-(p-O-(glucosyl) p-coumaroyl) glucoside] 5-O-[6′′′-O-(malonyl) glucoside].
A9. Cyanidin 3-O-[2′′-O-(2′′′-O-(sinapoyl) xylosyl) 6′′-O-(p-O-coumaroyl) glucoside] 5-O-[6′′′′-O-(malonyl) glucoside].
A10. Cyanidin 3-O-[2′′-O-(2′′′-O-(sinapoyl) xylosyl) 6′′-O-(p-O-(glucosyl) p-coumaroyl) glucoside] 5-O-glucoside.
A11. Cyanidin 3-O-[2′′-O-(6′′′-O-(sinapoyl) xylosyl) 6′′-O-(p-O-(glucosyl)-p-coumaroyl) glucoside] 5-O-(6′′′′-O-malonyl) glucoside.
F1. Kaempferol 3-O-rhamnoside 7-O-rhamnoside.
F2. Kaempferol 3-O-glucoside 7-O-rhamnoside.
F3. Kaempferol 3-O-[6′′-O-(rhamnosyl) glucoside] 7-O-rhamnoside.
F4. Quercetin 3-O-rhamnoside 7-O-rhamnoside.
F5. Quercetin 3-O-glucoside 7-O-rhamnoside.
F6. Quercetin 3-O-[6′′-O-(rhamnosyl) glucoside] 7-O-rhamnoside.

Figure 2.

HPLC/PDA chromatograms of aqua-methanol extracts of Arabidopsis wild type, pap1-D mutant and PAP1 over-expressing transgenic plant.
(a, b) Wild-type leaves grown on agar (WLA).
(c, d) Pap1-D mutant leaves grown on agar (PLA).
(e, f) PAP1 over-expressing transgenic plant leaves grown on agar (OLA).
(g, h) Wild-type leaves grown on vermiculite (WLV).
(i, j) Pap1-D mutant leaves grown on vermiculite (PLV).
(k, l) Wild-type roots grown on agar (WRA).
(m, n) Pap1-D mutant roots grown on agar (PRA).
(a, c, e, g, i, k, m) Absorbance at 520 nm for detection of anthocyanins.
(b, d, f, h, j, l, n) Absorbance at 320 nm for detection of flavonoids.
The names and structures of 11 anthocyanins (A1–A11) and 10 flavonols (F1–F10) are indicated in Table 1 and Figure 1. S1 and S2 are sinapate conjugates. mAU, milliabsorbance units.

In the leaves, the total anthocyanin in the pap1-D mutant is 50 times (grown on agar) and 11 times (grown on vermiculite) higher than that in the wild-type plant grown under each condition. The A11 contents were approximately 75 and 44% of the total anthocyanin in the wild-type plant and pap1-D mutant grown on agar, respectively. A11 is the most highly modified anthocyanin with 4 glycosides and 3 acyl moieties attached to its molecule.

In the roots, five anthocyanins (A1, A2, A3, A5 and A8) accumulated in pap1-D mutant grown on agar (Figure 2m, Table 1). A5 was the most abundant anthocyanin amounting to approximately 74% of the total anthocyanin in the wild-type plant, and to approximately 79% of the total anthocyanin in the pap1-D mutant. The total anthocyanin in the roots of the pap1-D mutant is 14 times as high as that in the roots of the wild-type plant. Anthocyanins attached to a sinapoyl moiety (A4, A7, A9, A10 and A11) were not detected in roots, suggesting the lack of sinapoyl transferase activity or the very low supply of sinapoyl-CoA in roots.

Flavonols.  In addition to anthocyanins, three kaempferol glycosides (F1–F3), three quercetin glycosides (F4–F6), and four unknown flavonol glycosides (F7–F10) were detected and identified (Figures 1 and 2, Table 1).

In the wild-type leaves grown on agar, the kaempferol dirhamnoside F1 is the major flavonol amounting to approximately 51% of the total flavonol (Figure 2b, Table 1). However, in the leaves of PAP1 over-expressing lines, F1 accumulation was repressed (Figure 2d,f,j, Table 1) amounting to less than approximately 37% of the total flavonol. The amounts of the other kaempferol glycosides F2 and F3 in leaves were almost the same in the wild-type plant and pap1-D mutant grown on agar.

Quercetin glycosides (F4–F6) accumulated more in the leaves of the PAP1 over-expressing lines than in those of the wild-type plant. The total quercetin glycoside in the leaves of the pap1-D mutant is more than 10 times as high as that in the leaves of the wild-type plant.

Higher levels of F5 and F6 accumulated in the roots (Figure 2l,n) than in the leaves. F5 was the major flavonol amounting to approximately 41% of the total flavonol in the roots. In contrast to those in the leaves, no marked differences in the amounts of quercetin glycosides in the roots were observed between the wild-type plant and the pap1-D mutant. Trace amounts of F7–F10 were also found in the roots. The levels of these flavonol glycosides were the same in the roots of the wild-type plant and the pap1-D mutant. In general lower amounts of flavonols accumulated in the leaves than in the roots. The exceptions would be Flavonol 3-O-rhamnoside and 7-O-rhamnosides (F1 and F4) that were detected only in the leaves.

Amino acids, sugar and anions.  In the PAP1 over-expressing lines, no significant changes in the levels of 16 amino acids were observed by HPLC with fluorescent detection, as well as in the amounts of 12 anions and sugars detected by capillary electrophoresis.

Non-targeted analysis by FT-MS.  Non-targeted FT-MS metabolite analysis was conducted on seven leaf and root samples of the wild-type plant and PAP1 over-expressing lines grown on either agar or vermiculite. To identify the key determinant factors of the metabolome, principal component analysis (PCA) was conducted with approximately 1800 peaks of non-targeted FT-MS analysis and targeted anthocyanin metabolites (Figure 3). By this analysis (Figure 3a), seven experimental groups each of three independent plant lines were classified into three major clusters: leaves grown on agar (WLA, PLA and OLA), roots grown on agar (WRA and PRA) and leaves grown on vermiculite (WLV and PLV).

Figure 3.

PCA of non-targeted metabolome and anthocyanin targeted analyses.
(a) A total of 1800 peaks detected by FT-MS analysis.
(b) Eleven anthocyanins analyzed by LC/PDA/ESI-MS.
The first and second principal components are shown as the x- and y-axis, respectively.

The first component of the PCA results (76% variance) predominantly reflects the difference in the type of organ (leaf or root), and the second component (9% variance) primarily indicates a difference in growth conditions (agar or vermiculite) as well as a secondary reflection of the total anthocyanin content (wild or pap1-D). Two major clusters (leaf on vermiculite and root on agar) formed two separate groups each reflecting two different genotypes (wild and pap1-D). This is presumably due to the small but significant difference in total anthocyanin content between the wild type and pap1-D plants as detected by FT-MS, supporting the results of the LC-MS analysis.

Altogether, these results suggest that the major determinant factors of the metabolome were the type of organ (leaf or root) and growth condition (agar or vermiculite). This implies that the global metabolome profiles of PAP1 over-expressing lines are relatively similar to those of wild-type plants despite the marked difference in total anthocyanin observed. Indeed, as shown in Figure 3(b), the PCA results of the anthocyanin-targeted analysis indicate that the major determinant factor of anthocyanin patterns is the genotype of plants reflected to the first component. The PAP1 over-expressing lines form three distinct clusters: (1) root on agar; (2) leaf on agar; and (3) leaf on vermiculite.

In contrast, the wild-type plants form a single cluster regardless of the type of organ and growth condition, exhibiting only slightly affected anthocyanin patterns. These results suggest that the PAP1 gene regulates anthocyanin accumulation in a relatively specific manner, causing only a small change in the metabolome.

Transcriptome analysis using DNA microarrays indicates upregulated expression of novel genes by PAP1

The transcript levels of 22 810 genes on the Arabidopsis Genome ATH1 GeneChip array were determined. Details of the experimental designs and procedures of chip hybridizations are summarized as a web supplementary file (Table S1 in online data) compliant with the MIAME checklist format (

Hybridizations were conducted for the samples of WLA (WT/leaf/agar), PLA (pap1-D mutant/leaf/agar), OLA (PAP1 over-expressing transgenic plant/leaf/agar), WRA (WT/root/agar) and PRA (pap1-D mutant/root/agar). Four different sets of comparison were made to sort out the candidate genes responsible for anthocyanin accumulation in PAP1 over-expressing lines. A fold increase or decrease in the normalized intensity was calculated for the following comparisons: PLA1 (PLA experiment 1) versus WLA1 (WLA experiment 1); OLA1 (OLA experiment 1) versus WLA1 (WLA experiment 1); PLA2 (PLA experiment 2) versus WLA2 (WLA experiment 2); and PRA versus WRA. Figure S1 shows a scatter plot of PLA1 versus WLA1 as a typical example of the comparisons.

To identify genes exhibiting reproducible changes in expression, genes expressing more than 1.5-fold in the PLA1 versus WLA1, OLA1 versus WLA1 and PLA2 versus WLA2 comparisons were selected as induced genes, whereas genes expressing less than 0.66-fold in the same comparisons were selected as repressed genes. The results are illustrated as Venn diagrams in Figure 4.

Figure 4.

Venn diagrams of genes of which expressions were changed.
(a) Intersection of upregulated genes with PAP1 over-expression in leaves.
(b) Intersection of downregulated genes with PAP1 over-expression in leaves.
(c) Intersection of upregulated genes with PAP1 over-expression in leaves and roots.
(d) Intersection of downregulated genes with PAP1 over-expression in leaves and roots.
Genes that show more than 1.5-fold changes are selected in each comparison.

Thirty-nine upregulated genes including PAP1 (Figure 4a) and 12 downregulated genes (Figure 4b) were identified and are listed in Table 2. Of 39 genes upregulated in leaves, 17 genes were also induced in roots (Figure 4c). No genes were downregulated in both leaves and roots (Figure 4d). Eleven of 39 upregulated genes in leaves were annotated as encoding known anthocyanin biosynthetic enzymes or regulatory proteins characterized previously such as TT3 (DFR; At5g42800), TT4 (CHS; At5g13930), TT5 (CHI;At3g55120), TT7 (F3′H; At5g07990), PAP1 (At1g56650), TT8 (At4g09820), TT19 (GST; At5g17220), TTG2 (At2g37260), ANS (At4g22870), 4CL (At1g20490) and CHI (At5g05270).

Table 2.  Genes regulated by PAP1
Gene familyMIPS codeNameAnnotationHomologues of the identified geneblastp score (e-value)LeafRoot
Fold change (average)Fold change
  1. Expression data under PAP1 over-expression are shown. The genes with fold change >1.50 or <0.66 in the experiments with leaf are shown by the order of ‘fold change’.

  2. *Eleven genes previously reported for involvement in anthocyanin biosynthesis.

Upregulated (39)
 Flavonoid pathwayAt4g22870ANS*Leucoanthocyanidin dioxygenase, putativeTT18 (A. thaliana)227 (5e-59)20.216.3
At5g42800TT3*Dihydroflavonol 4-reductase  14.716.4
At5g07990TT7*Flavonoid 3′-monooxygenase  12.75.1
At5g13930TT4*Chalcone synthase  11.11.7
At1g204904CL*4-Coumarate:CoA ligase 1 family4CL1 (A. thaliana)308 (1.0e-82)4.31.2
At5g05270CHI*Chalcone-flavanone isomerase familyChalcone isomerase (E. umbellata) 52 (8.0e-06)3.11.8
At3g55120TT5*Chalcone-flavanone isomerase  3.11.3
 Flavonoid glycosyltransferaseAt5g54060UGT79B1Glycosyltransferase familyAnthocyanidin 3-O-glucoside rhamnosyltransferase (P. integrifolia)322 (1.0e-86)70.811.5
At4g14090UGT75C1Similar to anthocyanin 5-O-glucosyltransferaseAnthocyanidin 5-O-glucosyltransferase (Petunia × hybrida)378 (1.0e-103)12.79.0
At5g17050UGT78D2Similar to flavonoid 3-O-glucosyltransferaseFlavonoid 3-O-rhamnosyltransferase (A. thaliana)649 (0.0)3.92.0
 Flavonoid acyltransferaseAt1g03940 Similar to anthocyanin 5-aromatic acyltransferaseAnthocyanin 3-O-glucoside coumaroyltransferase (P. frutescens)259 (1.0e-67)25.919.9
At3g29590 Similar to anthocyanin 5-aromatic acyltransferaseAnthocyanin 5-O-glucoside cafferoyltransferase (G. triflora)157 (6.0e-37)13.09.2
 Glutathione S-transferaseAt5g17220TT19*,AtGST12Glutathione transferase, putative  19.18.3
At1g02930AtGST6Similar to glutathione S-transferaseAN9 (Petunia × hybrida)143 (3.0e-33)17.91.1
At1g02940AtGST5Similar to glutathione S-transferaseAN9 (Petunia × hybrida)136 (6.0e-31)3.64.0
 Transcription factorAt1g56650PAP1*myb-related protein anthocyanin2  25.67.5
At4g09820TT8*bHLH protein; helix-loop-helix protein DEL  18.36.0
At2g37260TTG2*WRKY family transcription factor  3.82.1
At5g61600 AP2 domain transcription factor, putativePti4 (L. esculentum)101 (2.0e-20)3.60.9
 TransporterAt1g34580 Monosaccharide transporter, putative;STP1 (A. thaliana)516 (1.0e-145)2.51.2
At4g04750 Sugar transporter familyGLUT8 (G. gallus)203 (9.0e-51)2.51.2
 Ca2+ binding proteinAt4g27280 Calcium-binding EF-hand family proteinCCD1 (T. aestivum)104 (5.0e-22)2.10.7
At1g70670 Ca2+-binding EF-hand common family proteinGmPM13 (G. max)166 (4.0e-40)2.01.4
 OtherAt3g51030AtTRX1Thioredoxin H-type 1  3.11.3
At1g74420 Similar to xyloglucan fucosyltransferaseAlpha-1,2-fucosyltransferase (P. sativum)435 (1.0e-120)2.41.4
At4g24570 Mitochondrial carrier protein family  2.30.7
At2g23000 Serine carboxypeptidase -relatedSNG1 (A. thaliana)615 (1.0e-175)2.22.4
At3g47260 Ulp1 protease family  2.11.2
At3g22290 Expressed protein  2.11.2
At5g47500 Pectinesterase family  1.91.1
At2g22890 Hypothetical protein  1.81.0
At4g30590 Plastocyanin-like domain containing proteinENOD16 (M. truncatula) 96 (4.0e-19)1.81.1
At2g31090 Expressed protein  1.80.9
At2g47240 Long-chain-fatty-acid-CoA ligase familyAcyl CoA synthetase (B. napus) 564 (1.0e-159)1.81.0
At4g10280 Expressed protein  1.70.3
At3g13190 Expressed protein  1.71.0
At5g45550 Expressed protein  1.71.1
At4g32105 Expressed protein  1.71.1
At4g14580 CBL-interacting protein kinase 4wpk4 (T. aestivum) 320 (4.0e-86)1.61.5
Downregulated (12)At2g05540 Glycine-rich protein  0.51.7
At4g15660 Glutaredoxin protein family  0.61.4
At3g28740 Cytochrome P450 familyCYP81E8 (M. truncatula)4822 (1.0e-135)0.51.0
At3g47340ANS1Glutamine-dependent asparagine synthetase  0.60.9
At4g32810 Retinal pigment epithelial membrane protein familyRAMOSUS1 (P. sativum) 749 (0.0)0.51.2
At5g20250DIN10Glycosyl hydrolase family 36  0.50.7
At3g51790ATG1Transmembrane protein G1p-related  0.51.0
At2g29420AtGSTU7Glutathione transferase, putativeGST7 (Z. mays) 182 (5.0e-45)0.61.0
At1g75750GASA1GAST1 protein homologueRSI-1 (L. esculentum)  62 (3.0e-09)0.60.8
At1g01780 LIM domain protein-related  0.61.1
At4g30270MERI5BXyloglucan endotransglycosylase (meri5B)  0.60.8
At2g47180 Galactinol synthase, putativeGolS-1 (A. reptans) 511 (1.0e-144)0.61.0

Only a small portion of several paralogous genes for each biosynthetic enzyme were upregulated in the PAP1 over-expressing lines, suggesting that induced genes in these lines encode functional proteins involved in anthocyanin production. Combined with the metabolite profiles, these results suggest that the PAP1 gene specifically induces the expression of genes involved in anthocyanin production or accumulation, leading to anthocyanin accumulation.

Putative assignment of function of PAP1-upregulated genes

From the results of the metabolome and transcriptome analyses, we could putatively assign the function of PAP1-upregulated genes. In addition to the anthocyanin biosynthetic genes indicated above, several unconfirmed genes in certain gene families were upregulated. These include three glycosyltransferase-family genes (At5g54060, At4g14090 and At5g17050), two acyltransferase-family genes (At1g03940 and At3g29590), two glutathione S-transferase-family genes (At1g02930 and At1g02940) and two sugar-transporter-family genes (At1g34580 and At4g04750). Considering the accumulation of specific molecular species of anthocyanins in PAP1 over-expressing plants, the functions of these upregulated genes can be putatively assigned to be associated with the production of specific anthocyanin derivatives for their modification and transport (Figure 5).

Figure 5.

Summary of integrated metabolomics and transcriptomics on phenylpropanoid and flavonoid biosynthetic pathways.
The genes and metabolites indicated in red are those upregulated by PAP1. The contents of kaempferol glycosides were decreased. PAL, phenylalanine ammonia-lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumarate-CoA ligase; C3H, cinnamate 3-hydroxylase; COMT, cinnamate O-methyltransferase; F5H, ferulate 5-hydroxylase; OMT, O-methyltransferase; CCoA3H, cinnamoyl-CoA 3-hydroxylase; CCoMT, cinnamoyl O-methyltransferase; CCR, cinnamoyl-CoA reductase; CAD, cinnamoyl-alcohol dehydrogenase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3′H, flavonoid 3′-hydroxylase; FLS, flavonol synthase; FGT, flavonol glycosyltransferase; DFR, dihydroflavonol reductase; LAR, leucocyanidin reductase; ANS, anthocyanidin synthase; BAN, anthocyanidin reductase BANYULS; AGT, anthocyani(di)n glycosyltransferase; AAT, anthocyanin acyltransferase. *Upregulated genes besides 39 genes listed in Table 2 (see Table S2).

Three glycosyltransferase genes are assigned to encode the proteins catalyzing one of four glycosylation reactions for the formation of the most extensively modified A11 anthocyanin. Two acyltransferases are assigned to one of three possible anthocyanin acyltransferases for the formation of A11 anthocyanin. Sugar-transporter-like proteins may be responsible for the uptake of anthocyanins into the vacuole. The AP2 domain transcription factor (At5g61600) and two Ca2+-binding EF-hand family proteins may be involved in the downstream regulation of anthocyanin biosynthesis by PAP1.

UGT78D2 and UGT75C1 as flavonoid 3-O-glucosyltransferase and anthocyanin 5-O-glucosyltransferase, respectively

Three glycosyltransferase genes, At5g54060 (UGT code; UGT79B1), At4g14090 (UGT75C1) and At5g17050 (UGT78D2), were induced in PAP1 over-expressing plants, suggesting the involvement of these three proteins in the modification of the sugar moieties of anthocyanins produced in PAP1 over-expressing plants. At5g17050 (UGT78D2) and At4g14090 (UGT75C1) were found to encode flavonoid 3-O-glucosyltransferase (3GT) and anthocyanin 5-O-glucosyltransferase (5GT), respectively.

Figure 6 shows the molecular phylogenetic tree of the amino acid sequences of the flavonoid glycosyltransferases. The phylogenetic tree shows that At5g17050 (UGT78D2) belongs to the subfamily of 3GT and At4g14090 (UGT75C1) to the subfamily of 5GT.

Figure 6.

Molecular phylogenetic tree of the amino acid sequences of the flavonoid glycosyltransferases.
The amino acid sequences were aligned using the multiple sequence alignment clustalw ( The GenBank accession numbers for the sequences are as follows: At5g17050, UGT78D2, 3GT (NM_121711); At4g14090, UGT75C1, 5GT (NM_117485); At5g54060, UGT79B1 (NM_124785); At2g36790, UGT73C6, 3G-7GT (NM_129234); At1g30530, UGT78D1, 3RT (NM_102790); eggplant 3GT (X77369); petunia 3GT (AB027454); gentiana 3GT (D85186); grape 3GT (AF000371); barley 3GT (X15694); maize 3GT (X13501); petunia 5GT (AB027455); perilla 3GT (AB002818); torenia 5GT (AB076698); verbena 5GT (BAA36423); perilla 5GT (AB013596); petunia 3G-2′′RT (Z25802); scuttellaria 7GT (BAA83484); gentiana 3′GT (AB076697).

The T-DNA-inserted mutants of At5g17050 and At4g14090 were obtained from the collection of the Salk Institute (Alonso et al., 2003). Line-049338-designed ugt78d2 contained a T-DNA insertion at the second exon of At5g17050 (UGT78D2), and line-108458-designed ugt75c1 had a T-DNA insertion at the exon of At4g14090 (UGT75C1) (Figure 7a). The transcripts of At5g17050 and At4g14090 were not observed in the homozygotes of each T-DNA-inserted mutant (Figure 7b). In the homozygous ugt78d2 mutant, the total anthocyanin was reduced to 21% of that in the wild type, although the composition of the accumulated anthocyanins was the same as in the wild type (Figure 8a,c).

Figure 7.

T-DNA-inserted mutants of At5g17050 and At4g14090.
(a) Schematic structure of the T-DNA-inserted lines of At5g17050 (line 049338) and At4g14090 (line 108458).
(b) RT-PCR analysis of cDNA expression for UGT78D2 and UGT75C1 in the respective homozygous mutants.

Figure 8.

Functional identification of At5g17050 (UGT78D2) and At4g14090 (UGT75C1) by analyses of T-DNA-inserted mutants and by enzymatic assay in vitro.
(a–f) Flavonoid analysis of leaves of T-DNA-inserted mutants grown on GM agar medium containing 1% sucrose for 2 weeks, and transferred to GM-agar medium containing 12% sucrose for 1 week in a growth chamber.
(a, b) Wild-type leaves (WLA).
(c, d) ugt78d2 mutant leaves.
(e, f) ugt75c1 mutant leaves.
(a, c, e) Absorbance at 520 nm for anthocyanin analysis.
(b, d, f) Absorbance at 320 nm for flavonoid analysis.
(g) Structures of anthocyanins (A12–A17) accumulated in ugt75c1.
(h–k) In vitro enzymatic assay of recombinant UGT78D2 protein.
(h) Standard cyanidin 3-O-glucoside.
(i) Standard cyanidin.
(j) Product by the protein extracts of Escherichia coli expressing recombinant UGT78D2.
(k) Product by the protein extracts of E. coli expressing β-glucuronidase used as a negative control.
Enzymatic conversion of cyanidin into cyanidin 3-O-glucoside was carried out in the presence of protein extracts of E. coli transformed with the vectors carrying UGT78D2 cDNA and UDP-glucose as described in Experimental procedures. The reaction products were analyzed by HPLC.

Because anthocyanin accumulation was suppressed in the 3GT-deficient maize mutant (bz1) (Dooner et al., 1985; Fedroff et al., 1984), the reduction in the anthocyanin level was due to decrease in UDP-glucose: cyanidin 3-O-glucosyltransferase activity. In addition to a reduction in the anthocyanin level, the pattern of accumulated flavonol glycosides also changed.

In the ugt78d2 mutant, the levels of four flavonol glycosides (F2, F3, F5 and F6) with glucose attached at the 3-position were reduced (Figure 8b,d). In contrast, the levels of two flavonol glycosides (F1 and F4) with a rhamnose residue attached at the 3-position were slightly elevated.

These results indicate that UGT78D2 is responsible for the glucosylation of both anthocyanins and flavonols at the 3-position. Furthermore, recombinant UGT78D2 with a 6X His tag at the N-terminal was produced in Escherichia coli BL-21 AI. UDP-glucose: cyanidin 3-O-glucosyltransferase activity was detected in the protein extract of E. coli expressing recombinant UGT78D2 (Figure 8j). Three anthocyanidins (cyanidin, pelargonidin and delphinidin) and three flavonols (kaempferol, quercetin and myricetin) were tested for use as substrates for the reaction catalyzed by recombinant UGT78D2. All of them were suitable substrates for the reaction catalyzed by recombinant UGT78D2, namely, their conversion to the corresponding 3-glucosides (data not shown). These results indicate that UGT78D2 catalyzes the glucosylation of both cyanidin and flavonols at the 3-position as UDP-glucose: flavonoid 3-O-glucosyltransferase in planta.

The homozygous ugt75c1 mutant exhibited an altered anthocyanin pattern, accumulating six new anthocyanins, A12–A17, which are not produced in the wild-type plant (Figure 8e). Detailed investigation of the mass spectra obtained using MSn analysis indicated that A12, A13, A14, A15, A16 and A17 (Figure 8g) are A1, A5, A4, A8, A7 and A11 de-glucosylated at the 5-position, respectively, suggesting the lack of 5-glucosylation activity of anthocyanins in the ugt75c1 mutant. No substantial change was observed in the composition of flavonols (Figure 8f). These results clearly indicate that UGT75C1 is a functional UDP-glucose: anthocyanin 5-O-glucosyltransferase.


Holistic changes of metabolome and transcriptome caused by ectopic PAP1 expression

Ectopic PAP1 over-expression resulted in a marked overaccumulation of cyanidin-type anthocyanins and quercetin-type flavonols. Only the levels of kaempferol glycosides in the PAP1 over-expressing lines decreased to approximately 30% of that in the wild-type plants. Regarding the intermediates of the biosynthetic pathways for such flavonoids, only phenylalanine did not exhibit a change in level; the other metabolic intermediates decreased in level to less than the detection limits of FT-MS and LC-MS analyses. Of the metabolites under consideration in this discussion, only flavonoid metabolite patterns dropped below the measurable limits of current technology.

Being associated with such metabolome changes, PAP1 expression resulted in the upregulation of almost all genes encoding anthocyanin biosynthetic enzymes (Figure 5). The expression of known flavonoid genes, such as TT4 (CHS) and TT5 (CHI) was upregulated in the leaves and roots of the PAP1 over-expressing lines.

In addition to these well-known anthocyanin biosynthetic genes, genes that are putatively annotated to anthocyanin biosynthetic genes, such as At5g05270 (CHI homologue) and At4g22870 (ANS homologue), were also upregulated. These paralogous genes, as well as previously characterized genes, are presumably involved in anthocyanin biosynthesis.

All these metabolome and transcriptome data suggest that PAP1 specifically regulates flavonoid biosynthetic genes causing the specific accumulation of cyanidin- and quercetin-type flavonoids in a relatively specific manner. This finding is in striking contrast to that of a recent study of the anthocyanin-accumulating pho3 mutant of the sucrose transporter gene, wherein a wide array of gene expressions changed (Lloyd and Zakhleniuk, 2004).

Functional identification of two flavonoid glycosyltransferases

In the Arabidopsis genome, 107 UDP-sugar-dependent glycosyltransferase genes are present (Bowles, 2002). Only a few of them, however, have been functionally characterized. In our present study, two glycosyltransferases, UGT78D2 (At5g17050) and UGT75C1 (At4g14090), were predicted to be involved in anthocyanin biosynthesis, and these were subsequently identified as flavonoid 3-O-glucosyltransferase and anthocyanin 5-O-glucosyltransferase, respectively.

As the mutant of UGT78D2 (ugt78d2) still accumulated a small amount of anthocyanins, the presence of a secondary activity of flavonoid 3-O-glucosyltransferase was suggested. UGT78D1, which is structurally similar to UGT78D2, has recently been identified as flavonoid 3-O-rhamnosyltransferase using UDP-rhamnose as the sugar donor (Jones et al., 2003). Both proteins belong to the same phylogenic group of flavonoid 3-O-glycosyltransferases. However, the specificities of UGT78D1 and UGT78D2 toward UDP-sugar are strict, as determined from the distinct flavonoid accumulation patterns of mutants lacking the gene for each protein.

UGT75C1 belongs to the phylogenic group of anthocyanin 5-O-glucosyltransferases together with functionally identified anthocyanin 5-O-glucosyltransferases from various plant species (Yamazaki et al., 1999, 2002). UGT75C1 is functionally non-redundant in A. thaliana, because its mutant (ugt75c1) completely lacks anthocyanin 5-O-glucosides.

Predicted functions of genes upregulated by PAP1 over-expression

In addition to the two glycosyltransferase genes functionally identified in our present investigation, two other genes, At5g54060 (UGT79B1) and At3g21560 (UGT84A2), were induced in PAP1 over-expressing lines, suggesting the possible participation of the proteins encoded by these genes in the production of anthocyanins. Due to the weak induction of At3g21560 by PAP1, this gene is not listed in Table 2; however, the induction in pap1-D was reproducible (Table S2). As the most extensively modified anthocyanin molecule A11 possesses, in addition to 3-O-glucose and 5-O-glucose, a xylose residue attached at the C2-position of 3-O-glucoside and a glucose residue attached at the p-position of a coumaroyl group, two unidentified proteins, UGT79B1 and UGT84A2, are assumed to be responsible for either of these two extra sugar attachments. Considering the differences in the pattern of anthocyanin accumulation and gene expression profile between the leaves and roots, UGT79B1 is assumed to be most likely responsible for xylosyltransfer to the C2-position of glucose, and UGT84A2 for glucosyltransfer to the p-position of a coumaroyl group. The clustering in the molecular phylogenic tree of the glycosyltransferase family is also consistent with these assumptions.

The Arabidopsis genome contains approximately 70 genes associated with acyl-CoA-dependent acyltransferase (Dudareva and Pichersky, 2000). Two putative acyltransferase genes, At1g03940 and At3g29590, were upregulated by PAP1 expression. The most extensively modified anthocyanin A11 contains three acyl groups: sinapoyl, p-coumaroyl and malonyl. Taking into account the distinct patterns of the expression of the two genes and anthocyanin accumulation in the leaves and roots, At1g03940 and At3g29590 would either be malonyltransferase or p-coumaroyltransferase. The patterns of gene expression and anthocyanin accumulation in stressed plants by sucrose treatment and UV irradiation (data not shown) suggest that sinapoyltransferase is expressed constitutively in such plants.

Glutathione S-transferase (GST) is required for the vacuolar sequestration of anthocyanin in maize (Bz2; Marrs et al., 1995) and petunia (An9; Alfenito et al., 1998). In the Arabidopsis genome, 47 GST family genes are present (Dixon et al., 2002). Recently, Arabidopsis GST TT19 (At5g17220, GST code; AtGSTF12) has been isolated as an anthocyanin-transport-facilitating protein (Kitamura et al., 2004). In our present study, in addition to the TT19 gene, two other genes, At1g02930 and At1g02940, located adjacently in chromosome 1 were induced by PAP1 expression. These results suggest that GSTs encoded by At1g02930 and At1g02940 are responsible, at least in part, for the vacuolar sequestration of anthocyanin in Arabidopsis in addition to TT19, as the tt19 mutant still accumulates a small amount of anthocyanins (Tohge, T. and Saito, K., Chiba University, Chiba, Japan, personal communication).

Networks of transcription factors

Recently, a network model of the TTG1-dependent transcriptional pathway including anthocyanin accumulation, seed coat pigmentation and trichome initiation has been proposed (Zhang et al., 2003). In the present study of PAP1 over-expressing plants, three transcription factor genes, TT8 (bHLH protein), TTG2 (WRKY protein) and At5g61600 (a AP2 domain factor), in addition to PAP1, were upregulated. The other transcription-factor genes did not change (Table S2). In addition, the pap1-D mutant exhibited no distinct changes in its seed coat pigmentation and trichome initiation, though a dominant chimeric PAP1 repressor downregulates proanthocyanidin formation (Matsui et al., 2004). These results demonstrate that PAP1 is responsible for the anthocyanin-specific downstream of the transcription network. TTG1 (WD40 protein) is necessary in addition to PAP1 for anthocyanin production (Borevitz et al., 2000). A basic MYC protein, TT8, required for DFR and BAN gene expression in Arabidopsis siliques is necessary for proanthocyanidin production (Nesi et al., 2000).

Regarding common cis-acting elements related to PAP1, we found statistically significant motifs in the approximately 1000 bp promoter regions of 38 upregulated genes using the place program ( and motif analysis ( Two motifs, MYBPLANT (A/CACCA/TAA/CC) and MYBPZM (CCA/TACC), were found as candidate target cis-elements of the PAP1 transcriptional factor. These motifs have been identified as cis-elements responsible for the binding of MYB proteins in the anthocyanin pathways in Antirrhinum (Sablowski et al., 1994; Tamagnone et al., 1998) and maize (Grotewold et al., 1994).

In addition, two sequences, CCCACC and CACGTG, were found as common motifs in the promoter regions of the upregulated genes. However, there is as yet no available information on the functions of these candidate cis-elements. Further detailed analysis is necessary to determine such functions.

Experimental procedures

Plant materials and growth conditions

Arabidopsis thaliana (ecotype Columbia) plants were used as the wild-type plant in this study. The pap1-D mutant was described previously (Borevitz et al., 2000). The PAP1 cDNA over-expressing transformant was obtained by transformation of A. thaliana with the engineered Ti plasmid carrying cauliflower mosaic virus 35S promoter linked with the coding sequence of PAP1 cDNA. The plants were cultured on GM-agar medium containing 1% sucrose (Valvekens et al., 1988) in a growth chamber at 22°C in 16/8 h light and dark cycles for 3 weeks, or in a standard greenhouse at 22°C in 16/8 h light for 4 weeks. Samples from wild-type plant and PAP1over-expressing lines were used, namely: WLA (wild-type leaves grown on GM agar medium); PLA (pap1-D mutant leaves grown on GM agar medium); OLA (PAP1-over-expressed transgenic leaves grown on GM agar medium); WLV (wild-type leaves grown on vermiculite); PLV (pap1-D mutant leaves grown on vermiculite); WRA (wild-type roots grown on GM medium); and PRA (pap1-D mutant roots grown on GM medium). The leaves and roots of plants were harvested, immediately frozen with liquid nitrogen and stored at −30°C until use. Identical plant materials were used for analysis of transcriptome using DNA microarrays, targeted flavonoid profile by HPLC/PDA/ESI-MS and non-targeted metabolome by FT-MS.

Evaluation of T-DNA insertion mutants

The T-DNA-inserted mutants of A. thaliana, line 049338 and line 108458, were obtained from the Salk Institute. Genomic DNA of the mutants of A. thaliana was extracted with DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). The left border of T-DNA and flanking sequence of each line was amplified by PCR using gene-specific primers (5′-CGGAGGTTGGTACGGAAGTGA-3′ for 049338, 5′-GCGGTCTTGTGGAGGTTGAGA-3′ for 108458) and LBb1 (5′-GCGTGGACCGCTTGCTGCAACT-3′). Nucleotide sequences of the PCR products were determined for confirmation of T-DNA insertion sites. Total RNA of mutants were extracted with RNeasy Plant Mini Kit (Qiagen), and cDNA was synthesized with SuperScript II RNase H- reverse transcriptase (Invitrogen Corp., Carlsbad, CA, USA) following the manufacturer's instruction. By RT-PCR, the lack of transcripts of At5g17050 and At4g14090 was confirmed in each line, in which homozygous T-DNA was inserted. The sequences for RT-PCR are 5′-CAACACCGCACAATCCAACTC-3′ and 5′-ACCCGTTGCTTCGTGTTTCA-3′ for UGT78D1, and 5′-CGACGGTCTCAAGTCATTCGA-3′ and 5′-TCAGCAAACTGCGGAAACG-3′ for UGT75C1, respectively.

Targeted flavonoid profiling by HPLC/PDA/MS, amino acid analysis and anion analysis

Frozen leaves and roots were homogenized in 5 μl extraction solvent (methanol:acetate:H2O = 9:1:10) per 1 mg fresh weight of tissues by mixer mill (MM300; Retsch Gmbl & Co. KG, Haan, Germany) at 30 Hz. After centrifugation at 12 000 g, cell debris was discarded and extracts were centrifuged again. Fifty microliters of supernatant was applied to HPLC/PDA/ESI-MS system comprising a Finnigan LCQ-DECA mass spectrometer (ThermoQuest, San Jose, CA, USA) and an Agilent HPLC 1100 series (Agilent Technologies, Palo Alto, CA, USA) as described previously (Jones et al., 2003; Yamazaki et al., 2003). HPLC was carried out on a TSK-GEL RP-18 (φ4.6 mm × 150 mm; TOSOH, Tokyo, Japan) at a flow rate of 0.5 ml min−1. Elution gradient with solvent A [CH3CN-H2O-TFA (10:90:0.1)] and solvent B [CH3CN-H2O-TFA (90:10:0.1)] and the following elution profile (0 min 100% A, 40 min 60% A, 40.1 min 100% B, 45 min 100% B, 45.1 min 100% A, 52 min 100% A) using linear gradients in between the time points. PDA was used for detection of UV-visible absorption in the range of 250–650 nm. Nitrogen was used as sheath gas for the positive-ion ESI-MS performed at capillary temperature and voltage of 350°C and 5.0 kV, respectively. The tube lens offset was set at 10.0 V. Full scan mass spectra were acquired from 200–1500 m/z at 2 scans sec−1. Tandem MS analysis was carried out with helium gas as the collision gas. The normalized collision energy was set to 30%. Metabolites were identified based on UV visible absorption spectra and mass fragmentation by tandem MS analysis in comparison with the known compounds of our laboratory stock (Jones et al., 2003; Yamazaki et al., 2003) and the reported data (cited in Table 1).

Amino acid analysis was carried out by post-column derivatization method using HPLC coupled fluorescent detection as described previously (Hirai et al., 2004). Anion and sugar analysis was performed by capillary electrophoresis as reported previously (Hirai et al., 2004).

Non-targeted metabolome analysis by FT-MS

High-, middle- and non-polar extracts of plant materials were subjected to FT-MS (APEX III FT-ICMS; Bruker Daltonics, Billerica, MA, USA) as described previously (Aharoni et al., 2002; Hirai et al., 2004). The fold change values of intensity of each mass peaks observed were calculated as the ratio of signal intensity in mutant and transformant samples to that in the wild-type sample. Metabolite identification was carried out based on elemental composition calculations from accurate m/z value using DISCOVArray (Phenomenome Discoveries, Inc., Saskatoon, Canada; The detailed procedure is described elsewhere (Table S3). The fold change values were used for PCA. PCA was conducted using the software of GeneLinker Gold 3.0 (Molecular Mining Corp., Cambridge, MA, USA;

Transcriptome analysis using DNA microarrays

Total RNA was extracted using RNeasy Plant Mini Kit (Qiagen) from frozen plant materials. Labeled target cRNA was prepared according to the technical manual of Arabidopsis Genome ATH1 DNA array (Affymetrix, Santa Clara, CA, USA). Double-stranded cDNA was prepared from 40 μg of total RNA using SuperScript Choice System (Invitrogen). The resultant cDNA was transcribed in vitro using BioArray High Yield RNA Transcript Kit (Enzo, New York, NY, USA). Following purification and fragmentation, the labeled cRNA was hybridized to Arabidopsis Genome ATH1 GeneChip array (Affymetrix) in a Hybridization Oven model 640 (Affymetrix). Washing and staining of chips were carried out using GeneChip Fluidics Station model 400. Scanning was carried out with gene Array Scanner (Agilent Technologies). The procedure is described elsewhere (Table S1).

Calculation and analysis of transcriptome data

The GeneSpring 6.2 (Silicon Genetics, Redwood City, CA, USA, was used for GeneChip-array data calculation. The raw signal of each gene after subtraction of background was normalized with the median of all measurements for each sample on the chip. The minus values were converted to 0.01 signal value. Fold change was calculated as the ratio of normalized signal intensity in the mutant or transformant to that in the wild-type plant. To reduce false positives, we selected genes with ‘present’ absolute values out of the baseline data.

In vitro assay of recombinant UGT78D2

Full-length cDNA (RAFL clone no. RAFL05-12-P07; Seki et al., 1998, 2002) of At5g17050 was obtained from RIKEN BioResource Center, Tsukuba, Japan. To express recombinant protein, At5g17050 cDNA was introduced to GatewayTM system (Invitrogen Corp.) following the manufacturer's instruction. The attB site was introduced by two steps of PCR using gene-specific primers (5′-AAAAAGCAGGCTCCATGACCAAACCCTCCGAC-3′ and 5′-AGAAAGCTGGGTCACATTCAAATAATGTTTACAACTGCATCC-3′) and attB adaptor primers (5′-GGGGACAAGTTTGTACAAAAAAGCAGGCT-3′ and 5′-GGGGACCACTTTGTACAAGAAAGCTGGGT-3′), respectively. The entry clone pE5-17050 was then obtained by BP recombination with pDONR221. The nucleotide sequence of the entry clone was determined to confirm the sequence. Then, At5g17050 cDNA was introduced to pDEST17 from pE5-17050 by LR recombination to produce pD17-5g17050. Recombinant UGT78D2 protein with 6X His tag at the N-terminal was expressed in E. coli BL-21 AITM transformed with pD17-5g17050 as described before (Nakajima et al., 2001) with slight modification (0.2% of l-arabinose was used to induce the expression of recombinant protein). After induction, cells were cultured at 16°C for overnight. Detection of 3GT activity in the protein extracts of E. coli was performed as described previously (Taguchi et al., 2001).


We thank Dr Richard A. Dixon (Samuel Roberts Noble Foundation, Admore, OK, USA) for providing the pap1-D mutant. We also thank the Salk Institute Genomic Analysis Laboratory for providing the sequence-indexed A. thaliana T-DNA insertion mutants, and the RIKEN BioResource Center for providing the full-length cDNA. We thank Ms Rebecca Friend-Heath for kindly editing the English in the manuscript. This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology (Japan; Grants-in-Aid for Scientific Research), by CREST of Japan Science and Technology Agency (JST), and by Research for the Future Program (grant no. 00L01605; Molecular Mechanisms on Regulation of Morphogenesis and Metabolism Leading to Increased Plant Productivity).

Supplementary Material

The following material is available from

Figure S1. Scatter plot of normalized signal intensity. Genes with ‘present’ values in the absolute call of the baseline data were selected for the analysis. Normalized signal intensity of each spot in the wild-type leaf sample (WLA1) (x-axis) is plotted against that in the pap1-D-mutant leaf sample (PLA1) (y-axis). Black, red, blue and purple arrows indicate flavonoid biosynthetic genes, PAP1 gene, glycosyltransferase genes, acyltransferase genes and glutathione-S-transferase genes, respectively. Green lines represent the threshold lines (y = 2x and y = 0.5x) and the diagonal line (y = x).

Table S1 The Minimum Information About Microarray Experiment (MIAME) checklist of GeneChip experiments. Experimental designs and procedures were described following the MIAME checklist format proposed by the Microarray Gene Expression Data Society (

Table S2 Expression of genes annotated or presumed to be related with phenylpropanoid production by DNA array analysis

Table S3 The detailed procedure for non-targeted metabolome analysis by FT-MS