• Open Access

MS/MS spectral tag-based annotation of non-targeted profile of plant secondary metabolites


  • Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.

(fax +81 45 503 9489; e-mail ksaito@psc.riken.jp).


The MS/MS spectral tag (MS2T) library-based peak annotation procedure was developed for informative non-targeted metabolic profiling analysis using LC-MS. An MS2T library of Arabidopsis metabolites was created from a set of MS/MS spectra acquired using the automatic data acquisition function of the mass spectrometer. By using this library, we obtained structural information for the detected peaks in the metabolic profile data without performing additional MS/MS analysis; this was achieved by searching for the corresponding MS2T accession in the library. In the case of metabolic profile data for Arabidopsis tissues containing more than 1000 peaks, approximately 50% of the peaks were tagged by MS2Ts, and 90 peaks were identified or tentatively annotated with metabolite information by searching the metabolite databases and manually interpreting the MS2Ts. A comparison of metabolic profiles among the Arabidopsis tissues revealed that many unknown metabolites accumulated in a tissue-specific manner, some of which were deduced to be unusual Arabidopsis metabolites based on the MS2T data. Candidate genes responsible for these biosyntheses could be predicted by projecting the results to the transcriptome data. The method was also used for metabolic phenotyping of a subset of Ds transposon-inserted lines of Arabidopsis, resulting in clarification of the functions of reported genes involved in glycosylation of flavonoids. Thus, non-targeted metabolic profiling analysis using MS2T annotation methods could prove to be useful for investigating novel functions of secondary metabolites in plants.


The objective of ‘non-targeted’ metabolic profiling analysis is to describe metabolic events in plants by determining all detectable metabolites. Of the various profiling techniques, non-targeted analysis using LC-MS is a promising tool for investigating the diversity of phytochemicals (Bottcher et al., 2008; Dettmer et al., 2007; Villas-Boas et al., 2005); it is said to be as effective as methods employing GC-MS (Moco et al., 2007b; Tikunov et al., 2005). Many applications have been reported in various fields of plant sciences (Broeckling et al., 2005; Farag et al., 2008; Grata et al., 2008; Keurentjes et al., 2006; Kim et al., 2007; Schliemann et al., 2007), including functional genomic studies for the identification of metabolism-related genes (Messerli et al., 2007; Mintz-Oron et al., 2008; Schauer et al., 2006). The methodology of LC-MS-based metabolic profiling has recently been improved in terms of data acquisition with the development of peak-picking software packages such as xcms (Smith et al., 2006), MZmine (Katajamaa and Oresic, 2005) and MetAlign (de Vos et al., 2007). The current state of the art of LC-MS metabolomics has been summarized in the experimental protocol by the Wageningen group (de Vos et al., 2007) as well as in review articles (Dettmer et al., 2007; Dunn, 2008).

One of the most difficult technical challenges encountered in LC-MS metabolomics is the development of an annotation strategy for the many unknown peaks (Bino et al., 2004; Moco et al., 2007a; de Vos et al., 2007). In microarray analyses, gene expression profile data are analyzed by using various data-mining methods. In addition, functional annotations for each gene spotted on the array can be deduced from the sequence data by performing a homology search of the databases. The results are interpreted on the basis of the gene expression and annotation data, promoting further understanding of plant functions. However, metabolite information has not been fully assigned to peaks in LC-MS profile data. For example, only the peaks derived from six flavonoids, several glucosinolates and a few phenylpropanoids have been annotated in the case of the aerial tissues of intact Arabidopsis (Keurentjes et al., 2006; von Roepenack-Lahaye et al., 2004), while the metabolic profile data often contain more than 1000 peaks (rows). Thus, the current state of non-targeted metabolic profiling using LC-MS may be considered to be an analogy of an EST-based custom-made microarray, but one that lacks sequence information

With regard to GC-MS profiling, the peak annotation procedure has been facilitated by creation of a spectral database of authentic compound data (Wagner et al., 2003), as well as improvements in the methods of processing complex profiling data (Jonsson et al., 2006; Kopka, 2006; Lisec et al., 2006; Tikunov et al., 2005; Wiklund et al., 2008). However, only a few peaks in the metabolic profile data were annotated by using a standard compound-based method in LC-MS profiling because the collection of authentic compounds of plant secondary metabolites is incomplete. Therefore, considerable efforts have been made in annotation of metabolites using tandem mass spectral data (MS/MS) (Bottcher et al., 2008; Farag et al., 2007; Moco et al., 2006; Rochfort et al., 2008; von Roepenack-Lahaye et al., 2004; Suzuki et al., 2007). Although the MS/MS spectra are insufficient for metabolite identification in a strict sense, they can provide an indication of putative structures of metabolites via databases and/or manual interpretation of the fragmentation pattern (Bottcher et al., 2008; Rochfort et al., 2008).

In non-targeted metabolic profiling analyses, MS/MS data have usually been acquired for several interesting peaks observed by data mining (Bottcher et al., 2008; Cao et al., 2008; Soga et al., 2006; Takahashi et al., 2008). Thus, additional MS/MS analyses are required when other peaks were observed by means of a different mining method (Figure 1a). This situation can be improved if the MS/MS spectra of most of the peaks in the profile data are acquired and stored in a library prior to metabolic profiling analyses (Figure 1b). A spectral library can be created from MS/MS spectra obtained using the automatic data acquisition function of the MS spectrometer in an experiment distinct from conventional metabolic profiling analyses. Once the library is created, the MS/MS spectra of the metabolite peaks observed in the profile data can be obtained from the library. This will enable deduction of the structure of the metabolites by manual and/or database-assisted interpretation of the fragmentation pattern without additional MS/MS analysis. On the basis of this information, a hypothesis can be formulated for a metabolic event in sample plants to facilitate further functional characterization of plant metabolism, as performed in microarray analyses. Identification of unusual plant constituents by interpretation of the MS2T data may reveal the existence of a pathway and the genes responsible for such biosynthesis in plants. Additionally, metabolic phenotyping of a loss-of-function mutant could provide an understanding of the function of the mutated gene.

Figure 1.

 Usual (a) and modified (b) procedures for non-targeted metabolic profiling analysis using LC-MS.
The new and improved steps in this study are highlighted in gray.

In this study, a strategy for non-targeted metabolic profiling analysis using LC-MS with MS2T-based peak annotation was investigated by developing an MS2T library of Arabidopsis metabolites. The performance of the developed method was evaluated by analyzing the tissue specificity of the metabolites and metabolic phenotyping of Ds transposon-tagged mutant lines of Arabidopsis. Using this method, more than 1000 peaks were quantitatively analyzed, and approximately 50% of these peaks were tagged by MS2Ts. The MS2T-based peak annotation procedure appends metabolite information to approximately 100 of these peaks. The metabolic profile data successfully reveal not only novel aspects of tissue-specific secondary metabolism in Arabidopsis but also metabolic functions of the mutated genes by describing the metabolic events occurring in plant tissues.


Creation of MS2T libraries

In order to create MS2T libraries of Arabidopsis shoot metabolites, sample extracts derived from the shoot and inflorescence tissues of 6-week-old Arabidopsis seedlings were analyzed using liquid chromatography-quadrupole-time-of-flight/mass spectrometry (LC-Q-TOF/MS) by operating the mass spectrometer in the data-dependent acquisition mode (Hernandez et al., 2006; Ishihama, 2005). MS/MS spectra of many metabolites eluted from the column were thus automatically obtained (see Experimental procedures). The MS/MS spectral data obtained using the above method are referred to as MS/MS spectral tags (MS2Ts). As the data-dependent acquisition function did not provide MS/MS spectra in the case of overlapping metabolites due to the slow data-acquisition cycle, a slower gradient curve program with half the flow rate was employed for LC methods (see Experimental procedures). Additionally, the analyses were repeated 25 times by altering the mass ranges (60 Da) used to select precursor ions in order to obtain as many MS2Ts as possible. Finally, two MS2T libraries were prepared using shoot (ATH01p, 6491 entries) and inflorescence (ATH02p, 3703 entries) tissue extracts (Table S1). Each MS2T accession was labeled in the format‘ATH02p01290′ for example; this denotes the 1290th spectrum (01290) derived from the 2nd library of Arabidopsis thaliana (ATH02) extracts obtained in the positive ion mode (p, positive). To visualize the MS/MS spectral data of the MS2T accessions, a web-based tool named ‘MS2T viewer’ is provided on our website (http://prime.psc.riken.jp/) (Figure 2). It should be noted that the MS2T libraries contain a large amount of data derived from artifacts or low-intensity ions, and there is redundancy due to the iterative acquisition of MS/MS spectra of the same metabolite. The quality and technical problems of the MS2T library data are discussed in Appendix S1.

Figure 2.

 Screenshot of the MS2T viewer.
The spectrum image of MS/MS data (upper panel) and other text records (retention time, precursor ion m/z, etc., in the lower text box) of the queried MS2T accession (ATH02p01290) is displayed in the web-based tool. The MS2T viewer is available on our website (http://prime.psc.riken.jp/).

Acquisition and processing of metabolic profile data

To compare metabolite profiles among the tissues, metabolites were extracted from the rosette leaves, cauline leaves, stems and inflorescence tissues of 6-week-old Arabidopsis seedlings (n = 8) and analyzed using a profiling method developed in this study (see Experimental procedures) (Figure 1b, step 1, and Figure S1). The raw chromatogram data were organized into a peak intensity table (hereafter referred to as a ‘matrix’, Table S2) using MetAlign (Moco et al., 2006; de Vos et al., 2007) (Figure 1b, step 2). In this matrix, peak intensity data derived from a chromatographic peak of a metabolite commonly observed among the samples (eluted at similar retention times with identical mass numbers) were recorded in a single row. Therefore, each row in the matrix consists of data including the retention time (‘Ret.’ and ‘Scan Nr’ column in Table S2), unit mass number (‘Mass’ column) and peak intensity values obtained from each sample (Table 1). The peak-picking parameters of MetAlign were selected for sensitive detection of low-intensity peaks derived from metabolites (Appendix S2). Therefore, many signals derived from data other than metabolites, such as spikes, baseline drifts and noise, were inevitably included in the matrix (data not shown), indicating that matrix filtering is essential for discarding rows containing non-metabolite peaks. In this study, the processing of the original data matrix was performed by using methods for normalization, filtering of low-intensity data, and the deconvolution of isotope peaks to produce a matrix containing fewer biased and redundant data (Figure 1b, step 2). A toolbox consisting of six tools (‘Nprefilter’, ‘Nnormalizer’, ‘Nfilter’, ‘Nisotoperemover’, ‘Nannotator’ and ‘Nmotifsearch’) has been developed to execute the corresponding data-processing steps (Appendix S2 and S3). The precision of the peak intensity was estimated to be approximately 10%, although peak height instead of peak area was used to determine peak intensity (Appendix S2); further, the drift in the retention time was restricted to within 0.1 min (data not shown). Consequently, a data matrix comprising 32 columns (samples) with 1233 rows (peaks) (Table S3) was generated from the original matrix comprising 14 946 rows (Table S2). The metabolic profiles of four tissues of Arabidopsis are shown in Figure 3. The results revealed that Arabidopsis synthesizes many phytochemicals in a tissue-specific manner.

Table 1.   MS2T-based peak annotation results
Peak no.Retention time (Rt) (min) m/z (Da)AnnotationMS2T ΔRt < 0.15 minCompound ΔRt <0.05 minKNApSAcK Δm/z <5 mDaMassBank Hit score >0.8Literature Hit score >0.8
Identified by cross-validation of standard compound data and database information (15 peaks)
2040.888138TrigonellineATH02p00017, ATH02p00339Trigonelline hydrochloride_CAS: 6138-41-6, CAS: 535-83-1: pyridine-2-aldoxime methochloride_CAS: 51-15-0 Trigonelline 
32931.314803-Methylsulfinyl-n- propylglucosinolateATH02p03162, ATH02p033883-(methylsulfinyl)propyl glucosinolate_CAS: 554-88-1   
34171.5734944-Methylsulfinyl-n- butylglucosinolateATH01p01271, ATH01p01502, ATH02p03393, ATH02p036044-(methylsulfinyl)butyl glucosinolate_CAS: 21414-41-5   
44841.818613Glutathione (oxidized form)ATH02p04203, ATH02p04412Glutathione (oxidized form)_CAS: 27025-41-8C28H37O15: durantoside III  
991.911121[Tyramine-NH3]+ATH02p00031  Tyramine  p-aminobenzoate 
2051.919138Tyramine Tyramine_CAS: 51-67-2   
13501.928268AdenosineATH01p05470, ATH01p05473, ATH01p05679, ATH02p01278, ATH02p01281, ATH02p01572Adenosine_CAS: 58-61-7C9H18N1O8: miserotoxin,  C10H14N5O4: adenosine, C13H18N1O3S1: U68204Adenosine 
8252.638220PantothenateATH01p05252, ATH02p00999, ATH02p01290Sodium d-pantothenate_CAS: 867-81-2, CAS: 79-83-4: d-Pantothenic acid hemicalcium salt_CAS: 137-08-6,  CAS: 79-83-4: trans-zeatin_CAS: 1637-39-4: Pantothenate 
35203.357505Indol-3-ylmethyl glucosinolateATH01p01520, ATH01p01756, ATH02p03409, ATH02p03412, ATH02p03624, ATH02p036264-methoxyindole-3-ylmethyl- glucosinolate_CAS: 4356-52-9 Cocarboxylase 
60903.5757Quercetin-3-O-α- l-rhamnopyranosyl(1,2)- β-d-glucopyranoside-7-O- α-l-rhamnopyranosideATH01p03512, ATH02p05004Quercetin-3-O-α-l-rhamnopyranosyl (1,2)-β-d-glucopyranoside-7-O- α-l-rhamnopyranoside_CAS: 161993-01-7C33H41O20: luteolin 7-rutinoside-3′-glucoside Herbacetin-7-O-rha, quercetin-3′/4′-rha
58793.686741Kaempferol-3-O-α- l-rhamnopyranosyl(1,2)-β- d-glucopyranoside-7-O-α- l-rhamnopyranosideATH01p03327, ATH02p05006, ATH02p05009Kaempferol-3-O- α-l-rhamnopyranosyl(1,2)-β-d-glucopyranoside-7-O- α-l-rhamnopyranoside_CAS: 162062-89-7C33H41O19: apigenin 7-rutinoside-4′-glucoside Cyanidin 3-(glucoside) rhamnoside
37803.8735354-Methoxyindol-3-ylmethyl glucosinolateATH01p017624-methoxyindole-3-ylmethyl- glucosinolate_CAS: 83327-21-3   
44553.923611Quercetin-3-O-β- glucopyranosyl-7-O- α-rhamnopyranosideATH01p02728, ATH01p02938, ATH02p04216, ATH02p04219, ATH02p04429Quercetin-3-O-β-glucopyranosyl-7- O-α-rhamnopyranoside_CAS: 18016-58-5C27H31O16: isoscutellarein 7-allosyl-(1→2)-glucoside, luteol, C28H35O15: hesperidin, neohesperidin, C31H31O13: 4′-O-methyl carthamidin 7-(2-p-coumaroyl glucoside) Herbacetin-7-O-rha, quercetin-3′/4′-rha, delphinidin 3-(6′’-coumaroyl)glucoside
42854.211595Kaempferol-3-O-β-glucopyranosyl-7- O-α-rhamnopyranoside; quercetin-3,7- O-α-l-di-rhamnopyranosideATH01p02244, ATH01p02732, ATH01p02735, ATH02p04021, ATH02p04024, ATH02p04220, ATH02p04223Kaempferol-3-O-β-glucopyranosyl-7- O-α-rhamnopyranoside_CAS: 2392-95-2: Quercetin-3,7-O-α- l-dirhamnopyranoside_CAS: 28638-13-3C30H27O13: apigenin 7- (6′’-E-caffeoylglucoside);7- [[6-O-[3-(3,C27H31O15: paniculatin, apigenin 7-allosyl-(1→2)-glucoside, C23H39N4O14: didemethylallosamidin Cyanidin 3-glucoside, cyanidin 3-galactoside, cyanidin 3-(6′’-coumaroyl) glucoside
41154.557579Kaempferol 3,7- O-dirhamnopyranosideATH01p02248, ATH01p02737, ATH01p02740, ATH02p03840, ATH02p04030Kaempferol 3,7-O-dirhamno pyranoside_CAS: 482-38-2C28H35O13: podorhizol β-d-glucoside, C27H31O14: chrysin 7-gentiobioside, 7,3′,4′-trihydroxyflavone Cyanidin 3-(glucoside) rhamnoside
Tentatively identified by cross-validation of database information (eight peaks)
16491.446308Glutathione (reduced form)ATH01p05951, ATH01p06241, ATH02p01565, ATH02p01876 C10H18N3O6S1: l-glutathione, C14H14N1O7: lycoricidinol Glutathione (reduced form)
4651.835182TyrosineATH01p04635, ATH01p04938  Tyr 
3502.486166PhenylalanineATH01p03885, ATH01p04646, ATH02p00363, ATH02p00684, ATH02p00687 C9H12N1O2:l-phenylalaninePhe N-acetyl phenylalanine, Bestatin 
6663.145205TryptophanATH01p04959, ATH01p05257, ATH02p01007 C11H13N2O2:l-tryptophan, vasicinol, 11-oxocytisine, C7H13N2O5: trehalamine, C12H13O3: 3-butylidene-7- hydroxyphthalide, C9H17O3S: 2-oxo-8-methyl thiooctanoic acidTrp 
30004.245449Quercetin-3,7-O-α- l-di-rhamnopyranoside (fragment)ATH01p00839, ATH01p01074, ATH02p02980, ATH02p02983, ATH02p03204, ATH02p03207Luteolin-8-C-glucoside_CAS: 28608-75-5C21H21O11: fisetin 8-C-glucoside C25H21O8: artonin P, C18H25O13: aralidioside  
28494.557433Kaempferol 3,7-O-dirhamnopyranoside (fragment)ATH01p00842, ATH01p00845, ATH01p01079, ATH02p02985, ATH02p02988, ATH02p03209Apigenin 8-C-glucoside_ CAS: 3681-93-4C25H21O7: calomelanol G; 3,4,7,8- tetrahydro-5-hydroxy-4-(4-hy, C24H33O5S1: (S)-furanopetasitin  
5163.137188[Trp-NH3]+ATH01p04655, ATH01p04960, ATH02p00374, ATH02p00695 C11H10N1O2: indole-3-acrylic acidTrp 
32875.073479Isorhamnetin-3-O-glucoside Isorhamnetin-3-O-glucoside_ CAS: 5041-82-7   
Peaks of flavonol glycosides tentatively annotated by motif analysis (24 peaks)
40104.228565Kaempferol(tetrahydroxy flavone)- Rha-pentosideATH01p02012, ATH02p03835   Cyanidin 3-(glucoside) rhamnoside
42843.83595Kaempferol (tetrahydroxy flavone)- Hex-RhaATH01p02239, ATH01p02726, ATH01p02729, ATH02p04018 C34H27O10: agathisflavone tetramethyl ether, cupressuflavone,C27H31O15: paniculatin, apigenin 7-allosyl-(1→2)-glucoside Cyanidin 3-glucoside, cyanidin 3-galactoside, kaempferol-7-O-neohesperidoside
58834.312741Kaempferol (tetrahydroxy flavone)- Hex-diRhaATH01p03336, ATH01p03339, ATH02p05017 C44H37O11: guibourtinidol-(4α→ 2)-3,5,4′- trihydroxystilben Cyanidin 3-(glucoside) rhamnoside, cyanidin 3-(6′’-coumaroyl)glucoside
60914.05757Kaempferol (tetrahydroxy flavone)- diHex-Rha; quercetin-Hex-diRhaATH01p03518, ATH02p05012 C44H37O12: guibourtinidol- (4α→2)-3,5,3′,4′-tetrahydroxys, Cyanidin 3-(glucoside)rhamnoside
44564.312611Kaempferol (tetrahydroxy flavone)-diHexATH01p02734, ATH01p02943, ATH02p04222, ATH02p04225, ATH02p04434, ATH02p04436 C28H35O15: hesperidin, neohesperidin, 4,2′,4′-trihydroxy-6′-m, C31H31O13: 4′-O-methylcarthamidin 7-(2-p-coumaroyl glucoside), C34H31N2O9: atalanine, C27H31O16: isoscutellarein 7-allosyl- (1→2)-glucoside, luteol, C30H27O14: prodelphinidin B4 Cyanidin 3-glucoside, cyanidin 3-galactoside, cyanidin 3-sophoroside, cyanidin 3-diglucoside, cyanidin 3-laminaribiose
30014.312449Kaempferol (tetrahydroxy flavone)- diHex (fragment)ATH01p00839, ATH01p01074, ATH02p02983, ATH02p03204, ATH02p03207 C21H21O11: fisetin 8-C-glucoside C25H21O8: artonin P, C18H25O13: aralidioside  
28484.211433Kaempferol (tetrahydroxy flavone)- 3-O-β-glucopyranosyl-7-O- α-rhamnopyranoside (fragment)ATH01p00837, ATH01p01072, ATH02p02979, ATH02p02982, ATH02p03203, ATH02p03206 C25H21O7: calomelanol G, C21H21O10: apigenin 7-O-glucoside, isovitexin, C26H25O6: artocommunol CA; (+)-6-hydroxy-11-methoxy-3,3-dimet  
42833.686595Kaempferol (tetrahydroxy flavone)- 3-O-α-l-rhamnopyranosyl(1,2)-β- d-glucopyranoside-7-O-α- l-rhamnopyranoside (fragment)ATH01p02235, ATH01p02726, ATH02p04014, ATH02p04212 C27H31O15(3): paniculatin, apigenin 7-allosyl-(1→2)-glucoside, C34H27O10: agathisflavone tetramethyl ether, cupressuflavone, Cyanidin 3-(glucoside) rhamnoside, cyanidin 3-(6′’-coumaroyl)glucoside
28473.686433Kaempferol (tetrahydroxy flavone)- 3-O-α-l-rhamnopyranosyl(1,2)- β-d-glucopyranoside-7-O-α- l-rhamnopyranoside (fragment)ATH01p00829, ATH01p01063, ATH02p02971, ATH02p03196 C21H21O10: apigenin 7-O-glucoside, isovitexin, C24H17O8: kaempferol 3-p-coumarate  
52004.27681Kaempferl (tetrahydroxy flavone)- Hex-Rha-malonylATH01p03162, ATH02p04625, ATH02p04818 C34H33O15: okanin 4′-(2′’,4′’-diacetyl-6′’- p-coumarylglucoside, C30H33O18: luteolin 7-(6′’-malonylneohesperidoside), kaempfer,  
41274.033581Quercetin (pentahydroxy flavone)- Rha-pentosideATH01p02241, ATH01p02730, ATH02p04020, ATH02p04217 C33H25O10: sciadopitysin, 7,7′’,4′’’- tri-O-methylagathisflavo Herbacetin-7-O-rha, quercetin-3′/4′-rha, herbacetin-7-O-rha-8-O-glu
44543.517611Quercetin (pentahydroxy flavone)- Hex-RhaATH01p02723, ATH01p02932, ATH02p04210, ATH02p04420, ATH02p04423 C31H31O13: 4′-O-methylcarthamidin 7- (2-p-coumaroylglucoside) Delphinidin 3- (6′’-coumaroyl)glucoside, delphinidin 3-rutinoside, delphinidin 3-glucoside, rutin, delphinidin 3-galactoside
46614.042627Quercetin (pentahydroxy flavone)- diHexATH01p02939, ATH02p04218, ATH02p04430 C27H31O17: 6-hydroxyluteolin 7-sophoroside, 6-hydroxyluteolin, C35H31O11: kuwanon L, C30H27O15: 6-hydroxykaempferol 7-(6′’- (E)-caffeylglucoside) Delphinidin 3-glucoside, delphinidin 3-galactoside
31554.042465Quercetin (pentahydroxy flavone)-diHex (fragment)ATH01p01069, ATH01p01301, ATH02p03202, ATH02p03419 C21H21O12: gossypetin 8-rhamnoside, C18H25O12S1: paederosidic acid,  
29993.94449Quercetin (pentahydroxy flavone)- 3-O-β-glucopyranosyl- 7-O-α-rhamnopyranoside (fragment)ATH01p00833, ATH01p01067, ATH02p02977, ATH02p03201 C21H21O11: fisetin 8-C-glucoside; 8-C-glucosylfisetin, isoorie, C25H21O8: 8,9-dihydro-6,11- dihydroxy-3,3-dimethyl-  
29983.517449Quercetin (pentahydroxy flavone)-3- O-α-l-rhamnopyranosyl(1,2)- β-d-glucopyranoside- 7-O-α-l-rhamnopyranosideATH01p00826, ATH01p01060, ATH02p02969 C21H21O11: fisetin 8-C-glucoside; 8-C-glucosylfisetin, isoorie,  
42864.371595Isorhamnetin (tetrahydroxymethoxy flavone)-Rha-pentosideATH01p02735, ATH02p04024, ATH02p04027, ATH02p04223 C27H31O15: paniculatin, apigenin 7-allosyl-(1→2)- glucoside, C34H27O10: agathisflavone tetramethyl ether  
55454.625711Isorhamnetin(tetrahydroxymethoxy flavone)-Hex-Rha-malonylATH02p04819    
46353.915625Isorhamnetin(tetrahydroxymethoxy flavone)-Hex-RhaATH02p04215, ATH02p04427 C25H33N6O13: nikkomycin Petunidin 3-glucoside, petunidin 3-galactoside
46374.27625Isorhamnetin(tetrahydroxymethoxy flavone)-Hex-RhaATH02p04221, ATH02p04224, ATH02p04435 C27H29O17: luteolin 7-glucuronide- 3′-glucoside Petunidin 3-(6′’-coumaroyl) glucoside
31394.261463Isorhamnetin(tetrahydroxymethoxy flavone)-Hex-Rha (fragment)ATH01p01305, ATH02p03205, ATH02p03423   Brevifoliol
44324.625609Isorhamnetin(tetrahydroxymethoxy flavone)-diRhaATH01p02738, ATH01p02947, ATH02p04029, ATH02p04032, ATH02p04228 C28H33O15: physcion 8-gentiobioside, luteolin 3′-methyl ether, C35H29O10: olivieriflavone  
31404.625463Isorhamnetin(tetrahydroxymethoxy flavone)-diRha (fragment)ATH02p03210, ATH02p03428, ATH02p03431 C25H19O9: sapurimycin Brevifoliol
Peaks of sinapoylmalate tentatively annotated by the motif analysis (four peaks)
6914.845207Sinapoymalate (fragment)ATH01p04987, ATH01p05285, ATH02p00722, ATH02p01034, ATH02p01037 C10H11N2O1S1: 3-indolylmethylthiohydroximate, C8H15O2S2: (R)-lipoic acid  
6924.921207Sinapoymalate (isomer, fragment)ATH01p04991, ATH01p05285, ATH01p05288, ATH02p00727, ATH02p01034, ATH02p01037 C10H11N2O1S1: 3-indolylmethylthiohydroximate, C11H11O4: lathodoratin, scoparone, C8H15O2S2: (R)-lipoic acid  
58324.921737Sinapoymalate (isomer, adduct)ATH01p03345, ATH02p05025    
58314.837737Sinapoymalate (adduct)ATH01p03345, ATH02p05025    
Peaks of glucosinolates tentatively annotated by the motif analysis (26 peaks)
21990.9813584-Methylsulfinyl-n-butyl glucosinolate (fragment)ATH01p00006, ATH01p00268, ATH02p02175, ATH02p02178, ATH02p02438 C20H12N3O4: BE 13793C  
6020.9811964-Methylsulfinyl-n-butyl glucosinolate (fragment)ATH01p04622, ATH02p00659, ATH02p00662, ATH02p00971, ATH02p00974    
22001.5643584-Methylsulfinyl-n-butyl glucosinolate (fragment)ATH01p00011, ATH01p00278, ATH02p02185 C20H12N3O4: BE 13793CTyrosine methyl ester, glucosaminate 
6031.5641964-Methylsulfinyl-n-butyl glucosinolate (fragment)ATH01p04630, ATH02p00668, ATH02p00980    
23101.9623725-Methylsulfinyl-n-pentyl glucosinolate (fragment)ATH01p00015   2α-Scetoxy-2′β-deacetyl austrospicatine
7271.9622105-Methylsulfinyl-n-pentyl glucosinolate (fragment)     
35411.9625085-Methylsulfinyl-n-pentyl glucosinolate_1ATH01p01504, ATH01p01741, ATH01p01743, ATH02p03397, ATH02p03610 C10H16N5O13P2S1: 3′-phosphoadenosine 5′-phosphosulfateLoperamide, albendazole, N6-methyl-2′- deoxyadenosine 
36692.3255226-Methylsulfinyl-n-hexyl glucosinolateATH01p01508, ATH01p01746, ATH02p03614, ATH02p03812    
25722.7994007-Methylsulfinyl-n-heptylglucosinolate (fragment)ATH01p00563, ATH01p00566, ATH01p00815, ATH02p02722, ATH02p02959 C20H22N3O6: pelagiomicin ACloquintocet-mexyl, ketamine 
11022.7992387-Methylsulfinyl-n-heptylglucosinolate (fragment)ATH02p01000, ATH02p01292    
37852.7995367-Methylsulfinyl-n-heptylglucosinolateATH01p01747, ATH01p01991, ATH02p03619, ATH02p03621, ATH02p03816, ATH02p03819  Simeconazole, triadimefon 
32753.0444784-Methylthio-n-butylglucosinolateATH01p01287, ATH01p01290, ATH01p01515, ATH02p03188, ATH02p03406    
20573.0523424-Methylthio-n-butylglucosinolate (fragment)ATH01p00029, ATH01p06265, ATH02p02207 C17H16N3O5: pelagiomicin C  
39033.3145508-Methylsulfinyl-n-octylglucosinolateATH01p02001, ATH01p02230, ATH01p02233, ATH02p03821, ATH02p04007, ATH02p04010    
26903.3234148-Methylsulfinyl-n-octylglucosinolate (fragment)ATH01p00573, ATH01p00825, ATH02p02728, ATH02p02966 C17H24N3O9: SB 219383, C22H24N1O7: α-narcotine, synerazol  
12303.3232528-Methylsulfinyl-n-octylglucosinolate (fragment)ATH01p05491, ATH01p05699, ATH02p01299, ATH02p01302, ATH02p01595 C10H14N5O3: cordycepin, oxetanocin  
22843.348369Indol-3-ylmethylglucosinolate (fragment)ATH01p00036, ATH02p02211, ATH02p02472 C20H17O7: averufin, velloquercetin, malaccol C16H21N2O6S1(1): 3-indolylmethyl desulfoglucosinolate  
6853.348207Indol-3-ylmethylglucosinolate (fragment)ATH01p04963, ATH01p05260, ATH02p00699 C11H11O4: lathodoratin, scoparone, C10H11N2O1S1: 3-indolylmethylthio hydroximate, C8H15O2S2: (R)-lipoic acid  
25573.8733994-Methoxyindol-3-ylmethylglucosinolate (fragment)ATH01p00310, ATH01p00581    
10913.8812374-Methoxyindol-3-ylmethylglucosinolate (fragment)ATH01p05268, ATH01p05499 C14H9N2O2: 11-hydroxycanthin-6-one, C12H13O5: 5,6,7-trimethoxycoumarin, orthosporin, NSC 118343  
25584.4643991-Methoxyindol-3-ylmethylglucosinolate (fragment)ATH01p00320, ATH01p00590    
8595.2512227-Methylthio-n-heptylglucosinolate (fragment)ATH01p05291    
36495.2515207-Methylthio-n-heptylglucosinolateATH01p01544, ATH01p01780    
10785.9872368-Methylthio-n-octylglucosinolate (fragment)ATH01p05300, ATH01p05525    
25495.9873988-Methylthio-n-octylglucosinolate (fragment)ATH01p00342    
37745.9875348-Methylthio-n-octylglucosinolateATH01p01786, ATH01p02036    
Peaks of hydroxycinnamoylspermidines tentatively annotated by the motif analysis (16 peaks)
56815.53722Spermidine-trihydroxyferuloylATH01p03349, ATH02p04829, ATH02p05032    
31844.6468Spermidine-p-coumaroyl-feruloylATH01p01080, ATH01p01310, ATH02p03212, ATH02p03430 C27H34N1O6: (+)-pyripyropene G  
38695.877544Spermidine-hydroxyferuloyl-sinapylATH01p02033, ATH02p03657, ATH02p03660, ATH02p03854, ATH02p03856    
60226.232750Spermidine-hydroxyferuloyl-disinapylATH01p03534, ATH02p05042    
37114.557528Spermidine-feruloyl-sinapylATH02p03642, ATH02p03842    
35894.388514Spermidine-feruloyl-hydroxyferuloylATH02p03425, ATH02p03636    
39564.676558Spermidine-disinapylATH01p02018, ATH02p03841    
43413.83600Spermidine-di-p-coumaroyl-caffeoylATH02p04015, ATH02p04214   β-d-Glucopyranoside, (2E)-3-(4-methoxyphenyl)- 2-propenyl 6-O-α-l-arabino pyranosyl-
29054.659438Spermidine-di-p-coumaroylATH01p00843, ATH01p00846, ATH01p01078, ATH01p01081, ATH02p02987, ATH02p02991, ATH02p03211, ATH02p03214 C25H32N3O4: lunarine, C12H24N1O10S3: 4-methylsulfinyl butyl glucosinolate, C26H32N1O5: decaline  
58275.902736Spermidine-dihydroxyferuloyl-sinapylATH01p03353, ATH02p05036, ATH02p05038    
53016.325690Spermidine-diferuloyl-hydroxyferuloylATH02p04640, ATH02p04838    
34544.87498Spermidine-diferuloylATH01p01312, ATH01p01539, ATH02p03432, ATH02p03644    
54935.911706Spermidine-caffeoyl-hydroxyferuloyl-sinapylATH01p03354, ATH02p04834, ATH02p05037    
Figure 3.

 Metabolic profiles of four distinct Arabidopsis tissues.
The log2-transformed values are represented using a heat map. Hierarchical clustering of peaks was performed for the entire metabolic profile dataset (32 columns × 1233 rows).

Annotation of peaks using standard compounds

For annotation of peaks in the matrix, the retention time (min) and mass number (m/z) of commercially available standard compounds in addition to those of the authentic Arabidopsis standards (280 compounds in total) were acquired by the same profiling analysis method (Table S4). For each peak in the matrix, we searched standard compound data for a compound with an identical m/z value (unit mass data) that eluted at a similar retention time (within 0.05 min) (Figure 1b, step 3). Thirty-five matched pairs were obtained, and the annotation information is described under the heading ‘Compound’ in Table S3.

MS2T-based peak annotation

As MS2T data contain information about the retention time and m/z value of the precursor ion (Figure 2), the peaks in the matrix with identical m/z values that eluted at similar retention times (within 0.15 min) could be tagged with MS2T accessions (Figure 1b, step 3). A total of 614 peaks in the matrix were tagged by at least one MS2T. The results are listed in the ‘MS2T’ column in Table S3. The MS2T data tagged to each peak in the matrix were queried in three databases, including KNApSAcK (Oikawa et al., 2006; Shinbo et al., 2006), MassBank (Taguchi et al., 2007) and our in-house database of MS/MS spectral data taken from the literature (Figure 1b, step 3). Putative structural information was obtained for peaks 207, 69 and 41 in the matrix, as described in the ‘KNApSAcK’, ‘MassBank’ and ‘Literature’ columns of the matrix, respectively (Table S3). However, these tentative annotations are likely to include many false positives. Thus, the annotation information was cross-validated among the annotation methods to find plausible annotations. For example, the 825th peak (m/z 220; retention time 2.64 min) in the matrix was annotated as the protonated molecule [M + H]+ of d-pantothenate based on standard compounds and the MS2T data (ATH02p01290, Figure 2), which is essentially identical to the result using the MassBank MS/MS spectrum data (KO003696, pantothenate) with a hit score of 0.950. A total of 15 and eight peaks were identified and tentatively annotated based on the standard compound and MS2T data.

Detection of structurally related metabolites by a spectral motif search

It is well recognized that plants often contain a series of metabolites with similar structures. For example, it is expected that Arabidopsis will produce dozens of flavonols with various glycosylation patterns. The MS/MS spectra of two kaempferol glycosides identified above [ATH01p03327 of the 5879th peak (kaempferol-3-O-rhamnosyl(1,2)-glucoside-7-O-rhamnoside, Figure 4a) and ATH01p02248 of the 4115th peak (kaempferol-3,7-O-dirhamnoside, Figure 4b)] indicated that occurrence of the fragment ion of the kaempferol aglycon moiety (C15H11O6; m/z 287.0556) together with the neutral loss of glucose (C6H10O5; m/z 162.0528) and rhamnose (C6H10O4; m/z 146.0579) is a common spectral ‘motif’ in these MS/MS spectra. These results suggest that the peaks of structurally related metabolites can be extracted from the matrix by identifying MS2Ts containing the same spectral ‘motif.’ Here, the motif of kaempferol glycosides was defined by regular expression of the MS/MS spectral data as follows: frg (C15H11O6) && (nl (C6H10O5) || nl (C6H10O4)).

Figure 4.

 MS/MS spectra of the MS2Ts tagged to (a) the 5879th peak (ATH01p03327, kaempferol-3-O-rhamnosyl(1,2)-glucoside-7-O-rhamnoside), (b) the 4115th peak (ATH01p02248, kaempferol-3,7-O-dirhamnoside), and (c) the 4465th peak (ATH02p0422, kaemferol dihexoside).
The deduced neutral losses of hexose (Δ162.0528 Da) and rhamnose (Δ146.0579 Da) are indicated in the spectra.

The above formula indicates the spectral motif containing the fragment ion of kaempferol aglycon (tetrahydroxy flavone, in the strict sense) [frg (C15H11O6)] with the neutral loss of hexose [nl (C6H10O5)] or deoxyhexose [nl(C6H10O4)]. The formula was queried against MS2T libraries to search for peaks derived from structurally related metabolites using an ‘Nmotifsearch’ program written in Perl/Tk. Consequently, 10 additional peaks of kaempferol (tetrahydroxy flavone) glycosides or their fragment ions were tentatively determined (Table 1). Among them, kaempferol (tetrahydroxy flavone) dihexose (ATH02p04222 of the 4465th peak; Figure 4c) has not been reported previously as an Arabidopsis metabolite. Using this procedure, molecular-related or fragment ions of flavonol and glucosinolate derivatives were assigned to 24 and 26 peaks in total, respectively. Thus a total of 95 peaks derived from 44 metabolites were identified or tentatively annotated by this procedure (Table 1 and Table S5).

Inter-tissue comparison of metabolite profiles in the aerial parts of Arabidopsis

It has been suggested that plants produce various types of phytochemicals in a tissue-specific manner. However, the overall difference in metabolic profiles among the tissues has not been thoroughly investigated. To understand the tissue-specific metabolism in Arabidopsis, the metabolic profiles of the 44 metabolites identified or tentatively deduced by the MS2T annotation method were compared among the tissues (Figure 5). The metabolic profiles in cauline leaves, rosette leaves and stem tissues were similar to each other, except for a significant decrease in the levels of methylthioglucosinolates in the stem (Figure 5). This downregulation can partly be explained by upregulation of the S-oxygenating enzyme gene (At1g65860) that catalyzes the conversion of methylthioglucosinolates to the corresponding methylsulfinylglucosinolates in stem tissue (Hansen et al., 2007) (Figure S2a). In contrast, the profiles in the inflorescence tissues changed drastically due to accumulation of tyramine, quercetin and isorhamnetin glycosides as well as methylsulfinylglucosinolates (Brown et al., 2003). This coincided with the active expression of these biosynthesis-related genes in the flower, such as the OMT1 gene (At5g54160), which has a dual function in methylation of quercetin aglycon to isorhamnetin (Tohge et al., 2007) in addition to lignin biosynthesis. Comparison of the gene expression data of OMT1 with the metabolic profile data revealed that the methylation of quercetin to isorhamnetin in the stem was less than that in the inflorescence tissues, while OMT1 was also highly expressed in stem, probably for active lignin biosynthesis (Figure S2b). These results suggest that flavonol glycosides and lignin are specifically biosynthesized in stem tissues.

Figure 5.

 Inter-tissue comparison of the levels of 44 identified metabolites.
The log2-transformed intensity values are represented using a heat map.

To investigate further tissue-specific secondary metabolism in Arabidopsis, the metabolic profile data shown in Figure 3 were characterized to identify novel Arabidopsis metabolites by interpreting the MS2T data (Figure 1b, step 4). Despite their morphological differences, the metabolite profiles of rosette and cauline leaves are very similar to each other, suggesting that these leaves have similar metabolic functions. However, one peak (peak number 1408, m/z 277) that eluted at 3.23 min was specifically observed in the case of rosette leaf samples (Table 2). The metabolite responsible for this peak was determined to be p-coumaroylagmatine by manual interpretation of MS2T data (ATH01p05697, Figure 6a and Figure S3a), and this was confirmed by data from the literature (von Ropenack et al., 1998). The most remarkable metabolic phenotype was observed in the inflorescence tissues, where there was accumulation of several metabolites (clusters B and C in Figure 3). Of the peaks in cluster B, the intensities of five peaks drastically increased in an inflorescence tissue-specific manner (Table 2). Interpretation of the MS2T data tagged to these peaks revealed that the five metabolites corresponding to these peaks were di- or trihydroxycinnamic acid amides of spermidines such as di-p-coumaroylspermidine (ATH02p02987, Figure 6b and Figure S3b); this was supported by literature data (Bottcher et al., 2008; Youhnovski et al., 2001). A spectral motif search [query text: nl(C3H7N)] revealed that an additional 11 structurally related metabolites accumulated during the inflorescence process (Table 1). Among them, di-sinapoylspermidine has recently been reported as a seed metabolite of Arabidopsis (Bottcher et al., 2008; Meissner et al., 2008). Another inflorescence-specific metabolite (peak number 2156, retention time 3.957, m/z 344) was tentatively identified from cluster C in Figure 3 as sinapoylglutamate by interpretating the MS2T data (ATH01p00314, Figure 6c and Figure S3c). The identification of p-coumaroylagmatine, di-p-coumaroylspermidine and sinapoylglutamate in Arabidopsis tissues suggests that Arabidopsis has many unknown metabolic functions that remain to be uncovered. In addition, it should be noted that the peak annotations given here were obtained by referring MS2Ts without additional MS/MS data acquisition work (Figure 1b).

Table 2.   Deduced annotation, MS2T data and relative peak intensity of the inflorescence tissue-specific metabolites
Peak no.Retention time (min)Mass (m/z)Tentative annotationMS2T code (representative)MS2T data m/z (relative intensity)Relative intensity (internal standard = 1.0)
InflorescenceCauline leafRosette leafStem
14083.23277p-Coumaroylagmatine, putativeATH01p0569791.0494 (37), 114.1023 (14), 119.0486 (48), 147.0452 (100), 218.1225 (6), 260.1430 (8)0.004 ± 0.0020.003 ± 0.0010.065 ± 0.0370.003 ± 0.001
21563.957344Sinapoylglutamate, putativeATH01p0031491.0492 (13), 119.0483 (13), 147.0459 (14), 175.0428 (30), 207.0664 (100)0.045 ± 0.0080.010 ± 0.0050.004 ± 0.0010.006 ± 0.003
29054.66438Di-p-coumaroylspermidine, putativeATH02p0298791.0556 (18), 119.0524 (41), 147.0492(100), 204.1110 (19), 292.2118 (8), 438.2540 (20)0.880 ± 0.2920.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
34534.69498Di-feruloylspermidine, putativeATH02p0342972.0766 (5), 117.0284 (5), 145.0239 (17), 177.0498 (45), 234.1055 (24), 305.1812 (8), 322.2060 (29), 498.2472 (100)0.669 ± 0.3200.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
56815.53722Tri-hydroxyferuroylspermidine, putativeATH02p04827193.0454 (46), 250.0949 (50), 530.2488 (100), 722.2715 (65)0.877 ± 0.5930.014 ± 0.0120.003 ± 0.0002.974 ± 0.623
58275.90736Di-hydroxyferuroyl-sinapoyl spermidine, putativeATH02p05036161.0204 (6), 175.0361 (8), 193.0451 (15), 207.0619 (14), 250.1024 (23), 321.1768 (7), 338.2016 (9), 352.2197 (9), 526.2460 (18), 544.2592 (76), 736.2964 (100)3.695 ± 0.0450.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
60226.23750Hydroxyferuroyl-di-sinapoylspermidine, putativeATH02p05039147.0427 (7), 175.0362 (18), 193.0451 (24), 207.0594 (65), 250.0997 (34), 264.1103 (11), 321.1663 (12), 338.1973 (8), 352.2212 (17), 526.2408 (33), 544.2575 (91), 545.2682 (9), 558.2371 (13), 750.3192 (100)0.393 ± 0.2080.003 ± 0.0010.003 ± 0.0000.003 ± 0.001
Figure 6.

 MS/MS spectra of (a) the 1408th peak (ATH01p05697, p-coumaroylagmatine, putative), (b) the 2905th peak (ATH02p02987, di-p-coumaroylspermidine, putative) and (c) the 2156th peak (ATH01p00314, sinapoylglutamate, putative).
Tentatively deduced structures are also shown.

Metabolic phenotyping of Ds transposon insertion lines

To evaluate the suitability of the MS2T-based method for phytochemical genomics studies, metabolite profiling was conducted using Ds insertional mutants of Arabidopsis that were developed for phenome analysis (Kuromori et al., 2004, 2006). First, we analyzed 2-week-old seedlings of all homozygous mutants with transposon insertions in the coding regions of genes encoding UDP-dependent glycosyltransferase (UGT) or methyltransferase. The metabolic profile data for 73 lines (219 samples by triplicate analysis) was acquired within four working days, and a data matrix containing 1808 rows was obtained. The MS2T libraries created above (ATH01p and ATH02p) could tag MS2T data to 604 rows (33%), and 58 rows were annotated using the above-mentioned annotation data. The low coverage of MS2T tagging was due to the lack of root-specific metabolite data in the MS2T libraries.

A comparison of the metabolic profiles revealed that drastic changes were observed in mutant lines 11-3689-1, 13-3337-1, 13-1020-1 and 11-5836-1 (Figure 7). The functions of the disrupted genes in these lines could easily be ascertained from the changes in metabolites deduced by MS2T-based peak annotation information. For example, the levels of flavonol 7-rhamnoside derivatives were significantly reduced and that of quercetin dihexoside (ATH02p04218, data not shown) was increased in 11-3689-1 and 13-3337-1, suggesting that these mutants lacked the ability to produce 7-O-rhamnosyl flavonols. These lines are two mutant alleles of an identical gene, At1g06000, which has recently been identified as encoding UDP-rhamnose:flavonol-7-O-rhamnosyltransferase (UGT89C1) (Yonekura-Sakakibara et al., 2007). The metabolite phenotype of 13-1020-1, with a decrease in flavonol-3,7-dirhamnoside, could also be explained by the function of its disrupted gene, UGT78D1 (At1g30530, UDP-rhamnose:flavonol-3-O-rhamnosyltransferase) (Jones et al., 2003).

Figure 7.

 Metabolic profiles of Arabidopsis mutant lines with disruption in putative UDP-dependent glycosyltransferase (UGT) or methyltransferase family genes by insertion of the Ds transposon.
Two-week-old seedlings of 60 mutant lines and 13 wild-type (F-Nossen) parent lines were analyzed (219 samples in total, = 3), and the metabolic profile data were processed to obtain a matrix containing 1808 rows. The log2-transformed intensity data are normalized and hierarchically clustered using average linkage methods with the Euclidean distance function. The intensities of 59 peaks in the matrix are represented using a heat map.

In 11-5836-1 (Ds inserted into gene AtUGT84A2, At3g21560), the levels of sinapoylmalate and 1-glucosylsinapate were slightly decreased due to knockout of UGT84A2 that is responsible for the conversion of sinapate to 1-glucosylsinapate (Sinlapadech et al., 2007). In addition, the levels of two metabolites tentatively identified as sinapoylglutamate (ATH01p00314, Figure 6c) and quercetin deoxyhexosyl hexoside (ATH02p04020, data not shown) were increased. This may suggest an inter-connection of those increased metabolites with 1-glucosylsinapate that is affected by disruption of the UGT84A2 gene. A loss-of-metabolite phenotype was found in 15-1724-1 but not the allelic mutant 13-2882-1, indicating that there was no correlation between the phenotype in 15-1724-1 and disruption of the Ds-inserted gene (AtUGT85A7, At1g22340) (Woo et al., 2007). No significant metabolic phenotype was observed in other mutants.


One remarkable technical advance achieved by non-targeted metabolic analyses using LC-MS is that a metabolic event occurring in plants can be elucidated by determining a wide range of secondary metabolites, which will assist in formulation of a working hypothesis for further characterization of plant metabolic functions. Although many peaks in metabolite profile data must be annotated for this purpose, they can rarely be annotated using standard compound information (see ‘Compound’ column in Table S3). This situation can be improved if the metabolite peaks are already tagged with MS/MS spectral data prior to the data-mining process. Recently, several MS/MS spectra-based strategies involving flow-injection MS and Fourier transform MS methods have been reported (Beckmann et al., 2008; Cao et al., 2008; Iijima et al., 2008; Overy et al., 2008; Wrona et al., 2005). The methodology was improved in this study by introducing the concept of MS2T and creating MS2T libraries of many known and unknown metabolites that could be used as a basis for peak annotation of LC-MS metabolome data (Figure 2). One of the most significant technical advances of this MS2T-based strategy was that the MS2T libraries were created prior to metabolic profiling analysis; this was achieved by using optimized methods for acquisition of a large amount of MS/MS data. As the MS2T data for most peaks have already been acquired, the MS/MS data acquisition function can be excluded from routine metabolic profiling analysis, which enables high-throughput acquisition of metabolic profiling data (20 min per sample, Figure 7). Once the MS2T libraries have been created, they can be used for annotating data with similar metabolic profiles. Indeed, the MS2T libraries created in this study were used for the annotation of data from Ds transposon-tagged lines (Figure 7) as well as the inter-tissue comparison (Figures 3 and 5). Furthermore, the MS2T library can be applied for analysis of metabolic profile data acquired by using other LC-MS methods that employ identical or compatible LC conditions. In addition, it is notable that the entire peak annotation process described in this study was completed in a ‘dry’ lab (Figure 1b), without performing any additional ‘wet’ MS/MS analysis.

MS2T-based peak annotation

In this study, metabolic profile data were acquired using LC-Q-TOF/MS (Figure 1b, step 1), and the data matrices were generated using MetAlign (Figure 1b, step 2) (de Vos et al., 2007). Each row (peak) in the matrix was annotated using two sets of metabolite-related information, including the standard compound data and the MS2T libraries, by comparing the m/z and retention time data (Figure 1b, step 3). Consequently, approximately 3% and 50% of the peaks (rows) in the matrix were tentatively annotated and tagged using the standard compound and MS2T data, respectively (Table S3). On the basis of the MS2T data, structural information was assigned by referring to databases of plant metabolites, such as KNApSAcK (Oikawa et al., 2006; Shinbo et al., 2006) and MassBank (Taguchi et al., 2007). However, as the tentative annotation information may contain many false positives, only 2% of the peaks in total were finally annotated despite application of a large amount of data and many databases (Table 1). One of the reasons for this disappointing result is the incomplete MS/MS spectral database of phytochemicals. Interpretation of MS/MS data requires reference spectral data as estimation of the de novo structure from the MS/MS spectrum is often difficult even though high-resolution m/z data are available (Bocker and Rasche, 2008; Werner et al., 2008). Another reason is that there is no existing method to estimate the false-positive ratio in database search results. Because of these technical problems, cross-validation of the annotation data is necessary to obtain plausible annotations; however, many correct annotations are likely to be discarded. This indicates that further development of the informatics basis is required in terms of integration of the MS/MS spectral database of plant secondary metabolites (Baumann et al., 2000; Fredenhagen et al., 2005; Halket et al., 2005; Taguchi et al., 2007; Wishart et al., 2007) and its search algorithm. For this purpose, we are creating a MS/MS spectral database of authentic standards of plant secondary metabolites that are available from MassBank (http://www.massbank.jp/). However, matching of MS/MS spectra poses technical problems because the fragmentation patterns of the MS/MS spectral data depend on the type of mass spectrometer and its operating conditions, especially collision energy (Werner et al., 2008). The cosine product method used in this study, which was originally developed for comparing GC-MS spectra, cannot adequately deal with these problems. To overcome this problem, all the MS/MS spectral data in the MS2T library in this study were obtained using the ‘ramp’ mode, by which fragments detected at various collision energies are combined into one spectrum (Figure 4). In addition, a method termed a ‘spectral motif search’ was developed for searching similar MS/MS spectra from MS2T libraries. Comparison of metabolite structures with these MS/MS spectra allowed us to obtain a ‘spectral motif’, which represents the common structural patterns of neutral losses and fragment ions in a series of metabolites (Figure 4a,b). The spectral motifs are abstract expressions of MS/MS spectra and are partly independent of the nature of the MS/MS spectra, such as the fragment-ion intensities. Although information on neutral loss and fragment ion in MS/MS spectra has been used for metabolite identification, searching MS2T libraries using ‘spectral motifs’ as queries enabled us to identify structurally related metabolites from the metabolic profile data; this technique was then applied for annotation of a series of flavonol glycosides, glucosinolates and hydroxycinnamoylspermidines (Figure 4 and Table 1).

Using these methods, a total of 97 peaks of 48 metabolites in a matrix comprising 1233 rows were identified or tentatively annotated by means of the MS2T method (Table S5). The number of annotatable peaks will increase with further interpretation of the MS2T data, as approximately 600 peaks have already been tagged by MS2Ts. Recently, much effort has been invested in the annotation of metabolites by interpretating MS/MS spectral data. For example, Bottcher et al. (2008) reported the annotation of 75 Arabidopsis seed metabolites by manual interpretation of MS/MS spectra. As MS2T libraries of Arabidopsis shoot metabolites have been created, the published information can be used for further annotation of MS2T library data by performing spectral motif searches. It should be noted that most of the annotation information is tentative or involves putative estimation of the metabolite structure; therefore, co-characterization with authentic standards of secondary metabolites prepared from plant extracts is still necessary for rigorous identification of metabolites (Glauser et al., 2008; Ishihara et al., 2006).

Application of the MS2T-based method for elucidating metabolic events in Arabidopsis

In this study, we demonstrated that the LC-MS profiling technique could elucidate metabolic events in plants to provide a working hypothesis for further characterization of plant metabolic functions by quantitative determination of metabolite levels and MS2T-based peak annotation. The profiling of four distinctive Arabidopsis tissues revealed that the leaves, stems and inflorescence tissues of Arabidopsis have their own unique metabolites (Figure 3); this is probably due to tissue-specific expression of genes responsible for biosynthesis of these metabolites (Schmid et al., 2005). Further, the biosynthesis of two major classes of Arabidopsis secondary metabolites, including flavonoids and glucosinolates, was controlled by the tissue-specific expression of genes responsible for their biosynthesis (Figure 5 and Figure S2). This was also true in the case of tyramine accumulation in inflorescence tissues (Figure 5), which was accompanied by flower tissue-specific expression of a putative tyrosine decarboxylase gene (At4g28680, Figure S2c). Although no role for tyramine or that alkaloid derived from tyramine has been reported in Arabidopsis, the above result suggests that activation of tyramine biosynthesis has a role in the reproductive tissues of Arabidopsis, similar to the reproductive tissue-specific biosynthesis of various tyramine-derived alkaloids in other plant species (Negrel and Martin, 1984; Page, 2005).

In this study, peak annotations by the interpretation of the MS2T data can reveal, at least in part, novel aspects of tissue-specific secondary metabolism in Arabidopsis. For example, a rosette tissue-specific metabolite was putatively concluded to be p-coumaroylagmatine (Figure 6a) (von Ropenack et al., 1998). p-coumaroylagmatine is a precursor for the biosynthesis of hordatines, which play an important role in resistance to fungal attack in barley seedlings (Ishihara et al., 2002; von Ropenack et al., 1998). Although no hordatine-like metabolites have been detected in healthy Arabidopsis tissues (data not shown), this finding suggests that some biotic stress conditions might stimulate the biosynthesis of similar metabolites in Arabidopsis. A BLASTP search (http://www.tair.org/) revealed that HvACT1, which is responsible for the synthesis of p-coumaroylagmatine in barley (Burhenne et al., 2003) (GenBank accession number AB334132) showed the highest homology to AtHCT (At5g48930) of all Arabidopsis genes. AtHCT has already been characterized as an acyltransferase for synthesis of p-coumaroylshikimate in the lignin biosynthesis pathway (Hoffmann et al., 2003, 2004), and is highly expressed in the stem tissue as it is required for xylem formation (Figure S2d). This suggests that other acyltransferase genes might be responsible for the rosette leaf-specific biosynthesis of p-coumaroylagmatine.

Furthermore, several metabolites specific to inflorescence tissues (Figure 3 and Table 2) were estimated to be hydroxycinnamoylspermidines, such as di-p-coumaroylspermidine (Figure 6b). The occurrence of hydroxycinnamoylspermidines in reproductive tissues, e.g. in the pollen of several plant species (Martin-Tanguy et al., 1978; Meurer et al., 1986), and their biological activities (Fixon-Owoo et al., 2003) have been reported; however, their role in the reproductive process has not been investigated genetically or functionally in any plant. Recently, it has been demonstrated that agmatine is the first intermediate of the spermidine biosynthetic pathway from l-arginine in Arabidopsis (Illingworth et al., 2003; Janowitz et al., 2003). These results indicate that tissue-specific synthesis of various types of hydroxycinnamic acid amides from two metabolically related amines in Arabidopsis is probably due to tissue-specific expression of biosynthesis-related acyltransferase genes. These findings facilitate narrowing down of the candidate genes responsible for metabolic functions. For example, evaluation of the expression profiles of 89 genes in the acyltransferase family revealed that several genes showed rosette leaf-specific (e.g. At5g07870, Figure S2e) or pollen-specific (e.g. At4g29440, Figure S2f) expression profiles. This result must be confirmed by the unambiguous identification of metabolites, biochemical characterization of the expressed proteins, and metabolic phenotyping of loss/gain-of-function mutants.

Link to genetic resources of Arabidosis

Non-targeted metabolic profiling analysis will play an important role in functional genomic studies as it enables metabolic phenotyping of mutants to investigate the functions of disrupted genes in planta. Thus, it is believed that high-throughput metabolic phenotyping of a number of mutant lines by non-targeted profiling analysis will reveal novel gene functions without a priori knowledge of disrupted genes. The metabolic phenotyping of Ds insertion mutants of Arabidopsis demonstrated that the MS2T-based metabolome analysis is an effective tool in terms of high-throughput elucidation of metabolic phenotypes. The clear correlation between the metabolic phenotypes and disrupted genes revealed the gene function in planta (Figure 7). As other major changes were not observed in our non-targeted analysis, the functions of these genes were further clarified as specific to those characterized previously.

These results demonstrated that non-targeted metabolic profiling analysis using LC-MS together with the MS2T annotation methods developed in this study could prove to be a useful tool for investigating the novel function of plant secondary metabolites. The developed method is capable of analyzing the metabolic profiles of other plant species, including major crops such as rice and wheat (data not shown), and is also applicable in various fields of metabolomics research. However, a detailed investigation of Arabidopsis to detect functionally and genetically uncharacterized secondary metabolites as a model of other plant species is also important because the various genetic and informatics resources, as well as the ‘omics’ techniques (Hirai et al., 2007; Kuromori et al., 2006; Saito et al., 2008; Tohge et al., 2005; Yonekura-Sakakibara et al., 2007), enable us to perform phytochemical genomics studies to reveal novel functions of plant secondary metabolism.

Availability of source programs

The data and programs produced in this study are freely available on the Platform for Riken Metabolomics (PRIMe) website (http://prime.psc.riken.jp/lcms/).

Experimental procedures


All the chemicals used in this study were purchased from Tokyo Kasei (http://www.tciamerica.com), Sigma-Aldrich (http://www.sigmaaldrich.com/), Wako Pure Chemical (http://wako-chem.co.jp/english/), Nacalai Tesque (http://www.nacalai.co.jp/en/index) and AnalytiCon Discovery GmbH (http://www.ac/discovery.com/english/go.html). Indole-3-ylmethylglucosinolate, 1-methoxyindole-3-ylmethylglucosinolate and 4-methoxyindole-3-ylmethylglucosinolate were prepared as previously described (Ishihara et al., 2006). A total of 29 metabolites derived from Arabidopsis were isolated from whole plants of A. thaliana (Nakabayashi et al., unpublished results).

Plant materials

Seedlings of Arabidopsis thaliana (Col-0) were grown in pots containing soil at 20°C with a 16 h daily photoperiod. Six weeks after germination, the 12th or 13th expanded rosette leaves (rosette leaf), the 1st and 2nd expanded cauline leaves (cauline leaf), the upper part of the inflorescence (inflorescence), and first internode tissues (stem) were collected from eight individual Arabidopsis plants at stage 6.3 (Boyes et al., 2001) and stored at −80°C until use. For metabolic phenotyping of Ds transposon insertion lines (Kuromori et al., 2004, 2006), 60 lines of homozygous seeds were grown on the half-strength MS medium plates at 20°C with a 16 h daily photoperiod. Two weeks after germination, whole tissues of 20 seedlings were collected, weighed, and stored at −80°C.

Non-targeted metabolic profiling analysis using LC-ESI-MS

The frozen tissues were homogenized in five volumes of 80% aqueous methanol containing 0.5 mg l−1 lidocaine and d-camphor sulfonic acid (Tokyo Kasei) using a mixer mill (MM 300, Retsch, http://www.retsch.com) with a zirconia bead for 6 min at 20 Hz. Following centrifugation of 15 000 g for 10 min and filtration (Ultrafree-MC, 0.2 μm pore size; Millipore, http://www.millipore.com the sample extracts (2 μl) were analyzed using an LC-MS system equipped with an electrospray ionization (ESI) interface (HPLC, Waters Acquity UPLC system; MS, Waters Q-Tof Premier, http://www.waters.com). The analytical conditions were as follows. HPLC: column, Acquity bridged ethyl hybrid (BEH) C18 (pore size 1.7 μm, length 2.1 × 100 mm, Waters); solvent system, acetonitrile (0.1% formic acid):water (0.1% formic acid); gradient program, 1 : 99 v/v at 0 min, 1 : 99 v/v at 0.1 min, 99.5 : 0.5 at 15.5 min, 99.5 : 0.5 at 17.0 min, 1 : 99 v/v at 17.1 min and 1 : 99 at 20 min; flow rate, 0.3 ml min−1; temperature, 38°C; MS detection: capillary voltage, +3.0 keV; cone voltage, 22.5 V; source temperature, 120°C; desolvation temperature, 450°C; cone gas flow, 50 l h−1; desolvation gas flow, 800 l h−1; collision energy, 2 V; detection mode, scan (m/z 100–2000; dwell time 0.45 sec; interscan delay 0.05 sec, centroid). The scans were repeated for 19.5 min in a single run. The data were recorded using MassLynx version 4.1 software (Waters).

Data processing and MS2T-based peak annotation

The data matrix was generated from the metabolic profile data using MetAlign software (de Vos et al., 2007) and processed using in-house software written in Perl/Tk (‘N toolbox’, Appendix S3). Detailed methods for processing and interpretation of the MS2T data are described in Appendix S2. The processed data matrix was analyzed using MeV4.0 (TIGR, http://www.tm4.org) Saeed et al., 2003, 2006).

MS2T data acquisition

The sample extracts prepared by the method above (2 μl) were subjected to the same LC-Q-TOF-MS system operated under the same conditions mentioned above, except for the following changes: gradient program, 1 : 99 v/v at 0 min, 1 : 99 v/v at 0.2 min, 99.5 : 0.5 at 31 min, 99.5 : 0.5 at 34.0 min, 1 : 99 v/v at 34.2 min and 1 : 99 at 40 min; flow rate 0.15 ml min−1; survey detection mode for MS detection. In this mode, following acquisition of the MS spectrum (m/z 100–1000; dwell time 0.45 sec, inter-scan delay 0.05 sec), the MS/MS data of the most abundant ions were automatically obtained (m/z 50–1000; dwell time 2.5 sec; inter-scan delay 0.5 sec, data acquisition, centroid mode; collision energy ramped from 5 to 60 V). The mass/charge ratio (m/z) was calibrated using the lock-mass function with leucine enkephalin. The analyses were repeated 25 times by shifting the m/z ranges of the target ion selection window for the MS/MS analysis (m/z 100–160, 130–190, 160–220 … 880–940, 940–1000). The data were converted into ASCII format using DataBridge (Waters). The information in each MS/MS spectrum was formatted to the MS2T libraries using in-house Perl scripts. Low-intensity signals of fewer than 5 counts/sec were discarded in this process. The original retention time values were divided by two to compensate for the difference in peak elution conditions.


We wish to thank Prof. A. Ishihara (Kyoto University, Japan), R. Nakabayashi (Chiba University, Japan) and Prof. H. Takayama (Chiba University, Japan) for providing us with authentic Arabidopsis metabolites. We thank Drs Y. Sawada, A. Oikawa, E, Inoue, K. Akiyama and T. Sakurai for their useful comments on this manuscript and technical support (Riken Plant Science Center, Japan). This study was partly supported by Japan Science and Technology institute for Bioinformatics Research and Development (JST-BIRD) and the Research and Development Program for New Bio-industry Initiatives of Bio-oriented Technology Research Advancement Institution (BRAIN).