Elemental formula annotation of polar and lipophilic metabolites using 13C, 15N and 34S isotope labelling, in combination with high-resolution mass spectrometry

Authors


(fax +49 331 5678236; e-mail giavalisco@mpimp-golm.mpg.de).

Summary

The unbiased and comprehensive analysis of metabolites in any organism presents a major challenge if proper peak annotation and unambiguous assignment of the biological origin of the peaks are required. Here we provide a comprehensive multi-isotope labelling-based strategy using fully labelled 13C, 15N and 34S plant tissues, in combination with a fractionated metabolite extraction protocol. The extraction procedure allows for the simultaneous extraction of polar, semi-polar and hydrophobic metabolites, as well as for the extraction of proteins and starch. After labelling and extraction, the metabolites and lipids were analysed using a high-resolution mass spectrometer providing accurate MS and all-ion fragmentation data, providing an unambiguous readout for every detectable isotope-labelled peak. The isotope labelling assisted peak annotation process employed can be applied in either an automated database-dependent or a database-independent analysis of the plant polar metabolome and lipidome. As a proof of concept, the developed methods and technologies were applied and validated using Arabidopsis thaliana leaf and root extracts. Along with a large repository of assigned elemental compositions, which is provided, we show, using selected examples, the accuracy and reliability of the developed workflow.

Introduction

Mass spectrometry (MS)-based analysis of metabolites has become more and more advanced in recent years (Dettmer et al., 2007; Dunn, 2008; Griffiths and Wang, 2009; Allwood and Goodacre, 2010; Saito and Matsuda, 2010). The development of this field has been driven by the continuous improvements of mass spectrometers, coupled with various chromatographic separation techniques, improvements in open-source and commercial software developments, allowing data extraction and initial processing (Broeckling et al., 2006; Smith et al., 2006; Katajamaa and Oresic, 2007; Lommen, 2009; Benton et al., 2010; Pluskal et al., 2010), and the increasing quality and accessibility of public databases (Williams, 2008; Tohge and Fernie, 2009; Wang et al., 2009). These tools provide the basis for the efficient and reliable extraction of information from high-quality raw MS data. Accordingly, increasing numbers of targeted and untargeted metabolomic studies of plants (Saito and Matsuda, 2010), as well as other organisms (van der Greef et al., 2006; Mashego et al., 2007; Rabinowitz, 2007; Silva et al., 2007; Frickenschmidt et al., 2008), have been published.

In spite of these impressive developments several problems remain unsolved, specifically when dealing with untargeted metabolomic studies (De Vos et al., 2007). The main problems are peak annotation and the discrimination of biological from non-biological compounds (Saito and Matsuda, 2010). The gold standard for a reliable and accurate peak annotation remains an authentic reference compound. However, in many cases, e.g. for plant secondary metabolites, reference compounds are not available (Saito and Matsuda, 2010). NMR-based analysis (Dettmer et al., 2007; Eisenreich and Bacher, 2007; Pan and Raftery, 2007), which in principle is excellently suited for compound annotation, suffers from limited sensitivity. This is especially true if highly complex mixtures are to be analysed, allowing only a small number of compounds to be annotated, irrespective of the large sample quantities used (Nakabayashi et al., 2009). MS/MS-based structure elucidation is more sensitive; however, it requires a trained user and still does not often lead to the final structure (Bottcher et al., 2008; Matsuda et al., 2009b).

A first step towards the annotation of a chemical structure is the unambiguous annotation of metabolite peaks with regard to their elemental composition. We have previously described a 13C isotope labelling-based metabolomic analysis strategy in combination with high-resolution MS (Giavalisco et al., 2008, 2009) for the relative quantification and annotation of metabolites from the model plant Arabidopsis thaliana. Here we describe a significant improvement of this approach based on the parallel analysis of three differentially isotope-labelled, i.e. 13C, 15N and 34S, and one unlabelled plant extract. As described in the results section, the parallel analysis of four differentially isotope-labelled metabolite extracts allows the direct readout of the underlying chemical sum formula from the generated chromatogram, providing a database-like repository of raw files to confirm peaks of interest from cross-bench experiments.

Results and discussion

An improved extraction protocol for metabolomic and proteomic analysis

Detailed metabolic analysis of diverse metabolites, such as polar amino acids or more apolar triacylglycerols, requires specific extraction procedures in combination with specific analytical instrumentation (t’Kindt et al., 2008; Kim and Verpoorte, 2010). Unfortunately, in high-throughput metabolomic studies that use large numbers of samples and attempt to cover as many metabolites as possible, performing more than a single extraction protocol is often not feasible. Therefore, to cover a reasonable metabolic diversity from a single sample, fractionated extraction protocols need to be used (Kim and Verpoorte, 2010). A commonly used fractionated extraction method, which enables the separation and subsequent analysis of polar and semi-polar from hydrophobic metabolites, relies on a methanol : chloroform : water extraction system (Giavalisco et al., 2009). In theory this extraction procedure also allows for the recovery of the total proteins (Weckwerth et al., 2004), but as the protein fraction sediments into the interphase between the heavy chloroform and the lighter methanol : water phases, the recovery of these compounds is often tedious, and is not usually fully quantitative. In consequence we therefore decided to modify the previously employed fractionated extraction procedure to provide a method that allows the full recovery of metabolites, lipids, starch and proteins.

Methyl-tert-butyl-ether (MTBE) was recently described in a mammalian lipidomic study as an organic solvent that can substitute for chloroform in lipid extraction protocols (Matyash et al., 2008). This solvent, which has a lower density compared with chloroform, therefore seems to be well suited for an optimized fractionated extraction system to extract all the compounds of interest in distinct, easily separable fractions. As illustrated in Figure 1, the MTBE : methanol : water-based extraction protocol provides a phase separation between the now upper organic phase (containing the lipids) and the now lower aqueous phase (containing the polar and semi-polar metabolites). Because of the achieved phase inversion, as compared with the methanol : chloroform system, it is now possible to recover a compact, solid whole protein/starch pellet in the bottom of the vial. This pellet can be easily and reproducibly collected, and also provides a source of non-degraded proteins. To visualize the protein quality and its suitability for proteomic studies, we loaded 100 μg of the extracted protein on a 2D SDS page gel (Figure S1).

Figure 1.

 UPLC/GC-MS workflow for the metabolomic and proteomic analysis of the different fractions of the methyl-tert-butyl-ether (MTBE) extraction procedure.

High-resolution UPLC-FT-MS and all-ion fragmentation for the analysis of secondary metabolites and lipids

The main focus of our study was on the analysis and annotation of metabolites and lipids using high-resolution Fourier transform mass spectrometry (FT-MS), a technique that until recently has been restricted to specialized labs because of the high costs and technical complexity of the instrument (Dettmer et al., 2007). Recent advancements in MS technology development have made these machines easier to handle and more cost effective. Because of this development, we were able to adapt and improve a previously described ultra-performance liquid chromatography-FT-ion cyclotron resonance MS-based method (Giavalisco et al., 2008, 2009), for a standalone high-resolution Orbitrap mass spectrometer (Lu et al., 2010) coupled to an ultra-performance liquid chromatography (UPLC) system. Along with matching all the demands for high-resolution metabolomic analysis, namely fast scanning (up to 10 Hz), high resolution (up to 100 000) and mass accuracy (<2 ppm), this new instrument is comparable with the costs of common low-resolution gas chromatography-MS (Lisec et al., 2006) or low-resolution triple quadrupole-MS systems (Arrivault et al., 2009), making high-resolution MS affordable for many laboratories.

As a result of its higher scanning speed and improved electronics, another significant difference between the Orbitrap MS and the instrument used in our previous FT-ICR-MS-based method is that the new standalone Orbitrap MS allows us to acquire high-resolution MS as well as all-ion fragmentation spectra from the same measurement. The basic principle of this measurement, which has been previously used with quadrupole time-of-flight (QTOF) MS (Rainville et al., 2007), is to alternate between low-energy (full-scan MS) and high-energy (full-scan all-ion fragmentation) scans (Figure S2 and see below), thereby enabling the extraction of parent and fragmentation data from a single chromatographic run.

As well as using a different mass spectrometer, compared with our initial procedure, the workflow presented here was expanded further by including the analysis of lipids into the overall analysis (Figure 1 and see below). This became possible mainly because of the improved extraction protocol and the efficient adaption of our UPLC-FT-MS system to a reverse-phase C8-chromatographic separation system (Rainville et al., 2007), working with an acetonitrile : isopropanol : water mobile phase system (for details see Experimental procedures). The development of this highly reproducible chromatographic system allows for the separation of several thousand peaks within a 20-min gradient (Figure 2a and see below).

Figure 2.

 Lipidomic analysis of the methyl-tert-butyl-ether (MTBE) fraction from the fractionated extraction protocol.
(a) Total ion chromatogram of the positive ion mode spectrum.
(b) Extracted ion chromatogram of phosphocholine 34 : 2 from the total ion chromatogram.
(c) Extracted ion chromatogram of all phosphocholine-containing peaks from the total ion chromatogram.

Multiple isotope labelling enables accurate metabolite annotation

The annotation of metabolites in untargeted metabolomic analysis (De Vos et al., 2007) can be extremely tedious, and often produces ambiguous or even no results (Matsuda et al., 2009b; Saito and Matsuda, 2010). As we and others have shown previously (Hegeman et al., 2007; Giavalisco et al., 2008, 2009), isotope-labelling approaches in combination with complex database searches allow not only for the bona fide distinction of biological from contaminating background peaks, but also increase the reliability of the elemental composition annotation by decreasing the false discovery rate (Hegeman et al., 2007; Matsuda et al., 2009a).

In order to further reduce ambiguities for the elemental formula annotation, we increased the stringency of our method by including, along with the 13C labelling, 15N and 34S metabolically labelled metabolomes into our analysis. The labelling efficiencies for these isotopes was, as we have shown in the previous studies (Huege et al., 2007; Giavalisco et al., 2008), on average greater than 90% (data not shown).

Isotope-labelled compounds have, except for their molecular mass, identical physico-chemical properties, leading to almost identical chromatographic behaviour. Because of the mass difference of the monoistopic peaks of the different isotopically labelled co-eluting compounds, the absolute number of the labelled elements of a detected molecule can be directly deduced. This principle, and its use for the annotation of elemental compositions, is illustrated in Figure 3(a) for a semi-polar compound extracted from Arabidopsis leaf samples. As can be seen in all four extracts a major peak with a retention time of 4.17 ± 0.02 min can be observed. Zooming into the mass spectra of these peaks shows that they are reproducibly shifted according to the number of carbon (+12 = C12), nitrogen (+1 = N1) or sulphur (+6 = S3) atoms, providing important information for the annotation of elemental formula (Figure 3a). The missing elements, namely the number of hydrogen, phosphate and/or oxygen atoms can now, after having fixed the number of the labelled elements, easily be deduced from the accurate 12C monoisotopic mass of the measured compound by using a de novo elemental formula calculation (Kind and Fiehn, 2006).

Figure 3.

 Systematic overview showing the manual isotope labelling-assisted elemental composition annotation of glucoerucin.
(a) Extracted ion chromatograms of the different isotope masses of glucoerucin are indicated to the left. Whereas the spectra on the right show the corresponding full-scan mass spectra. The arrows within the mass spectrum indicate the isotope labelling-derived mass shift.
(b) Flow chart showing the stringency in elemental formula annotation in large databases (ChemSpider) or for the de novo calculation.

In Figure 3(b) the theoretical outcome of the de novo and the database-based elemental composition annotation is illustrated, displaying the major differences if the numbers of carbon, nitrogen and sulphur atoms are unknown or if they are fixed. Accordingly, the de novo elemental composition calculation from the measured 12C m/z value of 420.0622, using a mass error tolerance of up to 10 ppm, results, if the different elements are not fixed on the isotope labelling-deduced values, in 357 possible elemental compositions (Figure 3b). Restricting the de novo calculation to 12 carbon atoms reduces the list to 11 possible elemental compositions. Performing the de novo calculation, fixing all isotope-labelled elements (C, N and S), finally leads to the extraction of a single elemental composition from the measured mass (Figure 3b).

Pick a peak: making use of the all-ion fragmentation in combination with isotope labelling to annotate differential peaks

Metabolomic studies can generally follow two routes: one is to annotate and catalogue as many compounds as possible, which is already a major challenge (Last et al., 2007), whereas the second strategy is focused on the sole annotation of peaks that have been shown to be statistically significantly different between different conditions. A major aim of our method was to develop a technological platform that allows one to measure as many metabolites as possible while providing a methodology to unambiguously annotate the elemental composition of most, if not all of the peaks of interest from the sample. Again, a peak of interest is usually a compound that has significantly different levels between two conditions, and therefore might be causal for the differential phenotype or properties.

To illustrate the general procedure for the annotation of a peak using the accurate masses of the isotope-labelled MS spectra along with selected masses from the high-energy all-ion fragmentation spectra, we decided to select a peak that is significantly different between root and leaf tissue. Not unexpectedly there were several hundred significantly differential peaks in the spectra. Therefore, we picked a prominent, abundant leaf peak, with a retention time of 15.79 min, which is more than 1500-fold more abundant in the leaf than the root tissue. This compound has an accurate 12C m/z of 593.27759 (Figure 4a) and was identified in all isotopically-labelled leaf and root samples. Searching the accurate 12C mass by itself against the comprehensive ChemSpider database (Williams, 2008) resulted in 42 different hits, of which four elemental compositions matched the carbon shift of +35. But when we included the deduced nitrogen number of four and the absence of a detectable sulphur shift into the formula, we ended up with a single candidate, namely C35H36N4O5 (Figure 4b). To further confirm a possible structure from this hit we used the all-ion fragmentation data. Interestingly, these spectra provide us with two abundant fragments at m/z of 533.25720 and 459.21964, respectively, which could be annotated by the observed isotope shift patterns as C33H32N4O3 and C30H26N4O (Figure 4c). These two fragments, which were derived from the loss of a C2H4O2 and a C5H10O5 group from the parent molecule, matched perfectly with the structure of pheophorbide A (Figure 4d), which is a commonly described intermediate in the chlorophyll breakdown pathway, explaining why this peak was much higher in leaves than in roots (Schelbert et al., 2009; Nagane et al., 2010).

Figure 4.

 Mass spectrometry (MS) and all-ion fragmentation-based elemental composition annotation of pheophorbide A.
(a) Extracted ion chromatogram of the leaf (lower chromatogram) and root (upper chromatogram), indicating two differential peaks.
(b) Differential isotope-labelled mass spectra (MS) of the leaf-specific peak at retention time 15.80 min. The arrows within the mass spectrum indicate the isotope labelling-derived mass shift.
(c) Differential isotope-labelled mass spectra (all-ion fragmentation spectra) of the leaf-specific peak at retention time 15.80 min. The arrows within the mass spectrum indicate the isotope-labelling-derived mass shift.
(d) Chemical structure of pheophorbide A matching the MS and all-ion fragmentation-derived elemental compositions. Red lines on the structure indicate the fragmentation sides.

Automated peak annotation strategy for lipids and secondary metabolites using multiple isotope-labelled metabolomes

Having demonstrated that the data quality of the acquired spectra is high, and the concept of peak association between the different isotope-labelled samples is functional, we decided to develop an automatic, database-dependent strategy for the annotation of elemental compositions of all the peaks in the recorded chromatograms (Figure 5).

Figure 5.

 Flow chart indicating the database-assisted elemental composition assignment for isotope-labelled UPLC-FT-MS chromatograph.

The strategy of this approach is based on the independent extraction and alignment of the peaks from the four different isotope-labelled samples (12C, unlabelled, 13C, 15N and 34S), providing four independent data matrices containing the masses, retention times and intensity values of each measured peak (Figure 5). In a subsequent step, the masses from the aligned matrices (each isotope-labelled sample is used separately) were used to perform independent database searches against four databases (12C, unlabelled, 13C, 15N and 34S). Each of these four databases contains the exact masses of each compound calculated from the accurate mass of the monoisotopic elemental mass (for the unlabelled samples), whereas the masses of compounds within the other three databases (isotope-labelled databases) are calculated by using the masses of the stable isotope used for the labelling experiment (13C, 15N and 34S). The result of these four independent database searches leads again to four matrices containing the measured accurate mass and retention time of each measured peak connected to one or several matching elemental compositions. These four matrices can now, in a final step, be merged by matching the identical elemental compositions between the different isotope-labelled samples and their corresponding retention times (Figure 5 and Experimental procedures).

Depending on the size and the biological relevance of the databases used for these searches [we used KEGG and KNApSAcK for the polar fraction, and TargetLipids (Table S1) for the organic fraction], we were able to match 4908 polar and semi-polar elemental compositions (Table S2), whereas 2392 lipophilic elemental compositions were assigned (see next section).

Annotating peaks from the organic fraction: a lipidomic analysis

The MTBE-based extraction protocol enabled us to efficiently separate the polar from the hydrophobic metabolites (Figure 1). As can be seen from Figure 2(a), the UPLC-FT-MS spectra of the organic fraction provide rich elution profiles, especially in the positive ionization mode, where more than 30 000 extractable chromatographic peaks could be resolved. The average peak width is ∼15 s (Figure 2b), and is thus comparable with the peak dimensions achieved for the previously established reverse-phase chromatography of the polar fractions (Giavalisco et al., 2009).

Results obtained using the in-house assembled database (TargetLipids; Table S1) led to the isotope-assisted assignment of more than 2000 peaks to an elemental composition. In a further step, these annotated peaks were filtered and manually curated to provide a fully validated dataset (Table S2). The manual curation of the data relied, next to the removal of unexpected adducts (e.g. triacylglycerols do not ionize as [M + H]+ ions), on the highly systematic physico-chemical behaviour of the fatty acid-containing lipids within their different lipid classes (Fahy et al., 2009).

The main differences between lipid species within a certain class are often explained by the degrees of saturation and the acyl chain lengths of the fatty acids. Accordingly, longer fatty acid chains lead to a higher mass and increased retention time, whereas fatty acids with higher degrees of non-saturation result in lipids with lower masses and decreased retention times. This systematic m/z–retention time behaviour could now be employed to visualize and validate the lipids within their classes by simply plotting them in a scatter plot with m/z on the x-axis and retention time on the y-axis (Figure 6). Wrongly annotated lipids can be easily detected as dots outside the systematic scatter pattern, whereas possibly missing lipid species can be anticipated, as they appear as missing dots within a systematic series (Figures 6 and S3).

Figure 6.

 Graphical validation strategy for the class-specific analysis of lipid UPLC-MS spectra using simple two-dimensional scatter plots. An example of 41 [M + NH4]+ triacylglycerol peaks, measured in this study, are displayed. The x-axis represents the mass of the compound (m/z), whereas the retention time (min) is plotted on the y-axis. The two arrows in the scatter plot indicate the expected shifts in time and mass if double bonds (H2) or the C2H4 group are introduced or removed. Each diagonal line within the scatter plot is a series of lipids with the same number of carbons in the fatty acid chains.

As the complexity of our data is quite high and the main aim of this study was to demonstrate the suitability of the developed methods for the annotation of small molecules, we decided to use a small, but well-annotated database (Table S1). Employing the automated annotation strategy (Figure 5) and the stringent filtering of unexpected adducts, we assigned 439 of the peaks to a distinct elemental composition and a unique lipid name (Table S1). These 439 peaks were associated with 259 different, unique lipid species, which we grouped into 20 distinct lipid classes (Figure 7).

Figure 7.

 Histogram showing the qualitative and quantitative distribution of annotated lipid classes from the analysed lipid spectra (Table S3). Cer, ceramide; DAG, diacylglycerol; DGDG, digalactosyldiacylglycerol; GlcCer, glucosylceramide; GIPC, glycosylinositolphosphoceramide, lysoPC, lysoPhosphatidylcholine; lysoPE, lysoPhosphatidylethanolamine; MGDG, monogalactosyldiacylglycerol; PA, phosphatidic acid; PC, phosphatidylcholine; PE, phosphatidylethanolamine; PG, phosphatidylglycerol; PI, phosphatidylinositol; PS, phosphatidylserine; SQDG, sulfoquinovosyldiacylglycerol; TAG, triacylglycerol.

Looking in more detail at this data indicated that we do indeed qualitatively cover the majority of the phospho- and galactolipid species [phosphatidic acids (PAs), phosphatidyl cholines (PCs), phosphatidyl ethanolamines (PEs), phosphatidyl glycerols (PGs), phosphatidyl inositols (PIs), phosphatidyl serines (PSs), monogalactosyldiacylglycerols (MGDGs), digalactosyldiacylglycerols (DGDGs) and sulfoquinovosyldiacylglycerols (SQDGs)], which have been analyzed and quantified in a number of studies using either direct infusion-based TripleQuad-MS (Devaiah et al., 2006) or thin-layer chromatography (Dormann and Benning, 2002).

As well as these highly abundant polar lipids we also detected 33 different sphingolipids (Figure 7; Table S3), including ceramides (Cers), glucosylceramides (GlcCers) and glucoinositolphosphoryl ceramides (GIPCs) species. This was a bit surprising, because specific and sophisticated extraction protocols have usually been employed for the analysis of these lipid species (Sperling and Heinz, 2003; Markham and Jaworski, 2007). The authenticity of these lipids was confirmed by analysing either the positive-mode all-ion fragmentation spectra for the presence of the long chain base or by checking the identity of the fatty acid fragments in the negative ionization mode all-ion fragmentation spectra (data not shown).

Another unusual class of lipids that has been shown to be associated with wounding responses and jasmonic acid synthesis (Buseman et al., 2006; Bach et al., 2008; Glauser et al., 2008), namely the oxylipins, could be frequently (18 different species) detected in our samples (Table S3). Interestingly, these lipids do have one or two unusual fatty acids (Buseman et al., 2006; Glauser et al., 2008), which can be used as reporter ions to validate the obtained identity of the class members by checking the all-ion fragmentation spectra for the presence or absence of these characteristic fragment peaks. In Figure S4 we exemplified the procedure for the direct annotation of a number of representative oxophytodienoic acid (OPDA)-containing lipids by extracting the mass of OPDA from the chromatogram.

The aforementioned strategy is of course not restricted to OPDA-containing lipids. Looking at the all-ion fragmentation spectra of the different lipid classes indicates that most of them contain one or several reporter ions that can help to extract the group members from the total ion chromatogram. Another representative example is given for the PCs (Figure 2c).

As well as these previously described lipid species we could additionally detect and confirm, by a proper isotope shift and the systematic class-specific behaviour, a large number of highly apolar, late-eluting, di- and triacylglycerols, which are commonly believed to serve as seed storage or signalling lipids (Fraser et al., 2000; Arisz et al., 2009). These lipids, which we have localized to the cytosol, based on non-aqueous fractionation (NAF) data using mature A. thaliana leaf extracts (Krueger et al., 2011), have thus far not been described for mature leaf or root tissue extracts (Figure 6; Table S3).

Secondary metabolite analysis

Isotope labelling-assisted elemental composition annotation of the positive- and negative-mode spectra of the polar fraction of the fractionated extract resulted in the collection of 4908 peaks with a clear and reproducible mass shift (Table S2). These were assigned elemental compositions derived solely from the biological databases KEGG (Kanehisa et al., 2007) and KNApSAcK (Shinbo et al., 2006). This number could easily be increased by referring to more comprehensive, but less biologically oriented databases such as PubChem (Wang et al., 2009) or ChemSpider (Williams, 2008).

These 4908 identified peaks do not necessarily mean that we have identified this many differential compounds, as many of the listed formulae are covered by two or more differential ions, originating primarily from different ionization adducts. Once these redundant entries were removed, the list still contained 4118 elemental compositions with unique m/z–retention time properties. However, this highly complex list still contains redundant elemental compositions with different retention times (essentially different molecules). For example, we could detect the elemental composition C8H8O2, matching methylbenzonate, 39 times in our results table (Table S2). Other abundant formulae, which appeared more than 30 times in our chromatograms, included C6H10O5 (anhydrohexose), C9H8O3 (coumaric acid) and C6H12O6 hexose, indicating that although we are extracting the data from low-energy MS spectra, a reasonable level of fragmentation can still be observed, leading to the frequent appearance of these common substructures. One way to use these highly redundant formulae would be to employ them for the formula or even structure validation. For example, we found the elemental composition C10H10O6, a common fragment of kaempferol-containing flavonoids, six times in the results list (Table S2, and see below).

Filtering of the whole list of 4908 hits for unique elemental compositions led us to the assignment of 1672 formulae, whereas filtering for formulas that have previously been associated with the model organism A. thaliana (using the taxonomy entry from the KNApSAcK database), led to a total of 1203 (25%) unique elemental compositions. Interestingly, the total number of unique elemental compositions of these 1203 A. thaliana associated peaks reached only 211, indicating that there is still a huge discrepancy between the 1672 (total number of unique elemental compositions) and 211 Arabidopsis-associated unique elemental compositions. Therefore, we can conclude that we are already covering a large number of compounds, of which the vast majority still awaits a proper annotation for the sequenced model plant.

Looking in detail at these 211 unique elemental compositions indicates that 114 of them contain one or several sulphur atoms, whereas 399 contained one or more nitrogen atoms (Table S2). These compounds could be interesting for a scientist working in the field of sulphur and nitrogen homeostasis.

Finally, to illustrate these numbers and the underlying data in more detail, and to further demonstrate the accuracy, stringency and convenience of our annotation strategy, we plotted the isotope-shifted mass spectra of 100 of the 211 annotated formulae, which could be associated with the A. thaliana metabolome (Figure S5).

Revalidation of selected secondary metabolites

To further validate the data from the results table (Table S2), we tried to follow a similar strategy as the one employed for the lipid analysis (Figure S3). Unfortunately, this was not straightforward, as even though many secondary metabolites are grouped into classes (D’Auria and Gershenzon, 2005), only a few can be validated on their class-specific and systematic retention time–m/z dependencies. One example can be illustrated for the highly abundant methionine-derived aliphatic glucosinolates (Brown et al., 2003). Here, the iterative addition of CH2 groups, catalysed by a methylthioalkylmalate synthase (Textor et al., 2007), leads to systematically increased retention times (Figure S6). As can be seen from this example, the differential isotope labels not only help to identify the elemental composition of the compound, but also provide, even in the absence of authentic standards, evidence for the structure of the analysed compounds.

An additional way to validate compounds from the results table could be, as we have mentioned in the previous section, to select frequently occurring ions (likely to be fragments of several different precursor molecules) and extract them from the MS and also the all-ion fragmentation spectra. A very straightforward example can be illustrated for the previously mentioned quercetin- or kaempferol-containing flavonoids. Based on the fact that quercetin- and kaempferol-based flavonoids are decorated with different numbers and kinds of sugars (Yonekura-Sakakibara et al., 2008), it is possible to extract the exact m/z value of 287.05501 for the kaempferol- and the m/z value of 303.04993 for the quercetin-containg peaks from the all-ion fragmentation spectra. This directly leads to the distribution of these flavonoids within the chromatogram. If the retention time information and the isotope labelling-derived proper mass-shift data are combined, the separated flavonoids can be unambiguously annotated. In Figure S7 we show how the all-ion fragmentation spectra helped detect and annotate the six most abundant quercetin- and kaempferol-containing flavonoids. As we have previously shown for the lipid data, reporter ions like sulphate, lost from glucosinolates, can also be used to further find and identify the parent molecule.

Conclusion

Multi-omics technologies and holistic, instead of reductionist, cellular analysis facilitate the detection and interpretation of multiple cellular phenotypes (Palsson, 2009). The fractionated extraction method presented here, which was used in combination with high-resolution MS and all-ion fragmentation analysis of multiple isotope-labelled samples, provides the bases for unambiguous qualitative and quantitative metabolomics. The data quality and annotation capability of our system is best represented by the fact that the spectra derived from the four different isotope-labelling regimes are stringent enough to restrict the number of possible elemental compositions to one, even if highly complex databases or unrestricted de novo calculations are used for the formula calculations (Kind and Fiehn, 2006). Even though no fully automated strategy is yet available, the method in its current form allows, in most cases, the accurate annotation of a distinct elemental composition of each biological peak present in a sample.

In the long run we will try to further improve the method and increase the number of identifiable compounds by including hydrophilic interaction liquid chromatography (HILIC) (Spagou et al., 2010) or high-resolution reversed phase ion pairing liquid chromatography (Lu et al., 2010) on the polar fraction of our isotope-labelled metabolite extracts, which will hopefully extend the coverage of our methodology.

Experimental procedures

Plant growth

Unlabelled and 13C isotope-labelled plants: A. thaliana Col-0 plants used for the metabolite extraction were either grown under 12CO2 or 13CO2 using a BioBox growth chamber (GMS Gaswechsel-Messsysteme GmbH, http://www.gms-biobox.de). The plant material preparation and the experimental settings for the BioBox were as described by Huege et al. (2007). Plant growth in the BioBox was performed for 42 days with approximately 33–35 days required for full initial labelling. The aerial parts of the plants were separated from the roots by cutting and immediate freezing in liquid nitrogen.

15N-labelled plants: A. thaliana Col-0 plants were grown in a 12-h day/12-h night cycle with a light flux of 150 μmol photons m−2 s−1, 75/85% humidity day/night and 20°/18°C day/night. For germination the seeds were pre-incubated for 5 min on 0.1% agar containing 3.5 ng ml−1 gibberellic acid 4 and 7 solution (Duchefa, http://www.duchefa.com). A sowing pan was filled with Rockwool covered by a perforated black plastic foil, which was wetted with an Arabidopsis nutrient solution modified from Loque et al. [1 mm15NH415NO3 (98%15N) or 1 mm14NH414NO3 (Sigma-Aldrich, http://www.sigmaaldrich.com), 250 μm CaCl2, 100 μm Na-EDTA, 1 mm KH2PO4, 1 mm MgSO4, 100 μm H3BO3, 1.5 μm CuSO4, 50 μm KCl, 10 μm MnSO4, 0.1 μm Na2MoO4, 100 μm Na2O3Si and 2 μm ZnSo4].The pan was mounted on an empty 5-L box. The seeds were sown into the holes of the plastic foil covering the Rockwool and covered with wrapping foil (for at least 1 week) until they developed fully expanded cotyledons. To slowly reduce the humidity, holes were punched in the wrapping foil before removing it completely after 3 weeks. The 5-L box was filled with the Arabidopsis nutrient solution 1 week after sowing, and was then exchanged weekly until 1 week before harvest, where medium exchange was performed every second day. The plants were harvested after 5 weeks.

34S-labelled plants: A. thaliana seeds were sterilized by incubation in 70% ethanol for 5 min, followed by 15 min of incubation in a solution containing 20% sodium hypochloride and 0.01% Triton X-100. Subsequently, seeds were washed three times with sterile water, suspended in 0.15% agarose and transferred onto agar plates containing modified half-strength MS medium solidified with 1% agar. All sulphur-containing micronutrients were substituted with the corresponding chloride salt. 32S was supplied as MgSO4 to a concentration of 250 μm in 32S medium. A similar concentration of sulphate was achieved in 34S medium by adding the respective volumes of 34S solution (50 mm34SO4, 2.7 m NO3). NO3 concentrations in 34S and 32S media were adjusted to 13.5 mm.

Seed-loaded plates were stratified for three nights at 4°C, and were then transferred to a growth chamber and grown under a photon flux density of 40 μmol m−2 s−1 at 22°C with 16 h of light/8 h of dark. After 14 days leaf and root material was harvested.

Fractionated metabolite and protein extraction

Approximately 100 mg of the frozen plant tissue (roots or leaf) were homogenized in 2-ml Eppendorf tubes (Eppendorf, http://www.eppendorf.com) twice for 1 min at maximum speed within a Retschmill (MM 301; Retsch, http://www.retsch.com). The metabolites were extracted from each aliquot in 1 ml of a homogenous mixture of −20°C methanol : methyl-tert-butyl-ether : water (1 : 3 : 1), with shaking for 30 min at 4°C (Thermo Stat Plus; Eppendorf), followed by another 10 min of incubation in an ice cooled ultrasonication bath. After adding 650 μl of UPLC-grade methanol : water 1 : 3, the homogenate was vortexed and spun for 5 min at 4°C in a table-top centrifuge (Eppendorf). The addition of methanol : water led to a phase separation, providing the upper organic phase, containing the lipids, a lower aqueous phase, containing the polar and semipolar metabolites, and a pellet of starch and proteins at the bottom of Eppendorf tube. The separate phases were isolated and dried down in a speed vac (Centrivac; Heraeus, http://www.heraeus.com) and stored at −80°C until use in the different metabolomic or lipidomic analyses. The protein pellet was washed twice with methanol before it was stored until further use at −80°C.

UPLC-FT-MS measurement of lipids and semipolar metabolites

UPLC separation of the semipolar fraction of the fractionated metabolite extract was performed using a Waters Acquity UPLC system (Waters, http://www.waters.com), using an HSS T3 C18 reversed-phase column (100 mm × 2.1 mm × 1.8 μm particles; Waters). The mobile phases were 0.1% formic acid in H2O (Buffer A, ULC MS grade; BioSolve, http://www.biosolve-chemicals.com) and 0.1% formic acid in acetonitrile (Buffer B, ULC MS grade; BioSolve). A 2-μl sample (the dried-down aqueous fraction was re-suspended in 100 μl of UPLC grade water) was loaded per injection, and the gradient, which was taken out with a flow rate of 400 μl min−1, was: 1 min 99% A, 13-min linear gradient from 99% A to 65% A, 14.5-min linear gradient from 65% A to 30% A, 15.5-min linear gradient from 30% A to 1% A, hold 1% A until 17 min, 17.5-min linear gradient from 1% A to 99% A, and re-equilibrate the column for 2.5 min (20-min total run time).

The lipid fraction of the fractionated metabolite extract was performed on the same UPLC system using a C8 reversed-phase column (100 mm × 2.1 mm × 1.7 μm particles; Waters). The mobile phases were water (UPLC MS grade; BioSolve) with 1% 1 m NH4Ac, 0.1% acetic acid (Buffer A,) and acetonitrile : isopropanol (7 : 3, UPLC grade; BioSolve) containing 1% 1 m NH4Ac, 0.1% acetic acid (Buffer B). A 2-μl sample (the dried-down organic fraction was re-suspended in 500 μl of UPLC-grade acetonitrile : isopropanol 7 : 3) was loaded per injection, and the gradient, which was taken out with a flow rate of 400 μl min−1, was: 1 min 45% A, 3-min linear gradient from 45% A to 35% A, 8-min linear gradient from 25% A to 11% A, 3-min linear gradient from 11% A to 1% A. After washing the column for 3 min with 1% A, the buffer is set back to 45% A and the column is re-equilibrated for 4 min (22-min total run time).

The mass spectra were acquired using an Exactive mass spectrometer (Thermo-Fisher, http://www.thermofisher.com). The spectra were recorded alternating between full-scan and all-ion fragmentation-scan modes, covering a mass range from 100 to 1500 m/z. The resolution was set to 10 000, with 10 scans per second, restricting the loading time to 100 ms. The capillary voltage was set to 3 kV with a sheath gas flow value of 60 and an auxiliary gas flow of 35 (values are in arbitrary units). The capillary temperature was set to 150°C, whereas the drying gas in the heated electrospray source was set to 350°C. The skimmer voltage was set to 25 V, whereas the tube lens was set to a value of 130 V. The spectra were recorded from 1 min to 17 min of the UPLC gradients.

Data analysis

Chromatograms from the UPLC-FT MS runs were analysed and processed with refiner ms® 5.3 (GeneData, http://www.genedata.com). Molecular masses, retention time and associated peak intensities for the replicates of each sample group (12C, 13C, 15N and 34S samples) were extracted from the .raw files. The processing of the MS data included the removal of the fragmentation information from the .raw files. Additionally, chemical noise (constantly eluting non-biological compounds derived from the column or the employed UPLC solvents) was automatically removed. The chromatogram alignments were performed using a pairwise alignment-based tree using m/z windows of 5 points and RT windows of five scans within a sliding frame of 200 scans. Further peak filtering on the aligned data matrices was performed in excel® (Microsoft, http://www.microsoft.com), removing peaks with intensities below 10 000 counts. Additionally, a consistency check was performed, removing all peaks that did not show up in all of the replicates of a measurement.

These resulting peak and retention time lists were searched against different databases (ChemSpider, Kegg, KNApSAcK and TargetLipids) using an in-house developed database search tool [Golm Biochemical Space (gobiospace); Hummel et al., 2011, in press). For all the employed databases we calculated the accurate 12C, 13C, 15N and 34S masses. We searched the databases with the corrected masses derived from the measured peak list. Corrected means that we introduced an accurate mass correction according to the expected ionization mode, and therefore expected ionization adducts (positive ion mode, [M + H]+, [M + NH4]+, [M + Na]+; negative ion mode, [M−H]+, [M + formic acid−H]). The search criteria can be restricted to a mass error of between 0.1 and 100 ppm (we used 7 ppm), and the collected results should only contain chemical formulae with the elements C, H, N, O, P or S. The 12C, 13C, 15N and 34S data sets were analysed individually, and the result files, including the database annotations of each mass, containing chemical formula, retention time, m/z value, compound ID, and possible substance names and synonyms, were exported as text files. The content of these files was then sorted and filtered either directly in the database search tool or using access (Microsoft). The sorting of the data included the matching of 12C, 13C, 15N and 34S elemental compositions within the same ionization mode and the retention time alignment of matched chemical formulae. All other spectra manipulations and peak extractions were performed using xcalibur 2.1 (Thermo-Fisher).

Acknowledgements

Gudrun Wolter is kindly acknowledged for her help with plant preparation and MS measurements. Jean-Christope Avice (University Caen, Basse-Normandie, France) is kindly acknowledged for providing us with the 34S substrate. Furthermore, we would like to thank Antony Williams and Dr Shigehiko Kanaya for providing the ChemSpider and the KNApSAcK databases. Dr Leonard Krall is kindly acknowledged for his proofreading of the manuscript. The Max Planck Society is acknowledged for financial support.

Ancillary