We conducted a comparative analysis of the root metabolome of six parental maize inbred lines and their 14 corresponding hybrids showing fresh weight heterosis. We demonstrated that the metabolic profiles not only exhibit distinct features for each hybrid line compared with its parental lines, but also separate reciprocal hybrids. Reconstructed metabolic networks, based on robust correlations between metabolic profiles, display a higher network density in most hybrids as compared with the corresponding inbred lines. With respect to metabolite level inheritance, additive, dominant and overdominant patterns are observed with no specific overrepresentation. Despite the observed complexity of the inheritance pattern, for the majority of metabolites the variance observed in all 14 hybrids is lower compared with inbred lines. Deviations of metabolite levels from the average levels of the hybrids correlate negatively with biomass, which could be applied for developing predictors of hybrid performance based on characteristics of metabolite patterns.
Hybrid vigor (heterosis) describes the phenomenon of improved performance of a hybrid progeny compared to both homozygous parents (Shull, 1908). It is a central concept in plant breeding: more than 95% of US corn production is due to hybrid varieties, some of which exhibit a more than 100% increase in performance over the better-performing parent. Despite the agronomical importance of heterosis, its underlying molecular mechanisms remain elusive. Historically, three quantitative genetic theories have been employed to explain heterosis: the dominance theory (Davenport, 1908; Bruce, 1910), assuming that deleterious alleles present in either parent are complemented in the hybrid; the overdominance theory (Hull, 1945; Crow, 1948), postulating that the hybrid trait values exceeding the parental range are due to the heterozygous state at a single locus; and the epistasis theory (Powers, 1944; Williams, 1959), attributing heterosis to the increased number of possible interactions between non-allelic genes at two or more loci in hybrids. However, there is increasing evidence that these models are not sufficient to explain the complexity associated with heterosis (cf. Birchler et al. (2010)). Adding to this complexity, another recent study provides evidence that at least in some cases heterosis can be narrowed down to a single locus (Krieger et al., 2010).
Here, we are interested in heterosis of crop plants, specifically maize, with respect to biomass and yield. Biomass of a plant system reflects the integration of biosynthetic and catabolic activities over time. Therefore, it is tightly linked to the metabolic state of the plant system manifested in the metabolite levels. Based on these considerations, we set out to characterize the metabolic state of maize hybrids and their parental lines by using an established mass spectrometry-based metabolic profiling technology.
Specifically, we asked the following questions:
(i) Is the metabolic profile of hybrids different from that of the corresponding inbred lines?
(ii) Are the metabolite inheritance patterns (i.e. overdominant, dominant, additive) different between hybrids displaying different levels of heterosis?
(iii) Do the metabolite levels display a different distribution in hybrids showing biomass heterosis compared with that in inbred lines?
(iv) Do metabolic networks, reconstructed from correlations between metabolic profiles, of hybrids and inbred lines differ in their properties?
(v) What is the extent to which metabolite levels correlate with the degree of biomass heterosis?
Our findings demonstrate that the metabolic profile of each hybrid line is distinct, compared with the corresponding inbred lines, and allows to also separate reciprocal hybrids. Moreover, with respect to metabolite level inheritance, all three patterns – additive, dominant, and overdominant – are observed without significant overrepresentation. Metabolic networks, reconstructed from robust correlation between metabolic profiles, mostly display a higher network density in hybrids compared with that in inbred lines. Finally, when comparing all hybrids against all inbred lines metabolite levels generally display a lower variance amongst hybrids. The deviation of the metabolite level in individual lines from the average metabolite level calculated over all hybrids is significantly correlated with biomass (and thus with the degree of heterosis).
Description of the experimental setup
To characterize biomass and heterosis in crop plants, we decided to use corn as probably the most suitable plant system for investigating heterosis. In contrast to many of the previous studies, limited to only two inbred lines and the resulting hybrid, we wanted to cover lines displaying large genetic diversity as well as different levels of heterosis. To this end, we used four European parents, two each from the flint (UH002 and UH005) and the dent (UH250 and UH301) pool, and all their reciprocal hybrids. Furthermore included in this study were the two American lines (B73 and Mo17), most commonly used in corn heterosis research, together with their respective crosses. The full experimental design is displayed in Figure S1a in the Supporting Information.
As metabolism is sensitive to changing environmental conditions (Fernie et al., 2004), our experiment was designed to keep environmental influences to a minimum by identifying the heterotic trait as early as possible. Therefore, we performed our study on the germinating root system in maize due to its early accessibility in a relatively stable environment. Germinating maize roots have previously been shown to display heterosis at a very early developmental stage, 3–7 days after germination, by increased root length and fresh weight (Hoecker et al., 2006, 2008). As shown in Figure S1b, in agreement with published data, the root fresh weight of all but three hybrids showed significant best-parent heterosis.
Six samples, each containing 10 pooled roots, were analyzed by gas-chromatography time-of-flight mass-spectrometry (GC-TOF-MS) to obtain the metabolic profiles. This allowed reproducible identification of 112 metabolites whose relative concentration levels are given in Table S1.
Metabolic profiling allows easy separation of hybrids from their corresponding inbred lines
Subjecting the metabolite data to a principal components analysis (PCA) separates all inbreds from their respective hybrids (PC 1, Figure 1), demonstrating that there exist major differences between hybrids and corresponding parental lines. While genotypic mean values were plotted in Figure 1 for reasons of clarity, grey lines indicating the respective standard deviation of individual data points show the high general reproducibility of metabolic profiles. A remarkable separation can be observed between the American and European cultivars along PC 2; moreover, it is worth noting that fresh weight, indicated by different sizes of the data points, is not a primary source of variation in the first two components. A more surprising observation concerns the behavior of the reciprocal hybrids: Although one may expect these cultivars to cluster together, only two of the seven combinations (UH005 × UH002 and UH005 × UH250) exhibit a high proximity, while in some of the remaining cases a dominant maternal (e.g. UH301 × UH002, UH301 × UH005, and UH301 × UH250) and, in the case of the American lines, a weaker paternal dominant effect can be observed (also shown in Figure S2).
Most metabolites display both positive and negative heterosis patterns depending on the actual combination of parental lines
The deviation of a hybrid from the parental mean for any measurable trait can be quantified as relative mid-parent heterosis (rMPH) and calculated using the formula rMPH = 100d/a, where d is the difference between the hybrid and parental mean and a is the parental mean itself. Figure 2 shows the average rMPH for each of the 112 metabolites over all 14 hybrids based on log-transformed normalized data. Despite the fact that the degree of heterosis for any metabolite differs between individual hybrids, we observe some conserved patterns such as predominantly positive heterosis for most amines and amino acids (blue boxes) and negative heterosis for most sugar compounds (red boxes). A t-test comparing rMPH values of these two compound classes is significant (P-value <10−5, Figure 2). We also observed exceptionally large variation for two metabolites, GDIBOA and DIBOA, showing positive and negative heterosis depending on the cross. Calculating correlations of metabolite rMPH values and rMPH for fresh weight did not yield any significant results when stringent correction for multiple testing was applied.
While these data clearly demonstrate that heterosis is displayed on the level of individual metabolites and, furthermore, that some heterotic effects seem conserved throughout all crosses, it is important to note that for most metabolites the general strength and direction of heterosis depends on the individual hybrid.
Analysis of metabolite heterosis in crossings reveals the existence of all possible inheritance patterns
Often heterosis is classified by different proposed modes of action/inheritance on the genetic level, such as dominance, overdominance, or epistasis. Figure 2 demonstrates that both positive and negative heterosis is observed for individual metabolites. Next we determined the mode of inheritance for all hybrids and metabolites. On a molecular level, the mode of inheritance can be classified as additive, when the level approximates the average of the parental levels, and non-additive otherwise. The non-additive class encompasses four patterns: higher or lower than the higher/lower parent (termed positive/negative best-parent heterosis or overdominance) and higher or lower than the average of both parents but not exceeding the parental range (termed positive/negative dominance). The results of such a classification of molecular traits may favor one of the aforementioned heterosis theories.
Figure 3 shows the classification of our metabolome data for one of the crossings, i.e. the two reciprocal crosses between B73 and Mo17. Two remarkable features become immediately apparent: first, all inheritance patterns are observed and second, their distributions differ to a large extent between the reciprocal crosses. This may be due to the dominant influence of one parent, as described above. Analysis of the remaining 12 hybrids (results shown in Figure S3) essentially confirms this observation. Furthermore, no discernible patterns were recognizable when comparing inter- versus intra-group hybrids.
In order to see whether or not the inheritance pattern observed is somehow linked to the level of heterosis displayed by the hybrid we compared the metabolite inheritance pattern of UH250 × UH005 and UH005 × UH002 displaying the highest, respectively lowest, biomass mid-parent heterosis (157 and 5%) (Table S2). No large difference can be detected between these two hybrids with respect to the number of metabolites falling in the different categories, and the overlap of metabolites falling in identical categories is only modestly increased over the values expected by chance (Table S2).
Taking all hybrids into account our analysis shows that two-thirds of all metabolite levels follow an overdominance inheritance pattern (990 of 1568 genotypic mean values) the majority displaying a negative overdominance. For individual metabolic traits the mode of inheritance is highly specific for each individual hybrid and cannot be directly linked to fresh weight or fresh weight heterosis. Therefore, our data are not in favor of a single genetic theory, but suggest that dominance, overdominance, and epistasis are not mutually exclusive.
Metabolic networks of corn hybrids show a higher average degree compared with their parents
Metabolites do not act in isolation, but rather are connected by multiple transformations carried out via a metabolic network. We thus tested whether or not the analyzed genotypes differ in properties of metabolic correlation networks generated from experimental data. Metabolic networks were reconstructed based on pairwise Pearson correlation between experimentally determined metabolite levels for each of the 20 genotypes (cf. Experimental procedures). The resulting networks were subsequently analyzed for seven network properties, including: average degree, average path length, number of clusters, diameter, authority, betweenness and closeness centrality indices, as summarized in Table S3. To ensure that the property values calculated for the experimentally derived networks were significantly different from random data, we employed permutation tests. Figure S4 shows that all network properties are significantly different from those of networks reconstructed from adequately randomized data. To establish the networks, we fixed α at 0.01 resulting in a critical value (τc) of 0.89 for the Pearson correlation coefficient, τ. That is, all metabolite correlations exceeding τc are kept in the final network. The critical value of τ was also used as a cutoff for the respective random datasets. To evaluate the influence of the chosen α-level on the results, we investigated α-levels ranging from 0.001 up to 0.1 (Figure S5).
More detailed analysis of the network average degree, which deviates most significantly from a random distribution, reveals that most European hybrids display a best-parent or mid-parent heterosis with respect to this property illustrated for the cross UH005 and UH250 (Figure 4a). A different picture is displayed by the American lines. In agreement with the PCA result, a strong paternal influence is revealed on the network level as shown in Figure 4b, indicating a different behaviour of the American cross from the European lines. However, none of the tested network properties were significantly correlated to biomass or biomass heterosis.
Distribution of overdominance, dominance and additive behavior of metabolites results in lower metabolite variability in hybrids as compared to parental lines
As described above, all three modes of inheritance (dominance, overdominance, and additivity) are observed for individual metabolites. More importantly, the mode of inheritance for a specific metabolite is in general not fixed but varies as a function of the parental lines. This is exemplified for metabolite lysine, where positive overdominance (UH301 × UH250 and UH250 × UH301), negative overdominance (B73 × Mo17), and various intermediate levels can be found (Figure S6a).
Puzzled by this observation, we wanted to see if this seemingly random assignment of inheritance is associated with some other features distinguishing hybrids from parental lines. To this end, we decided to determine if metabolites display a similar variance on both parental lines and hybrids. In order to test this hypothesis, we determined the coefficient of variation (CV) for each metabolite in two groups, the first including all 14 hybrids and the second all six inbred lines. Subsequently, for each metabolite we calculated its log2-transformed CV-ratio, resulting in a negative (positive) value if the metabolite displays a lower (higher) variance in hybrids as compared to inbred lines (Figure 5a). To our surprise, we observed a clear shift in the distribution towards negative values (distribution mean of −0.27), demonstrating that the variance for metabolite means within parental lines is on average 21% larger compared with hybrids.
If hybrids would show solely intermediate metabolite levels all log2-transformed CV-ratios would be negative. To verify the significance of this shift, we therefore calculated the distribution means from permutation tests (μperm) and compared it with the mean of the observed distribution (μobs = −0.27). The distance between the mean of the random distribution means () and the observed value amounts to 13 standard deviations (Figure 5b), which for normal distributions corresponds to a P-value of <10−38.
Each hybrid metabolite value can be split into two components, an additive effect (a), which is equivalent to the parental mean, and the remaining dominance deviation (d). Here, in each of the 10 000 permutations, we kept the six parental values and the 14 hybrid additive effects (a1 − a7; similar for reciprocal hybrids) for each metabolite constant and assigned the 14 observed dominance deviations (d1 − d14; different for individual hybrids) at random to the hybrids. Then, we computed the resulting CV-ratio distribution. The rationale of this permutation scheme is based on the assumption that under the null hypothesis dominance values vary randomly, while under the alternative they take on values leading to more similar levels in the hybrid. As a result, on the level of individual metabolites, 33 display a significantly lower CV-ratio than expected while only for one metabolite was a significantly higher CV-ratio found.
Taken together the approach described above, permuting the inheritance patterns of the metabolites for the different hybrids, shows that the observed inheritance patterns are non-random and a necessary prerequisite resulting in the lower variance of metabolite levels observed for the hybrids.
Deviation from average levels is correlated with biomass
Moving on from this surprising observation of lower variance observed for the hybrids, we wanted to test if metabolite average levels can be linked to biomass, and thus heterosis. To this end, we analyzed whether or not the sum of the differences between all metabolite levels measured in an individual line and the average metabolite levels (AML) observed for different subgroups of hybrids would show some correlation with biomass. More specifically, we determined AML for the best performing lines (AMLbest; achieving a mean root fresh weight of >100 mg) and the worst performing lines (AMLworst; with mean root fresh weights <60 mg), respectively. A significant correlation is only observed when the best performing lines are taken into account (P =2 × 10−5, R2 = 0.64, Figure 6a), whereas no such correlation is observed when the worst performing lines are taken into account (P = 0.76, R2 = 0.006, Figure 6b). In a more systematic approach we then calculated AML based on different combinations of lines, including up to all 20 lines depending on their biomass performance. Figure 6c shows that highest correlation is indeed found for AMLbest. Including more genotypes in the calculation of AML leads to lower R2 values and further stripping of the best performing lines from the AML accelerates the decrease of R2.
Analysis of the secondary metabolites and the lipidome confirms the lower variance observed for the hybrids to different extents
The analysis described up to now has been restricted largely to metabolites of primary metabolism. The pools of primary metabolites constitute the source for the formation of secondary metabolites and lipids. While it was shown previously that primary metabolite profiles are tightly linked to biomass (Meyer et al., 2007a), we wanted to test to what extent our observation of reduced metabolite variance in hybrids is also apparent in secondary metabolites and the lipidome. To this end, we measured 5754 and 1992 different features from the polar and unpolar phase of extracts of the same sample set using an UPLC-Orbitrap platform (see Experimental procedures).
In agreement with the GC-MS data, a comparative analysis of the variance displayed by the metabolic features measured in the polar phase, containing primary and secondary metabolites, in hybrids as compared to parental lines confirms our previous results. Metabolite variance in hybrids is significantly lower compared with that in parental lines (Figure S7a). The average shift is −0.61, equivalent to CV-values being 53% higher in parental lines. The correlation coefficient between deviations from the calculated average metabolite levels (AMLbest) and fresh weight is even higher than for the smaller data set of primary metabolites (r = 0.88, P = 4 × 10−7, Figure S7b). Performing the same analysis for the lipids reveals values comparable to the primary metabolite analysis [mean CV-ratio of –0.28, equivalent to CV-values being 21% higher in parents (Figure S7c) and correlation of fresh weight with deviations from calculated lipid average levels (AMLbest) of r = 0.82 and P = 1 × 10−5 (Figure S7d)].
Here, we describe the analysis of the metabolic patterns of germinating roots of corn hybrids and their corresponding parental lines. Based on the metabolic profiles containing 112 distinct metabolites, we find a clear separation of hybrids from corresponding parental lines and European lines from the American cross and reciprocal crosses.
It is important to note that the observed separation is not caused by potential differences in developmental stage which would be reflected in differences in fresh weight. Fresh weight does not contribute more variance and, thus cannot be employed to separate the considered genotypes with respect to PC1 and PC2. Regarding the observed maternal influence one might at first glance suggest this to reflect the gene dosage effect of the endosperm, which represents the resource for the developing seedling. However, realizing that the American lines displayed a weaker paternal effect this interpretation becomes less convincing. It should be noted though that this unidirectional influence of either parent has already been described for heterosis manifested during embryo development (Meyer and Scholten, 2007) and on the transcriptome level of field grown seedlings (Swanson-Wagner et al., 2009).
Analysis of the pattern of the different metabolites in hybrids has clearly proven that both positive and negative mid and best-parent heterosis can be observed. The predominantly negative heterotic effect observed for sugars and members of the central energy metabolism in this study is in accordance with the findings by Meyer et al. (2007a) concerning the growth of A. thaliana plants; they observed that most of the central metabolites display a negative correlation with biomass, supporting the idea of draining the pools for these metabolites to promote growth.
The comprehensive analysis of all metabolites in all combinations has demonstrated that all modes of action of heterosis can be found within individual lines and for individual metabolites; moreover, no unifying pattern can be found to be present in all hybrids. Our results confirm and extend similar findings for transcripts, where, depending on the hybrid and the analyzed tissue, different representations of all patterns were observed as compared to their corresponding parents in both plant (Springer and Stupar, 2007; Stupar et al., 2008; Wei et al., 2009) and animal systems (Cui et al., 2006). The inability to identify a predominant pattern by analyzing the genetic mode of action of individual molecular traits in a parent–hybrid comparison led us to analyze other features associated with metabolite levels in the hybrids as compared to parental lines. To our surprise, we found the variance observed for metabolites in the group of hybrids to be smaller compared with the variance observed for the group of parental lines. Permuting the inheritance patterns leads to an increase in variability, thus strongly linking the two observations (i.e. lower variance in metabolite levels in the hybrids and complex inheritance patterns).
In this context, it is worth mentioning that the lower variance observed for the hybrids is not limited to primary metabolites, but is also observed to the same extent for secondary metabolites. The fact that the lipids still display the same trend, though to a lesser extent then secondary metabolites, is interesting given that most of the lipids measured in the lipidome platform are derived from membrane structures. This might suggest that cellular structures displaying a lower turnover rate as compared with metabolites of primary and secondary metabolism are more conserved independently of the actual growth rate.
It should be mentioned that the concept of lower variance in hybrids as compared to parental lines has been followed independently by other groups. Thus Springer and Stupar (2007) suggested a comparable model concerning gene expression levels, but without providing experimental support.
Work described by Phelan and Austad (1994) and Knight (1973) compared individual hybrids with individual parents and to some extent found the postulated lower variance in the hybrids. However, it is important to emphasize that the latter two works differ in one major point from our study. Whereas they compare individual hybrids with individual parental lines we do not follow this route at all. In contrast, we compare the group of all hybrids with the group of all parental lines to determine the CV of the metabolite levels. Only this approach allows the detection of the smaller variance in hybrids compared to the parental lines. Finally, we would like to point out that the finding of lower variance in the hybrids would be trivial if a predominance of additive behavior (mid-parent ranges for metabolite levels in hybrids) were true for the metabolites. Analysis of our metabolite data demonstrates, however, that two-thirds of all genotypic mean values show overdominance. Overdominance in contrast should lead to an increase in variation in the hybrids which would be just the opposite of what has been observed. As shown by permuting the observed dominance deviations, depending on the parental levels all modes of action (positive/negative overdominance, partial dominance, and additivity) will be observed and are often necessary for any particular metabolite to reach a certain level. A possible hypothesis of how the lower variation in hybrid metabolite levels could be linked to their biomass heterosis would be the concept of optimal metabolite levels. These optimal levels could result in an optimal flux through the metabolic system and thereby promote faster growth under the given experimental conditions. However, further investigations are necessary to identify the mechanisms that potentially allow hybrids to better adjust their metabolic levels compared to homozygous lines.
In conclusion, by characterizing the metabolic state of corn hybrids and its comparison with the corresponding parents, we identified several novel features typical for the hybrid lines. Our findings demonstrate that hybrids display less variance with respect to metabolite levels as compared to the parental lines. In order to attain these levels, specific complex inheritance patterns, both metabolite- and cross-dependent, are required. The observation that deviations of metabolite levels from the average metabolic levels correlate negatively with biomass might be helpful for the development of predictors of hybrid performance based on characteristics of metabolite patterns.
Plant material and growth conditions
The maize inbred lines UH002, UH005, UH250, UH301, B73 (Iowa Stiff Stalk Synthetic), and Mo17 (Lancaster) as well as the 14 hybrid combinations were generated in the nursery of the University of Hohenheim in the summer season of 2003 as well as in the nursery of the University of Tübingen in Puerto Rico in the winter season of 2005–06. Seeds were surface sterilized and germinated according to the method of Hoecker et al. (2006). For each replicate of a genotype 10 kernels were wrapped in a half-germination paper. The germination of the kernels was carried out in three periods with three, three, and four replicates for each genotype within 2 weeks. For further analyses, the 3.5-day-old roots were excised with a razor blade, the roots growing on the same germination paper were pooled, weighed, immediately snap frozen in liquid nitrogen, and stored at −70°C. The frozen samples were randomly homogenized in 2.0-ml round-bottomed micro-vials (Eppendorf, http://www.eppendorf.com/) with pre-washed 0.25-inch steel ball bearings in a ball mill (Retsch, http://www.retsch.com/).
Metabolomics analyses and data normalization
A targeted analysis evaluating the levels of 112 distinct metabolites was conducted for six biological replicates (pooled from 10 root samples) of each individual genotype following the procedure outlined in Lisec et al. (2006). Sixty-nine metabolites were identified by comparison with a reference database. For 19 of the remaining 43 unidentified metabolites we could assign a putative chemical class (aa, amino acid; acid, organic acid; cho, sugar; chop, sugar phosphate) according to selective masses from the spectra. All samples were measured in completely randomized order in three consecutive batches (measurement days).
Metabolite intensities were log-transformed to better resemble a normal distribution. A two-way analysis of variance (anova) was applied using genotype and sample batch (measurement day) as factors. Systematic differences due to the latter factor (measurement day) were removed. Values with Studentized residues larger than four were eliminated. In a further normalization step, we corrected for differences in metabolite levels due to variation in initial sample amount. Here, we calculated a correction factor for each sample as the ratio of its median peak height (i.e. metabolite level) and the median peak height for all replicates of the similar genotype. By dividing each sample with its correction factor, we scaled biological replicates to a similar median peak size, in order to prevent spurious correlations in the reconstruction of networks.
Analysis of secondary metabolites and lipids using a UPLC-Orbitrap platform
In an untargeted analysis we measured 5754 and 1992 different features from the polar and non-polar phase of extracts from six biological replicates (pooled from 10 root samples) of each individual genotype.
The non-polar fraction of the extract was measured following the procedure outlined in Giavalisco et al. (2009), while the polar phase was measured on a UPLC system using a C8 reversed phase column (100 mm × 2.1 mm × 1.7 μm particles, Waters). The mobile phases used for the separation were water (UPLC MS grade, Biosolve, http://www.biosolve-chemicals.com/) with 1% 1 m NH4Ac (Ac, acetate), 0.1% acetic acid (Buffer A) and acetonitrile/isopropanol (7:3, UPLC grade, Biosolve) containing 1% 1 m NH4Ac, 0.1% acetic acid (Buffer B). A 2 μl sample (the dried down organic fraction was re-suspended in 500 μl of UPLC grade acetonitrile:isopropanol 7:3) was loaded per injection and the gradient, which was taken out with a flow rate of 400 μl min−1, was: 1 min 45% A, 3-min linear gradient from 45% A to 35% A, 8-min linear gradient from 25% A to 11% A, 3-min linear gradient from 11% A to 1% A. After washing the column for 3 min with 1% A the buffer is set back to 45% A and the column is re-equilibrated for 4 min, resulting in a total run time of 22 min.
The mass spectra were acquired using an Exactive mass spectrometer (Thermo-Fisher, http://www.thermofisher.com/). The spectra were recorded using alternating full scan and all ion MS/MS scan mode, covering a mass range from 100 to 1500 m/z. The resolution was set to 10 000 with 10 scans per second, restricting the loading time to 100 ms. Capillary voltage was set to 3 kV with a sheath gas flow value of 60 and an auxiliary gas flow of 35. The capillary temperature was set to 150°C, while the drying gas in the heated electrospray source was set to 350°C. The skimmer voltage was set to 25 V while the tube lens was set to a value of 130 V. The spectra were recorded from 1 min to 17 min of the UPLC gradients.
Each metabolic feature represents a measured ion mass matching the following quality criteria: (i) it could be measured in all replicates of at least one genotype with an average intensity of 10 000; (ii) it was possible to assign at least one chemical sum formula after comparison with biological data bases KEGG (http://www.genome.jp/kegg/compound/) and KNApSAcK (http://www.kanaya.naist.jp/KNApSAcK/) using in-house software; and (iii) it was non-redundant (that is, no other peak within a 5-sec time window was highly (r > 0.95) correlated). The ion intensities were log-transformed.
Principal components analysis and hierarchical clustering analysis (HCA)
Normalized metabolite data were subjected to a PCA using the Bayesian algorithm (bpca) from the pcaMethods package (Stacklies et al., 2007) available for the statistical software framework r (http://www.r-project.org/) following an initial Pareto-scaling. To improve visual perception of the figure, only mean values for individual genotypes are depicted with the size of each symbol being proportional to the average sample weight of the genotype. Variation in the data is indicated by grey lines showing the corresponding standard deviations for the PCs of each data point. The Euclidean distances between genotypic means of metabolic profiles were calculated using the ‘dist’ function in R.
Mode of action of the heterosis
To visualize the mode of action of the heterosis for metabolic levels in a given hybrid we adopted a plot design from Swanson-Wagner et al. (2006). For each individual metabolite, its mode of action describes the ratio of the metabolite level in a hybrid compared with both parents, thus reflecting either an additive or non-additive behavior. The mode of action of the heterosis is indicated by the angle in polar coordinates, while the radius represents fold changes of the two extreme phenotypes. Metabolites falling directly on a dashed line connecting 1 and 7 o’clock, 3 and 9 o’clock, or 5 and 11 o’clock exhibited differences in metabolite levels with equal amount from the low- to the middle-expressing genotype and the middle- to the high-expressing genotype, respectively. Metabolites falling on the horizontal and vertical lines exhibited pure additivity and over- (or under-)dominance, respectively (see plot side labels). The 12 outer panels indicate the metabolite levels for data points falling exactly on the radial lines towards 1–12 o’clock in a symbolic fashion. For example, at the 2 o’clock line, metabolite level of the hybrid is equal to the paternal line and both are higher compared with the maternal line. The radius at which a data point is plotted represents the log2 of the fold change between the high- and low-level genotypes and grey circles indicate 2n-fold changes, where n are natural numbers.
To remove metabolites with small or insignificant changes we display only metabolites which showed significant differences among genotypes (P ≤ 0.05, anovaF-test). Further, we reversed the log transformation to increase the spread in radius and therefore allow a better differentiation between metabolites with higher or lower dynamic ranges.
Additionally, we classified metabolic effects according to their |d/a| ratio as additive (A; |d/a| < 0.2), partially dominant (PD; 0.2 ≤ |d/a| < 0.8), dominant (D; 0.8 ≤ |d/a| < 1.2) or overdominant (OD; |d/a| ≥ 1.2) as described by Stuber et al. (1987). Here, a is equivalent to half of the parental line difference and d is equivalent to the difference between hybrid value and parental mean. Negative deviations of hybrids from the parental mean are preceded by a minus sign (Table S2).
Metabolite stability and average metabolite levels
Given 112 metabolites, for each metabolite s we first calculated its coefficient of variation (CV = standard deviation/mean) over all 14 genotypic mean values of hybrid lines . Similarly, we calculated the coefficient of variation of each metabolite over all six genotypic mean values of parental lines. For each metabolite s we then calculated the log2-transformed ratio of the corresponding coefficients of variation for hybrids and parents, i.e. . Let C denote the observed distribution of Cs values over metabolites. Our data set shows a shift in the mean () of C towards negative values ( = −0.27, Figure 5a).
The null hypothesis that we test is: , where is the expected value of , the mean of C over metabolites from our randomization scheme. We estimate from the mean of the distribution C from random sampling . We generated the random distribution C by applying the following permutation scheme:
(i) Calculate the mean values of the seven parental combinations (a1–a7) which are equivalent to the hybrid additive effects (note that each value of a exists twice, because reciprocal hybrids have identical values of a)
(ii) Calculate the dominance deviation from a1–7 for all 14 hybrids (d1 − d14).
for k in 1…10 000:
(iii) Assign d1–14 randomly to a1–7.
(iv) Calculate the CV-ratio distribution C based on the original parental data and the randomized hybrid data.
(v) Store the resulting distribution C and its mean over metabolites .
The permutation distribution had a mean of –0.018 and a standard deviation of 0.018, which, if compared with , for normal distributions corresponds to a P-value of < 10−38 (Figure 5b).
For individual metabolites we can compare the observed CV-ratio with the CV-ratios from k permutations and calculate an empirical P-value for H0: using the formula
where I() is the indicator function.
The average level for a metabolite s is defined as the average level of s for a given subset Z of genotypes (μs,Z). A subset can be obtained by selecting only genotypes exceeding a threshold t for root fresh weight. For t ≥100 mg five of 20 genotypes would be selected as a subset to calculate μs. The vector of average metabolite levels for a subset can be termed AMLZ.
In the following, we calculated for each of the 20 genotypes the sum of all squared differences from AMLZ. We log-transformed the resulting values and multiplied them by −1 rendering a positive correlation with fresh weight.
Metabolic network calculation and analyses
For each of the investigated genotypes gi, 1 ≤ i ≤ 20, first a correlation network was reconstructed. To this end, for a chosen significance level α, we first calculated the corresponding critical value, τc, of the Pearson correlation coefficient between the available replicates n (n =6) at level α. The network was then obtained by including edges between any two metabolites (nodes) whose correlation exceeds the found critical value τc. In addition, seven network properties, including: the average degree, closeness centrality, betweenness centrality, authority index, number of clusters, diameter, and the average path length were calculated for each of the correlation networks using the igraph package in R. An excellent review of complex networks and their properties can be found in Newman (2003).
We then tested the null hypothesis: 1 ≤ j ≤ 7, where is the value of the property pj in the original network and is the expected value of this property. We estimate by the mean value of the property from the randomized network variants . In this way we ensured that metabolite values are sampled from their original distribution, not mixing values between any genotypes or different metabolite values within a genotype. The randomized networks are obtained by using the critical value τc found from the original data set and following the same procedure used in reconstructing the original network. Altogether, k =200 randomized networks are generated for each genotype, resulting in a distribution of k values for each network property. This distribution was used to derive an empirical P-value for each of the seven investigated properties. Comparing the average degree of random and observed metabolic networks, 1.1 and 3.5, respectively, we can estimate a false positive rate for edges (metabolite correlations) of about 31%.
To test for significant differences of a network property between any two genotypes one would need to apply other methods, such as bootstrapping, to determine confidence intervals. However, with the current experimental setting (six replicates per genotype) bootstrapping is not feasible.
We thank Dr A. Melchinger (University of Hohenheim), Dr F. Hochholdinger (University of Tübingen) and their co-workers for providing seeds of the inbred lines and hybrids used in this study. This project was supported by the Deutsche Forschungsgemeinschaft (DFG) grant ‘Heterosis in plants’ (SPP 1149).