Metabolite profiling of Douglas-fir (Pseudotsuga menziesii) field trials reveals strong environmental and weak genetic variation


Author for correspondence:Shawn D. Mansfield Tel: +1 604 8220196 Fax: +1 604 8229104 E-mail:


  • • The primary objective of this study was to assess metabolomics for its capacity to discern biological variation among 10 full-sib families of a Douglas-fir tree breeding population, replicated on two sites.
  • • The differential accumulation of small metabolites in developing xylem was examined through metabolite profiles (139 metabolites common to 181 individual trees) generated by gas chromatography mass spectrometry and a series of statistical analyses that incorporated family, site, and tree growth and quantitative phenotypic wood traits (wood density, microfibril angle, wood chemistry and fiber morphology).
  • • Multivariate discriminant, canonical discriminant and factor analyses and broad-sense heritabilities revealed that metabolic and phenotypic traits alike were strongly related to site, while similar associations relating to genetic (family) structure were weak in comparison. Canonical correlation analysis subsequently identified correlations between specific phenotypic traits (i.e. tree growth, fibre morphology and wood chemistry) and metabolic traits (i.e. carbohydrate and lignin biosynthetic metabolites), demonstrating a coherent relationship between genetics, metabolism, environmental and phenotypic expression in wood-forming tissue.
  • • The association between cambial metabolites and tree phenotype, as revealed by metabolite profiling, demonstrates the value of metabolomics for systems biology approaches to understanding tree growth and secondary cell wall biosynthesis in plants.


Recently, nontargeted metabolite analysis (metabolomics) has evolved into a new branch of functional genomics, which complements transcriptomics and proteomics technologies. Ideally, metabolomics aims to identify and quantify the full complement of low-molecular-weight, soluble metabolites in actively metabolizing tissues (Fiehn & Weckwerth, 2003). However, in practicality, the narrow molecular specificity of individual analytical techniques, and difficulties in amalgamating substantial data sets acquired using multiple techniques, have thus far generally restricted analyses to ‘targeted’ subsets of the greater metabolite pool (e.g. phenolics, carbohydrates, anthocyanins). Once collected, such data may be associated with measurements of plant genetic and overt quantitative or qualitative phenotypic traits, permitting correlative associations to be drawn between plants’ metabolite ‘pools’ and their genetic background, inherent phenotypic characteristics, responses to biotic and abiotic stress and/or genetic mutations (e.g. arabidopsis ‘pkl’ mutant in which seedlings retain some metabolic traits of embryos (Rider et al., 2004)). Through this connectivity, metabolomic data may assist in establishing causal relationships among genetic, metabolic and phenotypic phenomena. In recent years, metabolomics has been used successfully on numerous plant genera, including Arabidopsis (Fiehn et al., 2000; Roepenack-Lahaye et al., 2004), Populus (Jeong et al., 2004; Robinson et al., 2005), Medicago (Huhman & Sumner, 2002), Solanum (Roessner et al., 2001; Szopa, 2002), Cucurbita (Fiehn, 2003), Pinus (Morris et al., 2004) and, most recently, Triticum (Baker et al., 2006).

Metabolomics has demonstrated relationships between plant metabolite pools, genotype and phenotype, and helped to elucidate biological processes involving abiotic and biotic plant interactions in a variety of species. It is clear that metabolomics is a useful approach and promises to contribute further to our understanding of plant systems – specifically in the fields of tree growth and development. To date, most comparative metabolomics investigations have focused on model plant systems that have been subjected to environmental extremes (Rizhsky et al., 2004; Urbanczyk-Wochniak & Fernie, 2005), mutation, and/or targeted genetic modification (Roessner et al., 2001; Le Gall et al., 2003; Robinson et al., 2005). This approach has been effective to the extent that well-defined systems, which exhibit single-gene alterations and corresponding phenotypes, or acute responses to specific nutritional scenarios or environmental stresses, have allowed the underlying concepts and utility of metabolomics to be evaluated. However, experiments involving model systems and extreme, controlled conditions bear limited resemblance to the development of plant populations in ‘real world’ contexts. It is under exposure to variable genetic and environmental factors that the plastic nature of plant development is revealed, giving rise to observed variability in phenotypic parameters. Presumably, such variation is accompanied by corresponding shifts in metabolism that may be detected in the metabolite pools. Therefore, broad-scale elucidation of metabolic structure and the association of this with the genotypic, phenotypic and/or environmental characteristics of plant populations may aid in linking these aspects and furthering our understanding of plant development as a whole.

The research described herein evaluated a global metabolomics approach to investigating natural variability resulting from the influence of family and site on wood formation and tree growth in multiple full-sib Douglas-fir (Pseudotsuga menziesii) trees selected from an advanced second-generation breeding population, duplicated by site. This research represents a fundamental, nontargeted assessment of one of the newest branches of functional genomics for discerning biological variation in tree species. It demonstrates a technical ability to reveal the expected coherence between metabolic traits and other biotic and abiotic parameters, in the context of tree populations.

Materials and Methods

Plant material and sampling

Ten, full-sib, 26-yr-old Douglas-fir (Pseudotsuga menziesi (Mirb.) Franco) families from the British Columbia Ministry of Forests second-generation breeding program were employed in this study. The families represent a subset of trees from an extensive multifamily, multisite progeny study, breeding predominantly for superior growth performance. Each family is represented by 10 (out of a possible 16) individuals randomly selected from four blocks with four-tree row plots, randomly planted on each of two sites (total 200 trees). The two sites, Adam River and Gold River, are located on Vancouver Island, British Columbia, and represent a more productive and a less productive site, respectively, as defined by Douglas-fir height growth classification. Nineteen random samples were lost during transit and processing, resulting in a total of 181 samples over the 10 families.

Sampling was conducted over a 4 d period in late summer (6–9 August 2003). This period represents the latter part of the growing season, when latewood formation is occurring, and the cambial tissue was very fluid during sampling and thus suggested that the wood-forming metabolism was still active. The developing xylem tissue was obtained from each tree by first peeling a section of bark/phloem/outer cambium from the main bole of the trunk at breast height, and then scraping the developing xylem with a fresh razor blade. The collected material was immediately transferred to a cryovial, snap-frozen in liquid nitrogen and maintained in a cooled liquid nitrogen vapor tank in the field, and then at −80°C in the laboratory. At the same time, a 10 mm increment core was extracted at breast height for wood fibre evaluation, and the diameter at breast height (dbh) and absolute tree height were recorded.

Quantitative wood traits

A concurrent study focusing on genetic mapping of phenotypic growth and wood traits used tree measurements and the increment core wood from each sample tree to measure a set of 16 quantitative traits, including tree diameter at breast height (dbh), height (HT), volume (VOL), wood microfibril angle (MFA), fibre length (FL), fibre coarseness (Cs), earlywood density (ED), latewood density (LWD), average density – density of entire increment core (AD), latewood proportion (LWP), wood chemistry traits including total lignin content (TL), and arabinose (Ara), galactose (Gal), glucose (Glu), mannose (Man) and xylose (Xyl) contents.

Calculation of site index

Site index, a measure of site productivity, was employed to characterize each site by estimating the height of dominant and codominant trees at age 50. Thirty trees with the largest dbh of the sample population at each site were used to estimate site index. The breast height age was estimated using increment cores and the top height was estimated using a Vertex instrument (Vertex III; Haglöf, Sweden). Site index was then assessed for each tree using British Columbia Ministry of Forests growth intercept tables for coastal Douglas-fir (Nigh, 1997). The individual tree site index values were then averaged for each site to estimate site productivity.

Metabolite sample preparation

Frozen tissue was macerated to a fine powder with a 15 s burst using a dental amalgam mixer, employing a liquid N2-chilled copper/plastic capsule and steel ball bearings. Samples were kept frozen at all times and, once ground, were returned to −80°C.

Metabolites were extracted from tissue samples and prepared for gas chromatography mass spectrometry (GC-MS) using a two-phase methanol/chloroform method developed for metabolite extraction from Populus cambium and developing xylem (Robinson et al., 2005). Approximately 100 mg of frozen, ground developing xylem was accurately weighed into a prechilled 2 ml lock-cap centrifuge tube. To this, 600 µl of HPLC-grade methanol (CH3OH) was immediately added and vortexed for 10 s to halt biological activity and minimize degradation. In addition, 40 µl of distilled deionized water and 10 µl of an internal standard (10 mg ml−1 ribitol in H2O) were added. The sample was then incubated for 15 min at 70°C with constant agitation, and centrifuged at 14 000 g for 5 min. The supernatant, containing extracted metabolites, was retained. A quantity of CHCl3 (800 µl) was then added to the pellet, vortexed for 10 s to resuspend, and incubated for 5 min at 35°C with constant agitation. The resultant supernatant recovered following a second 5 min centrifugation at 14 000 g, was pooled with the supernatant from the initial CH3OH extraction. H2O (600 µl) was added to the combined supernatant, vortexed for 10 s, and then centrifuged for 15 min at 1350 g to permit the separation of polar (methanol/water) and nonpolar (methanol/chloroform) phases. This combination and separation of phases allowed metabolites extracted in one phase but with greater affinity for the other to repartition. A 1 ml aliquot of the polar (upper) phase was taken, and either processed immediately or stored at −20°C until further analysis. Metabolites in the nonpolar phase were not analyzed in this study.

The soluble polar metabolite samples were derivatized before GC-MS analyses. A quantity of the methanol/water phase (900 µl) was dried using a speedvac (3–4 h, 30°C), and methoxymated by resuspending the pellet in 50 µl of methoxyamine hydrochloride solution (20 mg ml−1 in pyridine) and incubating with constant agitation for 2 h at 60°C in order to protect carbonyl moieties. Acidic protons were then trimethylsilylated with 200 µl N-methyl-N-trimethylsilyltrifluoroacetamide (MSTFA) and incubated at 60°C with constant agitation for 30 min. Samples were left to stand at room temperature overnight to ensure complete derivatization, and then filtered before GC-MS analysis.

GC-MS analysis

Gas chromatography mass spectrometry analysis was conducted on a ThermoFinnigan Trace GC-PolarisQ ion trap system fit with an AS2000 auto-sampler and a split/splitless injector. The GC was equipped with a low-bleed Restek Rtx-5MS column (fused silica, 30 m, 0.25 mm ID, stationary phase diphenyl 5% dimethyl 95% polysiloxane). The GC conditions were set as follows: inlet temperature 250°C, helium carrier gas flow at constant 1 ml min−1, injector split ratio 10 : 1, resting oven temperature 70°C, and GC-MS transfer line temperature 300°C. Following injection of 1 µl of sample, the oven was held at 70°C for 2 min and then ramped to 325°C at a rate of 8°C min−1. The temperature was held at 325°C for an additional 6 min before being cooled rapidly to 70°C in preparation for the next run.

Mass spectrometry analysis was conduced in positive electron ionization (EI) mode, the fore-line was vacated to approx. 40 mTorr, with helium gas flow into the chamber set at 0.3 ml min−1. The source temperature was held at 250°C, with an electron ionization potential of 70 eV. The detector signal was recorded from 3.35 min after injection until 35.5 min, and ions were scanned across the range of 50–650 mass units (mu) with a total scan time of 0.58 s.

Data acquisition and processing

ThermoFinnigan ‘Xcalibur’ (v1.3) software was used for both GC-MS data collection and peak determination and measurement. GC-MS total ion chromatograms (TICs) of TMS-derivatives from the developing xylem at breast height were collected for all full-sib Douglas-fir families replicated on two sites, in order to elucidate the common ‘metabolite pools’ present in the actively metabolizing developing xylem tissue of each tree.

To normalize the raw TIC peak data, the area of each peak in a chromatogram was expressed relative to the area of the ribitol internal standard peak, and then again standardized across all chromatograms by adjusting for the precise amount of tissue (mg fresh weight) used in each sample extraction.

The alignment of peaks that represented the same compound in multiple chromatograms was automated using purpose-built ‘PeakMatch’ software (Robinson et al., 2005). Once compiled, the dataset consisted of 251 distinct compound peaks across all 181 samples (an array of approx. 45 000). Peaks are consequently labeled 1–251. As a means of minimizing artefacts caused by sample processing and analysis, the dataset was further reduced to only those peaks that appeared in at least 10% of the samples from each site. This yielded a dataset of 139 peaks across the 181 samples (an array of approx. 25 000), which was used in all statistical analyses.

Intermediate data handling and manipulation were carried out using Microsoft Excel 2000 and Corel Quattro Pro 12.

Multivariate statistical analyses

Further reductions of the metabolite and quantitative phenotypic trait datasets were carried out by multivariate discriminant analysis (MDA), factor analysis (FA), canonical correlation analysis (CCA) and canonical discriminant analysis (CDA) using the ‘proc discrim’, ‘proc factor’, ‘proc cancor’ and ‘proc candisc’ procedures of the Statistical Analysis System (SAS v9.1) software, respectively.

Multivariate discriminant analysis, a statistical approach that assesses the variation in preclassified multivariate data and is capable of generating predictive models, was applied to the metabolite data array. The data structure of this research project allowed MDA models to be developed using two different classification schemes: by site (Adam River, Gold River) or by family (2, 26, 38, 46, 62, 75, 92, 130, 151, 156). During the site analysis, the data were split into four equal subsets to build the predictive model. Four models were generated; each model was developed using three of the four datasets. The fourth dataset was used as an independent validation array to assess the accuracy of the model. This process was repeated until all combinations of the four datasets were used and a final accuracy calculated as the average of the four models. For family analysis, the data were equally split into two rather than four sets because of the limited number of samples per class (at most 10 replicates per site). In this case, two models were generated and tested, and the average accuracy calculated. For the two-class site model and the 10-class family model, prior probabilities of 0.5 and 0.1 (50 and 10%) are expected, respectively. Higher model accuracy than the prior probabilities implies that the MDA is able to distinguish between classes at a higher probability than random chance.

Factor analysis allows the variation in metabolite and quantitative trait data arrays to be explored without the constraints of data preclassification (as is the case with MDA). Initial exploratory analyses were carried out without limiting the number of factors generated (essentially making the factor analysis a principal components analysis). The eigenvalues and scree plot slope shifts (Tabachnick & Fidell, 2001) were used to select factors that represented significant portions of the variation in a dataset. The FA was then rerun, specifying an orthoganol ‘varimax’ rotation and the number of factors to be used in the rotation. Factor scores were plotted on the axes of scatter plots to generate a graphical representation of the variation in the original data captured by the analysis. The separation of sample clusters is considered to illustrate differences between distinct metabolic systems (Fiehn et al., 2000; Roessner et al., 2001, 2001b; Chen et al., 2003; Fiehn, 2003; Morris et al., 2004).

Canonical correlation analysis is used to investigate the relationship between two groups of variables (X and Y), and transforms the data into canonical variables in such a way as to maximize the covariance between groups (SAS Institute Inc., 1999). Specifically, in our study, this technique was used to explore the relationships between the metabolite array and quantitative phenotypic traits having relevance to tree growth and wood quality characteristics. The first group of variables comprised 139 metabolites for both the Adam River and Gold River sites, while the second consisted of the 16 quantitative phenotypic traits described above. Canonical variables were considered important if the canonical correlation was large, and significant at an alpha value of 0.05. It was also necessary for the transformed variables to explain a considerable proportion of the standardized variation in the original data, as described by canonical redundancy analysis. The structure correlation coefficients (between canonical variables and original metabolites or growth trait variables) were used to identify variables in the two sets that were related via the canonical correlation. Variables with correlations > 0.3 explained 10% or more of the variance, and were considered to be part of the canonical variable.

Canonical discriminant analysis is a multivariate statistical technique that derives linear combinations of groups of variables (metabolites) in a way that maximizes the variation between classes (families or sites). The multivariate analysis of variance (manova) output generated by CDA was used to test the ability to distinguish families and sites based on metabolite data and confirm results generated by MDA.

Calculation of heritabilities

Broad-sense heritability is an estimate of the total amount of variation that can be explained by genetics (additive, dominance and epistatic variation) and is measured on a scale between 0 (little genetic control) and 1 (entirely controlled by genetics). These estimates are an indication of the amount of variation caused by family vs environmental (site) effects.

SAS was used to generate components of variance for the calculation of metabolite heritability values and to test model parameters for family, site and family × site interaction. ‘Proc GLM’ was used to conduct analysis of variance for all metabolites using the components of variance in Table 1 and the following linear model:

Table 1.  Components of variance used in analysis of variance in calculation of heritabilities
 d.f.Components of variance
  1. F, family; B, block; S, site; f, no. of families; b, no. of blocks; s, no. of sites; n, no. of trees

Family(f – 1)
Site(s – 1)
Family × site(f – 1)(s – 1)
Block (site)s(b – 1)
Family × block (site)s(b – 1)(f – 1)
Sampling errorsfb(n – 1)
Yijlp = µ + Fi + Sl + Bj(l) + FBij(l) + FSil + Ep(ijl)

(Yijlp, individual phenotypic observation; µ, overall mean; Fi, fixed family effect; Sl, random site effect; Bj(l), random block effect; FBij(l), random family × block interaction nested within site; FSil, random family × site interaction; Ep(ijl), random residual effect).

Variance components for broad-sense heritability calculations were estimated using the REML method of ‘proc VARCOMP’. Broad-sense heritability was calculated for all metabolites showing significant family and site variation (F-test, α = 0.05) but no family × site interaction (F-test, α = 0.01) using the following formula:


(inline image, family variance; inline image, variance of family × site interaction; inline image, variance of family × block nested within site; inline image, residual variance).

Compound identification

National Institute of Standards and Technology (NIST) MS-Search software equipped with the NIST mass spectra, as well as the Max-Planck-Institute Trimethylsilane (TMS) (, Golm Metabolome Database (Kopka et al., 2005) and our own (Mansfield UBC laboratory) TMS mass spectral libraries were collectively used to identify metabolites of interest, as highlighted by the statistical analyses.

Results and discussion

Family related variation

Factor analysis and MDA and were performed on the metabolite dataset (181 trees, 10 families, two sites, 139 metabolites), focusing on family variation. In the FA, five factors that collectively accounted for 51% of the total variance were included in the varimax rotation. Although marked clustering and separation of samples were observed in certain factors, this was not family-related (Fig. 1a). In light of the apparent dominance of site over other effects when both sites were analyzed together, separate FAs for samples from each site individually were conducted as a potential means of revealing distinctions between families, free of the complexities of site interactions. In these analyses, some individual family clusters did separate from one another in factor score plots of various factor pairs (data not shown).

Figure 1.

Scatter plots of factor analysis (FA) factor scores for metabolite profiles of developing xylem from Douglas-fir (Pseudotsuga menziesii) trees, with plot axes derived from FA factors 1–3. Analysis represents the differentiation of 181 individual trees (93 from the Adam River site and 88 from the Gold River site), across 139 metabolites, and clearly demonstrates the clustering and separation of samples based on site. Dashed lines suggest plane of separation only. (a) Samples classified by family, representing individuals from families 2, 26, 38, 46, 62, 75, 92, 130, 151, and 156, designated by 0–9, respectively. (b–d) Samples classified by site: A, Adam River; G, Gold River.

A family-based FA was also conducted on the data for a set of 16 quantitative phenotypic traits, which gave very similar results to the metabolite FA (Fig. 2a). The first four factors, which accounted for 67% of the variation in that data, were used in the varimax rotation. When both sites were analyzed together, no separation of family clusters was evident. However, when each site was analyzed separately, some family separation was apparent, but as with the metabolites, no clear distinctions were observed (data not shown).

Figure 2.

Scatter plots of factor analysis (FA) factor scores for quantitative phenotypic traits from Douglas-fir (Pseudotsuga menziesii) trees, with plot axes derived from FA factors 1, 3 and 4. Analysis represents the differentiation of 181 individual trees (93 from the Adam River site and 88 from the Gold River site), across 16 quantitative phenotypic traits, and clearly demonstrates the clustering and separation of samples based on site. Dashed lines suggest plane of separation only. (a) Samples classified by family, representing individuals from families 2, 26, 38, 46, 62, 75, 92, 130, 151, and 156, designated by 0–9, respectively. (b–d) Samples classified by site: A, Adam River; G, Gold River.

When the dataset included samples from both sites (Adam River and Gold River) the MDA was only 18% accurate on average and 37% accurate at best (Table 2); this represents an improvement over the 10% probability of random chance, and implies that family variation can be distinguished. These findings were supported by the results of a CDA which was used to analyze the same data, and showed that the manova results could distinguish clearly between families (Fig. 3) based on the 139 metabolites used in the analysis (P < 0.05). MDA accuracy was further improved when samples from the two locations were analyzed separately, with a moderate improvement for Adam River (37% on average, 67% at best) and a more pronounced improvement for Gold River (65% on average, 90% at best). The improvement observed when samples from each location were analyzed separately is noteworthy and alludes to a confounding influence of site when investigating genetic variation in this and other tree populations (i.e. family × site interactions).

Table 2.  Prediction accuracies of multiple discriminant analyses of metabolite profiles of developing xylem from 181 Douglas-fir (Pseudotsuga menziesii) trees, by family and site
SitesAverage prediction accuracy of MDA (%)
By familyBy site
F2F26F38F46F62F75F92F130F151F156Adam RiverGold River
  1. AR, Adam River; GR, Gold River.

  2. The ‘percentage accuracy’ represents the average frequency with which the discriminant model accurately predicted family (out of a possible 10) or growth site (out of a possible two) of individual known trees, based on their metabolite profiles (139 metabolites).

AR and GR 01712121737 02537250.800.92
Adam River40457010102053671240  
Gold River37469070467770905770  
Figure 3.

Scatter plots of canonical discriminant analysis canonical scores for metabolite profiles of developing xylem from Douglas-fir (Pseudotsuga menziesii) trees, with plot axes derived from canonical factors 1 and 2. Analysis represents the differentiation of 181 individual trees (93 from the Adam River site and 88 from the Gold River site), across 139 metabolites, and clearly demonstrates the clustering and separation of samples based on genetics (family). Samples classified by family, representing individuals from families 2, 26, 38, 46, 62, 75, 92, 130, 151, and 156, designated by 0–9, respectively.

Site-related variation

Analyses that focused on site-based variation were conducted as a complement to those relating to the family variation, described above. Adam River and Gold River differed in site productivity: Adam River was a more productive site with a site index of 39.7 m, and Gold River was a less productive site with a site index of 35.4 m. Adam River and Gold River are both located on Vancouver Island in the CWHvm and CWHxm biogeoclimatic subzones, respectively. Adam River (latitude: 50°24′00″; longitude: 126°10′00″) is 576 m above sea level and has very little understory vegetation, while Gold River (latitude: 49°51′30″; longitude: 126°04′45″) is 561 m above sea level and has an understory composed primarily of Vaccinium spp. The largest difference in site is related to the precipitation regime, with Adam River being classified as a ‘very wet’ (v) environment and Gold River located in the ‘very dry’ (x) biogeoclimatic region. Both sites were on relatively flat terrain, free of stumps, and were surrounded by even-age stands that did not restrict light access and protected the stands from wind damage. The major biogeoclimatic difference was in water availability, which will also influence both understory and soil composition.

The site-related factor analysis of the metabolite dataset was the same as that described in the previous section; however, the samples were labeled by site rather than by family (Fig. 1b–d). The three highest-ranking factors (F-1, F-2 and F-3, accounting for 16.7, 12.1 and 11.5% of the dataset variance, respectively) were responsible for clustering and separation of the samples, with site being the dominant influence (Fig. 1b–d). F-1 was the primary source of separation between site clusters, and a positive relationship between scores in F-1 and F-3 improved the separation (Fig. 1c). A small cluster of four Adam River samples that grouped with the Gold River cluster in F-1 is effectively isolated by F-2 (Figs 1b, 2c), and these samples presumably represent a variant metabolic subset.

The site-related FA of the phenotypic trait dataset was also the same as that used in the family analysis (described earlier), involving a varimax rotation of the first four factors, which collectively accounted for 0.67 of the total variance. In this analysis, F-1 was primarily responsible for clustering and separating the trees based on site, with some improvement offered by F-3 and F-4 (Fig. 2b–d). These factors accounted for 25.5, 14.0 and 8.9% of the dataset variance, respectively.

The MDA for site, based on the metabolite dataset, showed strong predictive accuracy (Table 2), which is indicative of large and/or consistent metabolic differences between populations from the two sites. manova results derived from the CDA confirm that sites can be distinguished based on the 139 metabolites used in the analysis (P < 0.05).

The results from the MDA and CDA of GC-MS metabolite profiles of developing xylem, and FA of metabolite profiles and quantitative phenotypic traits indicate that in this Douglas-fir population, a much clearer distinction can be made between trees based on site, compared with genetic origin (family); however, both can be differentiated. It is apparent from the metabolite profiles that differences between sites have had a detectable influence on the wood-forming metabolism of the trees. Although it is generally accepted that growing conditions can significantly influence metabolism and phenotypic traits in trees, to date there have been few demonstrations of the influences of uncontrolled site (climatic and environmental) factors on global metabolism in plant species. The findings of this study are consistent with those of Baker et al. (2006), for whom PCA of NMR-derived metabolic profiles demonstrated a much clearer distinction between transgenic and control wheat lines on the basis of site, rather than genotype.

Interaction between genetic and environmental elements

The determination of metabolites exhibiting significant family- or site-related variation and family × site interaction and subsequent calculation of the broad-sense heritabilities of metabolite pools, provided a quantitative representation of the trends observed in MDA, CDA and FA. Of the complete set of 139 metabolites, 78 (56.1%) (anova, α = 0.05) showed significant family variation, 108 (77.7%) had significant site variation, while 37 (38.19%) showed significant family × site interaction (anova, α = 0.01). Broad-sense heritability estimates of the individual metabolites ranged from 0 to 0.67, with only one being > 0.5. The generally low values of these estimates (mean = 0.12) further demonstrate that genetics (family) has a smaller influence on the observed variation in cambial metabolism than environmental (site) factors. Furthermore, over a quarter of all the metabolites showed significant family × site interaction, indicating that families often produce different metabolic responses to similar environmental cues. This analysis clearly illustrates that cambial metabolism is a complex response to both genetic and environmental stimuli, and to the interaction of the two. This result is in agreement with a previous study of the relative influence of genetic and specific environmental factors in Pinus sylvestris, in which significant family × temperature and family × temperature × water interactions were observed, in the absence of significant family main effects (Sonesson & Eriksson, 2000). Furthermore, this helps to explain why the MDA family predictions were improved when the Adam River and Gold River sites were analyzed separately. Of the 108 metabolites with significant site variation, 37 (34.3%) showed significant family variation in the absence of family × site interaction. For this subset of site-distinguishing metabolites, heritability was only slightly lower than that of the complete set (ranging from 0.00 to 0.67 and with a mean of 0.11), lending further support to the hypothesis that environment (site) plays a greater part in the observed metabolic variation than genetics (family origin). For a complete list of 37 compounds, with mass spectral data and possible chemical class assignments, see the Supplementary Material (Table S1).

It was possible to assign positive identities to over half of the 37 compounds that exhibited significant site and family variation with no family × site interaction, based on GC retention time and mass-spectral matches (Table 3). Several aspects of metabolism are represented, with some notable inclusions from branches of metabolism involved in wood formation. The list includes participants in the tricarboxylic acid (TCA) cycle (and malic acid), the major sugar pools and pentose phosphates (fructose, fructose-6P, glucose and glucose-6P), and metabolites related to lignin biosynthesis (Shikimic acid and quinic acid). The identities of most metabolites with the higher heritabilities are those related to carbohydrate metabolism. This is in agreement with the heritabilities calculated for quantitative traits, in which the glucan (i.e. cellulose), arabinose and xylose contents of wood were high relative to others traits.

Table 3.  Positively identified metabolites exhibiting significant site variation, for which broad-sense heritabilities could be calculated
Metabolite informationaHeritabilityb
Peak no.Compound IDH2
  • a

    Compound identity determined through mass-spectral and gas chromatography retention time matches with standard compounds. {BP} indicates metabolite by-product, as suggested by the Golm Metabolite Database.

  • b Of the metabolites for which significant site and family variation existed (anova, α = 0.05) in the absence of site × family interaction (i.e. G × E effects) (anova, α = 0.01), allowing for calculation of broad-sense heritability (37 of 139), only metabolites for which it was possible to assign positive identities (19) are presented.

92Alanine, B-0.00
173Quinic acid0.06
120Threonic acid0.09
73Glyceric acid0.10
104Malic acid0.11
208Glucaric acid0.13
223Glucose 6P {BP}0.16
67Maleic Acid0.16
182Glucose {BP}0.20
222Glucose 6P0.21
177Fructose {BP}0.23
164Shikimic acid0.30
135Xylose {BP}0.34

Heritabilities were also calculated for the 16 quantitative phenotypic traits, and although they were larger on average than for the metabolites, the estimates were still fairly low (Table 4). Of the heritable traits measured, tree height, arabinose, xylose and glucose content, all had heritabilities greater than 0.35. In particular, arabinose and glucose contents were greater than 50%. The broad-sense heritability estimate for glucose (1.28) is an overestimation that is likely a result of the small number of families used in the calculations. It is an indication that these values should be used in a relative sense for comparison with each other rather than as absolute values. However, despite this, the generally low heritabilities observed for the phenotypic traits should still be applicable. As with metabolites, genetics (family) does not appear to have much influence on the observed variation in phenotypic traits.

Table 4.  Broad-sense heritabilities of quantitative phenotypic traits
Quantitative traitHeritabilitya (H2)
  • a

    Quantitative traits are sorted according to their heritability score. Significant family-related variation in the absence of site × family interaction (i.e. G × E effects) was observed in all traits, permitting broad-sense heritability to be calculated for each.

Total lignin0.00
Fibre coarseness0.00
Fiber length0.16
Diameter at breast height0.20
Latewood porosity0.21
Latewood density0.22
Tree volume0.22
Average density0.25
Earlywood density0.30
Microfibril angle0.34
Tree height0.40

Interaction between metabolic and phenotypic elements

A CCA including 139 metabolites and 16 phenotypic traits was conducted. In this analysis, the first pair of canonical variables (metabolite 1 and growth 1) was the only relevant set. The canonical correlations for all 16 variate pairs were high (ranging from 0.99 to 0.74), yet only variates one and two were significant at an α of 0.05 (0.0006 and 0.0282, respectively). In addition, canonical redundancy analysis showed that only the first variate exhibited predictive power with regard to both sets of original variables, and that this was limited to prediction of variance in growth traits only. The transformed canonical variables, metabolite 1 and growth 1, accounted for a small proportion of the variation of the original data (0.2165 and 0.2207, respectively). Although low, these values are considerably higher than those for the second and subsequent sets of canonical variables.

The metabolites’ and growth traits’ canonical correlation coefficients (canonical factor loadings) for the first canonical variate have been assembled in Tables 5 and 6. In total, 52 out of 139 metabolites and 10 out of 16 growth traits were significantly correlated with their canonical variate (metabolite 1 and growth 1, respectively), although the correlation for latewood density was barely below the 0.3 cutoff. Owing to space limitations and to aid clarity, only metabolites whose correlation was significant (> 0.3) and whose identity could be positively determined have been presented here. For a complete list of 52 compounds, with mass spectral data and possible chemical class assignments see the Supplementary Material (Table S2).

Table 5.  Positively identified metabolites exhibiting significant canonical correlation coefficients, presented in conjunction with factor analysis scores and broad-sense heritability values for the same compounds
Metabolite informationaCCAbFactor analysescHeritabilityd (H2)
Peak no.Compound IDMetabolite 1F-1F-2F-3
  • a

    Compound identity determined through mass-spectral and retention time matches with standard compounds. Compounds sorted by correlation coefficient. Peak no. is the unique numerical identity of a metabolite in the 251 compound set originally resolved from chromatographic data. {BP} indicates metabolite by-product, as suggested by the Golm Metabolite Database.

  • b

    Of the 51 metabolites with significant (> ±0.3) canonical correlation coefficients across all 139 metabolites analysed, 21 were positively identified and presented in this table.

  • c For metabolites presented, factor scores in the site-differentiating factors F-1, F-2 and F-3 are presented only where significant (> ±0.3).

  • d Broad-sense heritabilities were calculated only for metabolites exhibiting significant family and site variation (anova, α = 0.05) in the absence of family × site interaction (i.e. G × E effects) (anova, α = 0.01).

92Alanine, B-0.5090.71  0.00
178Glucopyranose0.397 0.65  
54Acetic acid, bisoxyl0.3720.65  0.00
138Arabinose0.3630.460.41 0.31
120Threonic acid0.3490.49  0.09
182Glucose {BP}0.346   0.00
135Xylose {BP}0.3370.51  0.34
111Pyroglutamic acid0.3330.54   
20Acetic acid0.3230.59   
175Fructose0.310   0.06
177Fructose {BP}0.308   0.11
179Glucose0.300   0.20
173Quinic acid−0.304−0.31−0.49 0.06
150Rhamnose−0.308  0.31 
74Fumaric acid−0.373  0.35 
244Coniferin−0.459  0.66 
164Shikimic acid−0.487−0.450.390.560.30
73Glyceric acid−0.546−0.52  0.10
169Pinitol−0.659−0.70 0.32 
Table 6.  Canonical correlation coefficients of quantitative traits presented in conjunction with factor analysis scores and broad-sense heritabilities
Quantitative traitCCAaFactor analysisbHeritabilityc (H2)
Growth 1F-1F-3F-4
  • a

    Quantitative traits are sorted according to canonical coefficient.

  • b

    Factor scores in the site-differentiating factors F-1, F-3 and F-4 are presented only where significant (> ±0.3).

  • c

    Broad-sense heritabilities are presented for each quantitative trait.

Diameter at breast height0.8670.90  0.20
Tree volume0.8250.91  0.22
Tree height0.7830.91  0.40
Microfibril angle0.575  0.810.34
Total lignin0.484  0.760.00
Arabinose0.153   0.69
Galactose0.107   0.25
Earlywood density0.069 0.330.520.30
Latewood proportion0.012   0.21
Average density−0.076   0.25
Latewood density−0.295 −0.69 0.22
Glucose−0.309 0.73 1.28
Xylose−0.342 0.90 0.37
Fibre coarseness−0.412−0.49  0.00
Mannose−0.418 −0.80 0.31
Fiber length−0.4810.43 −0.520.16

For the phenotypic traits (Table 6), measures of wood yield (tree dbh, volume and height) were highly correlated with growth 1. Similarly, indicators of wood fibre quality (microfibril angle, fibre length and coarseness) were also highly correlated. This suggests that growth 1 is strongly related to wood yield and wood cell morphology. Additionally, the contents of primary chemical constituents of wood (total lignin, glucose, mannose, xylose) show less influence on growth 1, with lower, but significant, correlation coefficients. Correlation coefficients for traits related to wood density (latewood and earlywood density, average density, and latewood proportion) were less than 0.3, and as such did not significantly influence growth 1.

Many metabolite pools are correlated well with metabolite 1. A spread of metabolites associated with the TCA cycle (fumaric acid), ascorbate and aldarate metabolism (threonic acid), amino acid metabolism (glyceric acid, pyroglutamic acid, alanine), carbohydrate storage (rhamnose), and stress tolerance (pinitol) are present. Significant correlations are apparent for major (glucose and fructose) and minor (xylose, arabinose, and maltose) sugar pools. The pools of glucose and fructose are catabolite products of sucrose, the major transportable photoassimilate, and represent a starting point for many branches of metabolism, the most notable of which is cell wall biosynthesis. The minor pools observed are involved in ascorbate, nucleotide and more specific aspects of cell wall metabolism. All three have structural roles in cell walls, while xylose in particular is a key cell wall carbohydrate associated with primary wall deposition and a component of wood hemicellulose. Precursors to lignin biosynthesis (shikimic acid, coniferin, quinic acid) also correlate well with metabolite 1. Coniferin is believed to be involved in the transportation and storage of the monolignol coniferyl alcohol, and consequently plays an integral role in the process of cell wall lignification in softwoods (Samuels et al., 2002). On the other hand, shikimic and quinic acids are more broadly associated precursors, acting as intermediates in the synthesis of aromatic amino acids, flavonoids, and a range of other secondary metabolites aside from their involvement in lignin biosynthesis. It is therefore fitting that both shikimic and quinic acids are seen forming pools in the developing xylem of Douglas-fir, a phenomenon frequently associated with roles in alternative downstream pathways (Srere, 1987). Aside from their roles in the broadly serving shikimate pathway (reviewed by Herrmann & Weaver, 1999), there is support for their participation in the formation of shikimate and quinate esters of p-coumarate, as part of the metahydroxylation of that molecule in the phenylpropanoid pathway specifically responsible for monolignol biosynthesis (Humphreys & Chapple, 2002). The lignin-related metabolites shikimic acid and coniferin are amongst the highest correlators to metabolite 1, along with a number of amino acid metabolites and pinitol. These compounds predominate over the precursors of structural carbohydrates, which, although relevant, do not have as strong an influence as metabolite 1.

Collectively, the correlations between metabolites and growth traits and their highly correlated canonical variates indicate a clear link between wood yield and fibre quality of a tree, and the pooling of a series of metabolites related directly to wood biosynthesis in the developing xylem. Firstly, there is an inverse relationship between pools of metabolite precursors to significant carbohydrate components of wood and the presence of the structural components themselves (glucose, mannose, and xylose). This suggests that increased pooling of these metabolites occurs as a consequence of limited metabolic flux beyond the pool, and that reduced incorporation into the cell wall matrix is not a consequence of limited precursor availability, but rather of low demand. A similar but stronger inverse relationship also exists between the pools of lignin precursors and the total lignin content of wood, whereby the pools of coniferin, shikimic acid and quinic acid become larger as the total lignin content of wood is reduced. Again, such a relationship implies that the limiting factor in lignin biosynthesis is deposition, rather than precursor supply. Finally, there is a simple inverse, but perhaps telltale, relationship between the measures of yield (dbh, VOL, HT) and the pool of pinitol. As a ‘compatible osmoticum’ that has been associated with a response to drought stress (Keller & Ludlow, 1993; Griffin et al., 2004), the observed negative correlation between this metabolite and wood yield is entirely understandable.

There is another, more unified relationship that exists within the data of Tables 5 and 6. Where variability in the chemical composition of wood of a specific species is observed, there is typically an inverse relationship between the major carbohydrate and lignin contents. This appears to be the case in these Douglas-fir trees, as total lignin content is positively correlated with the growth canonical variate, while mannose, xylose and glucose content are all negatively correlated. Interestingly, similar (but opposite) correlations can be seen for the metabolite canonical variate in the pools of metabolite precursors to the carbohydrate and lignin polymers. The metabolomics approach applied here has allowed observation of a set of wood formation-related phenotypic and metabolic traits broadly reflecting one another. The observation of broad relationships such as this undoubtedly provides a starting point from which detailed understanding of specific interactions between metabolism and phenotype may be developed.

The relationships demonstrated by the CCA seem to be rooted in the metabolic and phenotypic variation associated with site differences. Almost all the metabolites that correlated highly in the CCA are high loaders in one or more of the factors responsible for site-related sample clustering and separation in the FA. Furthermore, it was possible to calculate broad-sense heritabilities for approximately half of the identified high-correlating metabolites in the CCA (Table 5). A similar trend is seen for the phenotypic traits in the CCA (Table 6), where all traits aside from average density, latewood proportion, arabinose and galactose contents load high in at least one of the site-differentiating factors F-1, F-3 and F-4. On average, heritabilities were greater than they had been for the metabolites, but in general they remained low. These observations all point toward the importance of site over family in the CCA, directly supporting the qualitative, visual evidence provided by the FA factor score plots (Figs 1, 2).

In summary, this study demonstrates that broad-scale, nontargeted metabolic profiles of actively metabolizing developing xylem can be correlated with extensive phenotypic data that define aspects of tree growth and wood properties in a population of siblings from high-growth performance families of Douglas-fir. Furthermore, a strong relationship between associated metabolic and phenotypic variation and environmental (site) factors exists, while a similar genetics (family) relationship exists, but is less dominant.

Additionally, significant correlations were observed between phenotypic indicators of tree growth (dbh, tree height and volume), cell morphology (microfibril angle, fibre length, fibre coarseness) and cell wall chemistry, and metabolite pools related to major components of cell wall biosynthesis, including cellulose (glucose, fructose), hemicellulose (xylose, arabinose, and maltose), and lignin (quinic acid, shikimic acid and coniferin). The existence of linear, quantitative relationships between tree and wood phenotype and wood-forming metabolism, as well as associations between the relative influences of family (genetics) and site (environment) on phenotype and the metabolite pools in actively growing tissue establish a clear biological connection between genetics, metabolism, phenotype and the impact of growing environment, and, as such illustrates the importance of metabolomics within the framework of functional biology, and demonstrates the potential of metabolic data in a unified approach to studying processes involved in tree/plant growth and wood biosynthesis.

Future studies should aim to increase the sampling population (number of families) to satisfy the requirements of quantitative genetic calculations, as well as replicating site conditions to allow relationships between geoclimatic and biotic factors to be more clearly defined. A notable outcome of this research was the weaker correlation between genetics (i.e. family) and metabolic or phenotypic traits. Whether this result accurately reflects the situation in tree populations in general, or was simply a result of characteristics of the specific sample population used in this study is not clear. As such, future attempts to demonstrate links between genetic and metabolic factors should look to tree populations that include families which exhibit a wider range of genetic and/or phenotypic diversity, rather than a somewhat narrow selection of ‘high performance’ families, as was employed in the current study. Alternatively, the use of clonal lines in place of full-sib families may be valuable in controlling dataset variation, although taking this approach would lead away from any goal of understanding tree and wood development in situations where genetic variability within families exists. Further resource-intensive yet potentially enlightening studies could also involve the tracking of wood-forming metabolism in multiple families or clones, under a variety of geoclimatic conditions, throughout the growing season. The metabolic data could then be related back to other biotic and abiotic factors, as was undertaken in the current study, to establish a more complete picture of wood-forming metabolism and how it relates to these associated factors. It is, however, apparent that broad scale metabolic profiling of ‘global’ plant metabolism can contribute to our understanding of biological processes in trees/plants or be used to diagnose specific genetic or phenotypic characteristics or responses.


Funding for this project was provided by the NSERC Strategic Program held by SDM. The authors gratefully acknowledge support from the Ministry of Forests, British Columbia in providing expert consultation and access to their tree breeding materials. The authors would also like to recognize the Top Achiever Doctoral Scholarship (Bright Future Scholarships, Tertiary Education Commission, Wellington, NZ) held by ARR from 2002 to 2005.