- Top of page
- Materials and Methods
- Supporting Information
A central goal of genetics is to dissect phenotypes into their genetic components, especially those underlying ecologically relevant phenotypes (Stinchcombe & Hoekstra, 2008). Knowledge about the genes underlying a diversity of phenotypes has accumulated in tandem with developments in forward genetic methodologies and technologies such as high-throughput genotyping and phenotyping platforms. Much of this work in nonmodel species has focused on forest trees, where genetic associations have been discovered for a range of phenotypic traits (Neale, 2007; Neale & Ingvarsson, 2008), many of which are clearly adaptive (Neale & Kremer, 2011). Missing from most of these studies, however, are investigations of how genotypes relate to cellular phenotypes, such as gene expression and metabolite concentrations. Yet, it is polymorphisms that affect cellular phenotypes that often give the strongest signals in genome-wide association studies in humans (Nicolae et al., 2010), and novel links between genotypes and phenotypes have been elucidated through analysis of this type of trait for plant species (Kirst et al., 2004, 2005; Wentzell et al., 2007; West et al., 2007; Potokina et al., 2008; Chan et al., 2010a,b; Dorst et al., 2010).
A number of studies for plants have dissected transcript abundance (i.e. gene expression) into its genetic components (Joosen et al., 2009). The focus of these genetical genomic studies (Jansen & Nap, 2001), as exemplified by studies of forest tree transcriptomes, is a description of gene expression variation among individuals (Holliday et al., 2008; Palle et al., 2011), analysis of a set of candidate genes and the cis-acting regulatory polymorphisms affecting gene expression (Thumma et al., 2009; Beaulieu et al., 2011), and the quantitative genetic analysis of overall gene expression patterns (Kirst et al., 2004, 2005). These analyses have discovered a substantial number of expression quantitative trait loci (eQTLs), highlighted the importance of cis- and trans-regulation for transcript abundance, and shown the complex genetic architecture underlying gene expression. This complexity has been observed across a wide range of organisms and has been attributed to dominance effects, changes in the strength of purifying selection across gene networks, and epistatic interactions among genes comprising metabolic pathways (Kacser & Burns, 1973, 1981; Hartl et al., 1985; Keightley, 1989; Whitlock et al., 1995; Bost et al., 1999; Kondrashov & Koonin, 2004; Rowe et al., 2008; Ramsay et al., 2009). Links between eQTLs and metabolite QTLs have also established interactions among cellular phenotypes, suggesting that natural variation for metabolites may create feedback loops to the transcriptome (Wentzell et al., 2007). Cellular phenotypes beyond gene expression, therefore, should be investigated during scans attempting to dissect adaptive traits into their genetic components.
The metabolome of a tree represents the entire set of small molecule metabolites, which are produced through cellular processes. In model plants, quantitative genetic analysis of a range of metabolites has established that they also have a complex genetic architecture, as well as identifying polymorphisms underlying variation in metabolite concentrations (Schauer et al., 2008; Kleibenstein, 2009; Chan et al., 2010a,b). By contrast, previous work on metabolites for forest trees has primarily focused on profiling (Fiehn, 2002; but see Robinson et al., 2007; Külheim et al., 2011), where metabolites are studied in an experimental design suited to testing specific hypotheses regarding their function. These studies have established correlations between metabolite concentrations and adaptive, whole-plant phenotypes such as drought-stress responses (e.g. Schwanz & Polle, 2001), seed dormancy (reviewed by Finklestein et al., 2008) and disease resistance (reviewed by Witzell & Martin, 2008). The genetic basis of variation in metabolite concentrations, however, remains largely unknown for forest trees, despite the clear genetic basis for other plants (Kliebenstein et al., 2001; Keurentjes et al., 2006; Rowe et al., 2008; Kleibenstein, 2009; Chan et al., 2010a,b). Thus, an association analysis using a large and representative sample for the gene space and metabolome for a forest tree is warranted.
Life history characteristics of forest trees make them amenable to complex trait dissection using forward genetic approaches (Neale & Savolainen, 2004). Genetic associations with cold-hardiness (Ingvarsson et al., 2008; Eckert et al., 2009a; Holliday et al., 2010), wood properties (Thumma et al., 2005, 2009; González-Martínez et al., 2007; Dillon et al., 2010; Beaulieu et al., 2011), lignin content (Wegrzyn et al., 2010), drought-stress responses (González-Martínez et al., 2008; Cumbie et al., 2010), disease resistance (Quesada et al., 2010) and secondary metabolites (Külheim et al., 2011) have been discovered for a variety of forest tree species. In general, the effect sizes are small, effects are additive, and associated markers span coding and noncoding portions of genes. These results are largely consistent with quantitative genetic theory describing the genetic architecture of complex traits (Hill et al., 2008).
Much of the association mapping for forest trees has focused on loblolly pine (Pinus taeda). This species is the most important commercial forest tree species growing in the southern United States, and has extensive genetic resources developed that range from deep expressed sequence tag (EST) libraries, from which thousands of single nucleotide polymorphisms (SNPs) have been discovered through resequencing, to multiple association populations. Here, we take an association mapping approach to dissect the loblolly pine metabolome into its genetic components. We establish that concentrations for many metabolites are heritable, can be associated significantly with SNPs, and that multi-SNP models can explain large portions (i.e. > 50%) of these heritabilities. We show, moreover, that SNPs associated with at least one metabolite display nonrandom attributes with respect to the entire dataset and discuss the relevance of this pattern to association mapping for nonmodel species.
- Top of page
- Materials and Methods
- Supporting Information
Fig. S1 Workflow for generation of the phenotypic and genotypic data sets for association mapping.
Fig. S2 The distribution of clonal effects (H2) as measured by the fraction of phenotypic variance accounted for by clonal identifiers in an ANOVA with clone as a fixed effect.
Fig. S3 Clonal means for the 292 metabolites were largely uncorrelated, as assessed with Spearman’s rank correlation (ρ), with one another.
Fig. S4 Standardized Gene Ontology (GO) terms for the 1487 out of 2488 EST contigs with a significant BLAST hit to a gene model in Arabidopsis that hit a term nested under molecular function (GO:0003674).
Fig. S5 Pairwise plots of the top four genetic principal components (PCs) derived from a principal components analysis (PCA) on the full 3563 SNP data set.
Fig. S6 Cumulative distribution plots of P-values from single SNP association tests using the (n − k − 1) r2-statistic.
Fig. S7 The distribution of minor allele frequencies (MAFs) for the top 24 SNPs, as measured using the FDR Q-value across 500 randomizations of phenotypic vectors relative to genotypic vectors.
Fig. S8 SNPs associated to at least one metabolite exhibited different minor allele frequencies (MAFs) and magnitudes of genetic differentiation (FST) among populations relative to the entire set of SNPs.
Fig. S9 Distribution of the fraction of clonal effects captured by the adjusted R2 from a linear model relating multiple ancestry-corrected SNPs to ancestry-corrected phenotypes for unknown metabolites.
Fig. S10 Distributions of additive and dominance effect sizes for unknown and known metabolites for genetic associations involving SNPs with three genotypic categories.
Fig. S11 Additive and dominance effect sizes were correlated with one another and with the minor allele frequency (MAF).
Table S1 A summary of known metabolites detected using GC-TOF-MS
Table S2 A summary of unknown metabolites detected using GC-TOF-MS
Table S3 Summary of linkage disequilibrium among SNPs used for association mapping
Table S4 Summary of associations for mannitol as identified using Bayesian mixed linear models
Table S5 Metabolomic phenotype data
Table S6 SNP genotype data
Table S7 Attributes of the genetic loci containing the SNPs used for association mapping
Table S8 Genetic associations detected using a multilocus Bayesian model (BAMD)
Methods S1 Discovery of single nucleotide polymorphisms.
Methods S2 Functional annotation of EST contigs.
Methods S3 Site annotations of SNPs.
Methods S4 Mode of inheritance.
Methods S5 Randomization tests.
Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.