Association genetics of oleoresin flow in loblolly pine: discovering genes and predicting phenotype for improved resistance to bark beetles and bioenergy potential


Author for correspondence:

John M. Davis

Tel: +1 352 846 0879



  • Rapidly enhancing oleoresin production in conifer stems through genomic selection and genetic engineering may increase resistance to bark beetles and terpenoid yield for liquid biofuels.
  • We integrated association genetic and genomic prediction analyses of oleoresin flow (g 24 h−1) using 4854 single nucleotide polymorphisms (SNPs) in expressed genes within a pedigreed population of loblolly pine (Pinus taeda) that was clonally replicated at three sites in the southeastern United States.
  • Additive genetic variation in oleoresin flow (h2 ≈ 0.12–0.30) was strongly correlated between years in which precipitation varied (ra ≈ 0.95), while the genetic correlation between sites declined from 0.8 to 0.37 with increasing differences in soil and climate among sites. A total of 231 SNPs were significantly associated with oleoresin flow, of which 81% were specific to individual sites. SNPs in sequences similar to ethylene signaling proteins, ABC transporters, and diterpenoid hydroxylases were associated with oleoresin flow across sites.
  • Despite this complex genetic architecture, we developed a genomic prediction model to accelerate breeding for enhanced oleoresin flow that is robust to environmental variation. Results imply that breeding could increase oleoresin flow 1.5- to 2.4-fold in one generation.


In the last decade, outbreaks of bark beetles in coniferous forests of North America have caused unprecedented tree mortality and economic losses (Nowak et al., 2008; van Mantgem et al., 2009; Waring et al., 2009), converting forests that were previously atmospheric carbon sinks into carbon sources (Kurz et al., 2008). Native species of bark beetle rapidly kill healthy trees by aggregating on their hosts, boring into the stem, and vectoring pathogenic fungi that are tolerant of conifer defenses (Paine et al., 1997; Wang et al., 2013). Climate change is thought to have exacerbated tree mortality from bark beetle infestations by increasing the number of beetle generations yr–1, expanding the range of beetles and their associated pathogens, and by weakening host defenses (Raffa et al., 2008; Bentz et al., 2010).

Increasing oleoresin production in conifer stems through breeding and biotechnology may enhance baseline resistance to bark beetles in managed plantations (Phillips & Croteau, 1999). Oleoresin is a viscous mixture of terpenoids stored under positive pressure within resin canals in the stems of conifers (Trapp & Croteau, 2001). Oleoresin that flows from the stem upon wounding obstructs beetle entry and inhibits germination and growth of pathogenic fungi (Franceschi et al., 2005; Kopper et al., 2005). Among pines, survival after a bark beetle infestation is positively correlated with the rate of oleoresin flow (Strom et al., 2002) and the number of resin canals within the stem (Kane & Kolb, 2010). However, increased stem oleoresin production alone may not be sufficient to protect conifer stands from severe bark beetle population eruptions, where healthy trees with the greatest oleoresin flow can become the preferred hosts of beetles (Boone et al., 2011).

The potential to utilize terpenoids in liquid biofuels is an additional incentive to genetically enhance oleoresin production in conifer stems. Whereas the energy content of bioethanol is only 70% of that of gasoline (Peralta-Yahya & Keasling, 2010), advanced biofuels derived from terpenoids have similar energy content to gasoline and diesel, with densities and hygroscopicities that are amenable to blending with fossil fuels (Harvey et al., 2010; Peralta-Yahya et al., 2011). Conifers require fewer inputs of fertilizer and herbicide than annual food crops even under intensive management, and the net energy balance of producing cellulosic ethanol from conifer wood compares favorably with ethanol derived from maize starch (Evans & Cohen, 2009).

The capacity to genetically enhance terpenoid production in conifer stems is largely untapped. Under normal growing conditions, pines accumulate oleoresin to 1–5% of their stem mass, but stem oleoresin contents of 20% have been observed after treatment with chemical elicitors of resinosis (Stubbs et al., 1984; Wolter & Zinkel, 1984). Although it remains to be demonstrated whether these high terpenoid concentrations can be achieved through genetics, previous studies indicate that variation in oleoresin flow is heritable and positively correlated with growth (Squillace & Bengtson, 1961; Roberds et al., 2003; Romanelli & Sebbenn, 2004).

Developing a detailed knowledge of how allelic variation in conifer breeding populations relates to phenotypic variation in oleoresin flow can accelerate selective breeding for enhanced oleoresin production. Conifer breeding has traditionally relied on phenotypic characterization of the breeding population near the harvest age to infer genetic merit for selection, requiring ≥ 20 yr to complete one breeding cycle (White & Carson, 2004). Genomic selection, or the prediction of breeding values from the summed effects of a panel of genetic markers in linkage disequilibrium with alleles controlling a trait (Meuwissen et al., 2001), circumvents the need to phenotype the breeding population for each generation. When combined with top-grafting to induce early seed development, genomic selection could reduce the breeding cycle of Pinus taeda from 12–20 yr to 4–7 yr (Resende et al., 2012a).

Genetic engineering is an alternative strategy to rapidly increase oleoresin production in conifer stems. This approach may include overexpressing and increasing the catalytic efficiency of terpenoid biosynthetic enzymes (Aharoni et al., 2006; Leonard et al., 2010), down-regulating competing pathways (e.g. lignin biosynthesis), and reprogramming stem development to favor resin canal formation (Zulak & Bohlmann, 2010). While many genes in the terpenoid biosynthetic pathway have been cloned in conifers (Kim et al., 2008, 2009; Schmidt & Gershenzon, 2008; Hamberger et al., 2011; Keeling et al., 2011), the genes involved in the regulation of terpenoid synthesis (e.g. transcriptional and post-translational regulation), the development of resin canals, and the transport of oleoresin into resin canals have yet to be characterized. Coordinated up-regulation of the terpenoid biosynthetic pathway and resin canal development may be achieved through transgenic manipulation of regulatory genes, as has generally been suggested for the metabolic engineering of plant defense pathways (Jirschitzka et al., 2012).

The genomic selection and biotechnological approaches to increase oleoresin production in conifers are potentially complicated by genotype × environment (G × E) interactions. Substantial G × E in oleoresin flow was observed among families of Pinus sylvestris and Pinus elliotti planted at different sites (Bridgen, 1980; Romanelli & Sebbenn, 2004). For traits in which G × E is prevalent, the effects of alleles may depend on the environment (Gillespie & Turelli, 1989), which reduces the prediction accuracy of genomic selection models (Resende et al., 2012a), and leads to uncertainty about the performance of transgenic varieties in diverse field environments (Zeller et al., 2010).

We measured oleoresin flow in a pedigreed population of loblolly pine that was clonally replicated at three sites in the southeastern United States and estimated heritability, site × genotype interactions, and genetic correlations with growth. By comparing genetic correlations in oleoresin flow across years vs between sites, we assessed the relative effect of weather vs climate and soils, respectively, on the genetic control of oleoresin flow. We then used an association genetic approach to discover genes underlying additive genetic variation in oleoresin flow and compared their allelic effects among sites. We specifically tested for associations with single nucleotide polymorphisms (SNPs) in genes with potential roles in terpenoid biosynthesis. Finally, we used cross-validation to assess how accurately the significantly associated loci could predict additive genetic variation within and across sites.

Materials and Methods

Oleoresin flow phenotyping in the CCLONES population

Oleoresin flow and tree growth were measured in the loblolly pine (Pinus taeda L.) CCLONES (Comparing Clonal Lines ON Experimental Sites) series I population. The CCLONES population was generated from a circular mating design of 54 parents from Florida, the Atlantic coastal plain, and lower gulf regions of the southeastern United States. Each parent was crossed with two to six other parents to generate 71 full-sib families with 14–23 progeny per family (Baltunis et al., 2007; Munoz et al., 2013). Clones of these progeny were sampled for oleoresin flow at three US field sites (Cuthbert, GA, Nassau, FL, and Palatka, FL), which varied in climate and in soil properties in the summer of 2010 when the trees were in their seventh growing season (Table 1). The Nassau site was also sampled in the sixth growing season to estimate correlations among repeated measures of oleoresin flow between years that varied in precipitation (Table 1). Within each site, oleoresin flow was sampled from three intensively managed replicates of clonal genotypes (Baltunis et al., 2007). The position of clonal genotypes (ramets) within replicates was randomized to incomplete blocks of 10–14 trees according to a resolvable alpha-lattice design (Patterson & Williams, 1976).

Table 1. Description of the Pinus taeda CCLONES sites and oleoresin flow sampling
 Cuthbert, GANassau, FLPalatka, FL
  1. Precipitation and temperature data were compiled from Weather stations were within 20 km of the CCLONES sites. Annual minimum temperatures were derived from the USDA plant hardiness zones map ( Nramets, number of ramets with oleoresin dry mass, total tree height (ht), and diameter at breast height (dbh) measurements (up to three ramets sampled per clone within a site); Nclones, number of clonal genotypes sampled; Ngenotyped, number of phenotyped clones with single nucleotide polymorphism (SNP) genotype information.

Location and date
Sampling dates26–29 July 2010

20–23 July 2009

19–22 July 2010

16–19 August 2010
Climate and weather
Annual minimum temperature−9.5 to −12.5°C−6.7 to −9.4°C−3.9 to −6.7°C
Average temperature, 1 July –31 July in year of samplingYear 7: 28.7°C

Year 6: 27.5°C

Year 7: 28.9°C

Year 7: 28.4°C
Precipitation, 1 February –31 July in year of samplingYear 7: 547 mm

Year 6: 918 mm

Year 7: 537 mm

Year 6: 649 mm
Long-term average precipitation, 1 February –31 July737 mm707 mm635 mm
Soil properties
Soil drainageWell drainedSomewhat poorly drainedPoorly to very poorly drained
Soil taxonomyFine-loamy, kaolinitic, thermic Rhodic KandiudultFine, mixed, thermic Typic, AlbaqualfSandy, siliceous, hyperthermic Ultic Alaquod
Stand properties – year 6
Trees ha−1165317001725
Basal area (m2 ha−1)130.4129.391.4
Oleoresin sampling design
N ramets Year 7: 2437

Year 6: 1829

Year 7: 2575

Year 7: 2586
N clones Year 7: 847

Year 6: 891

Year 7: 908

Year 7: 913
N genotyped Year 7: 809

Year 6: 836

Year 7: 850

Year 7: 842

To sample oleoresin flow, a single circular wound, 1.27 cm in diameter, was made through the bark and phloem at breast height with an arch punch (Strom et al., 2002; Roberds et al., 2003). Immediately after wounding, taps with 15 ml tubes were affixed to the stem over the wound site and oleoresin samples were collected after 24 h. Oleoresin samples were weighed twice – within 1 wk of collection (wet mass) and after the samples were lyophilized for 3 d (dry mass) – to detect any potential bias resulting from occasional rainwater contamination and phenotyping errors (Supporting Information, Fig. S1a). The heritability was slightly higher for the dry mass measurements (Fig. S1b) and the additive genetic correlation between oleoresin wet mass and dry mass was 0.99, indicating that lyophilization did not change the genetic rankings of the clones. Therefore, oleoresin dry mass (in g) was used in quantitative genetic analyses. Diameter at breast height (dbh) and total tree height (ht) were measured in cm in the sixth growing season from the trees that were sampled for oleoresin.

SNP genotyping

A total of 7535 SNPs, discovered in expressed genes from loblolly pine of diverse origin (Eckert et al., 2010), were genotyped with the Illumina Infinium platform (Illumina, San Diego, CA, USA) in 926 clones. A subset of 4854 SNP loci that were polymorphic in CCLONES was utilized in association analyses and genomic prediction modeling. Mean and median missingness of SNP genotypes among loci were 3.81 and 1.3%, respectively.

Quantitative genetic analysis

Oleoresin dry mass, dbh, and ht data were analyzed in ASReml v.3 (Gilmour et al., 2009) with the following multivariate mixed models:

display math
display math

Data from individual sites were analyzed with the first model to estimate site-specific heritabilities and correlations among traits. The data from all sites were combined and analyzed with the second model to partition site × genotype interactions from genetic effects. A Box–Cox power transformation (Box & Cox, 1964) was applied to the oleoresin dry mass data (hereafter abbreviated as ‘tdm’ for transformed dry mass) to normalize the model residuals (Methods S1). For both the individual site and across site models, y is a 3 × Nramets (Table 1) matrix of observed phenotypes of tdm, dbh, and ht, and X and Z are incidence matrices associating phenotypic observations with fixed and random factors, respectively. The effects of fixed factors, which included μ (the overall trait means), s (site effects), and r (the effects of replicate within site), were assessed with approximate F-tests (Kenward & Roger, 1997). Random effects were modeled with unstructured variance-covariance matrices of tdm, dbh, and ht and assumed to be multivariate normal with mean 0. Random factors (followed by their phenotypic variance components) included b (incomplete block within site and replicate (math formula I)), a (additive genetic effect of clone (math formula Aobs)), f (nonadditive genetic effect of family (math formula I)), c (nonadditive genetic effect of clone (math formula I)), sa (site × additive genetic (math formula I)), sf (site × family (math formula I)), and e (error (math formula I)). Additive genetic variance (math formula) and estimated breeding values (EBVs) for clones were calculated with the observed genomic relationship matrix (Aobs), which models identity-by-descent (IBD) as a result of pedigree as well as Mendelian segregation within families (Visscher et al., 2006). The observed relationship matrix was reconstructed from 4854 polymorphic SNP loci by modifying the method of Yang et al. (2010) such that Mendelian variance in marker-estimated IBD coefficients was estimated from the deviations from IBD expected from the pedigree. The individual site model was modified to estimate correlations between years at Nassau (y – tdm from years 6 and 7) and correlations between sites (y – tdm from Cuthbert, Nassau, and Palatka in year 7). Table S1 contains the formulas for estimating heritabilities and correlations. The significance of correlations was assessed by likelihood ratio tests (Methods S2).

Association genetic analysis

Analysis to detect significant associations between individual SNP loci and additive genetic variation in tdm within and across sites was carried out in two stages. First, all 4854 SNPs were ranked according to the magnitude of their effect on additive genetic variation in tdm. A total of 400 SNPs, which included the most highly ranked loci and 16 SNPs in sequences similar to terpenoid biosynthetic enzymes, were selected for the second stage where missing SNP genotypes were imputed and the significance of SNP effects was assessed in a multiple regression framework with the Bayesian Association with Missing Data (BAMD) module for R (Gopal et al., 2011; Li et al., 2012). The purpose of SNP preselection was to reduce multicollinearity and computational times in the BAMD analyses (Li et al., 2012).

We tested three SNP preselection methods (additive genetic variance reduction, ridge regression, and Bayes Cπ). Additive genetic variance reduction tests each SNP individually for the proportion of additive genetic variance explained, while ridge regression and Bayes Cπ estimate SNP effects simultaneously with multiple regression. Ridge regression shrinks SNP effects toward zero and assumes that the effects are normally distributed, whereas Bayes Cπ estimates the proportion of SNPs with zero effect, and then assumes that effects of the remaining SNPs are multivariate Student's t distributed (Habier et al., 2011; Methods S3). Among preselection methods, SNPs selected with Bayes Cπ that were significant in BAMD analysis most accurately predicted additive genetic variation in tdm in cross-validation (Fig. S2). Therefore, only the association results using SNPs selected with Bayes Cπ are reported.

Association analyses were conducted in BAMD with the model:

display math

The response variable y comprised the best linear unbiased predictions (BLUPs) of estimated breeding values (EBVs) of tdm divided by their reliabilities (hereafter referred to as deregressed EBVs). Dividing EBVs by their reliability (math formula, where SEPi is the standard error of prediction for clone i = 1… Nclones, 0 < math formula < 1) corrected for variable shrinkage of BLUPs as a result of imbalance in the number of phenotypic observations per clone and in the amount of information available from relatives (Garrick et al., 2009). No subpopulation structure was detected in CCLONES (Methods S4); therefore, it was unnecessary to include an additional effect of subpopulation to control for spurious associations arising from population structure (Pritchard et al., 2000).

The SNP effects were modeled by the term . The incidence matrix X contained the SNP genotypes of the 400 preselected loci. BAMD iteratively imputed missing genotypes with a Gibbs sampler over 50 000 generations based on correlations among the observed SNP values, phenotypic observations, and the additive genetic relationships (Li et al., 2012). After each imputation, SNP effects were estimated assuming a normal distribution of effect sizes. An SNP locus was deemed to have a significant effect on y if the 95% confidence interval of estimated effects over the last 40 000 generations did not intersect zero. Significant loci were ranked according to the magnitude of their standardized effects γstd = ½ (CIupper + CIlower)/σa, where CIupper and CIlower are the upper and lower bounds of the 95% confidence interval of effect size, and σa is the additive genetic standard deviation of the trait.

Population structure from the pedigree and uncertainty in the estimation of breeding values was modeled in the error ε ~ N(0, math formula R) where Aexp W. The expected relationship matrix (Aexp) modeled correlation among the residuals as a result of pedigree structure and the diagonal matrix W weighted deregressed EBVs in inverse proportion to the BLUP variances after factoring out error variance as a result of nongenetic factors (Garrick et al., 2009).

Functional annotation and mapping of significant loci

Expressed sequence tags that contained the significant SNPs were assessed for similarity to genes of known function with Blast2GO (Conesa et al., 2005). A subset of the significant SNPs from each site were located on a P. taeda genetic map from Eckert et al. (2010), which included 1495 of the 4854 SNPs used in association analysis.

Prediction of breeding values from genetic markers within and among sites

Prediction of breeding values from marker genotypes within sites was implemented in R (R Core Team, 2012) with 10-fold cross-validation as described in Resende et al. (2012a). Briefly, clones present within each site were randomly divided into 10 groups. Marker effects were estimated with ridge regression of marker genotypes on deregressed EBVs from nine-tenths of the population. Parental effects were removed from the deregressed EBVs to correct for gametic phase disequilibrium of unlinked markers arising from pedigree structure (Garrick et al., 2009). Breeding values from the remaining one-tenth of the population were predicted by multiplying the incidence matrix of marker genotypes by the vector of estimated marker effects. This procedure was repeated 10 times to predict breeding values for all clones present within a site. To predict breeding values among sites, the genotypes of markers significantly associated with tdm were regressed on EBVs from the site where they were significant. The estimated marker effects were then used to predict the breeding values from other sites. The accuracy of prediction within and among sites was assessed with the Pearson correlation between deregressed EBVs and SNP-estimated breeding values (rEBV, GEBV).


Phenotypic variation in constitutive oleoresin flow and cumulative growth

The untransformed oleoresin dry mass distributions were positively skewed at all sites (Fig. 1a), but were approximately normally distributed after transformation. Transformed oleoresin dry mass (tdm) differed significantly among sites (< 0.001 for all pairwise site contrasts, Table S2), but site rankings for tdm from year 7 (Nassau > Palatka > Cuthbert) did not correspond to site rankings for dbh (Cuthbert > Nassau > Palatka) (Fig. 1b). At Nassau, where oleoresin flow was measured in consecutive years, mean tdm was significantly greater in year 7, during a dry summer, relative to wet growing season in year 6 (Tables 1, S2).

Figure 1.

Boxplot distributions of oleoresin dry mass by site and year (a) and diameter at breast height by site (b) measured at age 6 yr in the Pinus taeda CCLONES population. Outlying data points, denoted by circles, fall outside the interval defined by the interquartile range (IQR) ± 1.5 × IQR.

Genetic and site × genotype effects on constitutive oleoresin flow

Within sites, additive genetic variance (math formula) accounted for 11.8–30.3% of the phenotypic variance in tdm, and nonadditive genetic variance (math formula) explained an additional 8.4–17.1% of the phenotypic variance. Across sites, phenotypic variance in tdm consisted of 11.8% additive genetic variance, 9.4% nonadditive genetic variance, and 6.9% site × genotype variance (math formula). The heritabilities and site × genotype effects on tdm were comparable to those of dbh and ht (Fig. 2).

Figure 2.

Proportion of phenotypic variance in transformed oleoresin dry mass (tdm), diameter at breast height (dbh), and total tree height (ht) comprising additive genetic (math formula), nonadditive genetic (math formula), site × genotype (math formula), incomplete block (math formula), and environmental error (math formula) variance components estimated in the Pinus taeda CCLONES population. Refer to Table S1 for equations to estimate variance proportions.

Repeated measurements of tdm at the Nassau site were moderately correlated between years 6 and 7 (rp = 0.41, Fig. 3a), in which precipitation varied (Table 1); however, strong additive (ra = 0.95) and total genetic (rg = 0.90) correlations were observed across years (Fig. 3b). Stochastic variation in tdm between years was attributable mainly to environmental effects (Fig. 3c, Table 1). By contrast, the additive genetic correlation of tdm between sites declined from 0.8 to 0.37 (Fig. 4) with increasing differences in soil and climate among sites (Table 1).

Figure 3.

Phenotypic, genetic, and environmental correlations ± 1 SE of transformed constitutive oleoresin dry mass (tdm) measured from the same ramets in consecutive years (ages 6 and 7 yr) in the Pinus taeda CCLONES population at Nassau, FL, USA. (a) Phenotypic correlation (rp) adjusted for the fixed effect of replication; (b) genetic correlation between years (ra, additive genetic; rg, total genetic) visualized by plotting tdm breeding values; and (c) environmental correlation (re) visualized by plotting the model residuals. All correlations were estimated according to equations in Table S1 and were significant (P < 0.05) according to the likelihood ratio test.

Figure 4.

Genetic correlations (ra, additive genetic; rg, total genetic) ± 1 SE between transformed oleoresin dry mass (tdm) estimated from Pinus taeda CCLONES genotypes that were clonally replicated at different sites. (a–c) Panels are ordered from sites that are closest geographically to sites that are most distant. Breeding values of tdm are plotted to visualize the genetic correlations between sites, which were estimated from the equations in Table S1. All correlations were significant (P < 0.05) according to the likelihood ratio test.

Correlations between oleoresin flow and cumulative growth

Genetic correlations between oleoresin dry mass and cumulative tree growth (i.e. ht and dbh in an even-aged stand) were positive at Nassau and Palatka, but were not significant at Cuthbert and in the combined site analysis. Site × genotype and environmental effects on tdm and growth were also positively correlated. Generally, correlations between tdm and tree size were weaker than correlations between dbh and ht (Table 2).

Table 2. Correlations (± 1 SE) between transformed oleoresin dry mass and cumulative growth in the Pinus taeda CCLONES population
  r a r g r e r p r sg
  1. An asterisk (*) next to a correlation indicates that the correlation was significant (< 0.05) by likelihood ratio test (Methods S2). Environmental and phenotypic correlations were not estimated across sites because environmental deviations were assumed to be independent among sites. tdm, Transformed oleoresin dry mass; ht, total tree height; dbh, diameter at breast height; ra, additive genetic correlation; rg, total genetic correlation; re, environmental correlation (within sites); rp, phenotypic correlation (within sites); rsg, site × genotype correlation (across sites).

Cuthbert, GA
tdm vs dbh0.26 ± 0.140.17 ± 0.090.20 ± 0.03*0.19 ± 0.03*
tdm vs ht0.16 ± 0.250.18 ± 0.140.19 ± 0.03*0.19 ± 0.03*
ht vs dbh0.73 ± 0.11*0.72 ± 0.04*0.65 ± 0.02*0.67 ± 0.02*
Nassau, FL
tdm vs dbh0.43 ± 0.13*0.39 ± 0.07*0.27 ± 0.03*0.31 ± 0.03*
tdm vs ht0.55 ± 0.15*0.34 ± 0.08*0.19 ± 0.02*0.24 ± 0.03*
ht vs dbh0.67 ± 0.10*0.71 ± 0.04*0.63 ± 0.02*0.65 ± 0.02*
Palatka, FL
tdm vs dbh0.19 ± 0.180.26 ± 0.08*0.32 ± 0.02*0.32 ± 0.02*
tdm vs ht0.39 ± 0.19*0.39 ± 0.07*0.31 ± 0.02*0.31 ± 0.02*
ht vs dbh0.86 ± 0.04*0.82 ± 0.03*0.83 ± 0.01*0.83 ± 0.01*
Across sites
tdm vs dbh0.15 ± 0.130.22 ± 0.07*0.40 ± 0.11*
tdm vs ht0.27 ± 0.150.31 ± 0.07*0.36 ± 0.12*
ht vs dbh0.75 ± 0.06*0.75 ± 0.03*0.72 ± 0.06*

Associations between SNPs and additive genetic variation in oleoresin flow

Between 41 and 65 SNPs were associated with tdm at individual sites, and 73 SNPs were associated with tdm in the across-site analysis (Table 3). While few significant tdm associations were shared between sites (two to six common associations), greater numbers of common associations were detected between years at Nassau (10 common associations) and between individual sites and the across-site analysis (eight to 20 common associations). The additive genetic effects of individual SNP loci were small, ranging from 0.05 to 0.15 additive genetic standard deviations (Table S3). No changes in the sign of the significant SNP effects were detected between sites, and correlations in the magnitude of common SNP effects varied from −0.375 to 0.9 among site pairs (Table 3). A total of 231 SNP loci were associated with tdm among all analyses (Table S3), and 132 of these loci were located on the genetic map from Eckert et al. (2010). Mapped SNPs that were significantly associated with tdm were scattered throughout the genome (Fig. 5).

Table 3. Single nucleotide polymorphisms (SNPs) associated with transformed oleoresin dry mass within and between sites in the Pinus taeda CCLONES population
 Across sitesCuthbertNassau, year 6Nassau, year 7Palatka
  1. Diagonal elements are the number of SNPs significantly associated with transformed oleoresin dry mass (tdm) at each site. Elements below the diagonal are the number of loci that were significant in pairs of association analyses. Elements above the diagonal are the correlations among the magnitude (i.e. absolute values) of the effects of SNPs that were significant between analyses.

Across sites730.0380.9000.4780.551
Nassau, year 682410.3540.169
Nassau, year 720510650.120
Figure 5.

Genetic map positions of single nucleotide polymorphisms (SNPs) significantly associated with transformed oleoresin dry mass and their effects by site. Of the 231 loci that were significantly associated with transformed oleoresin dry mass (tdm) among all analyses, 132 were mapped on the Pinus taeda linkage map from Eckert et al. (2010). SNP effects are in units of additive genetic standard deviations. Vertical dashed lines demarcate linkage groups (LG1–LG12).

Many significant tdm associations were unique to specific sites (Table 3; Fig. 5). Site-specific associations may be attributed to unbalanced representation of clonal genotypes at different sites, statistical significance of different SNP loci linked to the same quantitative trait locus (QTL) at different sites, and site × QTL interactions. To control for imbalance in clonal genotypes among sites, the association analyses of tdm from individual sites were repeated with 722 clones that were present at all three sites. Furthermore, a common set of 157 mapped SNPs, where adjacent loci were > 10 cM apart (12.5 cM average distance between adjacent loci), were utilized in the association analysis of each site to control for linkage and SNP preselection. After implementing these controls, 81% of the tdm associations were site-specific (Table S4), supporting the hypothesis that site-specific associations are attributable to site × QTL interactions.

Within sites, the significant SNPs predicted additive genetic variation in tdm more accurately than randomly selected SNPs (Fig. 6). Maximum prediction accuracy was reached before the stepwise inclusion of all SNPs that were associated with tdm, and was greater than the accuracies obtained from the estimated effects of all 4854 polymorphic SNPs. Furthermore, SNPs associated with tdm in year 6 were significant predictors of tdm breeding values from year 7 at Nassau (Fig. 6b).

Figure 6.

Accuracies by which the summed effects of single nucleotide polymorphisms (SNPs) associated with transformed oleoresin dry mass (tdm) predicted tdm breeding values from year 7 within individual sites (a–c) and across sites (d). Prediction accuracies, measured as the Pearson correlation between genomic estimated breeding values (GEBV) and estimated breeding values from quantitative genetic analyses (EBV), were computed with 10-fold cross-validation. The prediction accuracies of SNPs associated with tdm in year 7 were compared with prediction accuracies of SNPs associated with tdm in year 6 (c), accuracies obtained from fitting all 4854 polymorphic SNP loci, and accuracies from randomly sampled loci. The 95% confidence intervals (CIs) of prediction accuracy from random loci were constructed by randomly sampling SNP subsets with replacement 1000 times. The maximum values on the x-axes are numbers of loci significantly associated with tdm within sites (Nsig). SNP loci were added to the prediction model in decreasing order of their average effect on tdm among 10 random partitions of the Pinus taeda CCLONES population.

The accuracy by which significant SNPs from one site predicted additive genetic variation from another site (Fig. 7) was proportional to the additive genetic correlation (ra) between sites (Fig. 4). Cuthbert and Palatka, with an additive genetic correlation of 0.37, had the lowest between-site prediction accuracies (rEBV,GEBV = 0.33–0.36). By contrast, Nassau and Palatka, with an additive genetic correlation of 0.80, had the greatest between-site prediction accuracies (rEBV,GEBV = 0.42–0.43) and the prediction accuracy between years at Nassau (ra = 0.95; Fig. 3) varied from 0.4 to 0.5. Notably, the SNPs that were significant in the across-site association analysis (Table S3), in which site × genotype effects were partitioned from additive genetic variation (Fig. 2), predicted breeding values from single sites with accuracies (rEBV,GEBV = 0.42–0.56) similar to within-site prediction accuracies (rEBV,GEBV = 0.51–0.62).

Figure 7.

Prediction accuracies of single nucleotide polymorphism (SNP) loci significantly associated with transformed oleoresin dry mass (tdm) across sites and years in the Pinus taeda CCLONES population. The effects of significant loci were estimated with ridge regression on deregressed estimated breeding values (EBVs) from the site where the loci were significant. These locus effects were then used to predict the deregressed EBVs from another site. Prediction accuracies of SNP loci within sites where the loci were significant (circular arrows) were estimated with 10-fold cross-validation (Fig. 6). CUT, Cuthbert, GA; NAS, Nassau, FL; PAL, Palatka, FL; ALL, across-site analysis.

Associations with SNPs in sequences similar to terpene biosynthetic genes

We tested for tdm associations with SNPs in sequences similar to four of seven steps of the DXP pathway, three geranyl-geranyl pyrophosphate synthases (GGPPS), one terpene synthase, and seven cytochrome P450s (Table S5). The DXP pathway synthesizes the five-carbon isoprenoid precursors of mono- and diterpenoids (Rodriguez-Concepcion & Boronat, 2002); GGPPS condenses isoprenoids into the 20-carbon skeletons of diterpenoids (Schmidt & Gershenzon, 2008); terpene synthases are a large gene family in conifers that synthesize diverse cyclic terpenoids (Bohlmann et al., 1998; Keeling et al., 2011); and cytochrome P450s (CYP450) catalyze the final oxidation steps in diterpenoid resin acid synthesis (Ro et al., 2005; Hamberger et al., 2011). Only SNPs in two CYP450s were significantly associated with tdm. One significant CYP450 (EST contig 0_14468; Table S3) was similar to 3-epi-6-deoxocathasterone 23-monooxygenase, a gene involved in brassinosterioid biosynthesis (Kim et al., 2005). The other CYP450 (contig 2_9017) was similar to taxane 13-alpha-hydroxylase, a gene involved in the synthesis of the diterpenoid taxol (Jennewein et al., 2000).


Phenotypic variation in oleoresin flow is heritable

Our results confirm that variation in oleoresin flow is heritable in loblolly pine and can therefore be increased through selective breeding. Although additive genetic variation accounted for only 12–30% of phenotypic variation within sites and 12% of the variation across sites (Fig. 2), we predict that oleoresin flow could be increased 1.5- to 2.4-fold in one generation by crossing clones in the 90th to 99th percentile of the additive genetic distribution (Methods S5, Table S6). These large predicted gains from selection are the result of a strong positive skew in the phenotypic distribution of oleoresin flow (Fig. 1; Roberds et al., 2003).

The within-site broad-sense heritability estimates reported here (H2 = 0.25–0.38; Fig. 2) are lower than previous heritability estimates of oleoresin flow from a loblolly pine progeny test grown on one site in Florida (H2 = 0.66–0.69; Roberds et al., 2003). Roberds et al. (2003) averaged two observations of oleoresin mass per tree before estimating heritability, which reduced the within-individual component of variation. By contrast, we sampled each tree once per time point, but sampled up to three clonal replicates of each genotype per site. Environmental variation within and between cloned genotypes may have reduced our heritability estimates in comparison to Roberds et al. (2003), but clonal replication yielded more precise estimates of breeding values and a way to examine genotype × environment interactions (Gezan et al., 2006; Baltunis et al., 2007).

Effects of weather vs climate and soils on the genetic control of oleoresin flow

Within the Nassau site, average oleoresin flow was greatest in the seventh growing season (Fig. 1a; Table S2), which was relatively dry compared with the previous year (Table 1). This result supports the hypothesis that moderate water deficit increases oleoresin flow by limiting growth more than photosynthesis (Lorio, 1986; Lorio & Sommers, 1986). The strong genetic correlation in tdm between years at Nassau (Fig. 3b) indicates that oleoresin flow in loblolly pine genotypes of diverse origin responds similarly to variation in weather. By contrast, substantial site × genotype interactions in oleoresin flow were observed between Cuthbert and Palatka, the sites that were most distinct in climate and soils (Fig. 4c, Table 1). Together, these trends indicate that soil and climatic variation are stronger drivers of genotype × environment interactions in oleoresin flow than year-to-year variation in weather.

Genetic correlations between oleoresin flow and growth among sites

Estimates of the genetic correlation between oleoresin flow and growth varied from positive to nonsignificant among sites (Table 2), indicating that selective breeding for enhanced oleoresin production would not negatively impact growth. We hypothesize that variation in the genetic correlation between oleoresin flow and growth can be attributed, in part, to differences in tree–tree competition among sites. The weakest oleoresin flow–growth correlations (Table 2) and the lowest average oleoresin flow rates (Fig. 1a; Table S2) were observed at Cuthbert, the site with the largest trees (Fig. 1b) and highest basal area (Table 1). Previous studies have indicated that competition reduces allocation to herbivory defense (Moreno et al., 2009; Aspinwall et al., 2011), especially among shade-intolerant species such as loblolly pine (Calder et al., 2011). Conversely, thinning increased oleoresin flow and bark beetle resistance in Pinus ponderosa (Wallin et al., 2008). Thus, the results of this study and others (Nowak et al., 2008 and references therein) suggest the importance of managing competition to fully realize the genetic capacity for oleoresin flow in managed pine plantations.

Environmental variation interacts with the genetic architecture of oleoresin flow

Association genetic analysis of transformed oleoresin dry mass yielded over 200 significant associations within and across sites (Tables 3, S3), indicating that that oleoresin flow is a quantitative trait (Fig. 5). Furthermore, 81% of the significant associations were site-specific (Table S4), implying that the effects of alleles underlying quantitative genetic variation in oleoresin flow depend on the environmental context. The molecular mechanisms hypothesized to mediate genotype × environment interactions in plants include environmental regulation of gene expression, epigenetic modification, and post-translational modification (Nicotra et al., 2010). Numerous regulatory genes were associated with oleoresin flow (Table S3), including transcription factors (regulators of gene expression), genes involved in histone modification and the processing of microRNAs (regulators of epigenetic modifications), and protein kinases and phosphatases (regulators of post-translational modification), indicating that any or all of these processes might contribute to site × genotype interactions in oleoresin flow.

Prospects for genomic selection for enhanced oleoresin flow

Genomic selection has the potential to accelerate breeding for enhanced oleoresin production by alleviating the need to phenotype the breeding population in each generation (Meuwissen et al., 2001; Grattapaglia & Resende, 2011; Resende et al., 2012a,b). Prediction models that included only the significantly associated SNPs predicted additive genetic variation in oleoresin flow with greater accuracy than models with either randomly selected SNPs or all polymorphic loci (Fig. 6). This result indicates that our association genetic pipeline efficiently selected markers linked to causative polymorphisms and suggests that SNP preselection can increase accuracy in genomic prediction modeling by reducing model overparameterization (Schulz-Streeck et al., 2011; Resende et al., 2012b). Despite the prevalence of site × genotype effects (Figs 2, 4) and site-specific associations (Tables 3, S4), SNPs that were significantly associated with breeding values from the across-site quantitative genetic analysis predicted additive genetic variation at single sites with accuracies comparable to within-site prediction accuracies (Fig. 7). Thus, to develop genomic selection models that accurately predict genetic variation in diverse environments, marker effects should be estimated on breeding values in which genotype × environment interactions have been partitioned from genetic effects.

Association analysis to discover candidate genes for transgenic manipulation

The association genetic results reported here (Table S3) are a preliminary step towards targeting candidate genes for overexpression or silencing to validate function in oleoresin production. We highlight a few genes associated with oleoresin flow that are intriguing within the context of previous research.

Ethylene synthesis and signaling

Ethylene induces oleoresin synthesis and the differentiation of resin canals in members of the Pinaceae (Hudgins & Franceschi, 2004; Schmidt et al., 2011), including loblolly pine (Telewski et al., 1983; Stubbs et al., 1984). One SNP in an ACC synthase, a rate-limiting enzyme that catalyzes the first step of ethylene synthesis, was significant across sites (contig 0_17633). Furthermore, SNPs in ETHYLENE-INSENSITVE 2 (contig 0_14532), which is involved in the early detection of ethylene within the cell (Alonso et al., 1999), and an APETALA2 domain transcription factor (contig 0_3648) that functions downstream of EIN2 (Ogawa et al., 2007) were also associated with tdm (Table S3).

Oleoresin transport

The mechanism(s) by which oleoresin is transported from living cells in the stem to the extracellular space of resin canals in conifers is unknown (Zulak & Bohlmann, 2010). An ATP-binding cassette (ABC) transporter (NpABC1) mediates the export of a diterpene (sclareol) from Nicotiana plumbaginfolia cells (Jasinski et al., 2001); therefore, it is plausible that ABC transporters are also involved in the transport of conifer oleoresin. Variation in tdm was significantly associated with two putative ABC transporters (UMN_2415 and 0_8686).

Associations detected in an independent population

Eckert et al. (2012) conducted association genetic analyses of stem diterpenoids in a loblolly pine population of unrelated individuals that is independent of the CCLONES population studied here. SNPs in two genes (contigs 0_9534 and UMN_596) that were associated with stem abietic acid content in Eckert et al. (2012) were also associated with oleoresin flow in CCLONES. Contig 0_9534 is a putative acid phosphatase, while the function of contig UMN_596 is unknown.


Both genomic selection and genetic engineering to enhance oleoresin production in conifer stems could benefit from an improved understanding of the underlying genetic architecture. Although heritable variation in oleoresin flow in loblolly pine is controlled by many genes with allelic effects that are dependent on variation in climate and soils, we showed that it is feasible to develop genomic selection models that can accurately predict genetic variation in diverse environments. Work is underway to verify the significant oleoresin flow associations through analysis of gene expression, association genetic analysis in an independent population, and genetic transformation.


The Forest Biology Research Cooperative managed the CCLONES population and funded oleoresin collection. SNP genotyping was funded by the National Science Foundation Plant Genome Research Program award no. 0501763. Special thanks are due to L.H. Lott and J.H. Roberds (Southern Institute of Forest Genetics) for oleoresin taps; members of the University of Florida Forest Genomics Laboratory, especially J. Zhang, for assistance with oleoresin collection; M. Chavez Monte-Alegre for weighing the oleoresin tubes; and C. Dervinis and K. Smith for assistance with the BLAST analyses. J.W.W. was supported by a United States Department of Agriculture CSREES Food and Agricultural Sciences National Needs Graduate Fellowship.