• association genetics;
  • biofuels;
  • black cottonwood (Populus trichocarpa);
  • genotyping;
  • lignin biosynthesis;
  • linkage disequilibrium;
  • resequencing;
  • single nucleotide polymorphism (SNP)


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information
  • An association genetics approach was used to examine individual genes and alleles at the loci responsible for complex traits controlling lignocellulosic biosynthesis in black cottonwood (Populus trichocarpa). Recent interest in poplars as a source of renewable energy, combined with the vast genomic resources available, has enabled further examination of their genetic diversity.
  • Forty candidate genes were resequenced in a panel of 15 unrelated individuals to identify single nucleotide polymorphisms (SNPs). Eight hundred and seventy-six SNPs were successfully genotyped in a clonally replicated population (448 clones). The association population (average of 2.4 ramets per clone) was phenotyped using pyrolysis molecular beam mass spectrometry. Both single-marker and haplotype-based association tests were implemented to identify associations for composite traits representing lignin content, syringyl : guaiacyl ratio and C6 sugars.
  • Twenty-seven highly significant, unique, single-marker associations (false discovery rate < 0.10) were identified across 40 candidate genes in three composite traits. Twenty-three significant haplotypes within 11 genes were discovered in two composite traits.
  • Given the rapid decay of within-gene linkage disequilibrium and the high coverage of amplicons across each gene, it is likely that the numerous polymorphisms identified are in close proximity to the causative SNPs and the haplotype associations reflect information present in the associations between markers.


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Forest trees are a potential source of net-zero carbon emission lignocellulosic biofuels. The production of biofuels involves the collection of biomass, deconstruction of cell wall polymers into component sugars (pretreatment and saccharification) and conversion of these sugars to ethanol (fermentation) (Rubin, 2008). Woody bioenergy crops from which biomass is derived have not been domesticated for this purpose and the current methods for lignocellulosic saccharification and fermentation are inefficient. The recent need to develop viable fuel alternatives is now taking advantage of genomics resources and technologies to discover the potential gain that can be achieved through breeding. Traits of interest in trees with applications in bioenergy include growth rate, branching habit, stem thickness and cell wall chemistry (Bradshaw et al., 2000). Rapid growth, moderate genome size, woody tissues and economic importance make black cottonwood (Populus trichocarpa) an ideal model organism to examine biofuels-related traits (Bradshaw et al., 2000). Black cottonwood possesses tremendous genetic and phenotypic diversity, is obligately outcrossing, is able to hybridize with many other species and is easily clonally propagated (Davis, 2008). To further complement the advantages, black cottonwood is the first tree and bioenergy feedstock to have its genome sequenced and annotated. Derived from a single wild individual (Nisqually-1), the genome sequence represents an estimated 45 500 genes across 19 chromosomes (Tuskan et al., 2006). In addition to the genome, resources such as controlled cross-populations, cross-species’ molecular markers, expressed sequence tag (EST) collections and full-length cDNAs are available to the research community (Strauss & Martin, 2004; Ralph et al., 2006a,b; Tuskan et al., 2006).

Improvement of biofuels feedstocks focuses on increasing both the relative carbon partitioning in woody tissues above ground and the accessibility of cellulose for enzymatic digestion (Ragauskas et al., 2006). As with other woody species, the major components of black cottonwood secondary cell walls are cellulose, hemicellulose and lignin (Harris et al. 2008). Lignin inhibits saccharification in processes aimed at producing simple sugars for fermentation to ethanol. Many studies have focused on the molecular biology of wood and secondary wall formation (Sterky et al., 1998, 2004; Plomion et al., 2001; Schrader et al., 2004). The pathways and genes involved in lignin and cellulose biosynthesis, and microfibril deposition, are increasingly becoming well understood through biochemical analysis and expression studies (Whetten et al., 1998; Plomion et al., 2001; Li et al., 2003a,b; Peter & Neale, 2004; Schrader et al., 2004; Boerjan, 2005; Oakley et al., 2007). The specific roles of genes in these pathways have been verified through forward and reverse genetic mutation studies (Dixon & Reddy, 2003; Ralph et al. 2006a,b; Davis, 2008). A relatively unexplored area of research is the identification of the natural allelic variation controlling phenotype variation and the exploitation of this variation in breeding.

A major goal of population and quantitative genetics is the identification of the polymorphisms responsible for phenotypic variation (Feder & Mitchell-Olds, 2003; Stinchcombe & Hoekstra, 2008). Many traits of interest in forest trees, such as wood quality, are complex in nature and occur later in development (Groover, 2007). Recent advances in high-throughput marker technologies, combined with the wealth of genomic resources available to species such as black cottonwood, have enabled a closer examination of the number and effect sizes of genes responsible for traits of interest through complex trait dissection using association mapping. Tree species are ideal for association mapping as they are predominantly outcrossing and have large, relatively unstructured, populations, resulting in high levels of nucleotide diversity and low linkage disequilibrium (Neale & Savolainen, 2004; Gonzalez-Martinez et al., 2006). Significant associations between single nucleotide polymorphisms (SNPs) within candidate genes have been established in forest trees. Associations with wood quality traits in eucalyptus (Thumma et al., 2005), wood quality and drought tolerance traits in loblolly pine (Gonzalez-Martinez et al., 2007, 2008), bud phenology traits in European poplar (Ingvarsson et al., 2008) and cold hardiness-related traits in coastal Douglas fir (Eckert et al., 2009a) have been identified. In general, individual SNPs explain a small proportion of the phenotypic variance (0.5–5.0%), which is consistent with the complex nature of these traits.

In this study, statistical models were applied to perform association tests and to account for population structure in 579 SNPs from 40 candidate genes involved in lignocellulosic cell wall synthesis in black cottonwood. Single-marker and haplotype-based tests were performed to identify associations with natural variation in composite traits evaluating lignin and cellulose content.

Materials and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Association population and phenotypic data

Association population  GreenWood Resources (Portland, OR, USA) assembled a collection of 1189 black cottonwood (Populus trichocarpa Torr. & A. Gray) clones from 101 provenances from 12 river drainages located west of the Cascade Mountains between 480°56′N (Nooksack River, Whatcom County, WA, USA) and 430°47′N (Middle Fork, Willamette River, Lane County, OR, USA) latitudes during the period 1990 to 1999 (Fig. 1). The collection was established in clone banks where it was annually coppiced to remove C effects from planting stock used in the establishment of clonally replicated field trials in 1994, 1996, 1999 and 2003. All four trials were planted at an alluvial site on the lower Columbia River floodplain at Westport, OR, USA (460°08′N). The soil is deep and moderately well drained, with a loam–silt loam surface overlaying a sandy loam to fine sand horizon. Annual precipitation averages 2034 mm and the average maximum temperature during the April–September growing season is 20°C.


Figure 1.  Descriptive information on the distribution, sampling localities and population structure across the range of black cottonwood. (a) Range map for black cottonwood. (b) Sample locations across Oregon and Washington. Each point denotes a single tree (n = 448). (c) Population structure estimates across all the sampled range of black cottonwood. Colors designate the five significant genetic clusters detected using principal components analysis (PCA). Multiple colors denote points with multiple clones assigned to different genetic clusters.

Download figure to PowerPoint

Sample preparation and wood chemistry phenotyping  Wood samples were collected from a subset of 448 clones representing all of the original provenances. Two Haglof 5 mm increment cores were taken from the bark to the pith of up to three ramets per clone growing in the four Westport clone trials (Fig. 1b, Supporting Information Table S1). Cores were extracted at a diameter at breast height of 1.37 m and placed in a −8°C freezer until sectioning. Sample preparation consisted of removing the two outermost complete growth rings of each core because of the different ages of the trees.

Ground wood samples (c. 4 mg) were prepared in stainless steel sample cups, and pyrolyzed using a Frontier Pyrolyzer PY2020iD (Frontier Laboratories, Ltd. Fukushima, Japan). Pyrolysis was performed at 500°C using helium carrier gas flowing at 2.0 l min−1 (at STP). The transfer line connecting the pyrolysis unit to the molecular beam mass spectrometer was heated to c. 400°C. The pyrolysis vapors were expanded through a ruby sampling orifice that was mated directly to the faceplate of the molecular beam mass spectrometer. The total pyrolysis time was 30 s, although the pyrolysis reaction was completed in < 12 s. A custom-built molecular beam mass spectrometer using an Extrel™ Model TQMS C50 mass spectrometer was used for pyrolysis vapor analysis. Mass spectral data from mass to charge ratio (m/z) 30–450 were acquired on a Merlin data acquisition system using 22.5 eV electron impact ionization. Using this system, both light gases and heavy tars are sampled simultaneously and in real time. The mass spectrum of the pyrolysis vapor provides a rapid, semi-quantitative depiction of the molecular fragments. Data analysis was performed using Unscrambler v. 9.7 (CAMO A/S, Trondheim, Norway).

Resequencing, SNP discovery and genotyping

Candidate gene selection  Forty candidate genes associated with lignocellulosic cell wall development, and well annotated in the JGI Poplar Genome Assembly v. 1.1, were selected for resequencing (Table 1). These included 22 genes from 11 gene families involved in lignin biosynthesis and polymerization, six genes from four families involved in one-carbon metabolism associated with lignin biosynthesis, and 12 genes from five families involved in cellulose biosynthesis and microfibril deposition. The corresponding gene models were obtained from the poplar genome and manually curated (Table 1).

Table 1.   Details of candidate genes selected for resequencing
GeneAmpliconsSNPs targetedSNPs convertedGene familyJGI gene model
  1. Single nucleotide polymorphisms (SNPs) targeted, SNPs identified and sent for genotyping on the Illumina GoldenGate assay; SNPs converted, SNPs successfully genotyped on the Illumina GoldenGate assay.

  2. 1Genes with significant haplotype-based associations.

  3. 2Genes with significant single-marker associations.

4CL111042284-Coumarate:CoA ligase (4CL)estExt_fgenesh4_pg.C_1210004
4CL31395 grail3.0100002702
4CL531915 fgenesh4_pg.C_LG_III001773
C3H331010Coumarate 3-hydroxylase (C3H)fgenesh4_pg.C_LG_VI000268
C4H12297Cinnamate 4-hydroxylase (C4H)grail3.0094002901
C4H2231510 estExt_fgenesh4_pg.C_LG_XIII0519
CAD1,275630Cinnamyl alcohol dehydrogenase (CAD)estExt_Genewise1_v1.C_LG_IX2359
CCR1,252117Cinnamoyl-CoA reductase (CCR)estExt_fgenesh4_kg.C_LG_III0056
CesA1A1,2104327Cellulose synthase (CesA)gw1.XI.3218.1
CesA1B1,2106539 eugene3.00040363
CesA2A262519 gw1.XVIII.3152.1
CesA2B1,263218 estExt_Genewise1_v1.C_LG_VI2188
CesA3A173829 eugene3.00002636
HCT11,253318Hydroxcinnamoyl-CoA quinate/shikimate hydroxycinnamoyltransferase (HCT)fgenesh4_pg.C_LG_III001559
HCT6252718 eugene3.02080010
KOR124129Cellulase (KOR)estExt_fgenesh4_pg.C_LG_I0683
LAC1A251513Laccase (LAC)estExt_fgenesh4_pg.C_LG_XVI1027
LAC2275 estExt_fgenesh4_pg.C_LG_VIII0541
LAC90A52916 estExt_fgenesh4_pm.C_LG_VIII0291
PAL2251711Phenylalanine ammonia-lyase (PAL)estExt_fgenesh4_pg.C_LG_VIII0293
PAL442411 gw1.X.2713.1
PAL542411 estExt_fgenesh4_pg.C_LG_X2023
SAM11,232012S-Adenosylmethionine synthetase (SAMS)eugene3.00080928
SHMT13179Serine hydroxymethyltransferase (SHMT)eugene3.00012227
SHMT362614 grail3.0003095602
SHMT631410 estExt_fgenesh4_pm.C_880008
SUSY11,262416Sucrose synthase (SUSY)estExt_fgenesh4_pm.C_LG_XVIII0009
TUA13129α-Tubulin (TUA)gw1.II.3483.1
TUA5131911 eugene3.00090803
TUB1542718β-Tubulin (TUB)estExt_Genewise1_v1.C_LG_I1970
TUB163137 estExt_fgenesh4_pm.C_LG_IX0457
TUB931812 eugene3.00010909
CoAOMT123147Caffeoyl CoA O-methyltransferase (CCoAOMT)grail3.0001059501
CoAOMT232416 estExt_fgenesh4_pm.C_LG_I1023
COMT142614Caffeate O-methyltransferase (COMT)estExt_fgenesh4_pm.C_LG_XV0035
COMT2241713 estExt_fgenesh4_pm.C_LG_XII0129
F5H14149Ferulate 5-hydroxylase (F5H)estExt_fgenesh4_pm.C_570058
F5H2395 eugene3.00071182
gdcH163622Glycine decarboxylase complex, H (gdcH)estExt_fgenesh4_pg.C_LG_XII1299
gdcT23108Glycine decarboxylase complex, T (gdcT)eugene3.02520018

DNA isolation, primer design and resequencing  Leaf tissue from the diversity panel of 15 unrelated poplar clones (one ramet per clone), selected to represent the latitudinal range of the entire clone collection, was sampled as leaf punches, dried with silica gel and shipped at room temperature to DNA Landmarks (Saint-Jean-sur-Richelieu, QC, Canada) for DNA extraction, utilizing their proprietary microscale protocol. All DNA extractions were standardized to 2.5 ng μl−1 for resequencing. The same protocol was used to extract DNA for the 448 clones, with all extractions standardized to 50 ng μl−1 prior to genotyping.

Primers were designed at Ampure Agencourt Bioscience Corporation (Beverly, MA, USA), utilizing custom software against the Poplar Genome Assembly v. 1.1. Genomic sequences covering the entire protein-coding regions, including introns and 1000 bp upstream and 300 bp downstream noncoding sequences, were retrieved for primer design. The program was set to design primers every 700 bp, which yielded 517 primer pairs across the 40 genes. Of these, Agencourt utilized in-house software to select 200 nonoverlapping primer pairs based on a quality metric representing the redundancy in the genome and how likely the amplicon is to be a homopolymer locus. The best-scoring pairs were tagged with M13F (GTAAAACGACGGCCAGT) and M13R (CAGGAAACAGCTATGACC) primers for high-throughput sequencing.

Genomic DNA was amplified in a 384-well format polymerase chain reaction (PCR) set-up. Each PCR contained 10 ng DNA, 1 × HotStar buffer, 0.8 mM deoxynucleoside triphosphates (dNTPs), 1 mM MgCl2, 0.2 U HotStar enzyme (Qiagen, Valencia, CA, USA) and 0.2 μM forward and reverse primers in a 10 μl reaction. PCR cycling parameters were: one cycle of 95°C for 15 min, 35 cycles of 9°C for 20 s, 60°C for 30 s and 72°C for 1 min, followed by one cycle of 72°C for 3 min. The resulting PCR products were purified using solid-phase reversible immobilization chemistry followed by dye-terminator fluorescent sequencing with universal M13 primers. PCR for sequencing was initiated at 95°C for 15 min followed by 40 cycles for 10 s, 50 cycles for 5 s and, finally, 60 cycles for 2 min 30 s. Dye-terminator removal was performed using solid-phase reversible immobilization. Bidirectional Sanger sequencing of PCR fragments was carried out via capillary electrophoresis using ABI Prism 3730xl DNA analyzers (Applied Biosystems, Foster City, CA, USA).

SNP discovery and selection  Sanger resequencing produced a total of 202 amplicons (600–700 bp in length) representing 40 genes (3–12 amplicons per gene). The package, PineSAP (Pine Sequence Alignment and SNP Identification) (Wegrzyn et al., 2009), applied a combination of ProbConsRNA (Do et al., 2005), Polyphred (Nickerson et al., 1997), Polybayes (Marth et al., 1999) and machine learning techniques to align sequences from 195 of the 202 amplicons and to computationally identify 1485 polymorphisms (an average of seven SNPs per amplicon). SNP detection of the resulting calls was based on information gathered on quality scores, coverage and alignment metrics computed during the sequence alignments. The identified polymorphisms and their flanking sequences were formatted for the GoldenGate assay (Illumina, San Diego, CA, USA) and submitted to their in-house software package responsible for assigning design scores. An additional 1233 SNPs from 232 genes were identified for population structure inference through eSNP methods, utilizing ESTs from male and female catkin tissue aligned to the reference genome (Unneberg et al., 2005). To construct the 1536 assay, we selected 948 high scoring SNPs from the 40 lignin/cellulose genes and 588 high scoring eSNPs from the 232 catkin ESTs.

SNP genotyping  Genotyping was carried out using the Illumina GoldenGate SNP genotyping platform (Landegren et al., 1998; Oliphant et al., 2002; Fan et al., 2003; Eckert et al., 2009b) at the DNA Technologies Core Facility (University of California at Davis, Davis, CA, USA). The assay involves the generation of templates with specific target and address sequences using allele-specific extension, followed by ligation and amplification with universal primers. Fluorescent products are hybridized to coded beads on an array matrix and the signal intensities are subsequently determined using the BeadArray Reader (Illumina). Signal intensities are quantified and matched to specific alleles using BeadStudio v. 3.1.14 (Illumina). Manual adjustments to genotypic clusters were made when necessary. For the inclusion of SNPs into the final dataset, we used thresholds of 0.20 and 0.60 for the GenCall50 (GC50) and call rate (CR) indices, respectively (Table S2). These are established quality metrics that have been used to evaluate Illumina genotyping data (Pavy et al., 2008; Eckert et al., 2009b). The scores reflect the quality genotypic clusters (GC50) and the fraction of samples having a genotype defined for a particular SNP.

Tests for association

Genetic diversity, population structure and linkage disequilibrium  For each SNP, we estimated the expected and observed heterozygosity, Wright’s inbreeding coefficient (FIS) and hierarchical fixation indices using the Genetics and HIERFSTAT packages available in R (Goudet, 2005; Warnes and Leisch, 2006; R Development Core Team, 2007). We excluded those SNPs with |FIS| > 0.25 from further analyses. The significance of multilocus fixation indices was tested via bootstrapping across loci (n = 10 000 replicates) to obtain 99% confidence intervals (99% CI). Patterns of population structure were further examined using principal components analysis. Population structure coefficients were estimated using Eigenstrat v. 2.0 (Price et al., 2006). For association analyses, a Q matrix defined by significant principal components, as assessed using the Tracy–Widom distribution (Patterson et al. 2006), was utilized. Cluster membership was determined via hierarchical cluster analysis using Ward’s linkage and Euclidean distances on the significant principal components. The number of clusters was identified as + 1, where k is the number of significant principal components. We identified FST outliers using the bivariate distribution of expected heterozygosity and FST among inferred clusters observed for the 297 eSNPs to define the genome-wide expectation of background levels of genetic structure. Lignin SNPs falling outside this distribution were identified as FST outliers.

Linkage disequilibrium (LD) was measured as the squared correlation of allele frequencies r2 (Hill & Robertson, 1968), which is affected by both recombination and differences in allele frequencies between sites. The r2 value between pairs of informative SNP sites in candidate genes was calculated using the Genetics package in R (Warnes and Leisch 2006; R Development Core Team, 2007). Patterns of LD were investigated among SNPs from 39 of the 40 candidate genes. CesA1A was not included in this analysis because of physical annotation differences in the reference genome. To assess the extent of LD in the sequenced genomic regions, the decay of LD with physical distance (base pairs) between SNP sites within each candidate locus and over all candidate genes was evaluated by nonlinear regression analysis of r2 values (Remington et al., 2001). The expectation of r2 for low mutation rates and taking into account sample size is given by:

  • image

where C is the population recombination parameter (P = 4Ner) and n is the sample size; C was replaced by C × distance in base pairs when fitting the formula to our data using the nonlinear regression (nls) function in R (R Core Development Team, 2007).

Statistical models

Single-marker models were utilized for all SNP–trait combinations. A general linear model was fitted to each trait–SNP combination (Yu et al., 2006), with SNP markers as fixed effects and elements of the Q matrix as covariates. P values were generated for each test using 10 000 permutations of genotypes with respect to phenotypic trait values. All analyses were conducted using TASSEL v. 2.0.1 (Bradbury et al., 2007). Corrections for multiple testing were performed using the positive false discovery rate (FDR) method (Storey, 2002; Storey & Tibshirani, 2003). All the necessary data to perform these analyses are available in Tables S4 and S5. Modes of gene action were quantified using the ratio of dominance (d) to additive (a) effects estimated from least-square means for each genotypic class. Partial or complete dominance was defined as values in the range 0.50 < |d/a| < 1.25, whereas additive effects were defined as values in the range −0.50 ≤ d/a ≤ 0.50. Values of |d/a| > 1.25 were equated with over- or underdominance.

Haplotypes were inferred and their frequencies were estimated using the modified expectation maximization method of haplotype inference included in the haplo.stats (v. 2.0.1) program available in R (Schaid et al., 2002; R Core Development Team, 2007). Input consisted of genotype matrices with principal components analysis values and phenotypic values organized by tree sample (Table S6). Singleton alleles were ignored when constructing the haplotypes, and haplotypes with a frequency < 5 were also discarded. Output in the form of global score statistics and haplotype-specific scores was derived from generalized linear models (haplo.score). Corrections for multiple testing were performed using the positive FDR method (Storey, 2002; Storey & Tibshirani, 2003).


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information


Wood samples were analyzed using pyrolysis molecular beam mass spectrometry. The intensities of the major peaks assigned to lignin were summed in order to estimate the lignin content, syringyl : guaiacyl (S : G) ratios, C5 sugars and C6 sugars across the range of samples (Table 2). Lignin content was calculated with peaks at mass to charge ratio (m/z) of 124, 137, 138, 150, 152, 164 and 178; these were summed and then averaged for the different samples. S : G ratios were determined by summing the S peaks and then dividing by the sum of G peaks. C5 and C6 sugars were calculated as the sum of their respective peaks. Visualization of each phenotype demonstrated a strongly bimodal distribution for the C5 trait as opposed to the distributions for the other three composite traits, which were approximately normal. As a result, C5 was not included in subsequent analyses. S : G ratios ranged from 1.2 to 2.4, and lignin content ranged from 15.8 to 27.5%.

Table 2.   Major peak assignments from pyrolysis molecular beam mass spectrometry
m/zAssignment(S) or (G) precursor
  1. m/z, mass to charge ratio; S, syringyl peaks; G, guaiacyl peaks.

57, 73, 85, 96, 114C5 sugars 
57, 60, 73, 98, 126, 144C6 sugars 
137Ethylguaiacol, homovanillin, coniferyl alcoholG
150Vinylguaiacol, coumaryl alcoholG
1524-Ethylguaiacol, vanillinG
164Allyl-*propenyl guaiacolG
167Ethylsyringol, syringylacetone, propiosyringoneS
178Coniferyl aldehydeG
180Coniferyl alcohol, vinylsyringol, α-d-glucoseG, S
208Sinapyl aldehydeS
210Sinapyl alcoholS

Genotyping results

The 1536 SNPs chosen for genotyping using the Illumina GoldenGate platform represent 948 from 40 candidate genes (20 gene families and 202 amplicons), with 7–65 SNPs per gene, and 588 from the 232 catkins ESTs (Table 1). Of the 1536 SNPs, 874 (57%) yielded data consistent with our quality thresholds (579 candidate gene SNPs and 297 eSNPs). A conversion rate of 61% (579 SNPs) was observed among the 948 SNPs from the resequenced 40 lignin/cellulose candidate genes, as opposed to 51% for the eSNPs. The median GC50 score across all usable SNPs was 0.71 and the median CR score was 0.72. Quality scores across the genotyped loci are summarized in Table S2. The distribution of the quality metrics for genotyped SNPs, grouped by dataset, is shown in Fig. S1. The majority of the 579 successfully genotyped SNPs were silent, with nonsynonymous SNPs accounting for 19% of the total.

Population structure

Principal components analysis on the 488 clones using 297 eSNPs revealed four significant principal components, explaining 10% of the overall variance. From these four principal components, five clusters were formed using hierarchical clustering with Ward’s linkage method. All five clusters illustrated a latitudinal trend, with the Columbia River delineating a major geographical north–south separation (Fig. 1c). These five clusters also illustrated significant genetic structuring as estimated using FST, as well as significant differences among means for the three composite traits. The average FST was low for both sets, but greater for the lignocellulosic SNPs (FST = 0.034; 99% CI, 0.028–0.042) as opposed to the eSNPs (FST = 0.013; 99% CI, 0.011–0.016) SNPs. A comparison of the distribution of FST for each set revealed that seven genes had values of FST greater than any observed for the eSNPs (Fig. S4). These outliers were concentrated within the CesA3A, CAD, SUSY1, 4CL1, CesA2B, TUB15 and CesA1B genes (Fig. S4). Polymorphisms within these genes had values of FST approximately five to 10-fold greater than the multiple locus average. Cluster 1, which was distributed primarily south of the Columbia River, also had significantly different means for lignin, S : G and C6 (ANOVA: < 2.0 × 10−6; Tukey multiple comparison tests: < 0.01). Additional summaries of genetic diversity across all SNPs and clusters are given in Table S3 and Figs S1–S5.

Linkage disequilibrium

All r2 values were pooled to assess the overall behavior of LD for the candidate genes and to estimate the genome-wide degree of LD in black cottonwood. Fig. 2(b) shows the extent of LD across the sequenced regions. The fitted curve indicates that LD is generally low in black cottonwood, rapidly decaying by over 50% (from 0.50 to 0.20) within a distance of c. 200 bp (Fig. 2b,c). Within candidate genes, the average distance associated with LD decline to r2 = 0.1 varies from c. 200 to c. 600 bp (Fig. 2a).


Figure 2.  (a) Decay of linkage disequilibrium (LD) with distance in base pairs between sites in two candidate genes: SUSY1 and C4H1. Squared coefficients of allele frequency (r2) are plotted against distance in base pairs. The fitted curve represents the trend of decay of LD. (b) Decay of LD with distance in base pairs between sites pooled across 39 genes. (c) Decay of LD across all candidate genes for the first 400 base pairs from that presented in (b).

Download figure to PowerPoint

Overall summary of single SNP and haplotype-based associations

A total of 1734 (579 SNPs × 3 traits) single-marker association tests were performed. Of these, 65 were significant at the threshold of < 0.05. Multiple test corrections using the FDR method reduced this number to 37 at a significance threshold of < 0.10. A total of 13 lignin content, one S : G and 23 C6 sugar content associations were identified (Table 3). The 37 associations represent 27 unique SNPs from 40 candidate genes. Many of the 37 SNPs that exhibited significant associations with at least one trait were consistent with codominance (Table 4). Four of the 34 markers for which dominance and additive effects could be calculated were consistent with overdominance (|d/a| > 1.25). The remaining 30 markers were split between modes of gene action that were codominant (|d/a| < 0.50, 25) or partially to fully dominant (0.50 < |d/a| < 1.25, 5).

Table 3.   List of significant marker–trait pairs after a correction for multiple testing [false discovery rate FDR (Q)  0.10]
TraitGene symbolSNPFPNR2Q
  1. ns, nonsynonymous polymorphism; s, synonymous polymorphism; nc, noncoding polymorphism.

S : G
Table 4.   List of marker effects for significant marker–trait pairs
  1. 1Calculated as the difference between the phenotypic means observed within each homozygous class (2a = |GBBGbb|, where Gij is the trait mean in the ijth genotypic class).

  2. 2Calculated as the difference between the phenotypic mean observed within the heterozygous class and the average phenotypic mean across both homozygous classes [GBb−0.5(GBB Gbb), where Gij is the trait mean in the ijth genotypic class].

  3. 3sp, standard deviation for the phenotypic trait under consideration.

  4. 4Allele frequency of either the derived or minor allele. Single nucleotide polymorphism (SNP) alleles corresponding to the frequency listed are given in parentheses.

  5. 5The additive effect was calculated as pB(GBB) + pb(GBb)−G, where G is the overall trait mean, Gij is the trait mean in the ijth genotypic class and pi is the frequency of the ith marker allele. These values were always calculated with respect to the minor allele.

S : G

Among haplotype-based associations, 181 amplicons were analyzed (after the removal of singletons) and 17 amplicons from 13 unique genes were significant, with a global significance threshold of < 0.05 (Table 5). Multiple test corrections using the FDR method reduced this number to 14 amplicons (13 unique genes and 71 haplotypes) at a global significance threshold of < 0.10.

Table 5.   List of haplotypes with significant associations to phenotype after a correction for multiple testing [false discovery rate (FDR) Q ≤ 0.10]
AmpliconTraitPQHaplotypesSignificant haplotypesHaplotype frequencySingle-marker associations
  1. 1Single-marker associations with the lowest Q value relating to the significant haplotype–trait association.

  2. 2Significant single-marker associations (FDR Q ≤ 0.10) listed with the associated traits.

4CL1_11Lignin0.00420.05393TGC0.314CL1_11-108 (0.2278)1
4CL3_14Lignin0.00210.05196CGT0.024CL3_13-464 (0.2041)1
CAD_04Lignin0.00650.05789CAAAAT0.03CAD_04-185 (S/G, C6)2
CCR_12Lignin0.00600.057840 CCR_12-366 (0.2168)1
CesA1B_10Lignin0.00380.05396AGA0.15CesA1B_10-41 (0.3726)1
CesA2B_16Lignin0.00550.057620 CesA2B_16-423 (0.2967)1
CesA3A_09Lignin0.00180.05195TAAAAA0.01CesA3A_09-93 (0.2068)1
CesA3A_13Lignin0.00220.05197CGGAA0.15CesA3A_13-535 (0.1777)1
HCT1_12Lignin0.00160.05193AA0.73HCT1_12-156 (0.1828)1
SUSY1_02Lignin0.00530.05763AAAA0.77SUSY1_02-108 (lignin, C6)2 SUSY1_02-396, SUSY1_02-503 (C6)2
TUA5_09Lignin0.00270.052170 TUA5_09-73 (0.1899)1
4CL1_01C60.00000.00185AGA0.124CL1_01-468 (0.1668)1
CesA1A_12C60.00050.02316AGA0.09CesA1A_12-40 (C6)2
SAM1_07C60.00080.02395AGAA0.01SAM1_07-480 (0.2874)1

Lignin associations

Lignin composition was represented by averaging values of guaiacyl precursor peaks. A total of 13 significant single-marker associations were found for nine candidate genes associated with lignin content (Table 3). Three of the significant marker–trait associations were located in the coding region and 10 in the noncoding region. Two of the significant associations were nonsynonymous (C4H1, CESA2A) and one was synonymous (HCT6). Individually, each of the 13 markers explained a small proportion of the phenotypic variance, with effects ranging from 1.2% to 3.8%.

Eleven significant haplotype associations from 10 unique genes were identified for lignin content (Table 5). Eight amplicons, representing seven unique genes, had at least one significant haplotype after multiple test corrections (Table 5). Three of the amplicons did not have significant individual haplotypes and included regions of three candidate genes (CCR, CesA2B and TUA5). From the eight amplicons with at least one significant haplotype, just one (SUSY1) was supported with a single-marker association in the same trait (SUSY1_02-108). The remaining five candidate genes (4CL1, 4CL3, CesA1B, CesA3A and HCT1) had at least one supporting single-marker association, with a P value of < 0.05 before multiple test corrections.

S : G ratio associations

The S : G ratio phenotype is a result of the seven S peaks to the six G peaks. Analysis of the S : G trait resulted in one significant marker–trait association (Table 3). This marker is noncoding and explained a small proportion of the phenotypic variance (3.2%). Haplotype-based tests did not reveal significant associations.

C6 sugar associations

C6 sugars were represented by summing the values of six peaks. A total of 23 significant associations was found in 13 candidate genes associated with C6 sugars (Table 3). Four of the significant marker–trait associations were located in coding regions. Three of these SNPs were synonymous for three different candidate genes (CESA1A, C4H2, HCT6) and one significant association was nonsynonymous (CESA2A). Four marker–trait associations in two candidate genes were highly significant and unique only to the C6 phenotype (SUSY1, CESA1B). All 23 markers explained a small proportion of the phenotypic variance, with individual effects ranging from 1.1% to 3.7%.

Three amplicons representing three unique candidate genes (4CL1, CesA1A and SAM1) were significant in terms of haplotype-based associations with C6 (Table 5). All three amplicons were highly significant (< 0.05) with respect to C6 sugars and contained at least one significant individual haplotype after multiple test corrections (< 0.10). One candidate gene (CesA1A) contained a significant single-marker association in the same amplicon and associated with the same trait (CesA1A_12-40).


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Strategies for the domestication of forest trees using either conventional or novel molecular breeding approaches are centered around the exploitation of existing genetic diversity. Over the past few decades, genetic maps have been made for many forest tree species and quantitative trait loci have been mapped for a range of traits (Brown et al., 2003). The lack of resolution in mapping candidate genes and quantitative trait loci alleles can be overcome by association genetics, using natural populations in which the long evolutionary history has decreased the extent of LD in populations (Neale & Savolainen, 2004). An important prerequisite for association mapping is the availability of large allelic variation in the population. LD describes a key aspect of genetic variation in natural populations of plants. This study is the first examination of genome-wide LD in black cottonwood, and enables comparison with other poplars. We examined LD across 39 of the candidate genes (Fig. 2b,c), and observed a rapid decay of LD within just a few hundred base pairs, indicating the potential of association genetics to identify the genes responsible for variation in the trait. Previous studies in both P. tremula (five genes) and P. nigra (nine genes) showed a similar rapid decay of LD (Ingvarsson, 2005; Chu et al., 2009). LD was demonstrated to decay over significantly longer distances in a recent study across over 300 randomly selected gene fragments in the closely related P. balsimifera (Olsen et al., 2010).

This study examined both single-marker associations and haplotype-based tests to account for information present in the associations between markers, as well as directly between an SNP and the trait. Given the structure of our data, a natural way to apply the knowledge of LD within and between genes is to perform haplotype-based association tests. The power of a single-marker association test is often limited because LD information contained in flanking markers is ignored. Intuitively, haplotypes (which are essentially a collection of ordered markers) may be more powerful than individual, nonordered markers. This study demonstrates that the use of haplotypes can increase significantly the ability to map traits of interest.

Candidate genes known to be involved in lignocellulosic cell wall development were examined for genetic associations. There are two major steps of lignin biosynthesis in plants: monolignol biosynthesis and the subsequent polymerization of lignin monomers to form polymers. This biochemical pathway is highly conserved throughout vascular plants, and many of the enzymes have been identified and characterized (Boerjan et al., 2003; Xu et al., 2009). The cellulose biosynthesis pathway involves the synthesis and assembly of β-1,4 glucan chains at the rosette terminal complex, and their orderly deposition to form cell wall microfibrils. Although several candidate genes have been identified, the precise molecular mechanism of cellulose biosynthesis and microfibril deposition in plants is still not clearly understood. Genetic improvement of lignin and cellulose biosynthesis in trees continues to be a major research priority. Similar to other commercial applications for black cottonwood, modified lignin structure (chemical reactivity) and increased cellulose content are desirable traits. Mechanisms that can increase C6 sugar content and decrease C5 sugar content of hemicelluloses are favorable for fermentation.

In the monolignol biosynthetic pathway, the first step consists of a deamination of phenylalanine by phenylalanine ammonia-lyase (PAL) that produces cinnamic acid. PAL is encoded by a small multigene family (Appert et al., 1994; Osakabe et al., 1995; Cochrane et al., 2004), and five isoforms have been annotated in the poplar genome (Tsai et al., 2006). In this study, markers in PAL2, PAL4 and PAL5 were genotyped. A single-marker noncoding association was identified with PAL2 that explained 1.4% of the phenotypic variation in C6 sugars (Table 3). In aspen (P. tremuloides) stem, PAL2 transcripts have been localized to developing xylem cells, consistent with its involvement in lignin biosynthesis (Kao et al., 2002).

C4H catalyzes the first oxidative reaction in phenylpropanoid metabolism, namely the conversion of cinnamic acid to p-coumaric acid (Sewalt et al., 1997). Three C4H genes have been characterized in black cottonwood (Lu et al., 2006). C4H1 is proposed to be associated with G lignin deposition, whereas C4H2 is thought to be involved in S lignin biosynthesis (Lu et al., 2006). Four unique single-marker associations were identified in the C4H1 and C4H2 genes examined in this study. A significant nonsynonymous association in exon 1 of C4H1 with lignin demonstrated modes of gene action consistent with additive effects (Table 3; Fig. 3). The C allele at C4H1_02-219 is the minor allele and causes a histidine (H) [RIGHTWARDS ARROW] proline (P) amino acid substitution. Heterozygotes for the marker had a percentage value of lignin composition that was intermediate to either homozygote class (21.9% for A/A, 22.7% for A/C, 23.2% for C/C). A similar study in European maize identified a nonsynonymous SNP in the first exon of C4H1 associated with forage quality traits (Andersen et al. 2008). Physiological studies of these genes describe unique functions for the isoforms within the lignin biosynthetic pathway.


Figure 3.  Marker effects on the significant nonsynonymous single nucleotide polymorphisms (SNPs) found in C4H1 and CesA2A. (a) The C4H1_04-219 nonsynonymous marker in the first exon of the C4H1 gene illustrates patterns of gene action consistent with additive effects. The C allele at C4H1_04-219 causes a histidine (H) to proline (P) amino acid substitution. (b) The CesA2A_08-38 nonsynonymous marker is located in the sixth exon of the CesA2A gene. This SNP is significant for both lignin content and C6 traits. For lignin content, the homozygote decreases the percentage content, whereas, in C6, the sugar content is elevated. The G allele at CESA2A is the derived state and is responsible for an isoleucine (I) to valine (V) amino acid substitution. In both gene models, solid boxes denote untranslated region, solid lines are introns and open boxes indicate exons.

Download figure to PowerPoint

4-Coumarate:CoA ligase (4CL), which catalyzes the formation of CoA esters of p-coumaric acid and its derivatives, has a pivotal role in channeling phenylpropanoid precursors into different downstream pathways, each leading to a variety of functionally distinct end products (Harding et al., 2002). 4CL is also encoded by multigene families, with five isoforms annotated in the poplar genome (Tsai et al., 2006). Although we were unable to identify significant single-marker associations in 4CL1, 4CL3 and 4CL5, significant associations with haplotypes in 4CL1 and 4CL3 were observed for both lignin and C6 traits. Of the five haplotypes (spanning 389 bp) in 4CL1_01, two significant associations demonstrated an effect on C6 sugar content (35.1% for AGA and 34.1% for AAA). In lignin composition, two haplotypes of 4CL1_11 demonstrated a difference of > 1% in lignin composition (Table 5; Fig. 4b). Three single markers in 4CL_11 at < 0.05 were found to be LD, and their individual genotypic effects on lignin composition were small in comparison with the spanning haplotype block (Fig. 4b). The reduction of 4CL expression in transgenic poplar has resulted in significant reductions of lignin, ranging from 5% to 45% (Hu et al., 1999; Li et al., 2003a,b).


Figure 4.  Haplotype and single-marker associations are illustrated for SUSY1 and 4CL1. (a) The genotypic effects of the three proposed haplotypes (two significant) of SUSY1 are shown. The haplotypes yield significantly different median phenotypic values for the lignin content trait. The marker effects of four significant single-marker associations are also shown. SUSY1_02-108 is significant with respect to lignin. The remaining markers are significant with respect to the related trait, C6 sugars. All four markers are within linkage disequilibrium (LD) with one another. (b) The genotypic effects of the three haplotypes (two significant) of 4CL1 are shown. The significant haplotypes yield different median phenotypic values for the lignin content trait. No significant single-marker associations were identified after multiple testing; however, the box plots for single markers with P < 0.05 are shown. Two of the three markers are in LD with one another.

Download figure to PowerPoint

Hydroxycinnamoyl-CoA transferase (HCT) is the most recently identified enzyme in monolignol biosynthesis and belongs to a large family of acyltransferases (Hoffmann et al., 2003a,b). It catalyzes the conversion of p-coumaroyl-CoA and caffeoyl-CoA to their corresponding shikimate or quinate esters. Two of the six annotated HCT genes in the Populus genome (HCT1 and HCT6) are expressed in developing xylem (Tsai et al., 2006). HCT6_13-225 was a significant synonymous marker in both lignin and C6 (Table 3). Two significant haplotypes in HCT1_12 were associated with lignin composition (Table 5). HCT has not been transgenically manipulated in poplar; however, RNAi-mediated silencing of HCT in conifers (Pinus radiata) that do not produce S lignin had a strong impact on lignin content (42% reduction), monolignol composition and interunit linkage distribution (Wagner et al., 2007). A similar study of HCT in Arabidopsis showed a reduction in lignin content and an increased G lignin deposition (Hoffmann et al. 2004).

p-Coumaroyl-CoA shikimate proceeds through a series of transformations into caffeoyl-CoA shikimate, caffeoyl-CoA, feruloyl-CoA and coniferaldehyde by the action of the enzymes p-coumaroyl-CoA 3′-hydrolase (C3′H), HCT, caffeoyl-CoA O-methyltransferase (CCoAOMT) and cinnamoyl CoA reductase (CCR), respectively. CCoAOMT, catalyzing the methylation of caffeoyl-CoA to feruloyl-CoA, is critical in maintaining lignin structural integrity (Meyermans et al., 2000; Zhong et al., 2000). In the two independent studies referenced, antisense downregulation of CCoAOMT1 in transgenic hybrid poplar (P. tremula ×P. alba) resulted in reduced lignin content as well as altered S : G ratio. In this study, markers from CCoAOMT1 and CCoAMOT2 were genotyped. CCoAOMT1 had one significant noncoding SNP associated with C6 sugar content (Table 3).

Cinnamoyl-CoA reductase (CCR) catalyzes the conversion of hydroxycinnamoyl-CoA esters (p-coumaroyl-CoA, feruloyl-CoA, sinapoyl-CoA) into their corresponding cinnamyl aldehydes (Pichon et al., 1998). Downregulation of CCR in transgenic poplar (P. tremula × P. alba) is associated with up to 50% reduced lignin content (Leple et al., 2007). In this study, a single noncoding two-state marker in CCR was found to be strongly associated with lignin composition (Table 3). A different amplicon in the same gene (CCR_12) was globally significant in terms of haplotype associations, but did not report any significant individual haplotypes (Table 5). Haplotype associations have been identified previously in eucalyptus with CCR in relation to wood property traits (Thumma et al., 2005).

Coniferaldehyde (CAD) can be converted to coniferyl alcohol by the action of CAD or to 5-hydroxy-coniferaldehyde and sinapyl aldehyde by the action of ferulate 5-hydrolase (F5H) and caffeic/5-hydroxyferulic acid O-methyltransferase (COMT). CAD catalyzes the reduction of p-hydroxycinnamaldehydes into their corresponding alcohols and is the last enzyme in monolignol biosynthesis. In this study, CAD_04-185, a noncoding marker, illustrated patterns of gene action consistent with additive effects in relation to S : G and C6 sugars. This was the only single-marker association identified with the S : G ratio. Three of the nine individual haplotypes (spanning 407 bp) in the same amplicon of CAD were significant for lignin composition. Differences in genotypic effects on lignin content were minimal (22.2% for CAAAAT, 22.8% for CATAAT and 22.5% for GATAAT). The CAD gene family has been studied extensively in Arabidopsis, rice and poplar (Barakat et al. 2009). The downregulation of CAD in transgenic poplar did not affect the overall lignin content and composition, but led to an increased incorporation of the hydroxycinnamaldehydes into lignin (Baucher et al., 1996; Pilate et al., 2002). Field trials of CAD-deficient transgenic poplar showed improved Kraft pulping performance (Pilate et al., 2002).

COMT was originally thought to be a bifunctional enzyme that sequentially methylated caffeic and 5-hydroxyferulic acids. More recently, it has been shown to act downstream in monolignol biosynthesis by methylating the aldehyde and alcohol backbones (Osakabe et al., 1999; Parvathi et al., 2001). In this study, markers from COMT1 and COMT2 were successfully genotyped (Table 1). A single noncoding COMT2 marker was identified as significant with C6 sugar content (Table 3). Suppression of COMT in both P. tremula × P. alba and P. tremuloides lines did not change the lignin content, but resulted in a reduction in the S : G lignin ratio (as a result of a decrease in S and an increase in G), as well as the incorporation of an abnormal 5-hydroxyguaiacyl unit into the lignin (Van Doorsselaere et al., 1995; Tsai et al., 1998).

After their biosynthesis, monolignols are transported from the cytoplasm to the cell wall and polymerized to a lignin matrix. In the cell wall, the monolignols are oxidized to their radicals and polymerized. Laccases (Lac), peroxidases and other phenol oxidases have long been thought to be involved in this polymerization (Baucher et al., 2003), but conclusive evidence for their role is still lacking. In our study, we examined Lac1a, Lac2 and Lac90a. Lac1a was found to have two noncoding single-marker associations with C6 sugars (Table 3). In poplars, several laccases (Ranocha et al., 1999) have been cloned and characterized. At least eight of these laccases were identified in association with lignin biosynthetic pathways by microarray analysis (Andersson-Gunneras et al., 2006). Subsequent studies with antisense Lac3 in transgenic hybrid poplar showed little variation in lignin content; however, the soluble phenolics and structure of the secondary wall were altered (Ranocha et al., 2002).

Variations in the quantity and quality of cellulose in plants are suspected to be a primary result of enzymatic activities of different types of cellulose synthases (CesAs) (Haigler & Blanton, 1996). The CesA gene family contains 17 members in the sequenced poplar genome, five of which are highly expressed during wood formation (Joshi et al., 2004; Djerbi et al., 2005a,b; Suzuki et al., 2006; Kumar et al., 2009). All five isoforms were evaluated for association in this study (CesA1A, CesA1B, CesA2A, CesA2B and CesA3A), and all had at least one single-marker or haplotype association (Table 1). In lignin and C6 sugar traits, the same nonsynonymous marker in the sixth exon of CesA2A was strongly associated. The G allele at CesA2A is the minor allele and causes an isoleucine (I) [RIGHTWARDS ARROW] valine (V) amino acid substitution (Table 3). The genotypic effects of the two-state SNP are shown in Fig. 5(b). In lignin traits, the differences in content were significant (22% for AA and 23.6% for AG); the same is true for C6 sugar content (34.9% for AA and 32.1% for AG). Three single-marker associations between CesA1B and lignin composition were identified (Table 3; Fig. 5). Two of these three noncoding SNPs were also associated with C6 sugar content. CesA1B_10 had one significant haplotype associated with lignin composition. CesA1A had two noncoding and one synonymous association (CesA1A_12-40) for C6 sugars. One of the noncoding SNPs (CesA1A_20-226) was also associated with lignin content. CesA3A had two different amplicons with significant haplotype associations with lignin. Three significant haplotypes from six were highly associated in CesA1A_12 (spanning 183 bp), and their genotypic effects on C6 were also significant (33.6% for AGA, 34.2% for AAA, 35.3% for GAG) (Table 5).


Figure 5.  (a–c) An example of marker effects in the CesA1B gene on the lignin content phenotype. Each marker explains a small proportion of the phenotypic variance (r2 ∼ 2–3%) and is consistent with an additive model of gene action. Whiskers in the box plots represent 1.5 times the interquartile range. (d) Illustrated are the 39 single nucleotide polymorphisms (SNPs) genotyped for the CesA1B gene relative to the reference gene model, as well as three of the 39 that were significant (indicated with an asterisk). Solid boxes denote UTR, solid lines are introns and open boxes indicate exons in the gene model.

Download figure to PowerPoint

CesA proteins in the rosette terminal complex use cytosolic uridine diphosphate (UDP)-glucose as substrate, which is provided directly by particulate sucrose synthase (SUSY) (Haigler et al., 2001). This enzyme produces UDP-glucose and fructose from sucrose and UDP. Of the six SUSY genes annotated in the poplar genome, only two were highly expressed in wood-forming tissues based on microarray analysis (Geisler-Lee et al., 2006; Meng et al., 2007). In this study, amplicons from SUSY1 were successfully genotyped (Table 1). Single-marker tests with SUSY1 revealed six noncoding associations with C6 and two with lignin composition (Table 3). Two of the three individual haplotypes (spanning 386 bp) identified in SUSY1_02 were significant. Genotypic differences between haplotypes were observed for lignin composition (21.8% for AAAA and 22.9% for TGGG) (Table 5). Three of the four markers that compose the SUSY1_02 haplotype are in strong LD (Fig. 4a). Recently, overexpression of SUSY in transgenic poplar has led to an increase in both cellulose production and cellulose crystallinity (Coleman et al., 2009), confirming the previous suggestion that SUSY could be one of the limiting steps of cellulose biosynthesis (Haigler et al., 2001).

This study represents the most comprehensive evaluation of LD and genetic association in poplars. High-throughput genotyping technologies and the vast genomic resources in black cottonwood allowed a large number of candidate genes to be evaluated for associations with lignocellulosic cell wall development. The genes studied are those known to be associated with these pathways and those that have been extensively studied for commercial applications, such as pulp and feedstock production, and are now being further evaluated for improvement in relation to biofuels production. Given the rapid decay of within-gene LD in black cottonwood and the high coverage of amplicons across each gene, it is likely that the numerous polymorphisms identified are in close proximity to the causative SNPs, and the haplotype associations accurately reflect the information present in the associations between markers. This study demonstrates that a forward genetics approach (association genetics) can be used to discover naturally occurring allelic variation in genes associated with commercially important traits. The association approach provides estimates of the size of effects of these alleles on a phenotype. Understanding the size of the effects as well as the existing variation is critical in applying the knowledge gained on a particular SNP to marker-based breeding programs with goals to increase cellulose yield and, therefore, cellulosic ethanol production.


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

We thank Charles Nicolet and Vanessa Rashbrook for performing the SNP genotyping, and John Liechty and Benjamin Figueroa for bioinformatics support. Funding for this project was made available through the Chevron Technology Ventures-UC Davis Biofuels Project.


  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information
  • Andersson-Gunneras S, Mellerowicz EJ, Love J, Segerman B, Ohmiya Y, Coutinho PM, Nilsson P, Henrissat B, Moritz T, Sundberg B. 2006. Biosynthesis of cellulose-enriched tension wood in Populus: global analysis of transcripts and metabolites identifies biochemical and developmental regulators in secondary wall biosynthesis. Plant Journal 45: 144165.
  • Andersen JR, Zein I, Wenzel G, Darnhofer B, Eder J, Ouzunova M, Lubberstedt T. 2008. Characterization of phenylpropanoid pathway genes within European maize (Zea mays L.) inbreds. BMC Plant Biol 8: 2.
  • Appert C, Logemann E, Hahlbrock K, Schmid J, Amrhein N. 1994. Structural and catalytic properties of the 4-phenylalanine ammonia-lyase isoenzymes from parsley (Petroselinum crispum Nym). European Journal of Biochemistry 225: 491499.
  • Barakat A, Bagniewska-Zadworna A, Choi A, Plakkat U, DiLoreto DS, Yellanki P, Carlson JE. 2009. The cinnamyl alcohol dehydrogenase gene family in Populus: phylogeny, organization, and expression. BMC Plant Biol 9: 26.
  • Baucher M, Chabbert B, Pilate G, VanDoorsselaere J, Tollier MT, PetitConil M, Cornu D, Monties B, VanMontagu M, Inze D et al. 1996. Red xylem and higher lignin extractability by down-regulating a cinnamyl alcohol dehydrogenase in poplar. Plant Physiology 112: 14791490.
  • Baucher M, Halpin C, Petit-Conil M, Boerjan W. 2003. Lignin: genetic engineering and impact on pulping. Critical Reviews in Biochemistry and Molecular Biology 38: 305350.
  • Boerjan W. 2005. Biotechnology and the domestication of forest trees. Current Opinion in Biotechnology 16: 159166.
  • Boerjan W, Ralph J, Baucher M. 2003. Lignin biosynthesis. Annual Review of Plant Biology 54: 519546.
  • Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23: 26332635.
  • Bradshaw HD, Ceulemans R, Davis J, Stettler R. 2000. Emerging model systems in plant biology: poplar (Populus) as a model forest tree. Journal of Plant Growth Regulation 19: 306313.
  • Brown GR, Bassoni DL, Gill GP, Fontana JR, Wheeler NC, Megraw RA, Davis MF, Sewell MM, Tuskan GA, Neale DB. 2003. Identification of quantitative trait loci influencing wood property traits in loblolly pine (Pinus taeda L.). III. QTL verification and candidate gene mapping. Genetics 164: 15371546.
  • Chu Y, Su X, Huang Q, Zhang X. 2009. Patterns of DNA sequence variation at candidate gene loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms. Genetica 137: 141150.
  • Cochrane FC, Davin LB, Lewis NG. 2004. The Arabidopsis phenylalanine ammonia lyase gene family: kinetic characterization of the four PAL isoforms. Phytochemistry 65: 15571564.
  • Coleman HD, Yan J, Mansfield SD. 2009. Sucrose synthase affects carbon partitioning to increase cellulose production and altered cell wall ultrastructure. Proceedings of the National Academy of Sciences, USA 106: 1311813123.
  • Davis JM. 2008. Genetic improvement of poplar (Populus spp.) as a bioenergy crop. In: Vermerris W, ed. Genetic improvement of bioenergy crops. New York, NY, USA: Springer New York, 397419.
  • Dixon RA, Reddy MSS. 2003. Biosynthesis of monolignols. Genomic and reverse genetic approaches. Phytochemistry Reviews 2: 289306.
  • Djerbi S, Lindskog M, Arvestad L, Sterky F, Teeri TT. 2005a. The genome sequence of black cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes. Planta 221: 739746.
  • Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S. 2005. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Research 15: 330340.
  • Eckert AJ, Bower AD, Wegrzyn JL, Pande B, Jermstad KD, Krutovsky KV, St Clair JB, Neale DB. 2009a. Association genetics of coastal Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-hardiness related traits. Genetics 182: 12891302.
  • Eckert AJ, Wegrzyn JL, Pande B, Jermstad KD, Lee JM, Liechty JD, Tearse BR, Krutovsky KV, Neale DB. 2009b. Multilocus patterns of nucleotide diversity and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas fir (Pseudotsuga menziesii var. menziesii). Genetics 183: 289298.
  • Fan JB, Oliphant A, Shen R, Kermani BG, Garcia F, Gunderson KL, Hansen M, Steemers F, Butler SL, Deloukas P et al. 2003. Highly parallel SNP genotyping. Cold Spring Harbor Symposia on Quantitative Biology 68: 6978.
  • Feder ME, Mitchell-Olds T. 2003. Evolutionary and ecological functional genomics. Nature Review Genetics 4: 651657.
  • Geisler-Lee J, Geisler M, Coutinho PM, Segerman B, Nishikubo N, Takahashi J, Aspeborg H, Djerbi S, Master E, Andersson-Gunneras S et al. 2006. Poplar carbohydrate-active enzymes. Gene identification and expression analyses. Plant Physiology 140: 946962.
  • Gill GP, Brown GR, Neale DB. 2003. A sequence mutation in the cinnamyl alcohol dehydrogenase gene associated with altered lignification in loblolly pine. Plant Biotechnology Journal 1: 253258.
  • Gonzalez-Martinez SC, Ersoz E, Brown GR, Wheeler NC, Neale DB. 2006. DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L. Genetics 172: 19151926.
  • Gonzalez-Martinez SC, Huber D, Ersoz E, Davis JM, Neale DB. 2008. Association genetics in Pinus taeda L. II. Carbon isotope discrimination. Heredity 101: 1926.
  • Gonzalez-Martinez SC, Wheeler NC, Ersoz E, Nelson CD, Neale DB. 2007. Association genetics in Pinus taeda L. I. Wood property traits. Genetics 175: 399409.
  • Goudet J. 2005. HIERFSTAT, a package for R to compute and test hierarchical F-statistics. Molecular Ecology Notes 5: 184186.
  • Groover AT. 2007. Will genomics guide a greener forest biotech? Trends in Plant Science 12: 234238.
  • Haigler CH, Blanton RL. 1996. New hope for old dreams: evidence that plant cellulose synthase genes have finally been identified. Proceedings of the National Academy of Sciences, USA 93: 1208212085.
  • Haigler CH, Ivanova-Datcheva M, Hogan PS, Salnikov VV, Hwang S, Martin K, Delmer DP. 2001. Carbon partitioning to cellulose synthesis. Plant Molecular Biology 47: 2951.
  • Harding SA, Leshkevich J, Chiang VL, Tsai CJ. 2002. Differential substrate inhibition couples kinetically distinct 4-coumarate:coenzyme A ligases with spatially distinct metabolic roles in quaking aspen. Plant Physiology 128: 428438.
  • Harris AT, Riddlestone S, Bell Z, Hartwell PR. 2008. Towards zero emission pulp and paper production: the BioRegional MiniMill. Journal of Cleaner Production 16: 19711979.
  • Hoffmann B, Chabbert B, Monties B, Speck T. 2003a. Mechanical, chemical and X-ray analysis of wood in the two tropical lianas Bauhinia guianensis and Condylocarpon guianense: variations during ontogeny. Planta 217: 3240.
  • Hoffmann L, Besseau S, Geoffroy P, Ritzenthaler C, Meyer D, Lapierre C, Pollet B, Legrand M. 2004. Silencing of hydroxycinnamoyl-coenzyme A shikimate/quinate hydroxycinnamoyltransferase affects phenylpropanoid biosynthesis. Plant Cell 16: 14461465.
  • Hoffmann L, Maury S, Martz F, Geoffroy P, Legrand M. 2003b. Purification, cloning, and properties of an acyltransferase controlling shikimate and quinate ester intermediates in phenylpropanoid metabolism. Journal of Biological Chemistry 278: 95103.
  • Hu WJ, Harding SA, Lung J, Popko JL, Ralph J, Stokke DD, Tsai CJ, Chiang VL. 1999. Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees. Nature Biotechnology 17: 808812.
  • Ingvarsson PK. 2005. Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169: 945953.
  • Ingvarsson PK, Garcia MV, Luquez V, Hall D, Jansson S. 2008. Nucleotide polymorphism and phenotypic associations within and around the phytochrome B2 locus in European aspen (Populus tremula, Salicaceae). Genetics 178: 22172226.
  • Joshi CP, Bhandari S, Ranjan P, Kalluri UC, Liang X, Fujino T, Samuga A. 2004. Genomics of cellulose biosynthesis in poplars. New Phytologist 164: 5361.
  • Kao YY, Harding SA, Tsai CJ. 2002. Differential expression of two distinct phenylalanine ammonia-lyase genes in condensed tannin-accumulating and lignifying cells of quaking aspen. Plant Physiology 130: 796807.
  • Kumar M, Thammannagowda S, Bulone V, Chiang V, Han KH, Joshi CP, Mansfield SD, Mellerowicz E, Sundberg B, Teeri T et al. 2009. An update on the nomenclature for the cellulose synthase genes in Populus. Trends in Plant Science 14: 248254.
  • Landegren U, Nilsson M, Kwok PY. 1998. Reading bits of genetic information: methods for single-nucleotide polymorphism analysis. Genome Research 8: 769776.
  • Leple JC, Dauwe R, Morreel K, Storme V, Lapierre C, Pollet B, Naumann A, Kang KY, Kim H, Ruel K et al. 2007. Downregulation of cinnamoyl-coenzyme A reductase in poplar: multiple-level phenotyping reveals effects on cell wall polymer metabolism and structure. Plant Cell 19: 36693691.
  • Li L, Zhou Y, Cheng X, Sun J, Marita JM, Ralph J, Chiang VL. 2003a. Combinatorial modification of multiple lignin traits in trees through multigene cotransformation. Proceedings of the National Academy of Sciences, USA 100: 49394944.
  • Li Y, Kajita S, Kawai S, Katayama Y, Morohoshi N. 2003b. Down-regulation of an anionic peroxidase in transgenic aspen and its effect on lignin characteristics. Journal of Plant Research 116: 175182.
  • Lu SF, Zhou YH, Li LG, Chiang VL. 2006. Distinct roles of cinnamate 4-hydroxylase genes in Populus. Plant and Cell Physiology 47: 905914.
  • Marth GT, Korf I, Yandell MD, Yeh RT, Gu ZJ, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR. 1999. A general approach to single-nucleotide polymorphism discovery. Nature Genetics 23: 452456.
  • Meng M, Geisler M, Johansson H, Mellerowicz EJ, Karpinski S, Kleczkowski LA. 2007. Differential tissue/organ-dependent expression of two sucrose- and cold-responsive genes for UDP-glucose pyrophosphorylase in Populus. Gene 389: 186195.
  • Meyermans H, Morreel K, Lapierre C, Pollet B, De Bruyn A, Busson R, Herdewijn P, Devreese B, Van Beeumen J, Marita JM et al. 2000. Modifications in lignin and accumulation of phenolic glucosides in poplar xylem upon down-regulation of caffeoyl-coenzyme A O-methyltransferase, an enzyme involved in lignin biosynthesis. Journal of Biological Chemistry 275: 36 89936 909.
  • Neale DB, Savolainen O. 2004. Association genetics of complex traits in conifers. Trends in Plant Science 9: 325330.
  • Nickerson DA, Tobe VO, Taylor SL. 1997. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Research 25: 27452751.
  • Oakley RV, Wang YS, Ramakrishna W, Harding SA, Tsai CJ. 2007. Differential expansion and expression of alpha- and beta-tubulin gene families in Populus. Plant Physiology 145: 961973.
  • Oliphant A, Barker DL, Stuelpnagel JR, Chee MS. 2002. BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. BioTechniques 32: S56S61.
  • Olsen MS, Robertson AL, Takebayashi N, Salim S, Schroeder WR, Tiffin P. 2010. Nucleotide diversity and linkage disequilibrium in balsam poplar (Populus balsamifera). New Phytologist 186: 25262536.
  • Osakabe K, Tsao CC, Li LG, Popko JL, Umezawa T, Carraway DT, Smeltzer RH, Joshi CP, Chiang VL. 1999. Coniferyl aldehyde 5-hydroxylation and methylation direct syringyl lignin biosynthesis in angiosperms. Proceedings of the National Academy of Sciences, USA 96: 89558960.
  • Osakabe Y, Ohtsubo Y, Kawai S, Katayama Y, Morohoshi N. 1995. Structure and tissue-specific expression of genes for phenylalanine ammonia-lyase from a hybrid aspen, Populus kitakamiensis. Plant Science 105: 217226.
  • Parvathi K, Chen F, Guo DJ, Blount JW, Dixon RA. 2001. Substrate preferences of O-methyltransferases in alfalfa suggest new pathways for 3-O-methylation of monolignols. Plant Journal 25: 193202.
  • Patterson N, Price A, Reich D. 2006. Population structure and Eigenanalysis. PLoS Genet 2(12).
  • Pavy N, Pelgas B, Beauseigle S, Blais S, Gagnon F, Gosselin I, Lamothe M, Isabel N, Bousquet J. 2008. Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce. BMC Genomics 9: 21.
  • Peter G, Neale D. 2004. Molecular basis for the evolution of xylem lignification. Current Opinion in Plant Biology 7: 737742.
  • Pichon M, Courbou I, Beckert M, Boudet AM, Grima-Pettenati J. 1998. Cloning and characterization of two maize cDNAs encoding cinnamoyl-CoA reductase (CCR) and differential expression of the corresponding genes. Plant Molecular Biology 38: 671676.
  • Pilate G, Guiney E, Holt K, Petit-Conil M, Lapierre C, Leple JC, Pollet B, Mila I, Webster EA, Marstorp HG et al. 2002. Field and pulping performances of transgenic trees with altered lignification. Nature Biotechnology 20: 607612.
  • Plomion C, Leprovost G, Stokes A. 2001. Wood formation in trees. Plant Physiology 127: 15131523.
  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38: 904909.
  • Ragauskas AJ, Williams CK, Davison BH, Britovsek G, Cairney J, Eckert CA, Frederick WJ Jr, Hallett JP, Leak DJ, Liotta CL et al. 2006. The path forward for biofuels and biomaterials. Science 311: 484489.
  • Ralph J, Akiyama T, Kim H, Lu FC, Schatz PF, Marita JM, Ralph SA, Reddy MSS, Chen F, Dixon RA. 2006a. Effects of coumarate 3-hydroxylase down-regulation on lignin structure. Journal of Biological Chemistry 281: 88438853.
  • Ralph S, Oddy C, Cooper D, Yueh H, Jancsik S, Kolosova N, Philippe RN, Aeschliman D, White R, Huber D et al. 2006b. Genomics of hybrid poplar (Populus trichocarpax deltoides) interacting with forest tent caterpillars (Malacosoma disstria): normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray for the study of insect-induced defences in poplar. Molecular Ecology 15: 12751297.
  • Ranocha P, Chabannes M, Chamayou S, Danoun S, Jauneau A. 2002. Laccase down-regulation causes alterations in phenolic metabolism and cell wall structure in poplar. Plant Physiology 129: 145.
  • Ranocha P, McDougall G, Hawkins S, Sterjiades R, Borderies G, Stewart D, Cabanes-Macheteau M, Boudet AM, Goffner D. 1999. Biochemical characterization, molecular cloning and expression of laccases – a divergent gene family – in poplar. European Journal of Biochemistry 259: 485495.
  • Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, Kresovich S, Goodman MM, Buckler ESt. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proceedings of the National Academy of Sciences, USA 98: 1147911484.
  • Rubin EM. 2008. Genomics of cellulosic biofuels. Nature 454: 841845.
  • Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. 2002. Score tests for association between traits and haplotypes when linkage phase is ambiguous. American Journal of Human Genetics 70: 425434.
  • Schrader J, Nilsson J, Mellerowicz E, Berglund A, Nilsson P, Hertzberg M, Sandberg G. 2004. A high-resolution transcript profile across the wood-forming meristem of poplar identifies potential regulators of cambial stem cell identity. Plant Cell 16: 22782292.
  • Sewalt VJH, Ni WT, Jung HG, Dixon RA. 1997. Lignin impact on fiber degradation: increased enzymatic digestibility of genetically engineered tobacco (Nicotiana tabacum) stems reduced in lignin content. Journal of Agricultural and Food Chemistry 45: 19771983.
  • Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH et al. 2004. A Populus EST resource for plant functional genomics. Proceedings of the National Academy of Sciences, USA 101: 1395113956.
  • Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B, Bhalerao R, Larsson M, Villarroel R et al. 1998. Gene discovery in the wood-forming tissues of poplar: analysis of 5,692 expressed sequence tags. Proceedings of the National Academy of Sciences, USA 95: 1333013335.
  • Stinchcombe JR, Hoekstra HE. 2008. Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits. Heredity 100: 158170.
  • Storey JD. 2002. A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B (Methodological) 64: 479498.
  • Storey JD, Tibshirani R. 2003. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, USA 100: 94409445.
  • Strauss SH, Martin FM. 2004. Poplar genomics comes of age. New Phytologist 164: 14.
  • Suzuki S, Li LG, Sun YH, Chiang VL. 2006. The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in Populus trichocarpa. Plant Physiology 142: 12331245.
  • R Development Core Team. 2007. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
  • Thumma BR, Nolan MF, Evans R, Moran GF. 2005. Polymorphisms in cinnamoyl CoA reductase (CCR) are associated with variation in microfibril angle in Eucalyptus spp. Genetics 171: 12571265.
  • Tsai CJ, Harding SA, Tschaplinski TJ, Lindroth RL, Yuan Y. 2006. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus. New Phytologist 172: 4762.
  • Tsai CJ, Popko JL, Mielke MR, Hu WJ, Podila GK, Chiang VL. 1998. Suppression of O-methyltransferase gene by homologous sense transgene in quaking aspen causes red–brown wood phenotypes. Plant Physiology 117: 101112.
  • Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 15961604.
  • Unneberg P, Stromberg M, Sterky F. 2005. SNP discovery using advanced algorithms and neural networks. Bioinformatics 21: 25282530.
  • Van Doorsselaere J, Baucher M, Chognot E, Chabbert B, Tollier MT, PetitConil M, Leple JC, Pilate G, Cornu D, Monties B et al. 1995. A novel lignin in poplar trees with a reduced caffeic acid 5-hydroxyferulic acid O-methyltransferase activity. Plant Journal 8: 855864.
  • Wagner A, Ralph J, Akiyama T, Flint H, Phillips L, Torr K, Nanayakkara B, Kiri LT. 2007. Exploring lignification in conifers by silencing hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyltransferase in Pinus radiata. Proceedings of the National Academy of Sciences, USA 104: 1185611861.
  • Warnes G, Leisch F. 2006. Genetics: population genetics. R Package.
  • Wegrzyn JL, Lee JM, Liechty J, Neale DB. 2009. PineSAP – sequence alignment and SNP identification pipeline. Bioinformatics 25: 26092610.
  • Whetten RW, MacKay JJ, Sederoff RR. 1998. Recent advances in understanding lignin biosynthesis. Annual Review of Plant Physiology and Plant Molecular Biology 49: 585609.
  • Xu Z, Zhang D, Hu J, Zhou X, Ye X, Reichel KL, Stewart NR, Syrenne RD, Yang X, Gao P et al. 2009. Comparative genome analysis of lignin biosynthesis gene families across the plant kingdom. BMC Bioinformatics 10(Suppl. 11): S3.
  • Yu XQ, Mei HW, Luo LJ, Liu GL, Liu HY, Zou GH, Hu SP, Li MS, Wu JH. 2006. Dissection of additive, epistatic effect and Q × E interaction of quantitative trait loci influencing stigma exsertion under water stress in rice. Yi Chuan Xue Bao 33: 542550.
  • Zhong RQ, Morrison WH, Himmelsbach DS, Poole FL, Ye ZH. 2000. Essential role of caffeoyl coenzyme A O-methyltransferase in lignin biosynthesis in woody poplar plants. Plant Physiology 124: 563577.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Fig. S1 Distribution of quality metrics for genotyped single nucleotide polymorphisms (SNPs) grouped by dataset.

Fig. S2 Cluster assignments illustrated across pairwise plots of the four significant principal components (PCs) derived using principal components analysis (PCA).

Fig. S3 Summaries of population genetic parameters across all samples and samples placed into clusters.

Fig. S4 Differentiation among inferred genetic clusters for Populus trichocarpa reveals FST outliers within the set of focal single nucleotide polymorphisms (SNPs).

Fig. S5 Cluster assignment is correlated with phenotypic traits.

Table S1 Sample localities for the 448 individuals used for association mapping in Populus trichocarpa

Table S2 Summaries of quality scores across genotyped single nucleotide polymorphism(SNP) loci

Table S3 Summaries of genotyped single nucleotide polymorphisms (SNPs) for focal and control SNPs

Table S4 Genotype data

Table S5 Phenotype data

Table S6 Haplotype data

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

NPH_3415_sm_TableS1-3andFigS1-5.doc885KSupporting info item
NPH_3415_sm_TableS4.txt1647KSupporting info item
NPH_3415_sm_TableS5.txt21KSupporting info item
NPH_3415_sm_TableS6.csv1061KSupporting info item