- Top of page
- Materials and Methods
- Supporting Information
Forest trees are a potential source of net-zero carbon emission lignocellulosic biofuels. The production of biofuels involves the collection of biomass, deconstruction of cell wall polymers into component sugars (pretreatment and saccharification) and conversion of these sugars to ethanol (fermentation) (Rubin, 2008). Woody bioenergy crops from which biomass is derived have not been domesticated for this purpose and the current methods for lignocellulosic saccharification and fermentation are inefficient. The recent need to develop viable fuel alternatives is now taking advantage of genomics resources and technologies to discover the potential gain that can be achieved through breeding. Traits of interest in trees with applications in bioenergy include growth rate, branching habit, stem thickness and cell wall chemistry (Bradshaw et al., 2000). Rapid growth, moderate genome size, woody tissues and economic importance make black cottonwood (Populus trichocarpa) an ideal model organism to examine biofuels-related traits (Bradshaw et al., 2000). Black cottonwood possesses tremendous genetic and phenotypic diversity, is obligately outcrossing, is able to hybridize with many other species and is easily clonally propagated (Davis, 2008). To further complement the advantages, black cottonwood is the first tree and bioenergy feedstock to have its genome sequenced and annotated. Derived from a single wild individual (Nisqually-1), the genome sequence represents an estimated 45 500 genes across 19 chromosomes (Tuskan et al., 2006). In addition to the genome, resources such as controlled cross-populations, cross-species’ molecular markers, expressed sequence tag (EST) collections and full-length cDNAs are available to the research community (Strauss & Martin, 2004; Ralph et al., 2006a,b; Tuskan et al., 2006).
Improvement of biofuels feedstocks focuses on increasing both the relative carbon partitioning in woody tissues above ground and the accessibility of cellulose for enzymatic digestion (Ragauskas et al., 2006). As with other woody species, the major components of black cottonwood secondary cell walls are cellulose, hemicellulose and lignin (Harris et al. 2008). Lignin inhibits saccharification in processes aimed at producing simple sugars for fermentation to ethanol. Many studies have focused on the molecular biology of wood and secondary wall formation (Sterky et al., 1998, 2004; Plomion et al., 2001; Schrader et al., 2004). The pathways and genes involved in lignin and cellulose biosynthesis, and microfibril deposition, are increasingly becoming well understood through biochemical analysis and expression studies (Whetten et al., 1998; Plomion et al., 2001; Li et al., 2003a,b; Peter & Neale, 2004; Schrader et al., 2004; Boerjan, 2005; Oakley et al., 2007). The specific roles of genes in these pathways have been verified through forward and reverse genetic mutation studies (Dixon & Reddy, 2003; Ralph et al. 2006a,b; Davis, 2008). A relatively unexplored area of research is the identification of the natural allelic variation controlling phenotype variation and the exploitation of this variation in breeding.
A major goal of population and quantitative genetics is the identification of the polymorphisms responsible for phenotypic variation (Feder & Mitchell-Olds, 2003; Stinchcombe & Hoekstra, 2008). Many traits of interest in forest trees, such as wood quality, are complex in nature and occur later in development (Groover, 2007). Recent advances in high-throughput marker technologies, combined with the wealth of genomic resources available to species such as black cottonwood, have enabled a closer examination of the number and effect sizes of genes responsible for traits of interest through complex trait dissection using association mapping. Tree species are ideal for association mapping as they are predominantly outcrossing and have large, relatively unstructured, populations, resulting in high levels of nucleotide diversity and low linkage disequilibrium (Neale & Savolainen, 2004; Gonzalez-Martinez et al., 2006). Significant associations between single nucleotide polymorphisms (SNPs) within candidate genes have been established in forest trees. Associations with wood quality traits in eucalyptus (Thumma et al., 2005), wood quality and drought tolerance traits in loblolly pine (Gonzalez-Martinez et al., 2007, 2008), bud phenology traits in European poplar (Ingvarsson et al., 2008) and cold hardiness-related traits in coastal Douglas fir (Eckert et al., 2009a) have been identified. In general, individual SNPs explain a small proportion of the phenotypic variance (0.5–5.0%), which is consistent with the complex nature of these traits.
In this study, statistical models were applied to perform association tests and to account for population structure in 579 SNPs from 40 candidate genes involved in lignocellulosic cell wall synthesis in black cottonwood. Single-marker and haplotype-based tests were performed to identify associations with natural variation in composite traits evaluating lignin and cellulose content.
- Top of page
- Materials and Methods
- Supporting Information
Strategies for the domestication of forest trees using either conventional or novel molecular breeding approaches are centered around the exploitation of existing genetic diversity. Over the past few decades, genetic maps have been made for many forest tree species and quantitative trait loci have been mapped for a range of traits (Brown et al., 2003). The lack of resolution in mapping candidate genes and quantitative trait loci alleles can be overcome by association genetics, using natural populations in which the long evolutionary history has decreased the extent of LD in populations (Neale & Savolainen, 2004). An important prerequisite for association mapping is the availability of large allelic variation in the population. LD describes a key aspect of genetic variation in natural populations of plants. This study is the first examination of genome-wide LD in black cottonwood, and enables comparison with other poplars. We examined LD across 39 of the candidate genes (Fig. 2b,c), and observed a rapid decay of LD within just a few hundred base pairs, indicating the potential of association genetics to identify the genes responsible for variation in the trait. Previous studies in both P. tremula (five genes) and P. nigra (nine genes) showed a similar rapid decay of LD (Ingvarsson, 2005; Chu et al., 2009). LD was demonstrated to decay over significantly longer distances in a recent study across over 300 randomly selected gene fragments in the closely related P. balsimifera (Olsen et al., 2010).
This study examined both single-marker associations and haplotype-based tests to account for information present in the associations between markers, as well as directly between an SNP and the trait. Given the structure of our data, a natural way to apply the knowledge of LD within and between genes is to perform haplotype-based association tests. The power of a single-marker association test is often limited because LD information contained in flanking markers is ignored. Intuitively, haplotypes (which are essentially a collection of ordered markers) may be more powerful than individual, nonordered markers. This study demonstrates that the use of haplotypes can increase significantly the ability to map traits of interest.
Candidate genes known to be involved in lignocellulosic cell wall development were examined for genetic associations. There are two major steps of lignin biosynthesis in plants: monolignol biosynthesis and the subsequent polymerization of lignin monomers to form polymers. This biochemical pathway is highly conserved throughout vascular plants, and many of the enzymes have been identified and characterized (Boerjan et al., 2003; Xu et al., 2009). The cellulose biosynthesis pathway involves the synthesis and assembly of β-1,4 glucan chains at the rosette terminal complex, and their orderly deposition to form cell wall microfibrils. Although several candidate genes have been identified, the precise molecular mechanism of cellulose biosynthesis and microfibril deposition in plants is still not clearly understood. Genetic improvement of lignin and cellulose biosynthesis in trees continues to be a major research priority. Similar to other commercial applications for black cottonwood, modified lignin structure (chemical reactivity) and increased cellulose content are desirable traits. Mechanisms that can increase C6 sugar content and decrease C5 sugar content of hemicelluloses are favorable for fermentation.
In the monolignol biosynthetic pathway, the first step consists of a deamination of phenylalanine by phenylalanine ammonia-lyase (PAL) that produces cinnamic acid. PAL is encoded by a small multigene family (Appert et al., 1994; Osakabe et al., 1995; Cochrane et al., 2004), and five isoforms have been annotated in the poplar genome (Tsai et al., 2006). In this study, markers in PAL2, PAL4 and PAL5 were genotyped. A single-marker noncoding association was identified with PAL2 that explained 1.4% of the phenotypic variation in C6 sugars (Table 3). In aspen (P. tremuloides) stem, PAL2 transcripts have been localized to developing xylem cells, consistent with its involvement in lignin biosynthesis (Kao et al., 2002).
C4H catalyzes the first oxidative reaction in phenylpropanoid metabolism, namely the conversion of cinnamic acid to p-coumaric acid (Sewalt et al., 1997). Three C4H genes have been characterized in black cottonwood (Lu et al., 2006). C4H1 is proposed to be associated with G lignin deposition, whereas C4H2 is thought to be involved in S lignin biosynthesis (Lu et al., 2006). Four unique single-marker associations were identified in the C4H1 and C4H2 genes examined in this study. A significant nonsynonymous association in exon 1 of C4H1 with lignin demonstrated modes of gene action consistent with additive effects (Table 3; Fig. 3). The C allele at C4H1_02-219 is the minor allele and causes a histidine (H) proline (P) amino acid substitution. Heterozygotes for the marker had a percentage value of lignin composition that was intermediate to either homozygote class (21.9% for A/A, 22.7% for A/C, 23.2% for C/C). A similar study in European maize identified a nonsynonymous SNP in the first exon of C4H1 associated with forage quality traits (Andersen et al. 2008). Physiological studies of these genes describe unique functions for the isoforms within the lignin biosynthetic pathway.
Figure 3. Marker effects on the significant nonsynonymous single nucleotide polymorphisms (SNPs) found in C4H1 and CesA2A. (a) The C4H1_04-219 nonsynonymous marker in the first exon of the C4H1 gene illustrates patterns of gene action consistent with additive effects. The C allele at C4H1_04-219 causes a histidine (H) to proline (P) amino acid substitution. (b) The CesA2A_08-38 nonsynonymous marker is located in the sixth exon of the CesA2A gene. This SNP is significant for both lignin content and C6 traits. For lignin content, the homozygote decreases the percentage content, whereas, in C6, the sugar content is elevated. The G allele at CESA2A is the derived state and is responsible for an isoleucine (I) to valine (V) amino acid substitution. In both gene models, solid boxes denote untranslated region, solid lines are introns and open boxes indicate exons.
Download figure to PowerPoint
4-Coumarate:CoA ligase (4CL), which catalyzes the formation of CoA esters of p-coumaric acid and its derivatives, has a pivotal role in channeling phenylpropanoid precursors into different downstream pathways, each leading to a variety of functionally distinct end products (Harding et al., 2002). 4CL is also encoded by multigene families, with five isoforms annotated in the poplar genome (Tsai et al., 2006). Although we were unable to identify significant single-marker associations in 4CL1, 4CL3 and 4CL5, significant associations with haplotypes in 4CL1 and 4CL3 were observed for both lignin and C6 traits. Of the five haplotypes (spanning 389 bp) in 4CL1_01, two significant associations demonstrated an effect on C6 sugar content (35.1% for AGA and 34.1% for AAA). In lignin composition, two haplotypes of 4CL1_11 demonstrated a difference of > 1% in lignin composition (Table 5; Fig. 4b). Three single markers in 4CL_11 at P < 0.05 were found to be LD, and their individual genotypic effects on lignin composition were small in comparison with the spanning haplotype block (Fig. 4b). The reduction of 4CL expression in transgenic poplar has resulted in significant reductions of lignin, ranging from 5% to 45% (Hu et al., 1999; Li et al., 2003a,b).
Figure 4. Haplotype and single-marker associations are illustrated for SUSY1 and 4CL1. (a) The genotypic effects of the three proposed haplotypes (two significant) of SUSY1 are shown. The haplotypes yield significantly different median phenotypic values for the lignin content trait. The marker effects of four significant single-marker associations are also shown. SUSY1_02-108 is significant with respect to lignin. The remaining markers are significant with respect to the related trait, C6 sugars. All four markers are within linkage disequilibrium (LD) with one another. (b) The genotypic effects of the three haplotypes (two significant) of 4CL1 are shown. The significant haplotypes yield different median phenotypic values for the lignin content trait. No significant single-marker associations were identified after multiple testing; however, the box plots for single markers with P < 0.05 are shown. Two of the three markers are in LD with one another.
Download figure to PowerPoint
Hydroxycinnamoyl-CoA transferase (HCT) is the most recently identified enzyme in monolignol biosynthesis and belongs to a large family of acyltransferases (Hoffmann et al., 2003a,b). It catalyzes the conversion of p-coumaroyl-CoA and caffeoyl-CoA to their corresponding shikimate or quinate esters. Two of the six annotated HCT genes in the Populus genome (HCT1 and HCT6) are expressed in developing xylem (Tsai et al., 2006). HCT6_13-225 was a significant synonymous marker in both lignin and C6 (Table 3). Two significant haplotypes in HCT1_12 were associated with lignin composition (Table 5). HCT has not been transgenically manipulated in poplar; however, RNAi-mediated silencing of HCT in conifers (Pinus radiata) that do not produce S lignin had a strong impact on lignin content (42% reduction), monolignol composition and interunit linkage distribution (Wagner et al., 2007). A similar study of HCT in Arabidopsis showed a reduction in lignin content and an increased G lignin deposition (Hoffmann et al. 2004).
p-Coumaroyl-CoA shikimate proceeds through a series of transformations into caffeoyl-CoA shikimate, caffeoyl-CoA, feruloyl-CoA and coniferaldehyde by the action of the enzymes p-coumaroyl-CoA 3′-hydrolase (C3′H), HCT, caffeoyl-CoA O-methyltransferase (CCoAOMT) and cinnamoyl CoA reductase (CCR), respectively. CCoAOMT, catalyzing the methylation of caffeoyl-CoA to feruloyl-CoA, is critical in maintaining lignin structural integrity (Meyermans et al., 2000; Zhong et al., 2000). In the two independent studies referenced, antisense downregulation of CCoAOMT1 in transgenic hybrid poplar (P. tremula ×P. alba) resulted in reduced lignin content as well as altered S : G ratio. In this study, markers from CCoAOMT1 and CCoAMOT2 were genotyped. CCoAOMT1 had one significant noncoding SNP associated with C6 sugar content (Table 3).
Cinnamoyl-CoA reductase (CCR) catalyzes the conversion of hydroxycinnamoyl-CoA esters (p-coumaroyl-CoA, feruloyl-CoA, sinapoyl-CoA) into their corresponding cinnamyl aldehydes (Pichon et al., 1998). Downregulation of CCR in transgenic poplar (P. tremula × P. alba) is associated with up to 50% reduced lignin content (Leple et al., 2007). In this study, a single noncoding two-state marker in CCR was found to be strongly associated with lignin composition (Table 3). A different amplicon in the same gene (CCR_12) was globally significant in terms of haplotype associations, but did not report any significant individual haplotypes (Table 5). Haplotype associations have been identified previously in eucalyptus with CCR in relation to wood property traits (Thumma et al., 2005).
Coniferaldehyde (CAD) can be converted to coniferyl alcohol by the action of CAD or to 5-hydroxy-coniferaldehyde and sinapyl aldehyde by the action of ferulate 5-hydrolase (F5H) and caffeic/5-hydroxyferulic acid O-methyltransferase (COMT). CAD catalyzes the reduction of p-hydroxycinnamaldehydes into their corresponding alcohols and is the last enzyme in monolignol biosynthesis. In this study, CAD_04-185, a noncoding marker, illustrated patterns of gene action consistent with additive effects in relation to S : G and C6 sugars. This was the only single-marker association identified with the S : G ratio. Three of the nine individual haplotypes (spanning 407 bp) in the same amplicon of CAD were significant for lignin composition. Differences in genotypic effects on lignin content were minimal (22.2% for CAAAAT, 22.8% for CATAAT and 22.5% for GATAAT). The CAD gene family has been studied extensively in Arabidopsis, rice and poplar (Barakat et al. 2009). The downregulation of CAD in transgenic poplar did not affect the overall lignin content and composition, but led to an increased incorporation of the hydroxycinnamaldehydes into lignin (Baucher et al., 1996; Pilate et al., 2002). Field trials of CAD-deficient transgenic poplar showed improved Kraft pulping performance (Pilate et al., 2002).
COMT was originally thought to be a bifunctional enzyme that sequentially methylated caffeic and 5-hydroxyferulic acids. More recently, it has been shown to act downstream in monolignol biosynthesis by methylating the aldehyde and alcohol backbones (Osakabe et al., 1999; Parvathi et al., 2001). In this study, markers from COMT1 and COMT2 were successfully genotyped (Table 1). A single noncoding COMT2 marker was identified as significant with C6 sugar content (Table 3). Suppression of COMT in both P. tremula × P. alba and P. tremuloides lines did not change the lignin content, but resulted in a reduction in the S : G lignin ratio (as a result of a decrease in S and an increase in G), as well as the incorporation of an abnormal 5-hydroxyguaiacyl unit into the lignin (Van Doorsselaere et al., 1995; Tsai et al., 1998).
After their biosynthesis, monolignols are transported from the cytoplasm to the cell wall and polymerized to a lignin matrix. In the cell wall, the monolignols are oxidized to their radicals and polymerized. Laccases (Lac), peroxidases and other phenol oxidases have long been thought to be involved in this polymerization (Baucher et al., 2003), but conclusive evidence for their role is still lacking. In our study, we examined Lac1a, Lac2 and Lac90a. Lac1a was found to have two noncoding single-marker associations with C6 sugars (Table 3). In poplars, several laccases (Ranocha et al., 1999) have been cloned and characterized. At least eight of these laccases were identified in association with lignin biosynthetic pathways by microarray analysis (Andersson-Gunneras et al., 2006). Subsequent studies with antisense Lac3 in transgenic hybrid poplar showed little variation in lignin content; however, the soluble phenolics and structure of the secondary wall were altered (Ranocha et al., 2002).
Variations in the quantity and quality of cellulose in plants are suspected to be a primary result of enzymatic activities of different types of cellulose synthases (CesAs) (Haigler & Blanton, 1996). The CesA gene family contains 17 members in the sequenced poplar genome, five of which are highly expressed during wood formation (Joshi et al., 2004; Djerbi et al., 2005a,b; Suzuki et al., 2006; Kumar et al., 2009). All five isoforms were evaluated for association in this study (CesA1A, CesA1B, CesA2A, CesA2B and CesA3A), and all had at least one single-marker or haplotype association (Table 1). In lignin and C6 sugar traits, the same nonsynonymous marker in the sixth exon of CesA2A was strongly associated. The G allele at CesA2A is the minor allele and causes an isoleucine (I) valine (V) amino acid substitution (Table 3). The genotypic effects of the two-state SNP are shown in Fig. 5(b). In lignin traits, the differences in content were significant (22% for AA and 23.6% for AG); the same is true for C6 sugar content (34.9% for AA and 32.1% for AG). Three single-marker associations between CesA1B and lignin composition were identified (Table 3; Fig. 5). Two of these three noncoding SNPs were also associated with C6 sugar content. CesA1B_10 had one significant haplotype associated with lignin composition. CesA1A had two noncoding and one synonymous association (CesA1A_12-40) for C6 sugars. One of the noncoding SNPs (CesA1A_20-226) was also associated with lignin content. CesA3A had two different amplicons with significant haplotype associations with lignin. Three significant haplotypes from six were highly associated in CesA1A_12 (spanning 183 bp), and their genotypic effects on C6 were also significant (33.6% for AGA, 34.2% for AAA, 35.3% for GAG) (Table 5).
Figure 5. (a–c) An example of marker effects in the CesA1B gene on the lignin content phenotype. Each marker explains a small proportion of the phenotypic variance (r2 ∼ 2–3%) and is consistent with an additive model of gene action. Whiskers in the box plots represent 1.5 times the interquartile range. (d) Illustrated are the 39 single nucleotide polymorphisms (SNPs) genotyped for the CesA1B gene relative to the reference gene model, as well as three of the 39 that were significant (indicated with an asterisk). Solid boxes denote UTR, solid lines are introns and open boxes indicate exons in the gene model.
Download figure to PowerPoint
CesA proteins in the rosette terminal complex use cytosolic uridine diphosphate (UDP)-glucose as substrate, which is provided directly by particulate sucrose synthase (SUSY) (Haigler et al., 2001). This enzyme produces UDP-glucose and fructose from sucrose and UDP. Of the six SUSY genes annotated in the poplar genome, only two were highly expressed in wood-forming tissues based on microarray analysis (Geisler-Lee et al., 2006; Meng et al., 2007). In this study, amplicons from SUSY1 were successfully genotyped (Table 1). Single-marker tests with SUSY1 revealed six noncoding associations with C6 and two with lignin composition (Table 3). Two of the three individual haplotypes (spanning 386 bp) identified in SUSY1_02 were significant. Genotypic differences between haplotypes were observed for lignin composition (21.8% for AAAA and 22.9% for TGGG) (Table 5). Three of the four markers that compose the SUSY1_02 haplotype are in strong LD (Fig. 4a). Recently, overexpression of SUSY in transgenic poplar has led to an increase in both cellulose production and cellulose crystallinity (Coleman et al., 2009), confirming the previous suggestion that SUSY could be one of the limiting steps of cellulose biosynthesis (Haigler et al., 2001).
This study represents the most comprehensive evaluation of LD and genetic association in poplars. High-throughput genotyping technologies and the vast genomic resources in black cottonwood allowed a large number of candidate genes to be evaluated for associations with lignocellulosic cell wall development. The genes studied are those known to be associated with these pathways and those that have been extensively studied for commercial applications, such as pulp and feedstock production, and are now being further evaluated for improvement in relation to biofuels production. Given the rapid decay of within-gene LD in black cottonwood and the high coverage of amplicons across each gene, it is likely that the numerous polymorphisms identified are in close proximity to the causative SNPs, and the haplotype associations accurately reflect the information present in the associations between markers. This study demonstrates that a forward genetics approach (association genetics) can be used to discover naturally occurring allelic variation in genes associated with commercially important traits. The association approach provides estimates of the size of effects of these alleles on a phenotype. Understanding the size of the effects as well as the existing variation is critical in applying the knowledge gained on a particular SNP to marker-based breeding programs with goals to increase cellulose yield and, therefore, cellulosic ethanol production.