Linkage disequilibrium in trees
For association mapping, understanding the patterns of LD in the species under consideration is an important prerequisite, because the rate of decay of LD is needed to determine whether genome-wide associations are feasible or whether a candidate gene-based approach has to be considered. Previous studies have generally suggested a very low LD in trees; for example, Brown et al. (2004) found a rapid decline in LD within several kilobases in loblolly pine. Similar findings of limited LD were reported for candidate genes in other species of conifers (Dvornyk et al., 2002; Neale & Savolainen, 2004; Krutovsky & Neale, 2005; Gonzalez-Martinez et al., 2007). LD analysis in the Cinnamoyl CoA Reductase (CCR) gene and a gene encoding a COBRA-like protein (EniCOBL4A) in Eucalyptus nitens showed that LD does not extend over the entire gene (Thumma et al., 2005, 2009). In Populus, previous studies based on SNP markers have indicated that a rapid decay of LD occurs within just 300–1700 bp in candidate genes among related species of Populus (Ingvarsson, 2005; Ingvarsson et al., 2008; so that this citation matches the Reference List. Please confirm that this is correct. Xu et al., 2009; Wegrzyn et al., 2010), which is consistent with the LD decay observed in some PtoCesA genes using SSR markers in this study, indicating the potential of association genetics to identify the genes responsible for variation in key traits. However, the assessment of LD using genic SSR markers only applies to gene regions in this species, and may not be applicable at the whole genome-wide LD level. To date, all previous studies on LD decay in related species of Populus have focused on gene regions; whether there is LD in nongenic regions in Populus remains to be seen (Ingvarsson, 2005; Ingvarsson et al., 2008; Wegrzyn et al., 2010). The greater resolution power of SSRs in the detection of LD, compared with biallelic SNPs, has been demonstrated in other species (for a review, see Abdurakhmonov & Abdukarimov, 2008), suggesting the possibility of using a combination of SNP and SSR markers in LD mapping. These observations can be readily explained by noting that different marker types capture different historical information in a genome because of dissimilar mutation rates (e.g. SNP vs SSR or AFLP vs SSR) and nonuniform LD distribution among the chromosomes; in addition, population background is also a key factor influencing LD (Neale & Savolainen, 2004).
LD has been found to decline rapidly in some PtoCesA genes using limited, but evenly spaced markers (minor allele frequency was above c. 20%; Andreescu et al., 2007), suggesting that the resolution of marker–trait associations may be high in this study. The LD does not extend over the entire gene region, demonstrating that a candidate-gene-based LD approach maybe the best way to understand the molecular basis underlying quantitative variation in this species (Thumma et al., 2005).
Comparison and identification of associations in P. tomentosa
To date, the candidate-gene-based association approach has been particularly used to identify candidate gene alleles associated with growth and wood properties in several tree species (Thumma et al., 2005, 2009; Gonzalez-Martinez et al., 2007; Dillon et al., 2010, 2012; Wegrzyn et al., 2010; Beaulieu et al., 2011; Sexton et al., 2011). Generally, in a high-LD species, the power of a single-marker association test is often limited because LD information contained in flanking markers is ignored. Intuitively, haplotypes (a collection of ordered markers) may be more powerful than individual, nonordered markers (Akey et al., 2001). However, the comparison of single-marker and haplotype-based associations in this low-LD tree species demonstrated that the effect of the haplotype is mainly derived from the significant individual marker, and haplotype analysis may not be more powerful than single-marker analysis, although several haplotype-based associations (P < 0.05) were identified in the absence of significant single markers. Therefore, evaluation of single marker- and haplotype-based LD analyses should be performed when judging significant associations.
Besides the direct coding of CesA subunit proteins, genetic evidence suggests that CesA genes may participate in the pathway(s) of lignin and C6 sugar formation (Song et al., 2010; Wegrzyn et al., 2010); also, other candidate genes have been implicated in the pathway of cellulose biosynthesis, although whether they are directly involved has not been verified (Szyjanowicz et al., 2004; Coleman et al., 2009; Wegrzyn et al., 2010). Cell wall synthesis is coordinated with several other biological processes, and the genes in these shared pathways often are functional homologs (Persson et al., 2005; Beaulieu et al., 2011).
Lignin is a complex phenolic heteropolymer, and plays a key role in plant structure by providing strength, rigidity, and hydrophobicity to xylem cell walls (Demura et al., 2002). Marker C3-SSR2 and a haplotype-based association representing PtoCesA3 were highly significantly (Q < 0.05) associated with lignin content, but it was not a true validation in the linkage population (Table 2). Microarray profiling across developing xylem in Populus (Persson et al., 2007; Rajangam et al., 2008) showed that the PtiCesA8 gene (97% identity at the protein level with PtoCesA3) was strongly expressed during secondary cell wall deposition. Genes encoding lignin monomer-polymerizing laccases and lignin monomer synthesis enzymes are among the genes most closely co-expressed with AtCesA8 (Persson et al., 2005). Significant individual SNP associations in CesA1A (96–97% identity at the protein level with PtoCesA3) with lignin have been identified in black cottonwood (Populus trichocarpa) (Wegrzyn et al., 2010). Therefore, it is essential to expand our understanding of the action of PtoCesA3.
In PtoCesA10, the marker C10-SSR2 located in the coding region (PtoCesA10 exon 3) was uniquely associated with lignin content. The (TGA)4 allele is the minor allele in marker C10-SSR2, and it produces an insertion mutation, adding an Asp to the amino acid sequence (Fig. 3a). The results of association and validation (Table 2, Fig. 3a) strongly suggest that C10-SSR2 may be a functional polymorphism that is in or near a locus involved in the control of lignin content. Further analyzing the protein structure encoded by PtoCesA10, we found that the insertion of the amino acid (AA) is a distance of 15 AAs away from the zinc-binding domain, which has been shown to be involved in CESA protein–protein interactions (Joshi et al., 2004), suggesting that AA insertion may be associated with the zinc-binding domain for regulation of gene expression related to lignin composition. This conjecture was also supported by the significant expression differences among three groups of trees (Fig. 5a). Lignin deposition is largely associated with secondary wall formation, and genes linked to the lignin-related pathway for suberin synthesis are highly co-expressed with secondary cell wall AtCesA7 (91% identity at the protein level with PtoCesA10 ; Persson et al., 2005). This result is also in agreement with those of previous studies showing that PtiCesA7-A (96% identity at the protein level with PtoCesA10) is specifically expressed in the secondary cell wall (Rajangam et al., 2008; Song et al., 2010). Similarly, Wegrzyn et al. (2010) have identified significant SNP and haplotype-based associations in CesA1B (PtoCesA10 homologous genes) with lignin composition in black cottonwood.
Microfibril orientation may be related to the rate of cellulose synthesis (Paredez et al., 2006; Rajangam et al., 2008; Beaulieu et al., 2011). C4-SSR4, a noncoding marker within PtoCesA4, was the only single-marker association identified with the MFA, and illustrated a pattern of gene action consistent with additive effects (Table 2). Both single-marker and haplotype-based association results demonstrated that allelic polymorphism at the C4-SSR4 could be linked to some co-expressed processes involved in microfibril orientation. Furthermore, this marker was also detected in the validation population, with significant differences in MFA among three genotypes (data not shown). AtCesA6 (94% identity at the protein level with PtoCesA4) has been shown to affect both microtubule and microfibril orientation (Paredez et al., 2006). This research may provide a possible path for exploration of the genetic basis of microfibril orientation, but the role of the variant in controlling the trait needs further testing.
Holocellulose is the total polysaccharide fraction of the secondary xylem cell walls and is composed of cellulose and hemicelluloses; it makes up c. 80% of the secondary xylem tissue (Li et al., 2006). In the present study, the differences in holocellulose content for marker C12-SSR1 were significant among three of four genotypes (73.15% for (TTA)6/(TTA)4, 73.20% for (TTA)6/(TTA)6, 74.05% for (TTA)9/(TTA)6, and 74.38% for (TTA)9/(TTA)9), illustrating that patterns of gene action are consistent with additive gene effects (Fig. 4). Significant differences in holocellulose content for two haplotypes in PtoCesA12 were shown (73.17% for (TTA)6-(ATT)4-(AT)5 and 74.22% for (TTA)9-(ATT)3-(AT)5) (Table 3, Fig. 4), which was supported by two adjacent markers (C12-SSR1 and C12-SSR2) in high LD (r2 > 0.8; P < 0.001; Fig. 2). RT-PCR testing also indicated that the levels of relative expression were significantly different between these two haplotypes (Fig. 5b). These observations reveal the potential importance of this gene in the variability of holocellulose content. CesA6-related genes, which are homologous to PtoCesA12, have been identified and are expressed during cellulose biosynthesis or deposition in Arabidopsis and P. trichocarpa (Desprez et al., 2007; Persson et al., 2007). However, this marker was not successfully validated in the linkage population. Numerous reasons have been proposed to explain why some true associations may not be replicated across independent data sets, including sample size, variability in phenotype definitions, genetic heterogeneity, environmental interactions, age-dependent effects, and gene–gene interactions (Neale & Savolainen, 2004; Greene et al., 2009; Beaulieu et al., 2011; Dillon et al., 2012). Generally, wood traits are expected to be influenced by many genes, with small effects; gene–gene interactions are also likely to be of critical importance. Additionally, populations with different genetic and environmental backgrounds may have unfavorable pedigree linkage disequilibrium and phenotypic variation (Neale & Savolainen, 2004). For example, the related species P. alba × P. glandulosa is the female parent in this validation population; diverse gene-by-environment interactions within and between sites were not accounted for; the association result may be false positive. These reasons might explain the ‘lack of validation’ for this important association in the linkage mapping population.
Figure 4. Haplotype and single-marker associations with holocellulose content are illustrated for the cellulose synthase gene 12 (PtoCesA12) in Populus tomentosa. The genotypic effects of the two significant haplotypes (Q < 0.05) of PtoCesA12 are shown. The haplotypes yield significantly different mean phenotypic values for holocellulose content. The marker effects of three markers (one significant) are also shown. C12-SSR1 located in the PtoCesA12 promoter region was significantly associated with holocellulose content. The other marker C12-SSR2 was associated with holocellulose content at P < 0.05 and 0.05 < Q < 0.1 while C12-SSR3 was not significantly associated with this trait. All three markers were in linkage disequilibrium (LD) with one another.
Download figure to PowerPoint
Figure 5. Relative transcript levels for candidate genes in different groups representing different significant haplotypes (the error bars represent + SD). (a) The relative levels of cellulose synthase gene 10 (PtoCesA10) transcripts in three groups involving a total of 30 Populus tomentosa individuals. (b) The relative mRNA levels of PtoCesA12 in two groups representing two significant haplotypes.
Download figure to PowerPoint
Variations in the quantity and quality of cellulose in plants are suspected to be primarily a result of enzymatic activities of different types of cellulose synthases (CesAs; Atanassov et al., 2009; Kumar et al., 2009). The CesA7 genes PtiCesA7-A and AtCesA7, homologs of PtoCesA2, are expressed in developing xylem tissue undergoing secondary wall thickening in Populus or in the xylem of Arabidopsis (Suzuki et al., 2006; Atanassov et al., 2009). In PtoCesA2, the result that marker C2-SSR1 was associated with α-cellulose and holocellulose content traits in both discovery and validation populations (Table 2) suggests that C2-SSR1 may be a functional polymorphism in or near a locus involved in cellulose synthesis during secondary cell wall formation in P. tomentosa. Significant individual SNP markers or haplotype associations have been reported in the homologous gene CesA2A in P. trichocarpa (Wegrzyn et al., 2010).
Factors affecting the power of association mapping
Gene-derived SSR markers may have functional significance in regulating gene expression and function (Li et al., 2004; Varshney et al., 2005). In this study, a vast amount of genetic variation in P. tomentosa natural populations (Zhang et al., 2007) was coupled with a low level of homoplasy of SSR markers derived from conserved gene regions (Li et al., 2004), providing an appropriate tool for candidate-gene-based association studies, although previous studies have reported that size homoplasy of SSR alleles and allele reversion could be a problem in association studies (Ching et al., 2002). The selection of polymorphic genic SSR markers with a low level of size homoplasy, along with SNPs, a traditional marker type used for association analysis, would provide better potential to detect functional allelic variation underlying quantitative traits. In addition, knowledge and selection of optimal candidate genes using different approaches, such as microarray analysis, EST database searches and quantitative trait locus (QTL) mapping, in model or related plant species (Neale & Savolainen, 2004; Thumma et al., 2005) provide an important basis for identifying useful alleles located within functional genes controlling traits of interest. Deviations from HWE for SSR loci can be indicative of genotyping errors, inbreeding, population subdivision, or selection (Balding, 2006). In this study, genotyping errors and inbreeding can be excluded based on correcting genotyping errors and low FIS values for each locus. Population subdivision is thought to be the most important explanation for deviations from HWE, which was in agreement with the various geographic origins of wild P. tomentosa (Du et al., 2012). Population structure can generate spurious genotype–phenotype associations. Thus, use of the unified mixed-model method (MLM) would improve control of both type I and type II error rates (Yu et al., 2006).
Phenotyping is an important part of association mapping for forest trees. A typical association population is usually composed of a diverse set of unrelated individuals at the same location, and to increase precision in phenotypic measurements, one must usually clonally replicate individuals to reduce environmentally induced noise and measurement errors. Hence, in this study, we used a total of 1380 phenotypes (460 genotypes × 3 ramets) to compensate for the deficiency of having a limited number of SSR markers. Furthermore, when the entire collection is replicated across multiple environments, data on replicates of each individual can be combined to produce a phenotype mean value for the accession analysis, which is less influenced by environment or measurement errors (Long & Langley, 1999). Therefore, replication of genotype–phenotype associations is also crucial in association mapping to distinguish false-positive associations and to provide less biased estimates of the size of allelic effects. Additionally, validation of biological function through transgenic experiments and other molecular biology techniques can be used to verify associations (Thumma et al., 2005; Abdurakhmonov & Abdukarimov, 2008). For example, real-time PCR and linkage analysis were employed in this study to confirm the results obtained from association mapping.
Polymorphisms of SSR markers and significant haplotypes representing PtoCesA candidate genes were used to evaluate the functional loci or genes associated with lignocellulosic cell wall development. We demonstrated that the candidate gene-based association approach, along with validation in a large linkage-mapping population and confirmation using real-time PCR testing, can be employed to identify naturally occurring allelic variation in genes associated with important wood-quality traits. This study provides insights into the genetic mechanisms underlying wood development, and identifies particular markers for tree MAS breeding programs with the goals of improving the quality and quantity of wood products.