Chinese white poplar (Populus tomentosa), an important commercial tree species for timber and pulp production in northern China, has been used to examine the individual genes and allelic diversity responsible for complex traits controlling growth and lignocellulosic biosynthesis. Taking advantage of the low degree of linkage disequilibrium (LD) within P. tomentosa association populations, we examined associations between 15 cellulose synthase (PtoCesA) genes and traits including growth and wood properties.
Thirty-six novel simple sequence repeat (SSR) markers within PtoCesA genes were detected by re-sequencing and genotyped in an association population (460 individuals). Single-marker and haplotype-based LD approaches were used to identify significant marker–trait associations. Family-based linkage studies and real-time PCR testing were conducted to validate the functional significance of SSR variation.
Fifteen single-marker associations from seven PtoCesA genes and nine haplotype-based associations within six genes were identified in the association population (false discovery rate Q <0.05). Next, five SSR marker–trait associations (Q <0.05) from four PtoCesA genes were successfully validated in a linkage mapping population (1200 individuals).
The results imply a functional role for these genes in mediating wood properties, demonstrating the potential of combining single-marker and haplotype-based LD approaches to detect functional allelic variation underlying quantitative traits in a low-LD population.
Wood formation distinguishes trees from herbaceous plants, and represents a major metabolic sink for woody plants, as trees convert much of their photosynthesized products into woody tissues, which make up c. 20% of the total terrestrial carbon storage (Schlesinger & Lichter, 2001; Li et al., 2006). Woody tissues are composed of various biopolymers; cellulose and lignin supply mechanical strength to secondary walls, and hemicelluloses form cross-links among cellulose microfibrils. These polymers provide an enormous, renewable feedstock for pulp and paper, biofuel, and solid wood products (Mellerowicz & Sundberg, 2008).
Cellulose is the most abundant biopolymer in trees; its biosynthesis is catalyzed by cellulose synthase (CesA) and involves the synthesis and assembly of β-1,4-glucan chains by cellulose synthase complexes (CSCs) and the orderly deposition of the chains to form microfibrils in cell walls (Somerville, 2006). Previous studies have indicated that the proteins for cellulose biosynthesis are encoded by CesA genes, a multigene family with many members (Li et al., 2006; Suzuki et al., 2006). Plant CesA genes were first identified in cotton (Gossypium hirsutum) fibers (Pear et al., 1996). There are at least 10 CesA genes in Arabidopsis, 12 in rice (Oryza sativa) and at least nine in maize (Zea mays); distinct CesA genes dominate cellulose synthesis in different types of cell wall (Pear et al., 1996; Richmond & Somerville, 2000; Tanaka et al., 2003; Burton et al., 2004; Persson et al., 2007).
Recently, the Populus trichocarpa genome has been sequenced and annotated (http://genome.jgi-psf.org/Poptr1/Poptr1.home.html), and databases of expressed sequence tags (ESTs) from different developmental stages of wood formation are being rapidly generated (Sterky et al., 2004; Tuskan et al., 2006). Poplar (Populus) has emerged as a model species for the identification of CesA homologs and exploration of the mechanisms of cellulose biosynthesis (Djerbi et al., 2004; Zhang et al., 2010b). The first CesA gene from trees was isolated from aspen (Populus tremuloides) by Wu et al. (2000). Since then, 17 members of the CesA gene family have been cloned from aspen and hybrid aspen (Populus tremula × P. tremuloides) (Djerbi et al., 2004; Joshi et al., 2004; Supporting Information Table S1). Eighteen CesA genes (17 protein sequences) have been identified in P. trichocarpa (Suzuki et al., 2006; Table S1). In addition, several CesA genes are specifically expressed during primary or secondary wall synthesis in some angiosperms, such as Arabidopsis, black cottonwood (Populus trichocarpa) and loblolly pine (Pinus taeda L.) (Djerbi et al., 2005; Nairn & Haselkorn, 2005; Suzuki et al., 2006; Atanassov et al., 2009; Song et al., 2010). To further explore the functions of the CesA genes in wood formation, 17 CesA (PtoCesA) genes have been isolated from Populus tomentosa (Chinese white poplar; Table S1). Populus tomentosa, which belongs to the section Populus in the genus Populus, is an important commercial tree species for timber and pulp production in northern China. A vast amount of genetic variation has arisen during the evolution of P. tomentosa, as is evident in the natural populations (Zhang et al., 2007); this variation provides a potential source of beneficial alleles for marker-assisted breeding for improvement of wood fiber traits. Ongoing research has also examined the structure of the natural populations. Huang (1992) was the first to provide climatic regionalization in the distribution zones of P. tomentosa and show that three climatic zones can be treated as genetic regions. A P. tomentosa population with 460 individuals has been divided into 11 subpopulations using 20 genomic microsatellites in the model-based program structure (Du et al., 2012), and this population structure information was used in our subsequent association analysis in this study.
Linkage disequilibrium (LD)-based association mapping provides a valuable opportunity to identify the natural allelic variation responsible for a particular phenotype (Thumma et al., 2005). Tree species are ideal for the fine mapping of candidate genes and functional analysis of gene variants (Ingvarsson, 2005; Wegrzyn et al., 2010). Recently, advances in high-throughput marker technologies and new genomic resources have enabled a closer examination of the number and effect of candidate genes related to traits of interest, through complex trait dissection using LD mapping (Nordborg et al., 2002; Ingvarsson et al., 2008; Eckert et al., 2009). Wood-quality traits are quantitative traits controlled by multiple genes, with a moderate to high degree of heritability (Thumma et al., 2010). Significant associations between single nucleotide polymorphisms (SNPs) within candidate genes affecting wood formation have been established for forest trees (Thumma et al., 2005, 2009; Gonzalez-Martinez et al., 2007; Wegrzyn et al., 2010). In marker-assisted selection (MAS) breeding, simple sequence repeat (SSR) markers are ideal because they are hypervariable, codominant, and highly informative (Varshney et al., 2005). Unlike random genomic SSR markers, gene-derived SSR markers include microsatellites exclusively within candidate genes, including promoters, 5′ untranslated regions (UTRs), 3′ UTRs, introns, splice sites and exons (Varshney et al., 2005). Indeed, the presence of SSRs in the coding and/or regulatory regions can alter function, transcription or translation (Li et al., 2004; Varshney et al., 2005).
In this study, 36 polymorphic SSR markers developed from the PtoCesA gene family were used for single-marker and haplotype-based association mapping to explore allelic effects on natural variation in growth and wood-property traits in P. tomentosa. Furthermore, we also present experimental evidence to confirm the power of LD mapping to identify useful alleles located within functional genes controlling phenotypic traits.
Materials and Methods
In 1982, the Institute of Chinese White Poplars (Beijing Forestry University, Beijing, China) assembled a collection of 1047 individuals from the entire natural distribution region of Chinese white poplars (Populus tomentosa Carrière), covering an area of 1 million km2 (30–40°N, 105–125°E; Zhang et al., 2010b). Root segments from this collection were used to establish a clonal arboretum using a randomized complete block design with three replications in Guan Xian County, Shandong Province, China (36°23′N, 115°47′E). In this study, a set of 460 unrelated individuals of P. tomentosa from the collection, representing all of the original provenances, were randomly sampled for association analysis.
In this study, 1200 hybrid individuals were randomly selected from 5000 F1 progeny established by controlled crossing between two elite poplar parents, clone ‘YX01’ (Populusalba × Populus glandulosa) as the female and clone ‘LM 50’ (P. tomentosa) as the male, and these two species are members of the section Populus. The progeny were grown in 2008 in the Xiao Tangshan horticultural fields of Beijing Forestry University, Beijing, China (40°2′N, 115°50′E) using a randomized complete block design with three replications and were subsequently used for linkage analysis of phenotypic traits.
The 460 individuals of the association population were scored on the basis of seven quantitative traits, with at least three ramets per genotype. The growth traits, including tree height (H), diameter at breast height (D), and stem volume (V ), were measured in 2009 using the methods described by Zhang et al. (2006). Wood-property traits included microfiber angle (MFA) and holocellulose, α-cellulose, and lignin contents. First, wood cores were collected from each tree at a height 1.35 m above ground level, in which the variation in MFA was characterized using an X-ray powder diffractometer (Philips, Eindhoven, the Netherlands). These wood cores were then ground into wood meal, in which the holocellulose, α-cellulose, and lignin contents were measured using near-infrared reflectance spectroscopy (NIRS), as described by Schimleck et al. (2004).
The same seven phenotypic traits were measured in all three replicates of the 1200 clones in the hybrid population in 2010 using the same methods described in the preceding paragraph for the association population. The software sas for Windows, ver. 8.2 (SAS Institute, Cary, NC, USA) was used for analysis of variance (ANOVA) and phenotypic correlations for these seven traits.
DNA extraction, SSR discovery, and genotyping
Total genomic DNA was isolated from young leaves using the DNeasy Plant Mini Kit (Qiagen China, Shanghai, China) following the manufacturer's protocol. The genomic DNA sequences of 17 PtoCesA genes were obtained from the Joint Genome Institute (JGI) Database (http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html; Suzuki et al., 2006; Kumar et al., 2009). In total, 105 903 kb of genomic DNA sequences from these 17 unique PtoCesA genes, with an average of 4645 bp per gene, was obtained by re-sequencing 40 P. tomentosa individuals, and the gene length ranged from 4462 bp (PtoCesA2) to 7633 bp (PtoCesA17; Table S1). We detected 36 polymorphic SSR loci within the PtoCesA gene family using ssrit software (http://www.gramene.org/db/markers/ssrtool; Temnykh et al., 2001), with the criterion that the minor allele frequency was ≥ 5% (Fig. 1). Detailed information on these 17 candidate genes and their homologous reference genes in Populus species, and the 36 screened SSR markers, is presented in Tables S1 and S2.
The SSR amplification reaction and PCR were conducted following the procedure of Zhang et al. (2010a). The PCR products were finally separated by capillary electrophoresis using an ABI3730xl DNA Analyzer (Applied Biosystems, Carlsbad, CA, USA), after confirmation of PCR amplification on a 1.5% agarose gel. The analysis of polymorphic loci was performed with GeneMapper v4.0 software (Applied Biosystems) using the LIZ 600 size standard (Applied Biosystems). Subsequently, micro-checker 2.2.3 (http://www.microchecker.hull.ac.uk/) was used for identifying and correcting genotyping errors (van Oosterhout et al., 2004).
Real-time PCR testing
Real-time PCR (RT-PCR) was performed using cDNA samples, which were reverse-transcribed from the total RNA in mature xylem tissue of P. tomentosa individuals (10 individuals per group, and each tree was homozygous for the particular haplotype). The quantitative PCR program and the generated real-time data analysis were performed as described by Zhang et al. (2010a). The specific primer pairs were individually designed for the PtoCesA genes (depending on the single-marker and haplotype-based associations) and an internal control (Actin) using Primer Express 3.0 software (Applied Biosystems). Primer details are shown in Table S3.
Genetic diversity, Hardy–Weinberg equilibrium (HWE) and LD tests
The summary statistics for population diversity, including the observed number of alleles per locus (NA), polymorphism information content (PIC), expected heterozygosity (HE), and Wright's inbreeding coefficient (FIS), were calculated using popgen version 1.32 (Yeh et al., 1999). HWE tests were performed using the software Arlequin version 3.11 (http://cmpg.unibe.ch/software/arlequin3/); then, we also applied the Bonferroni correction for multiple testing. Patterns of LD were investigated among SSR loci from 15 PtoCesA candidate genes (Table S2). The squared correlation of allele frequencies r2 (Hill & Robertson, 1968) was used to test the LD between pairs of SSR markers, with 105 permutations using the software package tassel version 2.0.1 (http://www.maizegenetics.net/).
In the association population (discovery population), all association tests between 36 SSR markers and 7 traits were conducted, using the unified mixed-model method (MLM) with 104 permutations in the software package tassel version 2.0.1 (Yu et al., 2006; Bradbury et al., 2007). The effects of all the genotype classes in each SSR marker (Table S2) were tested by performing a χ2 test at the 0.01 probability level, but the rare genotypes (the percentage of minor genotypes < 5% and the null allele) in each marker were removed from the genotype effect analysis. The MLM can be described as follows: y = μ + Qv + Zu+ e, where y is a vector of phenotype observation, μ is a vector of intercepts, v is a vector of population effects, u is a vector of random polygene background effects, e is a vector of random experimental errors, Q is a matrix defining the population structure from structure, and Z is a matrix relating y to u. Var(u) = G = K with as the unknown additive genetic variance and K as the kinship matrix (Yu et al., 2006). In this Q + K model, the relative kinship matrix (K) was obtained using the method proposed by Ritland (1996), so that this citation matches the Reference List. Please confirm that this is correct. which is built into the program SPAGeDi, version 1.2 (Hardy & Vekemans, 2002), and the population structure matrix (Q) was identified based on the significant subpopulations (K=11; Du et al., 2012), as assessed according to the statistical model described by Evanno et al. (2005), using 20 neutral genomic SSR markers. Corrections for multiple testing of smoothed P-values for all associations were performed using the positive false discovery rate (FDR) method with 104 permutations (Storey & Tibshirani, 2003).
Inheritance tests of all significant SSR loci identified in the association population were examined in the F1 hybrid population (validation population), by performing a χ2 test at the 0.01 probability level, and then SSR markers following Mendelian expectations (P ≥0.01) were used in single-marker analysis in this hybrid population (excluding the genotype data involving the null allele at each locus). Significant SSR loci were detected by fitting the data to the model y = μ + mi + eij, where y is the trait value, μ is the mean, mi is the genotype of the ith marker, and eij is the residual associated with the jth individual in the ith genotypic class. The per cent phenotypic variance explained by the most significant marker was calculated, and the FDR method was used to perform a correction for multiple testing (Storey & Tibshirani, 2003).
The haplotype (a block of linked ordered markers) frequencies of locus genotypes were estimated and the tests of haplotype association with the trait values were carried out using the software famhap version 19 (http://famhap.meb.uni-bonn.de/index.html.; Becker & Knapp, 2004; Herold & Becker, 2009). famhap estimates haplotype frequencies using maximum-likelihood. Singleton alleles were ignored when constructing the haplotypes, and haplotypes with a frequency < 5% were also discarded. The input consisted of genotype matrices with structure analysis matrices (Q) and phenotypic value matrices, and significances of the haplotype associations were determined based on 104 permutation tests. A correction for multiple testing was performed using the positive FDR method (Storey & Tibshirani, 2003).
Modes of gene action
The modes of gene action were quantified using the ratio of dominance (d ) to additive (a) effects estimated from least-square means for each genotypic class. Partial or complete dominance was defined as values in the range 0.50 < |d ⁄a| < 1.25, whereas additive effects were defined as values in the range |d ⁄a| ≤ 0.5. Values of |d ⁄a| > 1.25 were equated with under- or overdominance. Details of the algorithm and formulas for calculating gene action were previously described (Eckert et al., 2009; Wegrzyn et al., 2010).
Phenotypic data distribution and correlations
In the association population, holocellulose and α-cellulose contents ranged from 64.13% to 87.40% (mean 73.58%) and from 40.63% to 47.74% (mean 44.53%), respectively. Descriptive statistics of the trait distributions are presented in Table S4. In the hybrid population, the two parent lines showed significant differences in all seven measured traits. The trait measurements of all 1200 F1 progeny were intermediate between those of the two parents for six traits, excluding holocellulose content; the F1 progeny had higher holocellulose contents than either parent. Table S5 shows the descriptive trait-distribution statistics of the F1 population. As expected, the frequency distributions for each trait measured in these two populations followed an approximately normal distribution (data not shown).
The wood and growth traits in the association population showed significant correlations (Table 1). Of these, lignin content was significantly negatively correlated with holocellulose content (P < 0.01), α-cellulose (P <0.01), and D (P <0.05); similar results were observed in the hybrid population (Table 1). In addition, lignin content was strongly and positively correlated with MFA (P <0.01), and significant positive pairwise correlations were observed between holocellulose and α-cellulose contents and D in the association population (P <0.01). The details of the phenotypic correlations among these traits in these two populations are shown in Table 1.
Table 1. Estimates of phenotypic correlations (R) for these seven traits in the Populus tomentosa association (above diagonal) and linkage mapping (below diagonal) populations
H, tree height; D, the diameter at breast height; V, stem volume; MFA, microfiber angle.
In total, 36 novel polymorphic SSR makers (minor allele frequency ≥ 5%) were developed from 15 candidate genes of the PtoCesA gene family, with an average density of one SSR every 2.62 kb. No polymorphic microsatellite loci were found in PtoCesA1 and PtoCesA5 (Table S2). For these SSRs, c. 62% were derived from the intron regions and 22% from the promoter regions, and the number of microsatellite markers detected in the 5′ UTR, exon, and 3′ UTR regions were 3, 2, and 1, respectively (Table S2). A survey of the repeat motif types indicated that dinucleotide repeats were the most abundant (42%), followed by trinucleotide repeats (31%) (Table S2). One hundred and forty-four alleles among the 460 samples were identified, and the NA ranged from 2 to 8 with an average of 4.0 (Table S2). The mean PIC and HE of loci were 0.436 and 0.505, respectively (Table S2). The HWE test in the 36 microsatellites indicated that seven loci departed from HWE (P <0.01); however, no individual locus deviated significantly from HWE after applying the Bonferroni correction for multiple testing (Table S2). In agreement with tests for HWE, all FIS values were small (the mean FIS = −0.065) and did not suggest the occurrence of inbreeding in our samples.
All r2 values were pooled to assess the overall behavior of LD within the PtoCesA genes. Figure 2 shows a larger number of SSRs that were in linkage equilibrium (r2< 0.3; P <0.001) across the sequenced regions. Limited LD of the SSR loci within the candidate gene did not extend over the entire gene region. However, the average decay distance associated with LD within the PtoCesA genes was not calculated because of the limited number of SSR markers in this study. Several loci within the same candidate gene were in significant LD, such as markers C12-SSR1, C12-SSR2, and C12-SSR3 in PtoCesA12 (r2 > 0.6; P <0.001; Fig. 2).
Summary of single-marker and haplotype-based associations
Single SSR marker–trait associations
All of 252 (36 SSRs × 7 traits) single-marker association tests conducted were accounted for with 104 permutations using the MLM. In all, 24 associations were significant at the threshold of P <0.05, representing 16 SSR loci from nine PtoCesA genes (Table 2). Multiple test corrections for all 24 associations reduced this number to 15 at a significance threshold of Q <0.05. These loci explained a small proportion of the phenotypic variance, ranging from 2.9% to 8.7% (Table 2). Of these, four SSR markers were associated with holocellulose. Lignin and α-cellulose had three significant associations each; two MFA associations and one association each with H, D, and V traits were observed in the association population (Q <0.05; Table 2). The 15 associations represent 10 SSR loci from seven PtoCesA genes. Many of the 10 SSR markers exhibited significant associations with at least one trait, consistent with the extent of codominance, and also suggesting a pleiotropic effect of these loci responsible for certain traits (Tables 1, 2). For two of the 15 associations, the mode of gene action is consistent with overdominance (|d ⁄a| > 1.25); the remaining 13 associations were split between modes of gene action that were partially to fully dominant (0.50 < |d ⁄a| < 1.25, n =7) or codominant (|d ⁄a| ≤ 0.5, n =6; Table 2). The majority of gene effects explained a small to moderate fraction of the phenotypic variation.
Table 2. Summary of significant simple sequence repeat (SSR) marker-trait pairs from the association test results in the Populus tomentosa discovery (association population) and validation (linkage mapping population) populations after correction for multiple testing errors
H, tree height; D, the diameter at breast height; V, stem volume; MFA, microfiber angle; N, number of trees sampled; P-value, significance level for association (significance is P ≤0.05); R2, percentage of the phenotypic variance explained; Q-value, a correction for multiple testing (false discovery rate FDR (Q) ≤ 0.05).
d/a =2 × d/2a (the ratio of dominance (d) to additive (a)); 2a, calculated as the difference between the phenotypic means observed within each homozygous class (2a = |GBB− Gbb|, where Gij is the trait mean in the ijth genotypic class); d, calculated as the difference between the phenotypic mean observed within the heterozygous class and the average phenotypic mean across both homozygous classes (d = GBb)0.5(GBB + Gbb), where Gij is the trait mean in the ijth genotypic class).
sp, standard deviation for the phenotypic trait under consideration. Details of the algorithm and formulas for calculating gene action have been described by Eckert et al. (2009) and Wegrzyn et al. (2010).
Represent the different allele/genotype effects at the same locus for the same trait between the discovery and validation populations.
Among haplotype-based associations, 44 regions (amplicons) from 10 PtoCesA genes were analyzed, and the number of haplotypes per region varied from 2 to 13 with an average of 7.0. Twelve significant regions from six unique genes were identified using the software famhap version 19, with a significance threshold of P <0.05 (details not shown). Multiple test corrections reduced this number to nine regions, which derived from six genes and included 69 haplotypes, at a significance threshold of Q <0.05 (Table 3). Eighteen significant haplotypes were significantly associated with the five phenotypic traits excluding D and V phenotypes (Q <0.05), and eight single-marker associations (Q <0.05), strongly supporting the haplotype-based associations for the same traits, respectively (Tables 2, 3).
Table 3. List of haplotypes with significant associations with wood quality and growth traits in the Populus tomentosa association population (n =460) after a correction for multiple testing (false discovery rate FDR (Q) ≤ 0.05)
H, tree height; D, the diameter at breast height; V, stem volume; MFA, microfiber angle; P-value, the significant level for haplotype-based association (significance is P ≤0.05).
Single-marker associations with the lowest Q value (FDR Q ≤ 0.05) relating to the significant haplotype–trait association.
/, no data was identified in this study.
C3-SSR2 (lignin, Q =0.0350)
C10-SSR2 (lignin, Q =0.0172)
C2-SSR1 (holocellulose, Q =0.0311)
C3-SSR2 (holocellulose, Q =0.0413)
C12-SSR1 (holocellulose, Q =0.0241)
C4-SSR2 (α-cellulose, Q =0.0487)
C10-SSR4 (α-cellulose, Q =0.0244)
C4-SSR4 (MFA, Q =0.0172)
Confirmation of association studies in a linkage mapping population
Thirty-three of 36 genic SSR markers followed Mendelian expectations (P ≥0.01), with a segregation ratio close to 1 : 2 : 1 for eight SSR loci, 1 : 1 for 10 loci, and 1 : 1 : 1 : 1 for 15 markers. The 16 significant SSR markers (P <0.05; Table 2) identified in the association population were all in accordance with Mendelian expectations (P ≥0.01), and no novel allele was discovered in the hybrid population. Therefore, single-marker association analysis (112; 16 SSRs × 7 traits) was conducted in this linkage mapping population, and we first observed seven marker–trait associations (P <0.05; Table 2). A multiple test correction reduced this number to five (Q <0.05; Table 2), and the proportion of phenotypic variation explained varied from 4.0% to 7.3% (Table 2). No SSR associations with growth traits were identified in the validation population (Table 2), which is consistent with the hypothesis that growth traits have relatively low heritability compared with wood-property traits (Thumma et al., 2010). The significant markers identified for three growth traits in the association population (Table 2) are probably derived from a causal relationship between wood-property and growth traits, and may represent false positive associations (Dillon et al., 2012).
Two significant SSR markers (C3-SSR2 and C10-SSR2) explaining 4.0–6.6% of the phenotypic variance for lignin content were identified in the validation population (Table 2). In the association population, the differences in lignin content of C3-SSR2 genotypes were significant (24.70% for (CT)4/(CT)4, 23.95% for (CT)4/(CT)3, and 23.06% for (CT)3/(CT)3), which was consistent with the additive effects of gene action on lignin content, However, the (CT)4 allele generated in the association population was not found in the validation population, and two genotypes ((CT)5/(CT)3 and (CT)5/(CT)5) of the parents were segregated in the linkage population. The differences in lignin content among the three genotypes (two significant) of marker C10-SSR2 were 24.72% for (TGA)4/(TGA)4, 24.58% for (TGA)4/(TGA)3, and 23.83% for (TGA)3/(TGA)3, indicating that patterns of gene action are consistent with dominant effects on lignin content (Table 2). Three haplotypes in the same amplicon of PtoCesA10 were significantly associated with lignin composition, which was supported by the significant marker C10-SSR2 (Table 3 and Fig. 3a). This marker was validated in the linkage mapping population (R2 = 6.6%; Table 2), and the mean values showed significant differences between genotypes, indicating that the allelic effect of C10-SSR2 is consistent in both association and validation populations (Fig. 3).
For the α-cellulose and holocellulose content traits, we observed that the significant marker C2-SSR1 for α-cellulose content was similarly associated with holocellulose content in the discovery and validation populations. In the association population, the differences in holocellulose content were significant (72.70% for (TTAA)3/(TTAA)3, 72.89% for (TTAA)3/(TTAA)4, 73.75% for (TTAA)3/(TTAA)5, and 73.68% for (TTAA)5/(TTAA)5). The same patterns were found for α-cellulose content (44.56, 44.49, 45.23, and 45.36%, respectively); this suggested that gene action was consistent with dominant effects in relation to these two traits (Table 2). The (TTAA)5 allele in marker C2-SSR1 is the minor allele for the holocellulose trait, and the same alleles in C2-SSR1 were also detected in validation population for α-cellulose and holocellulose traits (data not shown).
C4-SSR4 associated with MFA in the association population (6.0%, Q <0.05) and was also successfully validated in the linkage mapping population (Tables 2, 3). Heterozygotes (TACTGC)5/(TACTGC)4 for the marker had a difference of > 1° in MFA with either homozygote class. One of the five individual haplotypes in the amplicon of PtoCesA4 was significant for MFA, with a high MFA of 19.2° (Table 3), and the same allelic effect of C4-SSR4 was identified in the validation population.
Allelic relative expression based on real-time PCR
To test whether these significant allelic associations affect the relative mRNA expression levels for these genes, we quantified the mRNA levels among different groups with different genotypes or haplotypes. In total, 19 tests (nine haplotypes and 10 individual markers) representing seven PtoCesA genes (Table 3) were used to quantify the mRNA levels for these genes among different groups. Measurement of differential expression by real-time PCR indicated that only two candidate genes (PtoCesA10 and PtoCesA12) had different expression levels among the different groups (Fig. 5a,b). The mRNA products of the PtCesA10 transcripts were detected among three groups with different haplotypes (10 individuals per group; each individual selected was homozygous for the particular haplotype). The highest relative expression level of mRNA products (0.9376) was in the group with (CAAACA)5-(TGA)4-(TA)7, followed by (CAAACA)4-(TGA)4-(TA)7 (0.7168), and (CAAACA)3-(TGA)3-(TA)5 (0.2641) (Fig. 5a). The mRNA products of the PtoCesA12 transcripts were detected in two groups representing two significant haplotypes, with some differences in expression levels. A relatively high expression level (0.8513) was detected in the group with (TTA)9-(ATT)3-(AT)5, while a lower expression level (0.3910) was detected in the group with (TTA)9-(ATT)3-(AT)5 (Fig. 5b).
Linkage disequilibrium in trees
For association mapping, understanding the patterns of LD in the species under consideration is an important prerequisite, because the rate of decay of LD is needed to determine whether genome-wide associations are feasible or whether a candidate gene-based approach has to be considered. Previous studies have generally suggested a very low LD in trees; for example, Brown et al. (2004) found a rapid decline in LD within several kilobases in loblolly pine. Similar findings of limited LD were reported for candidate genes in other species of conifers (Dvornyk et al., 2002; Neale & Savolainen, 2004; Krutovsky & Neale, 2005; Gonzalez-Martinez et al., 2007). LD analysis in the Cinnamoyl CoA Reductase (CCR) gene and a gene encoding a COBRA-like protein (EniCOBL4A) in Eucalyptus nitens showed that LD does not extend over the entire gene (Thumma et al., 2005, 2009). In Populus, previous studies based on SNP markers have indicated that a rapid decay of LD occurs within just 300–1700 bp in candidate genes among related species of Populus (Ingvarsson, 2005; Ingvarsson et al., 2008; so that this citation matches the Reference List. Please confirm that this is correct. Xu et al., 2009; Wegrzyn et al., 2010), which is consistent with the LD decay observed in some PtoCesA genes using SSR markers in this study, indicating the potential of association genetics to identify the genes responsible for variation in key traits. However, the assessment of LD using genic SSR markers only applies to gene regions in this species, and may not be applicable at the whole genome-wide LD level. To date, all previous studies on LD decay in related species of Populus have focused on gene regions; whether there is LD in nongenic regions in Populus remains to be seen (Ingvarsson, 2005; Ingvarsson et al., 2008; Wegrzyn et al., 2010). The greater resolution power of SSRs in the detection of LD, compared with biallelic SNPs, has been demonstrated in other species (for a review, see Abdurakhmonov & Abdukarimov, 2008), suggesting the possibility of using a combination of SNP and SSR markers in LD mapping. These observations can be readily explained by noting that different marker types capture different historical information in a genome because of dissimilar mutation rates (e.g. SNP vs SSR or AFLP vs SSR) and nonuniform LD distribution among the chromosomes; in addition, population background is also a key factor influencing LD (Neale & Savolainen, 2004).
LD has been found to decline rapidly in some PtoCesA genes using limited, but evenly spaced markers (minor allele frequency was above c. 20%; Andreescu et al., 2007), suggesting that the resolution of marker–trait associations may be high in this study. The LD does not extend over the entire gene region, demonstrating that a candidate-gene-based LD approach maybe the best way to understand the molecular basis underlying quantitative variation in this species (Thumma et al., 2005).
Comparison and identification of associations in P. tomentosa
To date, the candidate-gene-based association approach has been particularly used to identify candidate gene alleles associated with growth and wood properties in several tree species (Thumma et al., 2005, 2009; Gonzalez-Martinez et al., 2007; Dillon et al., 2010, 2012; Wegrzyn et al., 2010; Beaulieu et al., 2011; Sexton et al., 2011). Generally, in a high-LD species, the power of a single-marker association test is often limited because LD information contained in flanking markers is ignored. Intuitively, haplotypes (a collection of ordered markers) may be more powerful than individual, nonordered markers (Akey et al., 2001). However, the comparison of single-marker and haplotype-based associations in this low-LD tree species demonstrated that the effect of the haplotype is mainly derived from the significant individual marker, and haplotype analysis may not be more powerful than single-marker analysis, although several haplotype-based associations (P <0.05) were identified in the absence of significant single markers. Therefore, evaluation of single marker- and haplotype-based LD analyses should be performed when judging significant associations.
Besides the direct coding of CesA subunit proteins, genetic evidence suggests that CesA genes may participate in the pathway(s) of lignin and C6 sugar formation (Song et al., 2010; Wegrzyn et al., 2010); also, other candidate genes have been implicated in the pathway of cellulose biosynthesis, although whether they are directly involved has not been verified (Szyjanowicz et al., 2004; Coleman et al., 2009; Wegrzyn et al., 2010). Cell wall synthesis is coordinated with several other biological processes, and the genes in these shared pathways often are functional homologs (Persson et al., 2005; Beaulieu et al., 2011).
Lignin is a complex phenolic heteropolymer, and plays a key role in plant structure by providing strength, rigidity, and hydrophobicity to xylem cell walls (Demura et al., 2002). Marker C3-SSR2 and a haplotype-based association representing PtoCesA3 were highly significantly (Q <0.05) associated with lignin content, but it was not a true validation in the linkage population (Table 2). Microarray profiling across developing xylem in Populus (Persson et al., 2007; Rajangam et al., 2008) showed that the PtiCesA8 gene (97% identity at the protein level with PtoCesA3) was strongly expressed during secondary cell wall deposition. Genes encoding lignin monomer-polymerizing laccases and lignin monomer synthesis enzymes are among the genes most closely co-expressed with AtCesA8 (Persson et al., 2005). Significant individual SNP associations in CesA1A (96–97% identity at the protein level with PtoCesA3) with lignin have been identified in black cottonwood (Populus trichocarpa) (Wegrzyn et al., 2010). Therefore, it is essential to expand our understanding of the action of PtoCesA3.
In PtoCesA10, the marker C10-SSR2 located in the coding region (PtoCesA10 exon 3) was uniquely associated with lignin content. The (TGA)4 allele is the minor allele in marker C10-SSR2, and it produces an insertion mutation, adding an Asp to the amino acid sequence (Fig. 3a). The results of association and validation (Table 2, Fig. 3a) strongly suggest that C10-SSR2 may be a functional polymorphism that is in or near a locus involved in the control of lignin content. Further analyzing the protein structure encoded by PtoCesA10, we found that the insertion of the amino acid (AA) is a distance of 15 AAs away from the zinc-binding domain, which has been shown to be involved in CESA protein–protein interactions (Joshi et al., 2004), suggesting that AA insertion may be associated with the zinc-binding domain for regulation of gene expression related to lignin composition. This conjecture was also supported by the significant expression differences among three groups of trees (Fig. 5a). Lignin deposition is largely associated with secondary wall formation, and genes linked to the lignin-related pathway for suberin synthesis are highly co-expressed with secondary cell wall AtCesA7 (91% identity at the protein level with PtoCesA10 ; Persson et al., 2005). This result is also in agreement with those of previous studies showing that PtiCesA7-A (96% identity at the protein level with PtoCesA10) is specifically expressed in the secondary cell wall (Rajangam et al., 2008; Song et al., 2010). Similarly, Wegrzyn et al. (2010) have identified significant SNP and haplotype-based associations in CesA1B (PtoCesA10 homologous genes) with lignin composition in black cottonwood.
Microfibril orientation may be related to the rate of cellulose synthesis (Paredez et al., 2006; Rajangam et al., 2008; Beaulieu et al., 2011). C4-SSR4, a noncoding marker within PtoCesA4, was the only single-marker association identified with the MFA, and illustrated a pattern of gene action consistent with additive effects (Table 2). Both single-marker and haplotype-based association results demonstrated that allelic polymorphism at the C4-SSR4 could be linked to some co-expressed processes involved in microfibril orientation. Furthermore, this marker was also detected in the validation population, with significant differences in MFA among three genotypes (data not shown). AtCesA6 (94% identity at the protein level with PtoCesA4) has been shown to affect both microtubule and microfibril orientation (Paredez et al., 2006). This research may provide a possible path for exploration of the genetic basis of microfibril orientation, but the role of the variant in controlling the trait needs further testing.
Holocellulose is the total polysaccharide fraction of the secondary xylem cell walls and is composed of cellulose and hemicelluloses; it makes up c. 80% of the secondary xylem tissue (Li et al., 2006). In the present study, the differences in holocellulose content for marker C12-SSR1 were significant among three of four genotypes (73.15% for (TTA)6/(TTA)4, 73.20% for (TTA)6/(TTA)6, 74.05% for (TTA)9/(TTA)6, and 74.38% for (TTA)9/(TTA)9), illustrating that patterns of gene action are consistent with additive gene effects (Fig. 4). Significant differences in holocellulose content for two haplotypes in PtoCesA12 were shown (73.17% for (TTA)6-(ATT)4-(AT)5 and 74.22% for (TTA)9-(ATT)3-(AT)5) (Table 3, Fig. 4), which was supported by two adjacent markers (C12-SSR1 and C12-SSR2) in high LD (r2> 0.8; P < 0.001; Fig. 2). RT-PCR testing also indicated that the levels of relative expression were significantly different between these two haplotypes (Fig. 5b). These observations reveal the potential importance of this gene in the variability of holocellulose content. CesA6-related genes, which are homologous to PtoCesA12, have been identified and are expressed during cellulose biosynthesis or deposition in Arabidopsis and P. trichocarpa (Desprez et al., 2007; Persson et al., 2007). However, this marker was not successfully validated in the linkage population. Numerous reasons have been proposed to explain why some true associations may not be replicated across independent data sets, including sample size, variability in phenotype definitions, genetic heterogeneity, environmental interactions, age-dependent effects, and gene–gene interactions (Neale & Savolainen, 2004; Greene et al., 2009; Beaulieu et al., 2011; Dillon et al., 2012). Generally, wood traits are expected to be influenced by many genes, with small effects; gene–gene interactions are also likely to be of critical importance. Additionally, populations with different genetic and environmental backgrounds may have unfavorable pedigree linkage disequilibrium and phenotypic variation (Neale & Savolainen, 2004). For example, the related species P. alba × P. glandulosa is the female parent in this validation population; diverse gene-by-environment interactions within and between sites were not accounted for; the association result may be false positive. These reasons might explain the ‘lack of validation’ for this important association in the linkage mapping population.
Variations in the quantity and quality of cellulose in plants are suspected to be primarily a result of enzymatic activities of different types of cellulose synthases (CesAs; Atanassov et al., 2009; Kumar et al., 2009). The CesA7 genes PtiCesA7-A and AtCesA7, homologs of PtoCesA2, are expressed in developing xylem tissue undergoing secondary wall thickening in Populus or in the xylem of Arabidopsis (Suzuki et al., 2006; Atanassov et al., 2009). In PtoCesA2, the result that marker C2-SSR1 was associated with α-cellulose and holocellulose content traits in both discovery and validation populations (Table 2) suggests that C2-SSR1 may be a functional polymorphism in or near a locus involved in cellulose synthesis during secondary cell wall formation in P. tomentosa. Significant individual SNP markers or haplotype associations have been reported in the homologous gene CesA2A in P. trichocarpa (Wegrzyn et al., 2010).
Factors affecting the power of association mapping
Gene-derived SSR markers may have functional significance in regulating gene expression and function (Li et al., 2004; Varshney et al., 2005). In this study, a vast amount of genetic variation in P. tomentosa natural populations (Zhang et al., 2007) was coupled with a low level of homoplasy of SSR markers derived from conserved gene regions (Li et al., 2004), providing an appropriate tool for candidate-gene-based association studies, although previous studies have reported that size homoplasy of SSR alleles and allele reversion could be a problem in association studies (Ching et al., 2002). The selection of polymorphic genic SSR markers with a low level of size homoplasy, along with SNPs, a traditional marker type used for association analysis, would provide better potential to detect functional allelic variation underlying quantitative traits. In addition, knowledge and selection of optimal candidate genes using different approaches, such as microarray analysis, EST database searches and quantitative trait locus (QTL) mapping, in model or related plant species (Neale & Savolainen, 2004; Thumma et al., 2005) provide an important basis for identifying useful alleles located within functional genes controlling traits of interest. Deviations from HWE for SSR loci can be indicative of genotyping errors, inbreeding, population subdivision, or selection (Balding, 2006). In this study, genotyping errors and inbreeding can be excluded based on correcting genotyping errors and low FIS values for each locus. Population subdivision is thought to be the most important explanation for deviations from HWE, which was in agreement with the various geographic origins of wild P. tomentosa (Du et al., 2012). Population structure can generate spurious genotype–phenotype associations. Thus, use of the unified mixed-model method (MLM) would improve control of both type I and type II error rates (Yu et al., 2006).
Phenotyping is an important part of association mapping for forest trees. A typical association population is usually composed of a diverse set of unrelated individuals at the same location, and to increase precision in phenotypic measurements, one must usually clonally replicate individuals to reduce environmentally induced noise and measurement errors. Hence, in this study, we used a total of 1380 phenotypes (460 genotypes × 3 ramets) to compensate for the deficiency of having a limited number of SSR markers. Furthermore, when the entire collection is replicated across multiple environments, data on replicates of each individual can be combined to produce a phenotype mean value for the accession analysis, which is less influenced by environment or measurement errors (Long & Langley, 1999). Therefore, replication of genotype–phenotype associations is also crucial in association mapping to distinguish false-positive associations and to provide less biased estimates of the size of allelic effects. Additionally, validation of biological function through transgenic experiments and other molecular biology techniques can be used to verify associations (Thumma et al., 2005; Abdurakhmonov & Abdukarimov, 2008). For example, real-time PCR and linkage analysis were employed in this study to confirm the results obtained from association mapping.
Polymorphisms of SSR markers and significant haplotypes representing PtoCesA candidate genes were used to evaluate the functional loci or genes associated with lignocellulosic cell wall development. We demonstrated that the candidate gene-based association approach, along with validation in a large linkage-mapping population and confirmation using real-time PCR testing, can be employed to identify naturally occurring allelic variation in genes associated with important wood-quality traits. This study provides insights into the genetic mechanisms underlying wood development, and identifies particular markers for tree MAS breeding programs with the goals of improving the quality and quantity of wood products.
We thank Profs Ronald R. Sederoff and Zhao-Bang Zeng (NC State University, Raleigh, NC, USA) for their detailed comments and specific suggestions for improving the manuscript. This work was supported by the State Key Basic Research Program of China (No. 2012CB114506) and the Project of the National Natural Science Foundation of China (No. 31170622, 30872042).