Nucleotide variation in genes involved in wood formation in two pine species

Authors


Author for correspondence: Christophe Plomion Tel: +33 5 57 12 28 30 Fax: +33 5 57 12 28 81 Email: plomion@pierroton.inra.fr

Summary

  • • Nucleotide diversity in eight genes related to wood formation was investigated in two pine species, Pinus pinaster and P. radiata.
  • • The nucleotide diversity patterns observed and their properties were compared between the two species according to the specific characteristics of the samples analysed.
  • • A lower diversity was observed in P. radiata compared with P. pinaster. In particular, for two genes (Pp1, a glycin-rich protein homolog and CesA3, a cellulose synthase) the magnitude of the reduction of diversity potentially indicates the action of nonneutral factors. For both, particular patterns of nucleotide diversity were observed in P. pinaster (high genetic differentiation for Pp1 and close to zero differentiation associated with positive Tajima's D-value for CesA3). In addition, KORRIGAN, a gene involved in cellulose–hemicellulose assembly, demonstrated a negative Tajima's D-value in P. radiata accompanied by a high genetic differentiation in P. pinaster.
  • • The consistency of the results obtained at the nucleotide level, together with the physiological roles of the genes analysed, indicate their potential susceptibility to artificial and/or natural selection.

Introduction

Identification of genes controlling quantitative trait variation is one of the great challenges of the post-genomic era. This knowledge is important not only for biomedicine but also for agriculture. In this latter field, such information would provide a way to manage and use the genetic variability in breeding and gene conservation programmes. The availability of markers linked to economically and ecologically relevant traits would be of particular interest in long-lived forest trees species such as conifers. Such tools would enhance the efficiency of artificial selection by reducing the duration of breeding cycles and increasing the genetic gain in each cycle. They would also provide criteria to manage functional genetic diversity, which is key to preserving adaptability of forest trees to their changing environment. Traditional forest tree breeding programmes have provided the forest industry with improved genotypes for wood production (e.g. 30% realized genetic gain for the volume of the bole in Pinus pinaster; Alazard & Raffin, 2003). The introduction of criteria towards wood quality selection is now considered as an important objective to ensure the sustainability of the wood market through the availability of raw material well suited to end-use products (Pot et al., 2002).

Wood property quantitative trait loci (QTLs) have been identified in many forest tree species, attesting the existence of major gene effects controlling part of the variation of wood and its end-use properties (Bradshaw & Stettler, 1995; Grattapaglia et al., 1996; Kumar et al., 2000; Lerceteau et al., 2000; Arcade et al., 2002; Moran et al., 2002; Neale et al., 2002; Brown et al., 2003; Markussen et al., 2003). Co-localizations of QTLs and candidate genes (Moran et al., 2002; Brown et al., 2003; Chagnéet al., 2003) have also been reported. However, given the large confidence intervals generally associated with QTLs (Mangin et al., 1994), these findings did not permit their validation. Complex trait dissection allowing the identification of individual genes is currently underway through association studies in humans and model animals (e.g. Drosophila). Recently, Thornsberry et al. (2001) have successfully transferred this approach to plants. In theory, association studies should be performed at a whole-genome level (also known as a genome scan); however, due to the specific features of conifers (short distance linkage disequilibrium (Brown et al., 2004; Neale & Savolainen, 2004) and extremely large genome size (Wakamiya et al., 1993)), a candidate-gene approach is the only possible way to understand the molecular basis underlying quantitative variation in these species.

Wood formation includes four major steps: cell division, cell expansion, secondary cell wall formation and cell death. These steps involve expression of a number of structural genes, coordinated by transcription factors, mainly involved in the biosynthesis of polysaccharides (cellulose: 40–50% of dry wood; hemi-cellulose: 25%; and pectins), lignins (25–35%), and cell wall proteins. A number of genes that determine cell wall composition and cell shape have been identified by classical biochemical analysis (e.g. lignification genes, reviewed in Whetten et al., 1998), and more recently by the application of the genomic tools such as gene or protein expression profiling (Plomion et al., 2000; Hertzberg et al., 2001; Le Provost et al., 2003; Gion et al., 2005) and the screening of large collections of Arabidopsis thaliana mutants (Fagard et al., 2000; Mouille et al., 2003).

Several studies have shown that wood structure and composition are influenced by environmental changes (Liphschitz & Waisel, 1970; Barber et al., 2000). The extent of these modifications has also been shown to be genetically regulated (Rozenberg et al., 2002), suggesting the potential functional role of xylogeneic genes in forest trees adaptation (Costa et al., 1998; Riccardi et al., 1998). In this context, it is possible that nucleotide diversity of these genes and their homologs in pine are involved in genetic variation of wood properties and, as such, may be subject to natural selection pressures in pine species.

For this study, eight candidate genes were selected based on their likely involvement in the determination of wood properties. Three were homologous to Arabidopsis thaliana cell wall mutant genes specifically involved in the cellulose and hemicellulose biosynthesis (a membrane-bound endo-1,4-beta-glucanase, KORRIGAN, and two cellulose synthases, CESA3 and CESA4). Five expressional candidate genes were also analysed. These genes have been identified through differential expression studies between different types of wood characterized by distinct chemical composition and structure (reviewed in Plomion et al., 2001). Pp2 (MYB-like transcriptional factor), Pp4 (ACC oxidase) and Pp6 (25S ribosomal gene) have been identified as being up-regulated in early wood, whereas Pp1 (glycine-rich protein homolog) was found to be up-regulated in late wood-forming tissue (Le Provost et al., 2003). Pr1 (unknown function protein) was isolated from wood forming tissue in P. radiata (S. Cato, unpublished data).

In the present study, nucleotide variation of these eight genes was analysed within and between two pine species: Pinus pinaster Ait. and Pinus radiata D.don, both of which are economically and ecologically important. Both species are currently the target for conservation efforts, and the accurate determination of their genetic structure at the functional level would help refine conservation strategies.

P. pinaster has a highly fragmented distribution over 4 Mha in the Mediterranean basin. This natural range includes highly variable climatic conditions, from more than 1000 mm rainfall in Tova (Corsica) to less than 100 mm in Oria (Spain), and soil structure that varies from sandy dunes to shallow rocky soils. The genetic structure of the species has been described using several sets of markers (reviewed in Burban & Petit, 2003) and reveals 18 geographically structured races belonging to three major groups: an Atlantic group, comprising populations from western France and the greater part of Spain and Portugal; a Mediterranean group, consisting of all eastern European populations, and including eastern Spanish populations up to Andalucía and the small stand of Punta Cires in Morocco; and a North African group comprising all the other African populations. Because of the fragmentation of its natural range, maritime pine exhibits a relatively high genetic differentiation among populations at nuclear markers in comparison to other conifer species. A high level of genetic differentiation was also observed for survival, adaptation to different climatic conditions, growth and phenology, resistance to insects and drought tolerance (reviewed in González-Martínez et al., 2002).

P. radiata grows naturally in five locations: Año Nuevo, Monterey and Cambria on the Californian mainland coast and Guadalupe and Cedros islands off the coast of Baja California. These five locations differ substantially from each other with respect to soil, elevation, temperature, rainfall and ecosystem associates. At the genetic level, significant differentiation was observed between the different populations (ranging between 0.119 and 0.26, depending on the type of markers and the populations considered; Moran et al., 1988; Wu et al., 1999; Karhu, 2001). Although the natural range of P. radiata is extremely small, it is the world's most widely planted fast-growing softwood species. It is cultivated on a commercial scale in Australia, Chile, South Africa and New Zealand.

The objectives of this study were twofold. The first was to study the patterns of nucleotide diversity of the eight chosen candidate genes in P. radiata and P. pinaster. More explicitly, we described for the first time in these two species, the type (SNP vs INDEL), nature (silent vs nonsynonymous) and genomic location (coding vs noncoding) of nucleotide polymorphisms. The second goal was to investigate whether nucleotide diversity patterns were compatible with neutral models or not.

Materials and methods

Plant material and DNA extraction

Haploid megagametophytes, a maternal tissue surrounding the diploid embryo in conifer seeds, were harvested from germinated seedlings just before the seed coat was cast off. Genomic DNA was extracted as described by Plomion et al. (1995). P. pinaster nucleotide diversity was assessed using megagametophytes collected from natural stands across the species natural range (Table 1). Twenty-four gametes from 13 provenances belonging to the three main groups identified by Baradat and Marpeau (1988) were included in this exploratory analysis. In a second step, for one of the genes (CesA3) the sample size was extended to 91 megagametophytes (Table 1). P. radiata nucleotide diversity was estimated using 23 megagametophytes collected from individual trees of the New Zealand breeding population (Forest Research, Rotorua, New Zealand). Previous studies, based on monoterpene analysis (Burdon et al., 1997a) and morphological traits (Burdon et al., 1997b) have shown that the local race was introduced from the USA during the 19th century and mostly derived from the Año Nuevo population, with some admixture from the Monterey population.

Table 1.  List of Pinus pinaster and Pinus radiata populations
SpeciesCountryPopulationLatitudeLongitudeAltitude (m)Sample sizeaGroupb
  • a

    Number of megagametophytes analysed per gene. For CesA3, a wider sample was studied. The sample size analysed for each population for this gene is indicated in parentheses.

  • b

    P. pinaster groups based on Burban & Petit (2003); P. radiata groups based on Burdon et al. (1997a,b).

P. pinasterTunisiaTabarka36°57′ N8°46′ E 200 2 (8)Mediterranean
 FranceCorsica Porto Vecchio41°28′ N9°12′ E 150 2 (7)Mediterranean
Corsica Vivario41°20′ N9°09′ E 600 2 (5)Mediterranean
Corsica Zonza41°45′ N9°11′ E 760 2 (6)Mediterranean
Aquitaine Castets43°52′ N1°08′ W  60 2 (3)Atlantic
Aquitaine Mimizan44°08′ N1°18′ W  35 2 (9)Atlantic
Aquitaine Souston43°41′ N1°25′ W  35 2 (4)Atlantic
Aquitaine Hourtin45°10′ N1°08′ W  40 4 (10)Atlantic
Aquitaine Medoc45°34′ N1°13′ W  40 2 (18)Atlantic
 PortugalLeiria Mata40°00′ N8°45′ W  50 1 (5)Atlantic
Leiria Velha40°00′ N8°45′ W  50 1 (4)Atlantic
 MoroccoPunta Cires35°55′ N5°28′ W  20 1 (6)Atlantic
Tamjout33°52′ N4°02′ W1600 1 (4)North African
P. radiataNew ZealandNZ breeding population (land race)   23Año Nuevo and Monterey

Primer design, PCR amplification and DNA sequencing

For each gene, a BLAST search (Altschul et al., 1997) was first run to identify homologs in pine expressed sequence tag (EST) databases available at http://cbi.labri.fr/outils/SAM/COMPLETE/index.php for P. pinaster and http://fungen.org/Projects/Pine/Pine.htm for Pinus taeda (Table 2). From the multiple alignments of the retrieved sequences, a consensus sequence was then derived for each candidate using sequencher v4.1.4 (Genecodes, Inc, Ann Arbor. Michigan USA). Primer pairs (Table 3) were designed from the consensus sequence using primer 3 (Rozen & Skaletsky, 2000).

Table 2.  Summary of the studied genes
Gene IDFunctionAccessionaBase pairs screenedNumber of homologs with pine EST (E-value < 1−10)
TotalExonIntron3′ UTRP. pinasterbP. taedac
KORRIGANmembrane-bound endo-(1-4)-β- glucanase (EC:2.4.1.12)BV079723 937566371  0 8 47
CesA3cellulose synthase (EC:2.4.1.12)BV079715+BV0797171048810238  0 8 49
CesA4cellulose synthase (EC:2.4.1.12)BV079716 489396  0 93 7 22
Pp1glycine-rich protein homologBV079718 493240  025322133
Pp2MYB-like transcriptional factor MBF1BV079719 494494  0  0 1  3
Pp4ACC oxidaseBV079720 270270  0  0 1 42
Pp625S rRNA geneBV079721+BV079722 902902  0  033176
Pr1unknown proteinBQ701569 113113  0  0 0 15
Table 3.  List of primer pairs and amplification conditions
Gene IDPrimer pairsAmplification conditions
Forward primerReverse primerTa (°C)Mg2+ (mm)
  • a

    For Pp6 and CesA3, two primer pairs were designed.

KORRIGANGCAGGACTATGGTGTTTTAAGCTATTCCCCCAGTATCACCCC59–503
CesA4AGATCTTGCTCAATGCCTCGCCAAACTTCACTGTCACATCG59–503
CesA3aaGCTTTGAGAAGTCGTTTGGCGTATGCCAGTCTTTCCAGCC64–552
CesA3baCATTGGTTCGAGTCTCTGCCTAACACACCAAGAGGCCACC59–503
Pp1GAGTTCTCAAGGGATGTCGGTAACACACCAAGAGGCACC59–503
Pp2AACAGATCATCCATCTCGGGACAGATGGTCATTGATCGCC59–503
Pp4GAACATCTACCCTGCTTGCCTGAAATTCCTAACATGCTCCC59–503
Pp6aaTTTTGATCCTTCGATGTCGGGAATCTCAGTGGATCGTGGC59–503
Pp6baAAATTCAACCAAGCGCGGCTTTTAACAGATGTGCCGCC59–502
Pr1ATCGCATGGGAGTTGCAGCATGTCAGCCTCGGTTTGG64–552

PCR products were sequenced using the Big Dye terminator kit (Amersham Bioscience, Uppsala, Sweden) and an ABI 3100 automatic sequencer (Applied Biosystem, Foster City, CA, USA) according to the manufacturers’ specifications. A single sequence was obtained per megagametophyte for each candidate gene. Singleton polymorphisms were verified through re-sequencing of the affected megagametophyte sample.

Landscape of nucleotide diversity

Sequence alignment and nucleotide polymorphism detection were performed with sequencher v4.1.4. Each polymorphic site was visually checked on the chromatograms in order to distinguish true polymorphisms from scoring errors. The use of haploid tissues greatly facilitated the sequence analysis, allowing the direct definition of the haplotypes (multilocus combinations of polymorphisms) without cloning or using an expectation maximization (EM) algorithm (Long et al., 1995).

Basic parameters including the number of single nucleotide polymorphisms (SNPs), insertion–deletions (INDELs), synonymous (S) and nonsynonymous (NS) mutations were calculated using the site software (Hey & Wakeley, 1997). Nucleotide diversity was estimated as θw (based on the number of segregating sites; Watterson, 1975) and π (based on the average number of nucleotide differences per site between sequences; Nei, 1987). These parameters were computed with site, without considering INDELs, at three different levels: (i) the whole sequenced region; (ii) noncoding regions (including introns, 3′ and 5′ untranslated regions (UTRs)); and (iii) coding regions, subdivided in two components – S and NS. The number of haplotypes and the haplotype diversity were calculated using the dnasp software (Rozas & Rozas, 1999).

Tests for selection were performed to estimate whether the considered genes followed the model of neutral evolution (Kimura, 1983) or not. Tajima's D-test, based on the allelic distribution (Tajima, 1989), was carried out using arlequin 2.000 software (Shneider et al., 2000). As implemented in this software, significance of this test was tested by generating random samples under the hypothesis of neutrality and population demographic equilibrium. This test was performed assuming the absence of recombination, making it conservative.

Levels of differentiation (FST) between Corsican and Aquitaine populations were estimated for all the studied genes with the analysis of molecular variance (amova) (Excoffier et al., 1992) as implemented in the arlequin 2.000 software (Schneider et al., 2001). In addition, differentiation among all the studied populations was also estimated for CesA3.

Considering first the small number of sequences analysed for most of the genes, and second the existence of significant differentiation for some of them, linkage disequilibrium (LD) was only computed for CesA3 for which a larger sample size was available. Given the absence of significant differentiation for this gene, LD between polymorphic sites was estimated using the whole set of sequence with dnasp. Fisher's exact tests and Bonferroni correction for multiples tests were computed to determine whether the detected associations were significant or not.

Total divergence between P. pinaster and P. radiata, estimated as the average number of nucleotide substitutions per site, was finally calculated using dnasp.

Results

Nucleotide variation at the intraspecific level

Sequence data for almost the complete set of gametes were obtained for six out of the eight genes analysed. For Pp1 and Pp2, only 12 and 14 high-quality sequences, respectively, were obtained in P. radiata, probably as a result of the coamplification of other family members. Sequences were deposited in dbSTS (http://www.ncbi.nlm.nih.gov/dbSTS/) and SNPs in dbSNP (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp).

The regions analysed covered a total of 4.7 kb, corresponding to 3.8 kb of coding sequence and 0.9 kb of-noncoding regions (intron and 3′ UTR) (Table 2). A total of 32 (29 SNPs and three INDELs) and 13 (exclusively SNP) intraspecific polymorphisms were detected in P. pinaster and P. radiata, respectively. All the INDELs were single-based and located in noncoding regions. A total of 10 singletons were identified (seven in P. pinaster and three in P. radiata). All the nonsynonymous polymorphisms were conservative or moderately conservative according to the classification of Grantham (1974).

The average nucleotide diversity was slightly higher for P. pinaster (0.00241) than for P. radiata (0.00186). This difference mainly relied on Pp1 and CesA3, for which 11 and nine polymorphic sites were detected in P. pinaster, whereas only one and two polymorphic sites, respectively, were detected in P. radiata. Although the numbers of sequences analysed were smaller for P. radiata for these two genes, π will not be better estimated with a sample of sequences above 10 as its variance levels off very quickly (Tajima, 1983), thus the divergence of the estimates probably does not result from these unequal sample sizes. Apart from these two genes, close correspondence between the nucleotide diversity estimates in the two species was observed.

The average number of haplotypes (3.375 in P. pinaster vs 2.375 in P. radiata) and the average haplotype diversity (0.425 in P. pinaster vs 0.376 and P. radiata) were, like the total nucleotide diversity, slightly higher in P. pinaster. Large variations in haplotype number and haplotype diversity were observed among the genes in both species. The number of haplotypes varied from one to six. With the exception of CesA3 and Pp1, the numbers of haplotypes were consistent among species.

Neutrality tests

Tajima's D-tests were performed exclusively for the genes presenting at least five polymorphic sites (Table 4). Significant departure from the null hypotheses of neutrality and demographic equilibrium at P < 0.05 was observed only for KORRIGAN in P. radiata. For all the genes, these tests were performed on the whole set of sequences available.

Table 4.   Pattern of nucleotide variation
Pinus pinaster Gene IDKORRIGANCesA3CesA4Pp1Pp2Pp4Pp6Pr1Total
Number of sequences2491232422242424256
INDEL100200003
SNP
 Total
  Sa5919210229
  Singleton011310017
  π0.001760.002600.000190.006960.001210.0008300.005730.00241
  θw0.001730.001770.000580.005150.001160.0010700.00560.00213
  Tajima's D0.630891.121471.20137 
 Noncoding
  S347115
  π0.001320.001020.00660.000830.00244
  θw0.001040.000820.0040.001070.00173
 Coding
  Total
   S251220012
   π0.000450.000840.000190.000360.00121000.00043
   θw0.000690.001020.000580.001140.0116000.00214
  Synonymous
   S23121009
   π0.000450.000780.000190.000360.00018000.00028
   θw0.000350.000610.000580.001140.00058000.00046
  Nonsynonymous
   S02001003
   π00.00006000.00103000.00015
   θw00.00041000.00058000.00014
Number of haplotypes562632133.375
Haplotype diversity0.8090.6070.0910.6560.5110.26800.5370.425
 (SE)(0.057)(0.059)(0.081)(0.079)(0.091)(0.113) (0.052) 
Pinus radiata Gene IDKORRIGANCesA3CesA4Pp1Pp2Pp4Pp6Pr1Total
  • a

    Number of SNPs;

  • b

    Significant Tajima's D-value (P < 0.05).

Number of sequences1821201214232321153
INDEL000000000
SNP
 Total
  Sa5211200113
  Singleton200010003
  π0.001980.000480.001010.001140.00374000.006450.00186
  θw0.002910.001000.000640.000890.00359000.005880.00191
  Tajima's D−1.97b 
 Noncoding
  S31105
  π0.001390.000260.0011400.00069
  θw0.001750.00050.0008900.00078
 Coding
  Total
   S21102006
   π0.000580.000220.0010100.00384000.00081
   θw0.001170.00050.0006400.04000.00604
  Synonymous
   S10102004
   π0.0004300.0010100.00384000.00075
   θw0.0005800.0006400.04000.00588
  Nonsynonymous
   S11000002
   π0.000160.00022000000.00005
   θw0.000580.0005000000.00015
Number of haplotypes422231142.375
Haplotype diversity0.6730.3270.4560.5560.604000.3390.376
 (SE) (0.123)(0.153)(0.085)(0.09)(0.076) (0.138) 

For CesA3, Tajima's D-test was first performed in the Aquitaine provenance exclusively, and then, according to the nonsignificant level of differentiation observed for this gene (see the next section, ‘Populations differentiation in P. pinaster’), performed also considering all the sequences available. Both calculations yielded the same result, that is a positive but nonsignificant Tajima's D-value (D = 1.12147, P = 0.117 for the whole area of distribution and D = 1.03263, P = 0.131 for the Aquitaine provenance). Estimation of the local recombination parameter R (Hudson, 1987) for this gene, and its subsequent integration in the calculation of the Tajima's D-value expected distribution using coalescence simulation in dnasp, did not change its significance.

Populations differentiation in P. pinaster

FST estimated using all the polymorphic sites revealed a significant differentiation between Corsican and Aquitaine provenances (FST = 0.22). However, this high level of differentiation relied exclusively on two genes (KORRIGAN and Pp1) for which highly significant differentiation were observed (0.45 and 0.23, respectively, in Table 5). In comparison, Mariette et al. (2001) observed a significant GST value of between 0.049 and 0.092 for amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR), respectively. Accounting for the difference in estimation methods which show that FST values are equivalent to twice the GST values (Nei, 1987), the differentiation between both groups of populations for both Pp1 and KORRIGAN is more than twice as high as that obtained for AFLP markers (0.098). Compared with SSR (0.184), although the differentiation values observed for both genes remain higher, the Pp1 value of differentiation is just slightly higher, whereas that of KORRIGAN is still twice as high.

Table 5. FST estimates between Corsican and Aquitaine populations
GeneFst
  • a

    Significant test.

CesA3−0.05482
KORRIGAN 0.45267a
Pp1 0.23280a
Pp2−0.14549
Pp4 0.14286
Pr1−0.05504
All (31 polymorphic sites) 0.22395a
GST AFLP–SSR (Mariette et al. 2001) 0.049–0.092

In addition, a wider sampling for CesA3 allowed us to test the differentiation between the 13 populations. No significant differentiation was observed and the estimated value is very close to zero. This result deviates from the significant differentiation observed with neutral markers at the level of the whole geographic distribution of maritime pine (Petit et al., 1995).

If FST estimates are probably dependent on the very few polymorphic sites (from one to four) detected for CesA4, Pp2, Pp4 and Pr1, the differentiation estimated for KORRIGAN, CesA3 and Pp1 for which at least five polymorphic sites were analysed are certainly more representative of the gene values.

Linkage disequilibrium

Linkage disequilibrium was only calculated for CesA3, as the other genes presented either strong population differentiation combined with only small population size analysed or low level of polymorphism. Out of the 36 tests performed (nine polymorphic sites), 11 were significant, after Bonferroni's correction for multiple testing.

Nucleotide variation at the interspecific level

Sixty-three polymorphisms including 59 SNPs and four INDELs distinguished P. pinaster from P. radiata (Table 6). All INDELs were located in the noncoding region. The total number of interspecific fixed differences varied from 0 for Pr1 to 21 for CesA3 (Table 6). NS fixed differences were found for five genes (KORRIGAN, CesA3, Pp1, Pp2 and Pp4). Three of these NS fixed differences were moderately radical regarding the amino acid modification (Grantham, 1974): two sites in Pp2 (modifications SER to ARG and GLY to ARG) and one site in Pp4 (VAL to SER).

Table 6.  Fixed differences between Pinus pinaster and Pinus radiata and estimates of total divergence D (x,y)
Gene IDNumber of INDELsSNP
NoncodingCoding synonymousCoding nonsynonymousD (x,y)
KORRIGAN32310.00686
CesA308850.02098
CesA411300.0083
Pp106320.0234
Pp200660.02444
Pp400220.0155
Pp602000.00228
Pr100000

Under neutral evolution, interspecific divergence is expected to be proportional to intraspecific nucleotide diversity. Comparison of divergence and nucleotide diversity revealed that only Pr1 diverged from this pattern. However, ue to the small size of the fragment analysed (113 bp), no particular hypothesis could be provided. A wider exploration of the diversity of this gene will be required before any conclusion can be drawn.

Discussion

Adequacy between the sampling strategy and SNP detection probability

The probability P of detecting the two alleles at a SNP locus depends on three parameters: (i) the number of gametes sampled, N; (ii) the frequency of the rare allele in the population, p; and (iii) the organization of gene diversity. In the absence of differentiation among populations, P = 1 – (1 – p)N. In the present study, for each species, on average 21 gametes were sequenced for each DNA fragment, resulting in a detection probability of 89% for a rare allele frequency of 10%.

In respect to P. pinaster, the probability of detecting polymorphic loci was probably maximized considering: (i) the scattered sample used in our study covering the three main groups of diversity; (ii) the moderate level of genetic differentiation at the neutral level between geographical provenances (GST = 0.14–0.17; Petit et al. 1995) for isozymes, proteins and terpenes, with populations from France, Portugal, Corsica, Spain, Italy, Sardinia; and (iii) the rather low differentiation within provenances (GST = 0.04 for isozymes, cpSSR, nuclear SSR and AFLP markers, within Spain, Portugal, Aquitaine and Corsica: Mariette et al., 2001; González-Martínez et al., 2002; Ribeiro et al., 2002). However, it is important to note that the North African group, which constitutes a singular mitochondrial lineage, with highly differentiated populations, was under-represented in this study and probably led to an under-estimation of the nucleotide diversity of some of the genes.

Concerning P. radiata, as reported in the Material and Methods section, the sample used in this study corresponds to the first generation of the New Zealand breeding population, which derived from the Año Nuevo population with some admixture from the Monterey population. Johnson and Lipow (2002) showed that first-generation seed orchards retain most of the genetic diversity present in the natural populations from which they were derived. As a consequence, the results obtained for P. radiata should reflect the nucleotide diversity present in its ancestral populations. Indeed, using nuclear and chloroplast microsatellite loci, no significant changes in diversity were found between the five natural populations of P. radiata, and the current New Zealand breeding populations (T. Richardson, Forest Research, New Zealand, pers. comm.). It is, however, important to note that, according to the selection criteria used to select the first-generation breeding population (i.e. growth and form), some of the genes controlling these traits could have been submitted to artificial selection events leading to a reduction in their diversity.

Nucleotide diversity in wood formation related genes

Polymorphic sites were found in almost all the genes analysed, providing the basis to initiate association studies to test the involvement of these genes in the variability of the traits of interest. The availability of haploid tissue enabled the definition of the different haplotypes, allowing a reduction of the polymorphic sites to be genotyped. For instance in P. pinaster, only 21 markers (SNPs and INDELs) will have to be genotyped to define the haplotypic composition, instead of the 32 polymorphic sites discovered. This subset of SNP tags was defined exclusively based on the haplotypes observed in the studied sample. Although linkage disequilibrium analysis would allow a reduction of this SNP tag set, such analysis was not performed, given the high differentiation observed for some of the analysed genes and the small sample size of each population analysed.

In spite of the exploratory nature of this study, limited to a restricted set of genes, it is interesting to note that the results obtained here agree with previous nucleotide surveys in conifers (Table 7). Although comparative diversity analyses using allozymes have shown that conifers are among the most genetically diverse organisms (Hamrick & Godt, 1990), nucleotide data do not support this statement. Indeed, the nucleotide diversity of conifers is higher than in humans but lower than in Zea mays. Interestingly, the nucleotide diversity levels reported in broadleaved trees such as Populus or Quercus are also significantly higher than in conifers (Table 7); the reasons for this divergence remain to be found.

Table 7.  Estimates of nucleotide diversity in different species
SpeciesNumber of lociNumber of genotypesLength (bp)Coverage of the natural distributionTotal nucleotide diversity (π)Reference
  1. NA, data not available.

Pinus pinaster 1022–91 4 746yes0.00241this study
Pinus radiata 1012–24 4 746no0.00186this study
Pinus taeda 193217 580yes0.00395Brown et al. (2004)
Pinus taeda 28NANANA0.00489 (θw)Neale & Savolainen (2004)
Pinus taeda 183210 116yes0.00533S. C. González-Martínez, CIFOR-INIA, Madrid, pers. comm.
Pinus sylvestris  212–15 4 136yes0.0007García-Gil et al. (2003)
Pinus sylvestris  120 2 045yes0.0014Dvornyk et al. (2002)
Cryptomeria japonica  74810 158yes0.00252Kado et al. (2003)
Pseudotsuga menziesii 12NANANA0.00853Neale & Savolainen (2004)
Populus tremula  524 6 188no0.0111Ingvarsson (2005)
Quercus petraea  727 3 083yes0.00722J. Derory, INRA Pierroton, pers. comm.
Glycine max L. Merr.1422576 000no, restricted to ancestors of North American cultivars0.00125Zhu et al. (2003)
Arabidopsis thaliana  920yes0.0067reviewed in Aguadé (2001)
Beta vulgaris 37 218 002no0.0076Schneider et al. (2001)
Zea mays 1836 6 935no, restricted to US elite maize breeding pool0.0063Ching et al. (2002)
Zea mays  612–25NAyes0.00871reviewed in White & Doebley (1999)

Lower diversity in P. radiata: consequences of neutral process or genes controlling traits submitted to selection

A trend towards lower nucleotide diversity was observed for P. radiata compared with P. pinaster. This result is consistent with our previous knowledge regarding the populations analysed. Although P. pinaster is characterized by a large geographic distribution, the natural range of P. radiata is extremely small. In addition, the populations analysed in this study covered different ranges of the distribution according to the considered species. For P. pinaster, almost the whole geographic distribution was analysed, whereas for P. radiata, only a subset of the total variation was analysed. As a consequence, the lower nucleotide diversity observed for P. radiata agrees with its lower population effective size compared with P. pinaster.

Although a lower diversity is expected in P. radiata under a neutral model of evolution, two genes (Pp1 and CesA3) presented an abnormally strong reduction of diversity in this species. A plausible hypothesis would be the concomitant effects of the smaller population effective size combined to the existence of natural and/or artificial selection acting on these genes. Such a scenario would have lead to the elimination of some alleles, resulting in an unusually low diversity level. Several concomitant results in P. pinaster support this hypothesis.

For Pp1 in P. pinaster, a higher differentiation than at the neutral level was observed. Such a differentiation pattern would be consistent with a ‘diversifying selection’ acting at this locus in P. pinaster. Evidence of selection at the molecular level for this gene would be consistent with its physiological role: Pp1 is a glycine-rich protein (GRP) that has been shown to be differentially expressed between differentiating xylem associated with different types of wood characterized by different physical and chemical properties; in other words, early vs late wood (Le Provost et al., 2003) and opposite vs compression wood (Allona et al., 1998; Zhang et al., 2000; Le Provost et al., 2003). Cell wall GRPs are localized in vascular tissues and are thought to provide elasticity as well as tensile strength during vascular development (Cassab, 1998). Polymorphisms inducing variation of these properties would definitely affect the adaptation of the tree to its environmental conditions and thus be preferentially fixed in certain conditions. In the case of P. radiata, according to the negative genetic correlations often reported between growth and wood quality in conifers (Rozenberg & Cahalan, 1997; Pot et al., 2002), the reduction of diversity observed may have resulted from the artificial selection on growth applied to the New Zealand land race.

The absence of differentiation observed for CesA3 in P. pinaster compared with the significant level observed for neutral markers (Petit et al., 1995) provides a strong indication of balancing selection acting on this gene. The positive Tajima's D-values reported for this gene for the whole area of distribution and for the Aquitaine provenance tend to confirm this hypothesis. Indeed, such values would not be expected in the case of no differentiation. Furthermore, the relatively high haplotype structure observed for this gene (high haplotype diversity, low number of haplotypes compared with the number of polymorphic sites, high level of linkage disequilibrium) also indicates the same tendency toward the action of balancing selection. These hypotheses of possible deviations from neutrality for CesA3 are consistent with its role in cellulose biosynthesis. Cellulose is one of the major components of the cell wall. In temperate zones, climatic variation during the annual course of the vascular cambium give rise to early wood formed early during the growing season, and late wood formed in late summer. This environmental pressure could strongly affect the major change in cellulose content recognized between these two types of wood.

As in the case of Pp1, the reduction of diversity observed for CesA3 in P. radiata would be consistent with the involvement of this gene in the genetic determinism of wood quality, a trait negatively correlated to growth.

KORRIGAN, a gene involved in polysaccharides biosynthesis, as a putative target of natural selection

Several results that include a high differentiation between Corsican and Aquitaine populations in P. pinaster and a significant negative Tajima's D-value in P. radiata suggest KORRIGAN as a potential target of selection in these species.

The high differentiation observed in P. pinaster is consistent with the existence of diversifying selection that would have lead to the prevalence of different haplotypes, as a consequence of their role in local adaptation to the particular environmental conditions encountered.

In P. radiata, the significant negative Tajima's D-value may result from a past selection event on this gene, or may be a recent one, which would be consistent with the relatively strong haplotype structure (only four haplotypes for 18 sequences and five polymorphic sites). Thus the excess of rare frequency polymorphisms would be consistent with a hitchhiking event in the P. radiata population. An alternative neutral hypothesis would be the recent expansion of the New Zealand breeding population.

The role of KORRIGAN is consistent with deviation from neutrality. Indeed, KORRIGAN is involved in the biosynthesis of cellulose, the main compound of the cell wall and whose amount is genetically controlled (Zobel & Buijtenen, 1989; Pot et al., 2002; Sewell et al., 2002), and which provides strength and flexibility to plant tissue. It encodes a β-1,4 endoglucanase, which catalyses the cleavage of the cellodextrin from the sistosterol cellodextrin (Nicol et al., 1998; Peng et al., 2002) before the proper synthesis of the cellulose microfibrils by the cellulose synthase complex. Its importance in this pathway has already been underlined. It is indeed strongly differentially expressed between early and late wood, presenting an overexpression in late wood which is characterized by a higher proportion of cellulose (accession AL750476 in Le Provost, 2003).

Recent studies tend to confirm the central role of this gene in the genetic variability of cell wall composition. Indeed, significant relationships between KORRIGAN polymorphisms and polysaccharides content were detected (coincident with QTLs in a three-generation outbreed pedigree; Pot, 2004). Also a significant association was observed in the P. pinaster first-generation breeding population between one KORRIGAN SNP and cellulose content (P. Garnier-Géré, pers. comm.). These observations reveal the potential importance of this gene in the variability of polysaccharide content, a trait that may be subjected to natural selection pressures.

Conclusion and perspectives

This exploratory study allowed the identification of polymorphisms in eight wood formation related genes in P. pinaster and P. radiata. This information is currently used in association studies to test their involvement in the phenotypic variability of economically important traits linked to wood structure and chemical composition in these two species.

The analysis of the patterns of nucleotide diversity obtained at the intra and interspecific levels provided some indications on adaptative evolution at the molecular level for KORRIGAN, Pp1 and CesA3. These interpretations are consistent with the demonstrated physiological role of these genes, and with recent data obtained in QTL mapping experiments and association studies.

Acknowledgements

We gratefully acknowledge the financial support from the French Ministry of Foreign Affairs, who granted a collaborative project «Déterminisme génétique et moléculaire de la qualité du bois chez les conifères» between INRA (Cestas, France) and Forest Research (Rotorua, New Zealand), the European Union (GEMINI: QLRT-1999-00942, FEDER:2003227) and the Aquitaine Region (2004-03-05-003FA). We thank Drs Valérie Lecorre, Philippe Rozenberg, Santiago C. González-Martínez and Antoine Kremer for their comments on earlier versions of the manuscript.

Ancillary