Identification of genes controlling quantitative trait variation is one of the great challenges of the post-genomic era. This knowledge is important not only for biomedicine but also for agriculture. In this latter field, such information would provide a way to manage and use the genetic variability in breeding and gene conservation programmes. The availability of markers linked to economically and ecologically relevant traits would be of particular interest in long-lived forest trees species such as conifers. Such tools would enhance the efficiency of artificial selection by reducing the duration of breeding cycles and increasing the genetic gain in each cycle. They would also provide criteria to manage functional genetic diversity, which is key to preserving adaptability of forest trees to their changing environment. Traditional forest tree breeding programmes have provided the forest industry with improved genotypes for wood production (e.g. 30% realized genetic gain for the volume of the bole in Pinus pinaster; Alazard & Raffin, 2003). The introduction of criteria towards wood quality selection is now considered as an important objective to ensure the sustainability of the wood market through the availability of raw material well suited to end-use products (Pot et al., 2002).
Wood property quantitative trait loci (QTLs) have been identified in many forest tree species, attesting the existence of major gene effects controlling part of the variation of wood and its end-use properties (Bradshaw & Stettler, 1995; Grattapaglia et al., 1996; Kumar et al., 2000; Lerceteau et al., 2000; Arcade et al., 2002; Moran et al., 2002; Neale et al., 2002; Brown et al., 2003; Markussen et al., 2003). Co-localizations of QTLs and candidate genes (Moran et al., 2002; Brown et al., 2003; Chagnéet al., 2003) have also been reported. However, given the large confidence intervals generally associated with QTLs (Mangin et al., 1994), these findings did not permit their validation. Complex trait dissection allowing the identification of individual genes is currently underway through association studies in humans and model animals (e.g. Drosophila). Recently, Thornsberry et al. (2001) have successfully transferred this approach to plants. In theory, association studies should be performed at a whole-genome level (also known as a genome scan); however, due to the specific features of conifers (short distance linkage disequilibrium (Brown et al., 2004; Neale & Savolainen, 2004) and extremely large genome size (Wakamiya et al., 1993)), a candidate-gene approach is the only possible way to understand the molecular basis underlying quantitative variation in these species.
Wood formation includes four major steps: cell division, cell expansion, secondary cell wall formation and cell death. These steps involve expression of a number of structural genes, coordinated by transcription factors, mainly involved in the biosynthesis of polysaccharides (cellulose: 40–50% of dry wood; hemi-cellulose: 25%; and pectins), lignins (25–35%), and cell wall proteins. A number of genes that determine cell wall composition and cell shape have been identified by classical biochemical analysis (e.g. lignification genes, reviewed in Whetten et al., 1998), and more recently by the application of the genomic tools such as gene or protein expression profiling (Plomion et al., 2000; Hertzberg et al., 2001; Le Provost et al., 2003; Gion et al., 2005) and the screening of large collections of Arabidopsis thaliana mutants (Fagard et al., 2000; Mouille et al., 2003).
Several studies have shown that wood structure and composition are influenced by environmental changes (Liphschitz & Waisel, 1970; Barber et al., 2000). The extent of these modifications has also been shown to be genetically regulated (Rozenberg et al., 2002), suggesting the potential functional role of xylogeneic genes in forest trees adaptation (Costa et al., 1998; Riccardi et al., 1998). In this context, it is possible that nucleotide diversity of these genes and their homologs in pine are involved in genetic variation of wood properties and, as such, may be subject to natural selection pressures in pine species.
For this study, eight candidate genes were selected based on their likely involvement in the determination of wood properties. Three were homologous to Arabidopsis thaliana cell wall mutant genes specifically involved in the cellulose and hemicellulose biosynthesis (a membrane-bound endo-1,4-beta-glucanase, KORRIGAN, and two cellulose synthases, CESA3 and CESA4). Five expressional candidate genes were also analysed. These genes have been identified through differential expression studies between different types of wood characterized by distinct chemical composition and structure (reviewed in Plomion et al., 2001). Pp2 (MYB-like transcriptional factor), Pp4 (ACC oxidase) and Pp6 (25S ribosomal gene) have been identified as being up-regulated in early wood, whereas Pp1 (glycine-rich protein homolog) was found to be up-regulated in late wood-forming tissue (Le Provost et al., 2003). Pr1 (unknown function protein) was isolated from wood forming tissue in P. radiata (S. Cato, unpublished data).
In the present study, nucleotide variation of these eight genes was analysed within and between two pine species: Pinus pinaster Ait. and Pinus radiata D.don, both of which are economically and ecologically important. Both species are currently the target for conservation efforts, and the accurate determination of their genetic structure at the functional level would help refine conservation strategies.
P. pinaster has a highly fragmented distribution over 4 Mha in the Mediterranean basin. This natural range includes highly variable climatic conditions, from more than 1000 mm rainfall in Tova (Corsica) to less than 100 mm in Oria (Spain), and soil structure that varies from sandy dunes to shallow rocky soils. The genetic structure of the species has been described using several sets of markers (reviewed in Burban & Petit, 2003) and reveals 18 geographically structured races belonging to three major groups: an Atlantic group, comprising populations from western France and the greater part of Spain and Portugal; a Mediterranean group, consisting of all eastern European populations, and including eastern Spanish populations up to Andalucía and the small stand of Punta Cires in Morocco; and a North African group comprising all the other African populations. Because of the fragmentation of its natural range, maritime pine exhibits a relatively high genetic differentiation among populations at nuclear markers in comparison to other conifer species. A high level of genetic differentiation was also observed for survival, adaptation to different climatic conditions, growth and phenology, resistance to insects and drought tolerance (reviewed in González-Martínez et al., 2002).
P. radiata grows naturally in five locations: Año Nuevo, Monterey and Cambria on the Californian mainland coast and Guadalupe and Cedros islands off the coast of Baja California. These five locations differ substantially from each other with respect to soil, elevation, temperature, rainfall and ecosystem associates. At the genetic level, significant differentiation was observed between the different populations (ranging between 0.119 and 0.26, depending on the type of markers and the populations considered; Moran et al., 1988; Wu et al., 1999; Karhu, 2001). Although the natural range of P. radiata is extremely small, it is the world's most widely planted fast-growing softwood species. It is cultivated on a commercial scale in Australia, Chile, South Africa and New Zealand.
The objectives of this study were twofold. The first was to study the patterns of nucleotide diversity of the eight chosen candidate genes in P. radiata and P. pinaster. More explicitly, we described for the first time in these two species, the type (SNP vs INDEL), nature (silent vs nonsynonymous) and genomic location (coding vs noncoding) of nucleotide polymorphisms. The second goal was to investigate whether nucleotide diversity patterns were compatible with neutral models or not.
Materials and methods
Plant material and DNA extraction
Haploid megagametophytes, a maternal tissue surrounding the diploid embryo in conifer seeds, were harvested from germinated seedlings just before the seed coat was cast off. Genomic DNA was extracted as described by Plomion et al. (1995). P. pinaster nucleotide diversity was assessed using megagametophytes collected from natural stands across the species natural range (Table 1). Twenty-four gametes from 13 provenances belonging to the three main groups identified by Baradat and Marpeau (1988) were included in this exploratory analysis. In a second step, for one of the genes (CesA3) the sample size was extended to 91 megagametophytes (Table 1). P. radiata nucleotide diversity was estimated using 23 megagametophytes collected from individual trees of the New Zealand breeding population (Forest Research, Rotorua, New Zealand). Previous studies, based on monoterpene analysis (Burdon et al., 1997a) and morphological traits (Burdon et al., 1997b) have shown that the local race was introduced from the USA during the 19th century and mostly derived from the Año Nuevo population, with some admixture from the Monterey population.
Table 1. List of Pinus pinaster and Pinus radiata populations
|P. pinaster||Tunisia||Tabarka||36°57′ N||8°46′ E|| 200|| 2 (8)||Mediterranean|
| ||France||Corsica Porto Vecchio||41°28′ N||9°12′ E|| 150|| 2 (7)||Mediterranean|
|Corsica Vivario||41°20′ N||9°09′ E|| 600|| 2 (5)||Mediterranean|
|Corsica Zonza||41°45′ N||9°11′ E|| 760|| 2 (6)||Mediterranean|
|Aquitaine Castets||43°52′ N||1°08′ W|| 60|| 2 (3)||Atlantic|
|Aquitaine Mimizan||44°08′ N||1°18′ W|| 35|| 2 (9)||Atlantic|
|Aquitaine Souston||43°41′ N||1°25′ W|| 35|| 2 (4)||Atlantic|
|Aquitaine Hourtin||45°10′ N||1°08′ W|| 40|| 4 (10)||Atlantic|
|Aquitaine Medoc||45°34′ N||1°13′ W|| 40|| 2 (18)||Atlantic|
| ||Portugal||Leiria Mata||40°00′ N||8°45′ W|| 50|| 1 (5)||Atlantic|
|Leiria Velha||40°00′ N||8°45′ W|| 50|| 1 (4)||Atlantic|
| ||Morocco||Punta Cires||35°55′ N||5°28′ W|| 20|| 1 (6)||Atlantic|
|Tamjout||33°52′ N||4°02′ W||1600|| 1 (4)||North African|
|P. radiata||New Zealand||NZ breeding population (land race)|| || || ||23||Año Nuevo and Monterey|
Primer design, PCR amplification and DNA sequencing
For each gene, a BLAST search (Altschul et al., 1997) was first run to identify homologs in pine expressed sequence tag (EST) databases available at http://cbi.labri.fr/outils/SAM/COMPLETE/index.php for P. pinaster and http://fungen.org/Projects/Pine/Pine.htm for Pinus taeda (Table 2). From the multiple alignments of the retrieved sequences, a consensus sequence was then derived for each candidate using sequencher v4.1.4 (Genecodes, Inc, Ann Arbor. Michigan USA). Primer pairs (Table 3) were designed from the consensus sequence using primer 3 (Rozen & Skaletsky, 2000).
Table 2. Summary of the studied genes
|KORRIGAN||membrane-bound endo-(1-4)-β- glucanase (EC:18.104.22.168)||BV079723|| 937||566||371|| 0|| 8|| 47|
|CesA3||cellulose synthase (EC:22.214.171.124)||BV079715+BV079717||1048||810||238|| 0|| 8|| 49|
|CesA4||cellulose synthase (EC:126.96.36.199)||BV079716|| 489||396|| 0|| 93|| 7|| 22|
|Pp1||glycine-rich protein homolog||BV079718|| 493||240|| 0||253||22||133|
|Pp2||MYB-like transcriptional factor MBF1||BV079719|| 494||494|| 0|| 0|| 1|| 3|
|Pp4||ACC oxidase||BV079720|| 270||270|| 0|| 0|| 1|| 42|
|Pp6||25S rRNA gene||BV079721+BV079722|| 902||902|| 0|| 0||33||176|
|Pr1||unknown protein||BQ701569|| 113||113|| 0|| 0|| 0|| 15|
Table 3. List of primer pairs and amplification conditions
PCR products were sequenced using the Big Dye terminator kit (Amersham Bioscience, Uppsala, Sweden) and an ABI 3100 automatic sequencer (Applied Biosystem, Foster City, CA, USA) according to the manufacturers’ specifications. A single sequence was obtained per megagametophyte for each candidate gene. Singleton polymorphisms were verified through re-sequencing of the affected megagametophyte sample.
Landscape of nucleotide diversity
Sequence alignment and nucleotide polymorphism detection were performed with sequencher v4.1.4. Each polymorphic site was visually checked on the chromatograms in order to distinguish true polymorphisms from scoring errors. The use of haploid tissues greatly facilitated the sequence analysis, allowing the direct definition of the haplotypes (multilocus combinations of polymorphisms) without cloning or using an expectation maximization (EM) algorithm (Long et al., 1995).
Basic parameters including the number of single nucleotide polymorphisms (SNPs), insertion–deletions (INDELs), synonymous (S) and nonsynonymous (NS) mutations were calculated using the site software (Hey & Wakeley, 1997). Nucleotide diversity was estimated as θw (based on the number of segregating sites; Watterson, 1975) and π (based on the average number of nucleotide differences per site between sequences; Nei, 1987). These parameters were computed with site, without considering INDELs, at three different levels: (i) the whole sequenced region; (ii) noncoding regions (including introns, 3′ and 5′ untranslated regions (UTRs)); and (iii) coding regions, subdivided in two components – S and NS. The number of haplotypes and the haplotype diversity were calculated using the dnasp software (Rozas & Rozas, 1999).
Tests for selection were performed to estimate whether the considered genes followed the model of neutral evolution (Kimura, 1983) or not. Tajima's D-test, based on the allelic distribution (Tajima, 1989), was carried out using arlequin 2.000 software (Shneider et al., 2000). As implemented in this software, significance of this test was tested by generating random samples under the hypothesis of neutrality and population demographic equilibrium. This test was performed assuming the absence of recombination, making it conservative.
Levels of differentiation (FST) between Corsican and Aquitaine populations were estimated for all the studied genes with the analysis of molecular variance (amova) (Excoffier et al., 1992) as implemented in the arlequin 2.000 software (Schneider et al., 2001). In addition, differentiation among all the studied populations was also estimated for CesA3.
Considering first the small number of sequences analysed for most of the genes, and second the existence of significant differentiation for some of them, linkage disequilibrium (LD) was only computed for CesA3 for which a larger sample size was available. Given the absence of significant differentiation for this gene, LD between polymorphic sites was estimated using the whole set of sequence with dnasp. Fisher's exact tests and Bonferroni correction for multiples tests were computed to determine whether the detected associations were significant or not.
Total divergence between P. pinaster and P. radiata, estimated as the average number of nucleotide substitutions per site, was finally calculated using dnasp.
Nucleotide variation at the intraspecific level
Sequence data for almost the complete set of gametes were obtained for six out of the eight genes analysed. For Pp1 and Pp2, only 12 and 14 high-quality sequences, respectively, were obtained in P. radiata, probably as a result of the coamplification of other family members. Sequences were deposited in dbSTS (http://www.ncbi.nlm.nih.gov/dbSTS/) and SNPs in dbSNP (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp).
The regions analysed covered a total of 4.7 kb, corresponding to 3.8 kb of coding sequence and 0.9 kb of-noncoding regions (intron and 3′ UTR) (Table 2). A total of 32 (29 SNPs and three INDELs) and 13 (exclusively SNP) intraspecific polymorphisms were detected in P. pinaster and P. radiata, respectively. All the INDELs were single-based and located in noncoding regions. A total of 10 singletons were identified (seven in P. pinaster and three in P. radiata). All the nonsynonymous polymorphisms were conservative or moderately conservative according to the classification of Grantham (1974).
The average nucleotide diversity was slightly higher for P. pinaster (0.00241) than for P. radiata (0.00186). This difference mainly relied on Pp1 and CesA3, for which 11 and nine polymorphic sites were detected in P. pinaster, whereas only one and two polymorphic sites, respectively, were detected in P. radiata. Although the numbers of sequences analysed were smaller for P. radiata for these two genes, π will not be better estimated with a sample of sequences above 10 as its variance levels off very quickly (Tajima, 1983), thus the divergence of the estimates probably does not result from these unequal sample sizes. Apart from these two genes, close correspondence between the nucleotide diversity estimates in the two species was observed.
The average number of haplotypes (3.375 in P. pinaster vs 2.375 in P. radiata) and the average haplotype diversity (0.425 in P. pinaster vs 0.376 and P. radiata) were, like the total nucleotide diversity, slightly higher in P. pinaster. Large variations in haplotype number and haplotype diversity were observed among the genes in both species. The number of haplotypes varied from one to six. With the exception of CesA3 and Pp1, the numbers of haplotypes were consistent among species.
Tajima's D-tests were performed exclusively for the genes presenting at least five polymorphic sites (Table 4). Significant departure from the null hypotheses of neutrality and demographic equilibrium at P < 0.05 was observed only for KORRIGAN in P. radiata. For all the genes, these tests were performed on the whole set of sequences available.
Table 4. Pattern of nucleotide variation
|Number of sequences||24||91||23||24||22||24||24||24||256|
| Tajima's D||0.63089||1.12147||–||1.20137||–||–||–||–|| |
|Number of haplotypes||5||6||2||6||3||2||1||3||3.375|
| (SE)||(0.057)||(0.059)||(0.081)||(0.079)||(0.091)||(0.113)|| ||(0.052)|| |
|Number of sequences||18||21||20||12||14||23||23||21||153|
| Tajima's D||−1.97b||–||–||–||–||–||–||–|| |
|Number of haplotypes||4||2||2||2||3||1||1||4||2.375|
| (SE)|| ||(0.123)||(0.153)||(0.085)||(0.09)||(0.076)|| ||(0.138)|| |
For CesA3, Tajima's D-test was first performed in the Aquitaine provenance exclusively, and then, according to the nonsignificant level of differentiation observed for this gene (see the next section, ‘Populations differentiation in P. pinaster’), performed also considering all the sequences available. Both calculations yielded the same result, that is a positive but nonsignificant Tajima's D-value (D = 1.12147, P = 0.117 for the whole area of distribution and D = 1.03263, P = 0.131 for the Aquitaine provenance). Estimation of the local recombination parameter R (Hudson, 1987) for this gene, and its subsequent integration in the calculation of the Tajima's D-value expected distribution using coalescence simulation in dnasp, did not change its significance.
Populations differentiation in P. pinaster
FST estimated using all the polymorphic sites revealed a significant differentiation between Corsican and Aquitaine provenances (FST = 0.22). However, this high level of differentiation relied exclusively on two genes (KORRIGAN and Pp1) for which highly significant differentiation were observed (0.45 and 0.23, respectively, in Table 5). In comparison, Mariette et al. (2001) observed a significant GST value of between 0.049 and 0.092 for amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR), respectively. Accounting for the difference in estimation methods which show that FST values are equivalent to twice the GST values (Nei, 1987), the differentiation between both groups of populations for both Pp1 and KORRIGAN is more than twice as high as that obtained for AFLP markers (0.098). Compared with SSR (0.184), although the differentiation values observed for both genes remain higher, the Pp1 value of differentiation is just slightly higher, whereas that of KORRIGAN is still twice as high.
Table 5. FST estimates between Corsican and Aquitaine populations
|All (31 polymorphic sites)|| 0.22395a|
|GST AFLP–SSR (Mariette et al. 2001)|| 0.049–0.092|
In addition, a wider sampling for CesA3 allowed us to test the differentiation between the 13 populations. No significant differentiation was observed and the estimated value is very close to zero. This result deviates from the significant differentiation observed with neutral markers at the level of the whole geographic distribution of maritime pine (Petit et al., 1995).
If FST estimates are probably dependent on the very few polymorphic sites (from one to four) detected for CesA4, Pp2, Pp4 and Pr1, the differentiation estimated for KORRIGAN, CesA3 and Pp1 for which at least five polymorphic sites were analysed are certainly more representative of the gene values.
Linkage disequilibrium was only calculated for CesA3, as the other genes presented either strong population differentiation combined with only small population size analysed or low level of polymorphism. Out of the 36 tests performed (nine polymorphic sites), 11 were significant, after Bonferroni's correction for multiple testing.
Nucleotide variation at the interspecific level
Sixty-three polymorphisms including 59 SNPs and four INDELs distinguished P. pinaster from P. radiata (Table 6). All INDELs were located in the noncoding region. The total number of interspecific fixed differences varied from 0 for Pr1 to 21 for CesA3 (Table 6). NS fixed differences were found for five genes (KORRIGAN, CesA3, Pp1, Pp2 and Pp4). Three of these NS fixed differences were moderately radical regarding the amino acid modification (Grantham, 1974): two sites in Pp2 (modifications SER to ARG and GLY to ARG) and one site in Pp4 (VAL to SER).
Table 6. Fixed differences between Pinus pinaster and Pinus radiata and estimates of total divergence D (x,y)
Under neutral evolution, interspecific divergence is expected to be proportional to intraspecific nucleotide diversity. Comparison of divergence and nucleotide diversity revealed that only Pr1 diverged from this pattern. However, ue to the small size of the fragment analysed (113 bp), no particular hypothesis could be provided. A wider exploration of the diversity of this gene will be required before any conclusion can be drawn.
Adequacy between the sampling strategy and SNP detection probability
The probability P of detecting the two alleles at a SNP locus depends on three parameters: (i) the number of gametes sampled, N; (ii) the frequency of the rare allele in the population, p; and (iii) the organization of gene diversity. In the absence of differentiation among populations, P = 1 – (1 – p)N. In the present study, for each species, on average 21 gametes were sequenced for each DNA fragment, resulting in a detection probability of 89% for a rare allele frequency of 10%.
In respect to P. pinaster, the probability of detecting polymorphic loci was probably maximized considering: (i) the scattered sample used in our study covering the three main groups of diversity; (ii) the moderate level of genetic differentiation at the neutral level between geographical provenances (GST = 0.14–0.17; Petit et al. 1995) for isozymes, proteins and terpenes, with populations from France, Portugal, Corsica, Spain, Italy, Sardinia; and (iii) the rather low differentiation within provenances (GST = 0.04 for isozymes, cpSSR, nuclear SSR and AFLP markers, within Spain, Portugal, Aquitaine and Corsica: Mariette et al., 2001; González-Martínez et al., 2002; Ribeiro et al., 2002). However, it is important to note that the North African group, which constitutes a singular mitochondrial lineage, with highly differentiated populations, was under-represented in this study and probably led to an under-estimation of the nucleotide diversity of some of the genes.
Concerning P. radiata, as reported in the Material and Methods section, the sample used in this study corresponds to the first generation of the New Zealand breeding population, which derived from the Año Nuevo population with some admixture from the Monterey population. Johnson and Lipow (2002) showed that first-generation seed orchards retain most of the genetic diversity present in the natural populations from which they were derived. As a consequence, the results obtained for P. radiata should reflect the nucleotide diversity present in its ancestral populations. Indeed, using nuclear and chloroplast microsatellite loci, no significant changes in diversity were found between the five natural populations of P. radiata, and the current New Zealand breeding populations (T. Richardson, Forest Research, New Zealand, pers. comm.). It is, however, important to note that, according to the selection criteria used to select the first-generation breeding population (i.e. growth and form), some of the genes controlling these traits could have been submitted to artificial selection events leading to a reduction in their diversity.
Nucleotide diversity in wood formation related genes
Polymorphic sites were found in almost all the genes analysed, providing the basis to initiate association studies to test the involvement of these genes in the variability of the traits of interest. The availability of haploid tissue enabled the definition of the different haplotypes, allowing a reduction of the polymorphic sites to be genotyped. For instance in P. pinaster, only 21 markers (SNPs and INDELs) will have to be genotyped to define the haplotypic composition, instead of the 32 polymorphic sites discovered. This subset of SNP tags was defined exclusively based on the haplotypes observed in the studied sample. Although linkage disequilibrium analysis would allow a reduction of this SNP tag set, such analysis was not performed, given the high differentiation observed for some of the analysed genes and the small sample size of each population analysed.
In spite of the exploratory nature of this study, limited to a restricted set of genes, it is interesting to note that the results obtained here agree with previous nucleotide surveys in conifers (Table 7). Although comparative diversity analyses using allozymes have shown that conifers are among the most genetically diverse organisms (Hamrick & Godt, 1990), nucleotide data do not support this statement. Indeed, the nucleotide diversity of conifers is higher than in humans but lower than in Zea mays. Interestingly, the nucleotide diversity levels reported in broadleaved trees such as Populus or Quercus are also significantly higher than in conifers (Table 7); the reasons for this divergence remain to be found.
Lower diversity in P. radiata: consequences of neutral process or genes controlling traits submitted to selection
A trend towards lower nucleotide diversity was observed for P. radiata compared with P. pinaster. This result is consistent with our previous knowledge regarding the populations analysed. Although P. pinaster is characterized by a large geographic distribution, the natural range of P. radiata is extremely small. In addition, the populations analysed in this study covered different ranges of the distribution according to the considered species. For P. pinaster, almost the whole geographic distribution was analysed, whereas for P. radiata, only a subset of the total variation was analysed. As a consequence, the lower nucleotide diversity observed for P. radiata agrees with its lower population effective size compared with P. pinaster.
Although a lower diversity is expected in P. radiata under a neutral model of evolution, two genes (Pp1 and CesA3) presented an abnormally strong reduction of diversity in this species. A plausible hypothesis would be the concomitant effects of the smaller population effective size combined to the existence of natural and/or artificial selection acting on these genes. Such a scenario would have lead to the elimination of some alleles, resulting in an unusually low diversity level. Several concomitant results in P. pinaster support this hypothesis.
For Pp1 in P. pinaster, a higher differentiation than at the neutral level was observed. Such a differentiation pattern would be consistent with a ‘diversifying selection’ acting at this locus in P. pinaster. Evidence of selection at the molecular level for this gene would be consistent with its physiological role: Pp1 is a glycine-rich protein (GRP) that has been shown to be differentially expressed between differentiating xylem associated with different types of wood characterized by different physical and chemical properties; in other words, early vs late wood (Le Provost et al., 2003) and opposite vs compression wood (Allona et al., 1998; Zhang et al., 2000; Le Provost et al., 2003). Cell wall GRPs are localized in vascular tissues and are thought to provide elasticity as well as tensile strength during vascular development (Cassab, 1998). Polymorphisms inducing variation of these properties would definitely affect the adaptation of the tree to its environmental conditions and thus be preferentially fixed in certain conditions. In the case of P. radiata, according to the negative genetic correlations often reported between growth and wood quality in conifers (Rozenberg & Cahalan, 1997; Pot et al., 2002), the reduction of diversity observed may have resulted from the artificial selection on growth applied to the New Zealand land race.
The absence of differentiation observed for CesA3 in P. pinaster compared with the significant level observed for neutral markers (Petit et al., 1995) provides a strong indication of balancing selection acting on this gene. The positive Tajima's D-values reported for this gene for the whole area of distribution and for the Aquitaine provenance tend to confirm this hypothesis. Indeed, such values would not be expected in the case of no differentiation. Furthermore, the relatively high haplotype structure observed for this gene (high haplotype diversity, low number of haplotypes compared with the number of polymorphic sites, high level of linkage disequilibrium) also indicates the same tendency toward the action of balancing selection. These hypotheses of possible deviations from neutrality for CesA3 are consistent with its role in cellulose biosynthesis. Cellulose is one of the major components of the cell wall. In temperate zones, climatic variation during the annual course of the vascular cambium give rise to early wood formed early during the growing season, and late wood formed in late summer. This environmental pressure could strongly affect the major change in cellulose content recognized between these two types of wood.
As in the case of Pp1, the reduction of diversity observed for CesA3 in P. radiata would be consistent with the involvement of this gene in the genetic determinism of wood quality, a trait negatively correlated to growth.
KORRIGAN, a gene involved in polysaccharides biosynthesis, as a putative target of natural selection
Several results that include a high differentiation between Corsican and Aquitaine populations in P. pinaster and a significant negative Tajima's D-value in P. radiata suggest KORRIGAN as a potential target of selection in these species.
The high differentiation observed in P. pinaster is consistent with the existence of diversifying selection that would have lead to the prevalence of different haplotypes, as a consequence of their role in local adaptation to the particular environmental conditions encountered.
In P. radiata, the significant negative Tajima's D-value may result from a past selection event on this gene, or may be a recent one, which would be consistent with the relatively strong haplotype structure (only four haplotypes for 18 sequences and five polymorphic sites). Thus the excess of rare frequency polymorphisms would be consistent with a hitchhiking event in the P. radiata population. An alternative neutral hypothesis would be the recent expansion of the New Zealand breeding population.
The role of KORRIGAN is consistent with deviation from neutrality. Indeed, KORRIGAN is involved in the biosynthesis of cellulose, the main compound of the cell wall and whose amount is genetically controlled (Zobel & Buijtenen, 1989; Pot et al., 2002; Sewell et al., 2002), and which provides strength and flexibility to plant tissue. It encodes a β-1,4 endoglucanase, which catalyses the cleavage of the cellodextrin from the sistosterol cellodextrin (Nicol et al., 1998; Peng et al., 2002) before the proper synthesis of the cellulose microfibrils by the cellulose synthase complex. Its importance in this pathway has already been underlined. It is indeed strongly differentially expressed between early and late wood, presenting an overexpression in late wood which is characterized by a higher proportion of cellulose (accession AL750476 in Le Provost, 2003).
Recent studies tend to confirm the central role of this gene in the genetic variability of cell wall composition. Indeed, significant relationships between KORRIGAN polymorphisms and polysaccharides content were detected (coincident with QTLs in a three-generation outbreed pedigree; Pot, 2004). Also a significant association was observed in the P. pinaster first-generation breeding population between one KORRIGAN SNP and cellulose content (P. Garnier-Géré, pers. comm.). These observations reveal the potential importance of this gene in the variability of polysaccharide content, a trait that may be subjected to natural selection pressures.
Conclusion and perspectives
This exploratory study allowed the identification of polymorphisms in eight wood formation related genes in P. pinaster and P. radiata. This information is currently used in association studies to test their involvement in the phenotypic variability of economically important traits linked to wood structure and chemical composition in these two species.
The analysis of the patterns of nucleotide diversity obtained at the intra and interspecific levels provided some indications on adaptative evolution at the molecular level for KORRIGAN, Pp1 and CesA3. These interpretations are consistent with the demonstrated physiological role of these genes, and with recent data obtained in QTL mapping experiments and association studies.
We gratefully acknowledge the financial support from the French Ministry of Foreign Affairs, who granted a collaborative project «Déterminisme génétique et moléculaire de la qualité du bois chez les conifères» between INRA (Cestas, France) and Forest Research (Rotorua, New Zealand), the European Union (GEMINI: QLRT-1999-00942, FEDER:2003227) and the Aquitaine Region (2004-03-05-003FA). We thank Drs Valérie Lecorre, Philippe Rozenberg, Santiago C. González-Martínez and Antoine Kremer for their comments on earlier versions of the manuscript.