SEARCH

SEARCH BY CITATION

Keywords:

  • candidate genes;
  • drought;
  • Eperua falcata ;
  • flooding;
  • neotropics;
  • outlier loci;
  • tree genetics

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

Unveiling the genetic basis of local adaptation to environmental variation is a major goal in molecular ecology. In rugged landscapes characterized by environmental mosaics, living populations and communities can experience steep ecological gradients over very short geographical distances. In lowland tropical forests, interspecific divergence in edaphic specialization (for seasonally flooded bottomlands and seasonally dry terra firme soils) has been proven by ecological studies on adaptive traits. Some species are nevertheless capable of covering the entire span of the gradient; intraspecific variation for adaptation to contrasting conditions may explain the distribution of such ecological generalists. We investigated whether local divergence happens at small spatial scales in two stands of Eperua falcata (Fabaceae), a widespread tree species of the Guiana Shield. We investigated Single Nucleotide Polymorphisms (SNP) and sequence divergence as well as spatial genetic structure (SGS) at four genes putatively involved in stress response and three genes with unknown function. Significant genetic differentiation was observed among sub-populations within stands, and eight SNP loci showed patterns compatible with disruptive selection. SGS analysis showed genetic turnover along the gradients at three loci, and at least one haplotype was found to be in repulsion with one habitat. Taken together, these results suggest genetic differentiation at small spatial scale in spite of gene flow. We hypothesize that heterogeneous environments may cause molecular divergence, possibly associated to local adaptation in E. falcata.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

Environmental gradients – the more or less continuous spatial variations of biotic and abiotic conditions – and environmental patchiness produce spatially variable selective pressure on biological populations, inducing their genetic diversification through local adaptation (Antonovics, 1971; Linhart & Grant, 1996; Kawecki & Ebert, 2004; Savolainen et al., 2004; Fine et al., 2005; Hedrick, 2006; Namroud et al., 2008). Correlation between environmental variables and frequencies of adaptive genetic variants has been repeatedly observed, and such patterns have generally been interpreted as signatures of selection forcing genetic pools to adjust to local environment (Storz & Kelly, 2008; e.g. Hedrick, 2006). The observation of adaptive genetic divergence between populations occupying different parts of an environmental gradient is therefore suggestive of the action of disruptive selection in favour of local adaptation.

The study of how genetic diversity is coupled with environmental gradients rests on solid theory, and stems from the rather intuitive idea that genetic turnover can be quantified through changes in allele frequencies, and that if a gradient influences allele frequencies, then an association should be present between the two (Epperson, 2003). Although with considerable refinement, this approach is the base of all studies of ecological-genetic gradients (Bergmann, 1978; Ingvarsson et al., 2005; Joost et al., 2007; Ingvarsson, 2008; Eckert et al.,2009b; Coop et al., 2010; Eckert et al., 2010a; Fournier-Level et al., 2011; Montesinos-Navarro et al., 2011; Hancock et al., 2011; Chen et al., 2012), including those of populations inhabiting contrasting habitats.

Conventionally, the effect of environmental gradients has been sought at scales that go from regional to continental (Acheré et al., 2005; Tsumura et al., 2007; Eveno et al., 2008; Namroud et al., 2008), implicitly assuming that at shorter scales migration will systematically overwhelm selection. Nevertheless, there are reasons to think that disruptive selection acts even at very local scales. Even in the absence of selection gradients, genetic relatedness tends to be spatially structured in plant populations (and particularly in trees) because of preferential dispersal in the close neighbourhood. Limitations to dispersal can therefore reinforce differential, spatially structured disruptive selection. Conversely, moderate levels of gene flow may increase the rate of adaptation, by enabling the emergence of novel multilocus genotypes and by exposing alleles to multiple environments, thus facilitating the action of selective filters (Goudet et al., 2009; Kremer & Le Corre, 2012). Finally, most plant populations produce a large excess of seeds and seedlings each season, which should set the stage for very strong selection, even if it is partially confounded by random processes. The existence of local adaptation in spite of gene flow has been reported at the very short spatial scale in animals (Storz, 2005a), in artificial plots for outcrossing wind-pollinated annual plants (Freeland et al., 2010) but also on larger scales for wind-pollinated trees (Savolainen et al., 2007; Eveno et al., 2008; Eckert et al., 2009b, 2010a,b). Jump & Peñuelas (2005) have reviewed proofs that intra-population genetic variation for traits and genes related to response to climatic gradients exists in plant species. Their analysis rests on a long tradition of studies on local adaptation to patchy or continuously varying environments, of which clear examples are found in annual plants at both the landscape (Angert & Schemske, 2005; Manel et al., 2010; Poncet et al., 2010) and within-population scale (Schmitt & Gamble, 1990). For instance, local adaptation has been identified at the molecular level for tree species within a range of less than 3 km (Jump et al., 2006), and parapatric or sympatric speciation for palms has likely occurred on a single 12 km−² island (Savolainen et al., 2006; Babik et al., 2009). Thus, even for long-lived organisms, such as trees and palms, it is possible to observe genetic divergence at a very local scale, in spite of the (real or expected) presence of recurrent gene flow among environmental patches or portions of the gradient. It is therefore legitimate to ask whether locally variable selection contributes to the diversification of sub-populations and to the build-up and maintenance of genetic diversity and adaptive potential in tree species.

With the development of genomic methods, several strategies for testing the association of Expressed Sequence Tags (EST), Single Nucleotide Polymorphisms (SNPs) or anonymous markers with traits and/or ecological preferences (association mapping; population genomics) have been introduced (Luikart et al., 2003; Neale & Savolainen, 2004; Gonzalez-Martinez et al., 2006b; Eckert et al., 2009a,b). These methods usually require a priori information that may not be easily accessible for nonmodel taxa (Luikart et al., 2003), while enabling gene-level selection studies without prior knowledge about the relationship of phenotype to genotype or the precise function of candidate loci (Storz, 2005b; Vasemägi & Primmer, 2005). Higher (or lower)-than-expected levels of divergence among populations at a given locus is then taken as suggestive of disruptive (or stabilizing) selection (Beaumont & Nichols, 1996; Luikart et al., 2003; Beaumont & Balding, 2004; Storz, 2005b; Gonzalez-Martinez et al., 2006b; Riebler et al., 2008). This strategy can be applied at the genome level, when extensive genomic information is available, or to sets of candidate genes (Phillips, 2005; Wright & Gaut, 2005) when a particular ecological and physiological process is targeted.

When environmental factors are spatially structured, for example in the case of habitat patches or gradients, the study of Spatial Genetic Structure (SGS) can also help testing the association of genotypes and environmental conditions. SGS can result from a variety of processes, including spatially structured selection and limited dispersal (Condit et al., 1996; Clark et al., 1998; Plotkin et al., 2000). It is therefore necessary to distinguish the relative role of the different evolutionary forces (Heywood, 1991; Manel et al., 2003; Vekemans & Hardy, 2004). In structured environments, the distribution of genotypes relative to habitat gradients can be compared with the overall distribution of genotypes (or to a null distribution). Specifically, at loci under divergent selection, it is expected that turnover of alleles is steeper along the gradient than in any other direction (and between ecologically contrasted zones than between randomly drawn zones; Oden & Sokal, 1986).

Landscapes with abrupt habitat changes occurring over short spatial scales and with an alternation of ecologically divergent habitat patches provide a suitable opportunity for the study of the strength of selective forces leading to local adaptation. Seasonally flooded lowland forests of the Guiana shield occur in a rugged landscape characterized by small creeks alternating with small hills, where edaphic (i.e. related to soil characteristics) conditions can vary steeply from bottomlands to the top of hills and hillocks, resulting in environmental mosaics (Baraloto & Couteron, 2010). Therefore, forest tree populations growing in this region provide the opportunity to test the occurrence of local adaptation phenomena. Habitat specialization has been repeatedly observed in tropical trees (Plotkin et al., 2000; Harms et al., 2001; Lopez & Kursar, 2003; Palmiotto et al., 2004; John et al., 2007). Several studies have tested responses to edaphic constraints in trees from the Guiana shield (Baraloto et al., 2005, 2006, 2007) and analysed the interspecific variability of traits related to edaphic stress response (Bonal et al., 2000a,b,c; Bonal & Guehl, 2001; Coste et al., 2005; Bonal et al., 2007; Scotti et al., 2010), but the presence of intraspecific local adaptation in species occupying several habitats (and its possible genetic base) have never been tested. The present work focuses on populations of E. falcata, a common tree species of the Fabaceae family growing in relatively dense clusters of up to several hundreds of trees and densities of up to 40 stems (diameter at breast height > 10 cm) per hectare. Seed dispersal is barochore and pollination is mostly performed by bats. E. falcata was found to be significantly positively associated with flooded forest (Collinet, 1997; Baraloto et al., 2005, 2007). However, distribution maps show that it can occur on a large spectrum of edaphic conditions, up to hilltops, thus showing a somewhat generalist behaviour and therefore potential for local adaptation. The present study focuses on three main edaphic habitats occupied by this species: bottomlands, seasonally flooded by heavy rainfall during the rainy season; slopes, with thin soil and highly variable soil water content and terra firme plateaus, with deep, well-drained soil possibly prone to drought during the dry season (Wright, 1992). We have analysed genetic diversity in a set of seven genes of which four have a known function related to response to hypoxic stress (Catalase), drought stress (Farnesyltransferase) and plant water balance (two aquaporins; Audigeos et al., 2010) and three were randomly drawn from an EST library obtained from seedlings from one of these papers' study areas. We assessed SGS and performed multilocus scans for genetic differentiation at small spatial scale (~6 ha), in two forest plots presenting environmental patchiness. We tested local differentiation of E. falcata populations as a function of variation of edaphic conditions and found loci potentially undergoing disruptive selection.

Materials and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

Sampling and DNA isolation

Trees were sampled from two forest inventory plots in French Guiana, one with an environmental mosaic (Paracou, Fig. 1a) and one with two homogeneous patches of strongly contrasted environments (Nouragues, Fig. 1b). The Paracou experimental station, 50 km from Kourou (5°15'N, 52°55'W), is formed by 15 plots of 6.25 ha (Gourlet-Fleury et al., 2004). The Nouragues research station (4°05'N, 52°41'W) has an area of over 100 ha, subdivided in 1-ha square plots. Both sites represent relatively accessible but undisturbed forest areas. At both sites, the study area was subdivided based on a discrete categorization (Ferry et al., 2010) of environmental (and particularly edaphic) conditions (Fig. 1): vertical drainage (VD) corresponding to terra firme forests, with deep soil rarely undergoing drought and never undergoing flooding; surface lateral drainage (SLD), which could experience drought during the dry season due to its very variable soil water-content and surface hydromorphy (SH) corresponding to seasonally flooded forest with soil saturated with water to the surface, undergoing hypoxic stress during the rainy season. This environmental partition loosely corresponds also to topography classes like plateau, slope and bottomland respectively. In Paracou, Plot 6 (of an area of 6.25 ha) was chosen. The three environmental conditions described above are represented in this plot (Fig. 1a). In Nouragues, six contiguous 1-ha plots crossed by a river (10J, 11J, 10K, 11K, 10L, 11L), for a total area of 6 ha, approximately equivalent to Paracou's Plot 6. Only the VD and SH ecological conditions were found at the Nouragues study site (Fig. 1b). The majority of Nouragues individuals are found in VD zones and a few in SH zones. Cambium was collected from 440 E. falcata trees with a diameter larger than 10 cm at breast height: 258 in Paracou (48, 172 and 38 in the SH, SLD and VD zones respectively) and 182 in Nouragues (29 and 153 in the SH and VD zones respectively) and DNA was extracted following a CTAB method (Doyle & Doyle, 1987; Colpaert et al., 2005).

image

Figure 1. E. falcata cartography on drainage map for each sampling site: Paracou Plot 6 (A) and Nouragues Plots 10J, 10K, 10L, 11K, 11J, 11L (B). Light grey represents VD, medium grey represents lateral drainage (SLD) and dark grey represents surface hydromorphy (SH). The axes used in each plot for directional autocorrelation analyses are shown. Dashed line: Paracou ‘Eastern Half’. Dotted line: Paracou ‘Northern half’.

Download figure to PowerPoint

Amplicon choice, sequencing and polymorphism detection

To study evolutionary processes responsible of genetic diversity in E. falcata, we analysed sequences with a putative role in the response to stresses related to edaphic conditions and sequences with unknown function. Of the seven loci used in the study, two were aquaporin gene fragments, PIP1.1 and PIP1.2 characterized in a previous study (Audigeos et al., 2010), whereas the other nuclear fragments were obtained by sequencing clones from both cDNA and genomic libraries. The five additional sequences included: a fragment of the gene coding for Catalase (CAT), involved in the response to oxidative stress caused by flooding; a fragment of the gene coding for Farnesyltransferase (FTase), involved in the abscissic acid (ABA) metabolic pathway; a DNA fragment coding for a hypothetical protein (HYP5) and two ESTs with unknown function (UNK7 and UNK14). The chosen loci represent a mix of candidates for a putative role in the response to edaphic constraints (Cat, FTase), genes with a housekeeping function in plant water balance (PIPs) and ‘randomly drawn’ gene functions (HYP5, UNK7, UNK14). Although we cannot assume that any of these sequences are ‘neutral’ in the general sense, we have no special reason to think that they undergo disruptive selection in this particular habitat gradient, with the notable exception of CAT and FTase. Therefore, we consider the sampled gene panel as representative of the general behaviour of the transcriptome with respect to this particular gradient. Moreover, we decided to make use only of EST sequences because other kinds of regions, such as anonymous genomic sequences or microsatellites, may have different molecular properties (e.g. different substitution rates, nucleotide composition and linkage disequilibrium levels for the former, different mutation model and rates for the latter), making the data set inhomogeneous from the evolutionary point of view. By restraining our analysis to only one kind of sequence type, we have tried to avoid any bias in data interpretation that may arise from comparisons between data sets with different underlying structures. The libraries were obtained using the Lambda ZAP II kit (Stratagene, La Jolla, CA, USA) following the manufacturer's protocol. About 200 clones were sequenced and their putative function assigned based on their comparison with public databases using BLASTn and BLASTx. A set of 47 sequences with length between 300 and 600 base pairs (bp) was selected, including: proteins with known function, retrotransposons, hypothetical proteins or sequences without match in public databases. Primers were designed using primer-blast (www.ncbi.nlm.nih.gov/tools/primer-blast/) and oligo-calc (http://www.basic.northwestern.edu/biotools/oligocalc.html). Preliminary tests for amplification were conducted on two individuals. Fifteen fragments produced specific amplicons; PCR and sequencing on 16 individuals were performed to evaluate sequence quality and polymorphism level. Five chosen fragments plus the two aquaporin genes were then sequenced in all samples. Haplotypes have been deposited in GenBank under accession numbers JQ801740JQ801745 (Table 1).

Table 1. Description of the fragments and their amplification conditions. Locus code: short name used throughout the text to indicate the locus; Accession numbers: EMBL/GenBank accession numbers; BlastX: closest BlastX match for each sequence in the EMBL/GenBank data base; Function: function of the closest BlastX match; Putative role in stress: role in the response to environmental stress, if known; Primers: primer pair for the amplification of each fragment; Annealing temperature: temperature used for annealing in PCRs (see Methods for details)
Locus codeAccession numbersProtein predictionBlastXFunctionPutative role in stressPrimers (5'[RIGHTWARDS ARROW]3')Annealing temperature
  1. a

    Partial cds (coding sequence).

  2. b

    Partial genomic sequence.

  3. c

    Touchdown PCR: annealing temperature decreases from the highest to the lowest temperature over the first seven cycles (see Methods for details).

CAT

JQ801740a

JQ801745b

CatalaseAAR84578Convert hydrogen peroxide in water and oxygenOxidative stress

F : TCCAGCTTCCTGTCAATGC

R : ACAACGCACATGGCACAC

64 °C
FTaseJQ801744Putative farnesyltransferase alpha subunitXP002534116Add a farnesyl group to the –SH of the cysteineABA stress signalling pathway

F : GCCCACCCTGAGAATGAAAG

R : TGCCTGAACCTGAAAACAAG

55[RIGHTWARDS ARROW]52 °C(3)
HYP5JQ801741Hypothetical proteinEEF45947UnknownUnknown

F : AATGCAATGGACCTTGAGC

R : TTCATGAAACGTGATCAACC

55[RIGHTWARDS ARROW]52 °Cc
PIP1.1FJ807642Putative aquaporin PIP1ABD63904Plasma membrane water channelWater-balance

F : CCCAGCAGTGACCTTCG

R : AACCAAGAACACAGCGAACC

64[RIGHTWARDS ARROW]57 °Cc
PIP1.2FJ807646Putative aquaporin PIP1ABR68794Plasma membrane water channelWater-balance

F : CAACCCGGCTGTGACC

R : GCCAAATGGACCAAGAACAC

64[RIGHTWARDS ARROW]57 °Cc
UNK7JQ801742UnknownUnknownUnknown

F : GACCGGAACAGTAATTCGTTG

R : ATTTCGCTAAAAAGGCCTGC

64[RIGHTWARDS ARROW]61 °Cc
UNK14JQ801743UnknownUnknownUnknown

F : GTATTGGGGGTATTCTCCGC

R : GCTGCCACTTCATGTGACC

64[RIGHTWARDS ARROW]61 °Cc

PCRs were carried out in a 15 μL volume containing 15 ng of DNA, 1x Taq buffer, 2 mm MgCl2, 0.25 mm of each dNTP, 0.3 U Taq polymerase (all products from New England Biolabs) and 0.5 μm of each primer. An initial denaturation at 94 °C for 10 min was followed by 35 cycles of (45 s at 94 °C; 20 s at the annealing temperature shown in Table 1; 1 min 30 s at 72 °C) and a final extension at 72 °C. PCR products were purified with EXOSAP-IT (USB Corporation). Sequencing reactions were performed with the BigDye® Terminator v3.1 cycle sequencing kit (Applied Biosystems) in a total volume of 10 μL containing 0.5 μL of Big Dye, 1.5 μL of Buffer, 1 μL of 2 μm primer, 4 μL of cleaned-up PCR product and 3 μL of milli-Q water. All fragments were sequenced in both directions. Sequencing reactions were then purified by ethanol purification and sequence data were obtained on an ABI 3130xl capillary sequencer (Applied Biosystems).

Base calling and contig assembly were done using codoncode aligner v2.0.1 (Codoncode Corporation, Dedham, MA, USA). All polymorphisms were visually checked. As DNA sample were diploid, the identification of haplotypes (i.e. sequence variants) for individuals with more than one SNP was performed using phase (Stephens et al., 2001; Stephens & Donnelly, 2003) implemented in dnasp v5 (Librado & Rozas, 2009) to produce two haploid sequences per individual.

Data analyses

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

We performed our analyses at the ‘site’ level (Paracou and Nouragues) and at the ‘habitat’ level (VD, SLD and SH) within a site. We use the terms throughout the article: ‘amplicon’ to refer to sequenced PCR fragments; ‘haplotype’ for the different amplicon sequence variants and ‘SNP’ for each polymorphic site (including indels).

Molecular diversity and differentiation

Nucleotide diversity of each amplicon was estimated by both θS (Watterson, 1975), based on the number of segregating sites and θπ (Nei, 1987), based on the average number of pairwise nucleotide differences between sequences in a sample. Haplotype diversity Hd (Nei, 2009) was also calculated for each amplicon. Analyses of diversity were conducted in dnasp v5 (Librado & Rozas, 2009). Linkage disequilibrium among amplicons was estimated only for haplotypes occurring with > 5% frequency, using a likelihood ratio test (Slatkin & Excoffier, 1996) as implemented in arlequin v3.5 (Excoffier & Lischer, 2010). Linkage disequilibrium (LD) within amplicons and departure from Hardy-Weinberg equilibrium were tested on a contingency table of observed vs. predicted genotype frequencies using a modified Markov-chain random walk algorithm as described by Guo & Thompson (1992) and implemented in arlequin. LD was tested with 5% significance before and after applying the sequential Bonferroni correction for multiple testing. We also computed the LD descriptive statistic r² (Hill & Robertson, 1968), as it summarizes both recombination and mutation history and it is less sensitive to sample size than other common LD statistics, such as D' (Flint-Garcia et al., 2003). r² was calculated on SNPs using dnasp v5 and statistical significance of r2 was computed with a one-tailed Fisher's exact test and applying Bonferroni corrections for multiple testing. The decay of LD with physical distance was estimated using nonlinear regression of LD between SNPs (r²) onto their distance in base pairs (Remington et al., 2001; Ingvarsson, 2005). The expected value of r² under drift-recombination equilibrium, taking mutation into account, was computed according to Hill & Weir (1988).

The genetic structure of populations was investigated by the analysis of molecular variance (amova) (Excoffier et al., 1992) implemented in arlequin v3.5. amova was estimated among sites and among environments within site, for each amplicon as NST (Pons & Petit, 1996) and for all amplicons as FST (Weir & Cockerham, 1984).

Detection of ‘outlier’ loci

Departures from the standard neutral model of molecular evolution were investigated by two different methods: the frequentist method described by Beaumont & Nichols (1996) and the more refined Bayesian method described in Beaumont & Balding (2004). To compare the results obtained with the two methods, we assigned confidence levels of 99% and 90% for fdist2 and bayesfst. The use of these two significance thresholds confers comparable false discovery rates to the two methods (Beaumont & Balding, 2004). Identification of polymorphisms carrying a possible signature of natural selection (‘outlier’ loci) was first performed with the fdist2 program, which uses the summary-statistics approach described in Beaumont & Nichols (1996) and further developed in Beaumont & Balding (2004). Twenty-thousand coalescent simulations were performed with three and two populations of 50 individuals for Paracou and Nouragues respectively. Because sample size was unequal between sub-populations at each site, and because only one sample size can be entered as a parameter in fdist2, we also ran the analyses with three populations, sample size 170 and two populations, sample size 150, for Paracou and Nouragues respectively; this corresponds to the largest sample size for each site. The numbers of populations and samples to simulate were chosen to model as closely as possible the populations that have been analysed at each site. Expected FST for simulations was determined as the mean of observed FST values. To comply with the assumption of independence of loci required for the estimation of population diversity and divergence, three independent subsets of 21 SNPs (three per amplicon) with zero pairwise LD were used to compute FST's. This led to three independent simulations, each of which is based on 21 statistically independent loci; as these are statistically uncorrelated, we consider them as being effectively independent loci, although they come from a restricted number of physical genome locations. The neutral envelop was constructed for each simulation at the 99% confidence level. A single envelop was obtained by selecting, in each diversity bin computed by the algorithm, the most conservative FST values (i.e. the largest upper bound and the smallest lower bound). Loci with a FST value exceeding the upper limit of the neutral envelop conditional on heterozygosity were considered as potentially under divergent selection. The Bayesian inference method implemented in the bayesfst program (Beaumont & Balding, 2004) was also used to identify genes under selection. This algorithm relies on a Bayesian model to identify locus-specific population divergence between samples, by implementing a Metropolis-Hastings Markov Chain Monte Carlo (MCMC) process based on the likelihood of allele counts. It has the advantage of disentangling locus effect (αi), population effect (βj) and optional interaction between locus and population effects (γij). A positive value of αi indicates the presence of disruptive selection at the locus, whereas a negative value suggests balancing selection. The γij's also have an interpretation in terms of selection: a large positive γij could indicate a potentially advantageous mutation that would be locally adapted in a particular population (Beaumont & Balding, 2004). Default prior distributions were used to generate 10 000 parameter series and convergence was checked using the CODA package of R version 2.10.1. Outlier values for αi and γij were identified setting the confidence level at 90%.

Spatial analyses

We tested whether the distribution of genotypes was likely to have arisen by chance, given the spatial structure of stems and habitats, using a method adapted from Harms et al. (2001). We compared the relative abundance of each haplotype in each habitat to its expectation under the null hypothesis of random distribution of haplotypes. The null distribution of each haplotype's relative abundance was simulated by 10 000 torus-translations of stem locations to conserve their spatial pattern. The limits of the neutral confidence interval were defined as values excluding 5% of the highest and lowest values. If the relative frequency of a genotype, determined from the true habitat map, was outside the confidence interval, then it was considered to be statistically associated with the habitat (if the frequency had a positive value) or dissociated from the habitat (negative value). Habitat association of haplotypes and genotypes for each amplicon and of SNPs was tested for each site.

Spatial genetic structure was assessed at plot scale using directional spatial autocorrelation analyses (Epperson, 2003) of the pairwise kinship coefficient between individuals (fij; Loiselle et al., 1995), which was computed for haplotypes and individual SNPs. Calculations were performed by spagedi (Hardy & Vekemans, 2002). Kinship coefficient values were computed for a set of nine 20 m-wide distance intervals (from 0 to 180 m) and the significance of the slope of fij as a function of geographical distance was tested based on the permutation procedure implemented in spagedi with 10 000 permutations. Significance of negative slopes (indicating that genetic similarity decreases with geographical distance) was tested at one-tailed α = 5% with Bonferroni correction for multiple tests. Directional autocorrelation was performed by taking into account all and only the pairs of points connected by a segment aligned in the desired direction, with a tolerance of pi/12 radians on each side. The matrix of distances for suitable pairs of points was computed using an R script written for this purpose and available from the Authors. Autocorrelation was performed: (a) for Paracou: along the (orthogonal) X and the Y axes indicated in Fig. 1a, and omnidirectionally, for the whole plot as well as for its Northern its Eastern halves (Fig. 1a; these sub-plots were included in the analyses because eye inspection of the landscape revealed that they contained a habitat gradient along one of the axes); (b) in Nouragues, along the X and Y axes indicated in Fig. 1b and omnidirectionally. The Y axis corresponds to the presumed direction of the environmental gradient for all cases except the Northern half of the Paracou plot, where the presumed cline direction is the X axis (for the whole plot in Paracou, despite the ruggedness of the pattern, the proportion of points sampled in the VD condition steadily increases along the Y axis; Fig. 1a). SGS was conservatively considered as anisotropic (i.e. strength of autocorrelation varied between directions) when the slope of fij values with distance was significant in one, but not the other, of the mono-directional tests, although some degree of autocorrelation is expected to occur in all directions due to neutral processes, such as limitations to seed and pollen dispersal.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

DNA polymorphism

Sequence polymorphism data obtained from the seven EST loci for the two experimental sites are shown in Table 2. The total number of polymorphic sites per amplicon ranged from 6 for HYP5 to 31 for FTase and the number of haplotypes per amplicon ranged from 12 for PIP1.2 to 36 for FTase. The average nucleotide diversity θπ across polymorphic fragments was 0.00254 and varied from 0.00111 for PIP1.2 to 0.00445 for PIP1.1. The average of θS is higher (0.0041) than θπ; values ranged from 0.00188 for PIP1.2 to 0.01126 for FTase, which had the highest number of SNPs and haplotypes. In this amplicon, the great majority of SNPs are nonsynonymous (18), one was tri-allelic and one was a heterozygous singleton coding for a termination codon that shortens the protein sequence of the last 10 amino acids.

Table 2. DNA polymorphism for seven amplicons across the two sampling sites. N: number of analysed haploid sequences, L: sequence length, ST: number of polymorphic sites, h: number of haplotypes; Hd: Nei's gene diversity computed on haplotype frequencies; θπ, θW: estimates of population-level diversity based on the average number of pairwise differences per site between sequences and on the number of segregating sites respectively
Amplicon N L S T h H d θπ(x103)θW (x103)
CAT3096578150.5671.531.93
FTase57839731360.6373.1211.26
HYP56423616100.6182.12 2.36
PIP1.145655217230.7964.454.60
PIP1.26645257120.3941.111.88
UNK74824178130.6453.182.84
UNK1462241211240.5022.303.83

Linkage disequilibrium

We did not find any clear evidence of tight linkage disequilibrium among amplicons. Three loci (CAT, PIP1.1 and PIP1.2) showed significant linkage disequilibrium after Bonferroni correction: CAT and PIP1.1 were associated in Nouragues; PIP1.1 and PIP1.2 in Paracou. As none was in LD in both Nouragues and Paracou, all loci were considered as independent. Between 2% (FTase) and 40% (HYP5) of within-amplicon linkage disequilibrium tests between SNP loci were significant at P < 0.01. Decay of linkage disequilibrium within amplicons was rapid (Fig. 2). Nonlinear fitting of the squared correlation of allele frequencies r² as a function of distance between SNPs showed expected values of ~0.10 at 100 bp (determination coefficient for the fitted model R2 = 3.5%). It has to be noted that the locus with the highest density of SNPs (FTase) also shows the lowest level of linkage disequilibrium, thus excluding the possibility that high levels of polymorphism are due to co-amplification of two different loci (which would cause strong linkage disequilibrium between variants belonging to the two loci).

image

Figure 2. Plot showing the squared correlation of allele frequency (r²) as a function of physical distance between sites for seven amplicons in E. falcata. A nonlinear fitting was performed using Equation 1 (Remington et al., 2001).

Download figure to PowerPoint

Population structure

Population differentiation analyses are presented in Table 3. The average level of genetic differentiation between sites was very low (FST = 0.010), but significantly different from zero (P < 10−5). NST values at the amplicon level ranged from −0.01 to 0.03, with three amplicons having a value significantly different from zero: CAT, FTase and UNK14. The level of global genetic differentiation among habitats within Paracou was quite similar, with an overall FST significant value of 0.01. NST values at the amplicon level ranged from 0.00 to 0.04, with two amplicons showing significant differentiation: CAT and FTase. The situation is similar for pairwise comparisons between environments, with significant divergence for CAT in three cases and for FTase in two cases. In Nouragues, the mean level of genetic differentiation was null. NST values varied between −0.10 and 0.10 among amplicons, with two amplicons (FTase and UNK14) showing significant positive values.

Table 3. Results of the analysis of molecular variance (amova). Genetic differentiation (F-statistics) at the haplotype level for each amplicon (NST) and at the multilocus level (FST) for different hierarchical levels (between sites and among and between environments)
   ParacouNouragues
  Paracou vs. nouragues PairwisePairwise
   GlobalVD vs. VLDVD vs. SHVLD vs. SHVD vs. SH
LocusStatisticF-statisticF-statisticF-statisticF-statisticF-statisticF-statistic
  1. Significant values (α = 5%) are indicated by an asterisk.

CAT N ST 0.03*0.04*0.06*0.06*0.02*−0.12
FTase N ST 0.02*0.02*0.04*0.03*0.000.10*
HYP5 N ST 0.000.000.010.010.000.01
PIP1.1 N ST 0.000.000.010.010.000.00
PIP1.2 N ST 0.000.010.010.000.00−0.01
UNK7 N ST −0.010.00−0.010.000.00−0.02
UNK14 N ST 0.01*0.000.01−0.010.000.05*
All loci F ST 0.01*0.01*0.010.02*0.010.00

Outlier detection

The summary-statistic simulation method implemented in fdist2 identified two SNPs of 74 and six of 60 as outliers showing footprints of disruptive selection at the 99% confidence level in Paracou and Nouragues respectively (Fig. 3). The outliers found for the Paracou site belong to two amplicons (CAT and UNK7). Outlier detection by pairs of habitats in Paracou (Supplementary Figure 1) shows that the results obtained in the global analysis are mainly due to divergence between VD and SH. The six outliers detected in Nouragues belong to three amplicons (CAT, FTase and UNK14). One SNP (CAT_S355) was a significant outlier at both sites (outlier detection based on simulations with larger samples sizes provided much more liberal results; Supplementary Figure 2). The more robust Bayesian method, implemented in bayesfst, provided different results at a comparable 90% confidence level: no SNP was significantly different from neutral expectations. However, the SNPs detected as significant by the coalescent-based method showed the highest αi values with the Bayesian method.

image

Figure 3. Distribution of observed FST values for each locus as a function of its average within-population heterozygosity (inline image). The simulated median (dotted line) and 99% neutral envelop confidence limits (dashed lines), obtained by coalescent simulation are shown. The names of loci lying outside the neutral envelop are displayed. (a) Paracou (b) Nouragues. Note that the scale on the y axis differs between the two plots.

Download figure to PowerPoint

Spatial genetic structure

Torus translation tests detected 20 significant independent habitat associations (not counting for tightly linked SNP loci): nine in Nouragues, 11 in Paracou, of which 13 for haplotypes and seven for SNPs, of 454 tests (4%) at two-tailed α = 5%; one remained significant after Bonferroni correction (Table 4 and Supplementary Table 1). Six associations with SLD and with VD, as well as one with SH, were detected (Supplementary Table 1), along with six cases of repulsion with SLD and one with VD (no case of repulsion with SH was detected). The most frequent haplotype at the FTase locus (h1) showed strong association with SH and repulsion with SLD in Paracou, and was associated with VD in Nouragues, together with two SNPs of the same gene. In Paracou, one PIP1.1 haplotype (h15) was associated with SLD and PIP1.2's most common haplotype (h1), as well as one PIP1.2 SNP (S145), were associated with VD. The repulsion between FTase haplotype H1 and SLD in Paracou was the only significant test left after Bonferroni correction.

Table 4. Torus-translation tests for habitat associations on the two study plots
ParacouHabitat association
VD+VD−SLD+SLD−SH+SH−Total
  1. For each plot, ‘haplotypes’: results for all haplotypes over the seven amplicons; ‘SNPs’: results for all SNP variants. For each habitat ‘+’ indicates significant positive association and ‘-‘ indicates significant negative association (two-tailed α = 5%). Numbers of significant tests after Bonferroni correction are marked by ‘*’. For SNPs, only the number of independent (i.e. without linkage disequilibrium) loci is given. Numbers in brackets indicate the number of different haplotypes, genotypes and SNP variants tested. NA = test not applied (the VD habitat level is absent in Nouragues).

Haplotypes [115]5011*108 (1*)
SNPs [149]1101003
Nouragues       
Haplotypes [75]23NANA005
SNPs [115]31NANA004
Total11512 (1*)1020 (1*)

Directional and omnidirectional autocorrelation was tested for each individual SNP, and for all amplicons at the haplotype level, at the two sites. After Bonferroni correction, 26 autocorrelograms (of a total of 894, or 3%), involving six SNPs and four amplicons, showed a significant negative slope at the α = 5% threshold (Supplementary Table 2). In eight cases (Fig. 4, Supplementary Table 2), there was significant autocorrelation along the direction of the gradient (Y axis for all tests except for Paracou, Northern Half), but not for the direction orthogonal to the gradient. Two of these tests involved amplicon UNK14 in Nouragues, for one SNP (UNK14_194) and for the whole amplicon; four involved amplicon HYP5 in Paracou, two for one SNP (HYP5_160) and two for the whole amplicon; two involved amplicon CAT in Paracou, one for a SNP (CAT_299) and one for the whole amplicon.

image

Figure 4. Directional autocorrelograms of estimated kinship coefficient fij (Loiselle et al., 1995) for all tests were significant in the direction of the gradient and nonsignificant in the orthogonal direction. Thick lines: observed values. Thin lines: upper and lower 95% neutral confidence limits. Solid lines: plot of (significant) autocorrelation values obtained along the expected gradient (Y axis for all plots except for Paracou, Northern half, for which the gradient is along the X axis). Dotted lines: plot of (nonsignificant) autocorrelation values obtained along the direction orthogonal to the gradient. Left panes: haplotype-level autocorrelograms. Right panes: SNP-level autocorrelograms.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

The results presented here show patterns of genetic differentiation associated with micro-geographical habitat variations at fine spatial scale in populations of E. falcata. These results were obtained with four independent methods and suggest that divergent selection may be strong even between sub-populations belonging to a continuous population.

The results obtained for each amplicon and each kind of analysis are summarized in Table 5. Analysis of molecular variance (Table 3) shows that the Catalase and Farnesyltransferase genes can reach very high levels of sub-population divergence within a plot (several SNPs displayed strongly negative FST values in Nouragues. Because the SH population is much smaller than the VD, both demographically and in surface, this may imply that the lower portion of the VD population is more similar to SH, at neutral loci, than it is similar to the upper part of VD, due to neutral SGS. This would have the consequence of generating negative FST, i.e. closer relatedness between alleles from different populations than from the same population). Coalescent-based outlier detection methods revealed eight SNPs under disruptive selection belonging to four amplicons (although none was significant with the more conservative Bayesian approach). Directional autocorrelation identified three SNPs (and the amplicons they belong to) as significantly associated to the expected direction of the gradient (although none was associated to the orthogonal direction). Finally, allele (or genotype)-by-habitat association tests obtained by torus permutation identified 20 significant tests at the two-tailed 5% threshold; one of these remained significant after Bonferroni correction. Table 5 shows that at least six SNP variants, haplotypes or amplicons turned out to be significant in at least two independent analyses. The Catalase amplicon showed significant results (mostly in Paracou) both at the amplicon and at the SNP variant levels; one SNP of the Catalase amplicon (CAT_S355) was a significant outlier in both populations. Farnesyltransferase displayed significant results at all levels and in both plots, with two SNP variants showing significant results in outlier detection and in torus permutations. The latter analysis, both at the haplotype and at the SNP level, indicates that generally the detected variants are less represented than expected in drought-prone SLD habitats: all significant tests show either association with VD or SH, or repulsion with SLD; the only results that remains significant after Bonferroni correction is the repulsion of haplotype 1 and SLD in Paracou. HYP5 also shows a SNP variant with clear trends of habitat association, as well as the UNK14 amplicon.

Table 5. Summary of significant results for all amplicons, all statistics and all sites. Amplicon/Haplotype/SNP level indicates whether the analysis was performed, respectively, for all haplotypes of an amplicon, for each of its haplotypes or for each of its SNPs. For amplicon-level tests, only the names of the populations having shown significant tests are listed. In toroid tests, ‘ass’ and ‘rep’ indicate whether a given haplotype or SNP variant was found to be in association (ass) or in repulsion (rep) with a given habitat
Amplicon amova Outlier detectionToroidal permutation testsDirectional spatial autocorrelation
Amplicon levelSNP levelHaplotype levelSNP levelAmplicon levelSNP level
  1. SNPs that appear as significant in at least two independent analyses are shown in bold. The asterisk (*) indicates the only toroid test that remained significant after Bonferroni correction.

Catalase (CAT)Paracou

Paracou: CAT_S355

Nouragues: CAT_S355

Paracou Eastern halfParacou Northern half: CAT_S221
Farnesyltrans-ferase (FTase)

Paracou

Nouragues

Nouragues: FTase_S36,

FTase_S269

Paracou: h1/SH (ass), h1/SLD (rep)*

Nouragues: h1/VD (ass)

Paracou: FTase_S242/SLD (rep)

Nouragues: FTase_S36/VD (ass), FTase_S269/VD (ass)

PIP1.1Paracou: h15/VLD (ass)   
PIP1.2Paracou: h1/VD (ass)Paracou: PIP1.2_S145/VD (ass)  
Hypothetical Protein 5 (HYP5)

Paracou: h10/VD (ass)

Nouragues: h2/VD (ass), h8/VD (rep), h11/VD (rep)

Paracou: HYP5_S267/VD (rep)

Nouragues: HYP5_S160-S201/VD (ass)

Paracou

Paracou Eastern half

Paracou: HYP5_S160

Paracou Eastern half: HYP5_S160

Unknown amplicon 7 (UNK7)Paracou: UNK7_S204Paracou: h1/VD (ass)
Unknown amplicon 14 (UNK14)NouraguesNouragues: UNK14_S86, UNK14_S194, UNK14_S328

Paracou: h4/VD (ass), h7/VD (ass)

Nouragues: h9/VD (rep)

Nouragues: UNK14_S86/VD (rep)NouraguesNouragues: UNK14_S194

These results suggest that forces behind the differentiation between sub-populations are very strong even at short spatial distances, and that these forces are structured by variation in habitat rather than by neutral dispersal processes. The processes underlying the observed divergence occur over distances in the order of few hundreds of metres – well within gene dispersal distances predicted for the genus (Hardy et al., 2006). Therefore, it is likely that at least part of the observed differentiation is caused by disruptive selection (Linhart & Grant, 1996). On one hand, our findings support the idea that environmental heterogeneity generates genetic heterogeneity within populations. On the other hand, the contrast between results observed in Nouragues and Paracou suggests that the contrasts we have studied are of different kinds. The structure of the gradients may differ between the two plots, as suggested by their differences in topography. Moreover, and more generally, it is likely that environmental conditions, other than the limited set of edaphic properties that we have taken into account, differentiate habitats in the two sites.

Differences among the results obtained with the three methods suggest that each captured different aspects of the distribution of genetic diversity. For instance, both the outlier detection and torus permutation tests stress the idea of differences in gene frequencies between (sub)populations, but the latter also takes into account the spatial distribution of genotypes; moreover, autocorrelation rests on the explicit spatial layout of pairwise individual relatedness, while ignoring population-level distributions (except for the determination of neutral envelopes). Thus, the three methods may be able to detect different patterns, which in turn may be the result of different dynamic processes: outlier detection methods stress the quantitative difference between the effects of selection and drift on divergence between groups; torus-based tests also compare groups, but stress departures from random distribution of individual variants; autocorrelation methods detect departures from the random distribution of individual relationships and tests continuous turnover of genotypes.

As our analyses are based on seven loci only, a possible source of incoherence among results obtained with the three methods may also lie in limited robustness. Seven loci certainly are far from providing a satisfactory representation of the whole transcriptome, let alone of the genome. Even without the ambition to evaluate genome-level processes, our study nevertheless proves that genetic divergence can be detected at the within-stand level, at least for some loci. Moreover, the robustness of each of the three methods used here resides (i) in the number of SNPs (not ESTs) for outlier detection, (ii) in the number of genotypes per locus (not in the number of loci, which are analysed individually) for torus-translation and (iii) autocorrelation analyses. For the latter analysis, it is not uncommon to obtain results from data sets containing between five and 10 loci (Collevatti & Hay, 2011; Oddou-Muratorio et al., 2010).

The partial incoherence shown by the results suggests a pattern of moderate divergence affecting multiple loci, occurring at the micro-geographical scale in relation with habitat conditions. It is important to underline that the diffuse signal of divergence that we observe must not be interpreted as straightforward indication of disruptive selection acting upon the observed loci. Other mechanisms, such as isolation by adaptation (Nosil et al., 2008), genomic hitch-hiking (Via & West, 2008) or partial restrictions to mating (e.g. by environmentally cued flowering time differences) may produce moderate levels of divergence at neutral loci. Such divergence is observed against a background of overall weak but diffuse SGS patterns (Supplementary Table 2: omnidirectional autocorrelation is significant for ‘all loci’ in three cases of four), probably caused by limited pollen and seed dispersal, as already observed at Paracou in a closely related species (Hardy et al., 2006). Population structuring is not, however, strong enough to prevent long-term genetic mixing, as shown by the rate of decay of intragenic linkage disequilibrium. The pattern shown in Fig. 2 indicates that historical genetic mixing at the population (and species) level is globally as intense as in other angiosperm tree species (e.g. Ingvarsson, 2008) and at least as intense as in most conifers (Brown et al., 2004; Gonzalez-Martinez et al., 2006a; Heuertz et al., 2006). Current mixing at the stand level appears to be relatively intense, because only a minority of loci showed significant spatial autocorrelation and because FST values between sub-populations were overall small. This is not inconsistent with the possibility that the observed divergence is caused by selection, because moderate levels of gene flow may facilitate divergence, as indicated by theoretical predictions of divergence with gene flow (Goudet et al., 2009; Kremer & Le Corre, 2012). Moreover, it is actually possible that the weak but detectable background spatial genetic structure contributes to create the conditions for divergence to operate: preferential mating between spatially close trees would tend to enrich sub-populations with locally adapted genotypes, thus enhancing the outcome of ecological filtering and facilitating sub-population divergence. This hypothesis can be put to test by building spatially explicit, individual-based models describing simultaneously pollination, seed dispersal and selection in divergent habitat patches (e.g. by building combinations of models for dispersal) and selection in continuous environmental patches (Débarre & Gandon, 2010) and for pollen and seed dispersal (Klein & Oddou-Muratorio, 2011).

The indication of the action of diversifying processes, observed in E. falcata, motivates further studies in the genetics of divergence, that will need to take advantage of population genomic approaches (Luikart et al., 2003) now accessible to nonmodel species in general (Ekblom & Galindo, 2011) and to trees in particular (Gonzalez-Martinez et al., 2006b). To take advantage of the wealth of data that can be produced by genomic approaches, these studies will need to be matched by breakthroughs in the modelling of processes of divergence with gene flow. The combination of theoretical advances and large data sets will permit to disclose the mechanisms underlying patterns of ecological-genetic divergence such as those demonstrated in E. falcata, and perhaps ultimately provide the key to the understanding of the maintenance of reservoirs of adaptive variation in natural populations.

Authorship

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

DA, CSS, IS, contributed to experimental conception and setup; DA, ST, CSS, IS, contributed to sampling strategy choice and sampling; DA, LB contributed to marker development and sequence data collection; DA, LB, ST, CSS, IS, contributed to data analyses. All authors contributed to the writing of the manuscript.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

The authors gratefully acknowledge the CIRAD field assistants for their contribution to data collection (M. Baisie, A. Etienne, K. Ficadici, F. Kwasie, K. Martinus, O. N'Gwete, P. Naisso, P. Pétronelli and R. Santé) and A. Jolivot for her work in data checking and data management. The authors wish to thank Saint-Omer Cazal for DNA extractions and Valérie Troispoux for her helpful technical assistance. Part of the experiments presented in the present publication was performed at the Genotyping and Sequencing facility of Bordeaux (grants from the Conseil Régional d'Aquitaine no 20030304002FA and 20040305003FA and from the European Union, FEDER no 2003227). This work was funded by the EU-funded INCO ‘SEEDSOURCE’ project, by the French Ministry of Ecology and Sustainable Development ‘ECOFOR – ECOSYSTEMES TROPICAUX’ program and by the EU-funded PO-FEDER ‘ENERGIRAVI’ program.

The Authors also wish to thank Olivier Hardy and Pauline Garnier-Géré for discussions on the methods and results reported here and for critically reading earlier versions of the manuscript, and Lino Ometto as well as two anonymous reviewers for carefully and critically reading the submitted manuscript.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Data analyses
  6. Results
  7. Discussion
  8. Authorship
  9. Acknowledgments
  10. References
  11. Supporting Information
FilenameFormatSizeDescription
jeb12069-sup-0001-FigureS1-S2-TableS1-S2-DataS1.docxWord document126K

Table S1 Results of toroidal permutation tests: list of individual SNPs and haplotypes having shown at least one significant habitat association.

Table S2 Slopes of spatial autocorrelation plots. Only results for loci whose tests provided at least one negative, significant slope are displayed.

Figure S1 Distribution of observed FST values for each locus (y axis) as a function of its heterozygosity (HS) (x axis) for pairs of habitats at the Paracou site (see Figure 1).

Figure S2 Outlier detection tests (a) for Paracou with sample size N = 170 and (b) for Nouragues with sample size N = 150.

Data S1 Description of genes and polymorphisms for genes with known functions.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.