Plants offer excellent models to investigate how gene flow shapes the organization of genetic diversity. Their three genomes can have different modes of transmission and will hence experience varying levels of gene flow. We have compiled studies of genetic structure based on chloroplast DNA (cpDNA), mitochondrial DNA (mtDNA) and nuclear markers in seed plants. Based on a data set of 183 species belonging to 103 genera and 52 families, we show that the precision of estimates of genetic differentiation (GST) used to infer gene flow is mostly constrained by the sampling of populations. Mode of inheritance appears to have a major effect on GST. Maternally inherited genomes experience considerably more subdivision (median value of 0.67) than paternally or biparentally inherited genomes (∼0.10). GST at cpDNA and mtDNA markers covary narrowly when both genomes are maternally inherited, whereas GST at paternally and biparentally inherited markers also covary positively but more loosely and GST at maternally inherited markers are largely independent of values based on nuclear markers. A model-based gross estimate suggests that, at the rangewide scale, historical levels of pollen flow are generally at least an order of magnitude larger than levels of seed flow (median of the pollen-to-seed migration ratio: 17) and that pollen and seed gene flow vary independently across species. Finally, we show that measures of subdivision that take into account the degree of similarity between haplotypes (NST or RST) make better use of the information inherent in haplotype data than standard measures based on allele frequencies only.
Gene flow shapes the organization of genetic diversity within and among populations (e.g., Wright 1931). In contrast to many vertebrates, seed plants have intrinsically complex and asymmetrical dispersal behaviours. Because adults are fixed, the dispersal function is mediated by two very distinct vehicles that operate in sequence, the male gametophyte (pollen) and the young sporophyte (seed). Another peculiarity of plants is that two organelle genomes coexist in the cytoplasm of their cells: the mitochondrial genome, nearly ubiquitous in eukaryotes, and the plastid genome, specific to plants. The mode of inheritance of these two genomes is varied and not always coincident, ranging from strictly maternal to strictly paternal (Harris & Ingram 1991; Reboud & Zeyl 1994; Mogensen 1996; Röhr et al. 1998). The contrasted patterns of inheritance of organelle and nuclear genes can be used to unravel the complexity of gene flow in plants, as they are predicted to result in very different distribution of genetic diversity within and among populations (Birky et al. 1989; Petit et al. 1993). Therefore, inferences can be made as to the relative importance of the two main components of dispersal, seed and pollen, thanks to the development of theoretical models that relate level of gene flow and population genetic structure (e.g. Petit 1992; Ennos 1994; Hu & Ennos 1997; Oddou-Muratorio et al. 2001; Hamilton & Miller 2002). However, these models are based on assumptions that are unlikely to be met in natural systems; hence interpretations have to be made carefully.
Relying on a large body of case studies should help evaluate to which extent these models are useful. Over more than two decades, research on plant organelle DNA (oDNA) diversity has been lagging behind that on animal mtDNA (Soltis et al. 1997; Schaal et al. 1998; Brunsfeld et al. 2001; Petit & Vendramin 2005). The situation has changed during the last 10 years, as a result of the emergence of efficient molecular techniques to identify and screen oDNA diversity. There is now sufficient data available to examine the realism of several predictions made by theoretical models and to provide some reference values for comparative purposes. A general introduction on oDNA variation in plants has been published recently (Petit & Vendramin 2005) but there has been no comprehensive review so far of the primary data generated during population surveys of oDNA variation in plants, despite some early or limited attempts (e.g. Ennos 1994; McCauley 1995; El Mousadik & Petit 1996; Ouborg et al. 1999; Morjan & Rieseberg 2004).
Here we compile studies that report the partitioning of genetic diversity within and among populations for organelle genes in seed plants and, whenever possible, identify estimates from the same species based on nuclear markers. We first examine the precision of estimates of population subdivision and test if it is more limited by the sampling of populations than by the sampling of individuals within populations, as suggested earlier (Pons & Petit 1995). The findings should be useful for planning future studies of organelle diversity. Furthermore, we ask whether biases are introduced into GST estimates by the genotyping method or by the examination of only a restricted part of the range. We then estimate mean levels of differentiation at markers having different modes of inheritance (maternal, paternal or biparental) and examine how these measures of subdivision covary across species. By relying on previously developed (neutral) equilibrium expectations, one should indeed obtain rough indirect measures of the relative importance of gene flow through pollen vs. seed and hence obtain some insight on how gene flow is generally achieved in plants. For instance, does gene flow take place predominantly through pollen, as often assumed (e.g. Levin & Kerster 1974; Ellstrand 1992)? Does some type of compensation exist between the two components of gene flow, with plants relying little on seed gene flow subjected to correspondingly higher level of pollen gene flow and vice versa? Because a minimum rate of gene flow is likely to be necessary for species cohesion and survival, some compensation might exist, particularly in species with very low dispersal rates. On the other hand, pollen and seed flow do not play an equivalent ecological role and other processes could overwhelm any compensatory effects between these two components of dispersal. Therefore, empirical data is needed to answer this quite fundamental question, which has received little if any attention so far. Finally, we test if the presence of a phylogeographical structure at organelle genes is a general feature in seed plants, comparing systematically the partitioning of organelle DNA diversity among populations when similarities between haplotypes are taken into account (NST or RST) or when they are not (GST). No clear consensus exists regarding the usefulness of taking allele similarity into account when empirically measuring population genetic structure (e.g. Gaggiotti et al. 1999; Balloux & Lugon-Moulin 2002), so this question seems relevant.
Materials and methods
The primary literature was searched until April 2004 for population studies of seed plant oDNA diversity, using bibliographical databases, checking references in published papers and contacting colleagues. Only studies with ≥ 5 populations and having ≥ 2 individuals per population analysed were considered. We sought to obtain the raw frequencies of haplotypes in populations and their full molecular characterization by examining published information or directly soliciting it from the authors. When several surveys were available for a single species, we included only those based on the highest number of populations and covering the largest fraction of the range, to avoid attributing too much weight to a few well-studied taxa. When a suitable oDNA entry was detected, we further attempted to identify a corresponding survey based on nuclear markers, by screening the literature for results from that particular species. Here as well, priority was given to range-wide studies.
The following information was recorded for each species: taxonomic group (gymnosperm or angiosperm), family, genome investigated (cpDNA, mtDNA or nuclear), its known or presumed predominant mode of inheritance (paternal, maternal or biparental), molecular technique used [the most frequent were: Probe-RFLP (restriction fragment length polymorphism, where the fragments are identified using a marked DNA fragment, the probe), Purif-RFLP (RFLP on DNA purified after isolation of organelles), PCR-RFLP (PCR amplification with specific primers followed by RFLP), SSR (single sequence repeats, also called microsatellites), sequencing, AFLP (amplified fragment length polymorphism), RAPD (random amplified polymorphic DNA) and isozymes], proportion of the range sampled (local, regional — that is, less than half of the range covered, or species-wide), number of populations investigated, total number of individuals sampled, arithmetic and harmonic mean number of individuals per population, number of haplotypes detected (for oDNA studies), number of ‘characters’ studied (number of polymorphic bands, variable nucleotide sites or SSRs for oDNA, number of loci for nuclear markers), and diversity statistics. For nuclear data, these included mean expected heterozygosity HE, FIS and FST, GST or ΦST, as provided in the papers. Although it would be preferable to use identical measures of genetic subdivision, FST, GST or ΦST estimate the same quantity and the differences between them is generally quite low (Morjan & Rieseberg 2004). For oDNA data, the diversity statistics were computed by us on the basis of the raw data as described below. In the few cases where this proved impossible, the estimates provided in the paper were used instead.
For oDNA, the following parameters were included in the database: total diversity HT, GST and its standard error, NST and RST and their standard errors (whenever appropriate: for NST, when there were ≥ 3 studied characters and when the description of haplotypes was available; for RST, when details on length variation at microsatellite loci were available). The program permut (by RJP, available at http://www.pierroton.inra.fr/genetics/labo/Software/) was used to estimate the parameters of population subdivision GST and NST and their standard errors. The difference between these two parameters is that NST takes similarities between haplotypes into account, contrary to GST (Pons & Petit 1996). These similarities were measured by counting the number of characters that differ between all pairs of haplotypes. The difference between NST and GST was tested through 1000 random permutations of haplotype identity (Burban et al. 1999). In the case of chloroplast microsatellites (cpSSRs), we used the program cpssr (available at http://www.pierroton.inra.fr/genetics/labo/Software/) to derive RST. This program is essentially identical to permut except that the distance between haplotypes is adapted to the mode of evolution of SSRs (it is the sum across loci of the squared differences in number of repeats). Note that the estimators of these parameters, proposed by Pons & Petit (1995, 1996), are based on a random model of population variation, instead of the fixed model procedure assumed by Nei & Chesser (1983). Hence, the variation resulting from the sampling of populations is taken into consideration, allowing comparisons across species. On the other hand, all populations are given the same weight, regardless of their sample sizes, as in Nei & Chesser (1983), but contrary to Weir & Cockerham (1984), where populations with higher sample sizes are given more weight. When sample sizes are identical in all populations, the estimate of GST proposed by Pons & Petit (1995) becomes identical to the parameter θ of Weir & Cockerham (1984).
where GSTm and GSTb correspond to the estimate of subdivision at maternally inherited markers and at nuclear (biparentally inherited) markers, and FIS is the heterozygote deficit estimated with nuclear codominant markers. When information from paternally and biparentally inherited markers is available (e.g. in conifers), this ratio can also be estimated as follows:
assuming strict allogamy, with GSTp corresponding to the estimate of subdivision at paternally inherited markers. Interpretation of mp/ms values requires particular caution because of a number of unrealistic assumptions underlying the island model used to derive Equations 1 and 2. These assumptions include (1) no mutation and no selection; (2) equal number of migrants among all populations, implying a lack of spatial genetic structure; (3) identical male and female effective population sizes; and (4) equilibrium between genetic drift and gene flow (e.g. Whitlock & McCauley 1999; Oddou-Muratorio et al. 2001).
Most statistical analyses were performed with systat version 10.2. Parametric standard tests were used after transforming data to meet normality assumptions. In particular, GST data were arcsin square root transformed before proceeding with further analyses. Because original GST and mp/ms values were not normally distributed, we provide medians plus their quartiles instead of arithmetic means, as this permits a more straightforward comparison of categories (see also Morjan & Rieseberg 2004).
Data were obtained for 183 species belonging to 103 genera and 52 families (Table 1). The raw data and list of source references is provided in Table S1. Most studies have been published recently (30% since 2003) in a variety of journals (29), with a maximum of 27% of all entries in a single journal (Molecular Ecology). Population studies based on organelle markers in plants have relied more on cpDNA markers (175 species) than on mtDNA markers (34 species), as stressed elsewhere (Petit & Vendramin 2005). Twenty-six species have been investigated at both cpDNA and mtDNA markers. Gymnosperms have been relatively well studied (42 of the c. 600 existing species investigated). They are particularly interesting models for studies on gene flow as many species have paternally inherited cpDNA and maternally inherited mtDNA (whereas both genomes are maternally inherited in most angiosperms). Raw data on haplotype frequencies were obtained in a large fraction of entries (92%). The most frequent technique used to study oDNA variation has been PCR-RFLP (46%), followed by Probe-RFLP (28%), microsatellites (11%), direct sequencing (9%) and cpDNA purification followed by RFLP (4%). In terms of sampling, the median values across studies were 157 individuals screened in total, distributed in 14 populations with an average of 10 individuals per population, allowing the identification of approximately seven haplotypes on the basis of around seven polymorphic ‘characters’. However, sampling strategies were very heterogeneous across species and each of these parameters varied by at least one order of magnitude. Studies of genetic structure based on nuclear markers were identified for 119 of the 183 species (65%), most of them (72%) based on isozyme markers (Table S2).
Table 1. Number of entries in the database, classified according to taxonomy, genome and mode of inheritance
Total number of species represented, which is lower than the total of the values situated on the left, because a given species might have been studied with markers from more than one genome.
Relationships between parameters
As expected, the total number of individuals analysed per species is positively related with the number of populations sampled (Pearson r = 0.62, n = 207, P << 0.001) and, to a lesser extent, with the mean sample size within population (r = 0.44, P << 0.001). There is a negative correlation between the number of populations investigated and the number of individuals per population (r = −0.42, P << 0.001), a result of the trade-off between sampling many individuals per population or many populations. On the other hand, sampling did not affect GST(all correlations between sample sizes and GST vary between −0.1 and 0.1), as expected for an unbiased estimate. However, there was a significant negative correlation between GST and the number of haplotypes (r = −0.44, n = 171, P << 0.001). Two other variables affect GST at maternally inherited markers: the molecular technique used and the coverage of the range. The effect of the technique (one-way anova: F = 3.03, d.f. = 5, n = 171, P = 0.01) mostly resulted from the fact that the eight studies based on purification of cpDNA followed by restriction enzyme digestion had higher GST. Similarly, the effect of range coverage (F = 3.93, d.f. = 2, P = 0.02) resulted from the fact that the species studied in a restricted area (i.e. locally) had lower GST compared to those studied across a larger fraction of the species’ range. For nuclear data, an effect of the technique was also noted (F = 5.11, d.f. = 6, n = 110, P < 0.001). It was the result of a higher mean subdivision in studies based on RAPD (11 cases), whereas estimates of genetic subdivision based on the other techniques were smaller and of similar magnitude (results not shown). On the other hand, no effect of range coverage was noted with these markers (F = 1.98, d.f. = 2, n = 109, P = 0.14).
Precision of GST estimates
The precision (standard error) of the estimates of population subdivision (GST) at organelle genes was also investigated. The 162 available estimates of GST standard error (at both maternally and paternally inherited markers) were plotted as a function of the number of populations, the total number of individuals analysed per species and the harmonic mean sample size per population (Fig. 1). The best predictor of the GST standard error (as assessed by the coefficient of determination R2) was the total number of individuals analysed, followed by the number of populations and the mean sample size per population (all after log transformation). The relation between total diversity hT and standard error of GST was also estimated and was found to be negative and significantly different from zero (Pearson r = −0.22, P < 0.001).
Selection of studies for further analysis
The above results indicated two possible sources of bias: the incorporation of species sampled in a too restricted part of their range and the inclusion of studies relying on prior purification of cpDNA. In the latter case, we attribute the bias to the fact that, in five out of eight cases, sampled individuals had been pooled to minimize the number of DNA isolation and purification experiments. Although the authors of these papers claimed that they would have detected any mixture of haplotypes from the resulting banding pattern (e.g. Soltis et al. 1991), some bias seems likely. All studies in which such a procedure had been used were therefore discarded from further analysis (12 in total, all based on maternally inherited markers; as a matter of fact, in only one of these 12 studies did the authors detect some intrapopulation variation). Similarly, those seven oDNA studies where only a small part of the species range had been investigated were removed, as well as one study for which the observed level of total diversity was particularly low (cf. our finding that a low level of diversity reduces the precision of GST). This left us with a total of 152 entries based on maternally inherited markers, corresponding to 144 different species, as in eight cases, results from both maternally inherited cpDNA and mtDNA markers were available. No study involving paternally inherited markers was deleted (grand total of 166 species).
The effect of taxonomic identity on GST was investigated using a nested anova with as main effects the factors ‘family’ and ‘genus nested within family’. For maternally inherited markers, the model explained 73% of the variance in GST. For paternally inherited markers, only the genus effect could be tested because most species involved belonged to Pinaceae; it was not significant.
Genetic subdivision at markers with different modes of inheritance
The distributions of GST values and their means were computed separately for angiosperms and gymnosperms and for maternally, paternally or biparentally inherited markers (Table 2; Fig. 2). The mode of inheritance has a major effect on the partitioning of genetic diversity, with studies based on maternally inherited markers having considerably higher GST than those based on paternally or biparentally inherited markers for both gymnosperms and angiosperms. On the other hand, there is no significant difference between GST at biparentally inherited markers and at paternally inherited markers in gymnosperms (separate variance t-test: t = −0.91, d.f. = 54.4, P = 0.37). A significant difference between angiosperms and gymnosperms was found at maternally inherited markers (t =−2.35, d.f. = 36.1, P = 0.024), whereas at nuclear markers, angiosperms also have significantly higher GST than gymnosperms (t = 2.66, d.f. = 77.2, P = 0.009).
Table 2. Genetic differentiation according to mode of inheritance in angiosperms and gymnosperms (conifers)
When both cpDNA and mtDNA data were available, only one data set was used in case of identical mode of inheritance (that based on the highest sample size or if similar on the most polymorphic markers, see Table S1). Superscript letters indicate significant differences between means at P < 0.05 (see text for details);
Q25 and Q75 are the first and third quartiles of the distribution of GST values.
0.165ab ± 0.036
0.099 [0.033, 0.163]
0.764d ± 0.008
0.759 [0.655, 0.890]
0.116a ± 0.003
0.088 [0.044, 0.152]
0.637c ± 0.002
0.646 [0.416, 0.871]
0.184b ± 0.002
0.137 [0.064, 0.230]
0.165 ± 0.036
0.099 [0.033, 0.163]
0.655 ± 0.002
0.673 [0.459, 0.879]
0.163 ± 0.001
0.115 [0.057, 0.199]
Covariation between GST based on markers from different genomes
There were 23 species for which both cpDNA and mtDNA data are available, including eight cases in which both genomes are maternally inherited. Data from 93 species were available for the combination of maternally inherited (cpDNA, mtDNA or both) and biparentally inherited (nuclear) markers. There were 29 species with data from both paternally inherited (cpDNA) and biparentally inherited markers (all conifers); for 13 of these, data was also available from mtDNA (i.e. data was available from three differentially inherited genomes).
In conifers, GST is nearly always larger at mtDNA markers than at cpDNA markers (Fig. 3). In contrast, GST estimates in angiosperms are similar for the two genomes and covary rather narrowly, especially when they derive from the same study (i.e. are based on the same individuals; see Table S1). In both angiosperms and gymnosperms GST at nuclear markers (GSTb) was lower than at maternally inherited markers (GSTm), with only three exceptions out of 93 (Fig. 4). We also checked whether GSTb < (2/GSTm − 1)−1 by indicating the corresponding curve on Fig. 4. This limit corresponds to the maximum value that GSTb can reach, for a given value of GSTm in an island model at equilibrium between migration and drift (Petit 1992; Petit & Vendramin 2005). There are seven cases out of 93 (including the three previous cases) that reach this limit. GST at paternally inherited markers (GSTp) was also compared to GSTb (Fig. 4). Most cases (25 out of 29) do not fall in the zone of covariation predicted by theory, which lies between the diagonal (GSTb = GSTm) and the previously described limit (Petit 1992; Petit & Vendramin 2005). Interestingly, in 17 out of 29 cases, GSTb is higher than the maximum predicted value (i.e. > GSTp), compared to only eight cases where GSTb is below the lower threshold [i.e. < 1/(2/GSTp − 1)]. That is, GSTp is often ‘too low’ or alternatively GSTb‘too high’, compared to neutral equilibrium expectations. However, there remains a large and positive correlation between GSTp and GSTb (Pearson r = 0.52, P = 0.004), not observed between GSTm and GSTb(r = 0.13, P = 0.22).
Pollen-to-seed migration ratios
Equation 1 was used to derive the pollen-to-seed migration ratio, using as input values estimates of FIS, GSTb and GSTm. As FIS estimates were available only for a subset of studies (73%), we first checked whether results were much affected when this parameter was taken into account (Equation 1). Migration ratios were little affected (Pearson r > 0.999 between estimates based on actual FIS values and those obtained by setting FIS values to zero), so we decided to set FIS to zero in all subsequent analyses. The median of the pollen/seed migration ratio estimates was 17, based on 93 individual measures [i.e. setting mp/ms to an arbitrary high value for those species where GSTm = 1, and setting it to zero for the seven species where GSTb < (2/GSTm − 1)−1]. This result suggests an overall predominance of gene flow by pollen, although in 25 species (27% of total) seed dispersal appears to account for a large (> 20%) component of total gene flow (i.e. mp/ms < 5). Alternative estimates of the same migration ratio were obtained using Equations 1 and 2 in those conifers for which data was available from all three genomes. One species was excluded (Pinus pinaster) because there was complete fixation for mtDNA markers. For the remaining 12 species, there was no correlation between the two estimates of mp/ms (Pearson r = −0.12, P = 0.71), indicating that estimates of this parameter lack stability when different sources of data and/or markers are used.
Does gene flow by seeds covary with gene flow by pollen?
The mp/ms ratio obtained by contrasting GSTm and GSTb increases with GSTm (Fig. 5). However, this could result either from a reduction of seed flow across species, without any concomitant increase in pollen flow, or from an increase of pollen flow in species with lower seed flow (i.e. compensation between dispersal by pollen and by seed). A null hypothesis was constructed by assuming that mp is independent of ms. This implies that mp/ms = K/(1/GSTm − 1), where K is a numerical constant (i.e. the migration ratio mp/ms is a function only of the denominator ms). We fitted this function with the data presented in Fig. 5 by minimizing the sum of squares of the deviations, yielding K = 8.9. Observed values fall on both sides of the curve (Fig. 5), regardless of the value of the abscissa (GSTm). This means that mp/ms values are neither too large nor too low for a given value of GSTm, suggesting the independence of pollen flow from levels of seed flow and hence no tendency for pollen flow to compensate low levels of seed flow.
Alternative ways to measure differentiation: GST vs. NST vs. RST
For 140 entries, both GST and NST could be estimated (118 with maternally inherited markers and 22 with paternally inherited markers). On average, NST is higher than GST(overall mean is 0.69 compared to 0.65 for maternally inherited markers, and 0.23 instead of 0.15 for paternally inherited ones; the mean GST values differ slightly from those in Table 2 given that they are based on a subset of all species). There are 99 cases (71%) where NST > GST, and the difference between the two estimates is significant (at P < 0.05) in 41 cases (including 32 cases based on maternally inherited markers and nine based on paternally inherited markers; Fig. 6). By contrast, a single case study generated a GST significantly higher than its corresponding NST. For maternally inherited markers, there is a tendency for NST to depart more from GST at increasing levels of differentiation: there were 26 out of 79 (33%) significant tests when GST > 0.5, compared to six out of 39 (15%) for GST < 0.5. However, this trend did not hold for paternally inherited markers.
There were 20 data sets where RST could be estimated. All are based on chloroplast microsatellite (cpSSR) data (16 conifers and four angiosperms). The average RST is higher than NST, itself higher than GST, and differences are often great (RST higher than GST by an average of 0.1) (Table 3). RST is significantly higher than GST in nine out of the 20 cases (Table 3).
Table 3. Comparison of differentiation estimates for cpSSR data
Asterisks indicate that NST(or RST) is significantly higher than GST (P < 0.05).
Our survey of population studies of organelle DNA in plants includes data from 183 species, which should come close to an exhaustive compilation of the data available in spring 2004. The first population genetic studies based on oDNA that were suitable for inclusion (i.e. that had sampled different individuals within several populations without bulking them in the molecular analyses) were by Banks & Birky (1985) on Lupinus texensis (98 individuals in 15 populations) and by Neale et al. (1986) in Hordeum vulgare (229 individuals in 21 populations, excluding cultivated accessions). Although a high level of subdivision was already apparent in these studies, direct estimates of GST for oDNA markers were first calculated in the 1990s (Kremer et al. 1991; in Quercus robur and Quercus petraea), making it clear that subdivision at cpDNA markers could be considerably larger than at nuclear ones.
For this compilation, measures of oDNA population structure are based in most cases on our re-analyses of the original raw data sets to improve comparability across studies. Indeed, different estimators of genetic differentiation have often been used in the literature, somehow restricting the interest of comparative studies (e.g. Nybom & Bartish 2000; Nybom 2004; but see Morjan & Rieseberg 2004). In principle, the type of molecular technique used could also affect measures of subdivision of diversity for a given genome. For instance, selection could affect estimates of genetic subdivision at protein markers (McDonald 1994), high mutation rates could affect subdivision at SSR markers (Hedrick 1999) and anonymous detection techniques such as RAPD or AFLP could include not only fragments from the nuclear genome but also oDNA fragments, resulting in upwardly biased measures of nuclear subdivision if undetected (Aagaard et al. 1995). However, the differences in levels of subdivision across markers found in the present paper were limited and seemed to be the result of other factors (such as bulked analyses of DNA samples, which were subsequently removed from the analyses) or involved only a small fraction of the studies (in the case of nuclear differentiation). Previous comparisons have shown some discrepancy between estimates of subdivision based on different nuclear markers for a given species (e.g. Nybom 2004) but no systematic bias has been reported (Vandewoestijne & Baguette 2002; Nybom 2004). As a consequence, we do not expect that the main conclusions of our studies could be affected by the heterogeneity of techniques used, although the field would certainly gain from further harmonization and standardization of techniques and methods of data analyses.
We also computed standard errors of examined parameters to assess the effects of sampling on the precision of estimates. As expected, the total number of sampled individuals is the best predictor of the precision of GST. However, we could also confirm an earlier observation (Pons & Petit 1995) that the precision of parameter estimates is more affected by the number of populations sampled than by the sampling of individuals within populations. This suggests that future studies on the geographical structure of plant populations based on oDNA should allocate most efforts to the sampling of as many populations as possible, even at the expense of within-population sampling. Sequential approaches first analysing simultaneously (i.e. bulking) several individuals from the same population followed by separate analyses of each individual when within-population variation is detected, do not appear to be advisable, given the prevalence of intrapopulation diversity. As a matter of fact, complete fixation (GST = 1) was detected in only 11 out of 152 studies based on maternally inherited markers.
The finding that the portion of the range sampled affects GST estimates confirms similar results by Morjan & Rieseberg (2004) with data from both plants and animals. Several studies have shown that differentiation generally increases with distance (e.g. Dumolin-Lapègue et al. 1997; Grivet & Petit 2002a; Palmé & Vendramin 2002; Heuertz et al. 2004), so detecting higher GST values in species-wide surveys makes sense. It would be desirable that future studies provide complete curves of differentiation as a function of distance, instead of single estimates of measures of differentiation, allowing standardization of GST to a common reference distance for comparison purposes, as recently performed in an analysis of mtDNA in vertebrates (Martin & McKay 2004).
In the present paper, the genera with the highest number of species studied at maternally inherited markers were Pinus and Quercus (11 and 10, respectively, out of a total of 144 species). Although these genera might be expected to bias overall estimates, averaging across genera instead of species changed mean GST estimates by less than 1%, so the values presented here should be representative, at least taxonomically. However, the finding that the taxonomic identity of species explains a considerable amount of variation in GST does challenge the significance of previously identified relationships between levels of genetic differentiation, life history traits or ecological attributes. Because GST values of species belonging to the same genus or family tend to be similar, analysing them independently increases the risk of statistical pseudoreplication. In fact, a recent study relying on phylogenetically informed analytical methods has demonstrated that the use of direct species comparisons without consideration of their phylogenetic relationships might result in many false positives when seeking to identify relations between GST and life history traits or ecological attributes of species (Aguinagalde et al. 2005).
The most striking contrast between GST estimates based on markers from different plant genomes is that between maternally inherited markers and markers having other modes of inheritance (biparental or paternal). Compared to biparentally inherited nuclear markers, maternally inherited oDNA markers display not only larger values of GST but also more heterogeneous values, occupying the full spectrum between little population structure (although cases below 0.1 are unusual: only two examples were found in this review) and near complete fixation (which is not rare, even in species that have been well-sampled and which display sufficient levels of total diversity, see, e.g. Grivet & Petit 2002b or Burban & Petit 2003). There is more heterogeneity in GST estimates based on paternally inherited markers than for those based on biparentally inherited ones, which might be attributed to the greater action of drift on effectively haploid genomes.
The lower levels of differentiation observed at nuclear genes in conifers compared to angiosperms merely confirm earlier studies (e.g. Hamrick et al. 1992), but the fact that this is matched by greater (and not lower) levels of differentiation at maternally inherited markers is a first indication that measures of genetic differentiation at different markers need not covary positively. In fact, no correlation was found between GST at nuclear loci and maternally inherited oDNA genes. On the other hand, a strong covariation of GST across genomes was observed between cpDNA and mtDNA in angiosperms, where both are generally maternally inherited. Strictly co-transmitted uniparentally inherited genomes represent a single ‘locus’ and should be characterized by similar levels of differentiation (Dumolin-Lapègue et al. 1998; Desplanque et al. 2000; Olson & McCauley 2000; Belahbib et al. 2001; Huang et al. 2001). In conifers, paternally and biparentally inherited markers also have very similar levels of genetic structure, as shown by the relatively tight positive correlation between the two sets of GST estimates. This makes sense as both the cpDNA and nuclear genomes are moved by pollen and by seeds, i.e. they use the same vehicles to achieve gene flow (note that paternal mode of transmission does not imply transmission by pollen only, as sometimes stated in the literature, e.g. Morjan & Rieseberg 2004). The only difference is that paternally inherited genes will experience dispersal by pollen at each reproductive cycle, whereas only 50% of the nuclear genes will experience gene flow by pollen in a given cycle (all genes experience gene flow by seeds). However, GSTb is often ‘too large’ (or GSTp‘too low’), compared to equilibrium conditions that predict that GSTb≤GSTp(e.g. Petit et al. 1993). One possible interpretation for this observation has been proposed by Petit et al. (1993; see their Fig. 2): as cpDNA is effectively haploid, its effective population size is lower than that of a nuclear gene (Birky et al. 1983, 1989). Hence, GST values at cpDNA markers will reach equilibrium faster than GST values at nuclear genes, resulting in transient situations where GSTp < GSTb (assuming that for both genomes the initial situation is characterized by larger-than equilibrium GST values).
The median mp/ms value across all species was c. 17, suggesting that gene flow through pollen is quantitatively much more important than through seeds. From these results, one might conclude that gene flow is generally asymmetrical in seed plants, with most species relying predominantly on pollen for gene exchanges. However, there are 27% of plants with mp/ms estimates below five, indicating that seeds can also play a significant role in overall gene flow (i.e. > 20% of total; typical examples from this category would include insect-pollinated and fleshy-fruited species, for which pollen dispersal is limited while frugivorous vertebrates render long-distance seed dispersal relatively common, e.g. Oddou-Muratorio et al. 2001).
Comparisons between various GST estimates indicate departure from migration-drift equilibrium, from neutrality or from an island model of population structure, so these conclusions are at best very rough indications and could even be misleading. Furthermore, the absence of correlation between migration ratios derived by contrasting either GSTp or GSTb with GSTm indicates that these estimates do suffer from low stability. Nevertheless, given the large variation in GST values observed across species, the mp/ms ratio should provide an approximate idea of the relative importance of the two components of historical (rather than current) gene flow. Morjan & Rieseberg (2004), while acknowledging the unrealistic assumptions made in such studies, have similarly concluded that estimates of number of migrants (Nm) do not appear to be so biased as to mask expected interspecific trends in patterns of gene flow. Moreover, an earlier study based on two different methods to estimate this ratio yielded similar estimates, thus providing some support for this approach (Oddou-Muratorio et al. 2001).
In principle, it could be of interest to check if species having high (or low) pollen dispersal ability simultaneously tend to have high (or low) seed dispersal ability. We are not aware of the question having ever been raised so far. In the near future, direct estimates of dispersal distances obtained by parentage analysis might provide an answer to this question. In the meantime, we have used an indirect approach, susceptible to the same criticisms made previously, to check for the existence of some kind of compensation between both forms of gene flow. For this purpose, we built a null hypothesis that mp is independent of ms, i.e. that it does not increase when GSTm increases (i.e. when ms decreases). This scenario was fitted on the data of Fig. 5, indicating no visible trend besides the logical increase of mp/ms with decreasing ms (i.e. increasing GSTm). In other words, no sign exists that would indicate either compensation or positive covariation between gene flow through pollen and through seeds.
Our last analysis consisted in testing whether the presence of a phylogeographical component of population structure is a common phenomenon in plants. By this we mean an additional component of geographical structure not seen when considering only differences in allelic frequencies between populations. It is obtained by subtracting standard GST(or FST) estimates from measures such as NST(e.g. Pons & Petit 1996) or RST (Slatkin 1995) that take into account similarities between haplotypes. In the frame of an island model, the finding of a significant phylogeographical component would provide support for historical (non equilibrium) population genetic structure, as RST and FST (= GST) are expected to converge to the same values under equilibrium conditions (Slatkin 1995) but it could equally point to an equilibrium situation in an isolation-by-distance model, as RST is predicted to be higher than FST in this case (Rousset 1996; see his Fig. 1).
Regardless of the cause for the difference between estimators, it appears that measuring genetic differentiation without including information on similarities between haplotypes results in the loss of useful information. In fact, in many cases NST > GST, and the trend is even stronger for RST. The same result was found on the basis of a smaller data set analysed by Petit et al. (2003) and Aguinagalde et al. (2005). In contrast, earlier studies by other authors had often questioned the relevance of RST estimates, in view of its higher variance compared to other estimators of FST (e.g. Streiff et al. 1998; Gaggiotti et al. 1999; Balloux et al. 2000; Balloux & Goudet 2002; Balloux & Lugon-Moulin 2002; Kalinowski 2002; Neigel 2002). Our findings clearly support the use of NST or RST for estimating organelle population genetic structure, especially at broad geographical scales. This fits with theoretical findings that have shown that ‘the relative efficiency of allele size-based vs. allele identity-based statistics depends on the relative contributions of mutations vs. drift to population differentiation’ (Hardy et al. 2003), and hence on the spatio-temporal scale considered.
In any case, the use of these estimators further emphasizes the high level of geographical structure found in most plant species when using maternally inherited markers, in stark contrast with biparentally inherited nuclear markers. This finding has attracted much interest for about 15 years, giving birth to a new discipline, plant phylogeography, and leading to the development of many important practical applications, such as traceability and ecocertification of forest products and the identification of plant populations for conservation.
The research was supported be the EC research program FAIR5-CT97-3795 (CYTOFOR) to RJP, SF and GGV and by a grant from the Bureau des Ressources Génétiques to RJP. AH received financial support from the FPI grant PB98-1144 of the Spanish Ministry of Science and Technology. We thank the following colleagues who have contributed unpublished data or complementary information for inclusion in the database: Richard Abbott, Itziar Aguinagalde, Erika Aguirre Planter, François Balfourier, Guillaume Besnard, Christiane Bittkau, Isabelle Bonnin, Jean Bousquet, Linda Broadhurst, Steven Brunsfeld, Margaret Byrne, Henri Caron, Stephen Cavers, Catherine Clark, Els Coart, Rosane Collevatti, Salvatore Cozzolino, Dan Crawford, Mitch Cruzan, Nicolas Devos, Alexis Ducousso, Cyril Dutech, Richard Ennos, Colin Ferris, Isabelle Gamache, Leonardo Gallo, Pauline Garnier-Géré, Felix Gugerli, Myriam Heuertz, Juan Pablo Jaramillo, Andy King, Marcus Koch, Monika Konnert, Antoine Kremer, Martin Lascoux, Sascha Liepelt, Catarina Lira, Andy Lowe, Roselyne Lumaret, Carlos Magni, Alessio Mengoni, Jose Gabriel Segarra Moragues, Nicole Muloko-Ntoutoume, Gerhard Müller-Starck, Brigitte Musch, Sylvie Oddou-Muratorio, Matt Olson, Anna Palmé, Daniel Piñero, Daniel Prat, Jim Provan, Olivier Raspé, Sarah Rendell, Bryce Richardson, Joëlle Ronfort, Stéphanie Roux, Luis Gil Sanchez, Pierre Saumitou-Laprade, Vladimir Semerikov, Tim Sharbel, Marco Soliva, Christoph Sperisen, Ivana Stehlik, Naoki Tani, Michèle Tarayre, Yoshihiko Tsumura, Lisa Wallace and Birgit Ziegenhagen.
Table S1. Measures of population subdivision at cpDNA and mtDNA markers in seed plants
Table S2. Measures of population subdivision at nuclear markers for plants from Table S1
Rémy J. Petit is studying population genetics and evolution of trees, with a special focus on phylogeography, organelle genetics and evolution, genetic diversity, palaeogenetics, and conservation genetics of trees. Jérôme Duminil is a PhD student at INRA, working on comparative studies of genetic diversity in plants. He is particularly interested in ecological, life historical and chorological features of species that can explain the distribution of genetic diversity. He is also using comparative methods to study organelle genome and sequence evolution. Silvia Fineschi is interested in the phylogeography and ecological genetics of angiosperm tree species. Daniela Salvini is a PhD student working on ecological and spatial genetics of white oaks, with particular interest on the study of gene flow and environmental features as factors shaping the natural genetic diversity of plant populations. Giovanni G. Vendramin's studies focus on conservation and population genetics, with particular emphasis on range-wide phylogeography and fine-scale population gene dynamics. Arndt Hampe has recently completed his PhD on the phylogeography and reproductive ecology of a climatic relict tree. He is now working on plant–animal interactions using both ecological and genetic approaches. The collaboration between the authors on this topic has started in 1998, during a stay of SF in Bordeaux, when the database was initiated. Since then, numerous exchanges, especially during the EU CYTOFOR project and then during stays of DS and AH in Bordeaux and of JD in Sevilla, have given further impetus to this work.