- Top of page
- Literature cited
Genetic methods are routinely used to estimate contemporary effective population size (Ne) in natural populations, but the vast majority of applications have used only the temporal (two-sample) method. We use simulated data to evaluate how highly polymorphic molecular markers affect precision and bias in the single-sample method based on linkage disequilibrium (LD). Results of this study are as follows: (1) Low-frequency alleles upwardly bias , but a simple rule can reduce bias to <about 10% without sacrificing much precision. (2) With datasets routinely available today (10–20 loci with 10 alleles; 50 individuals), precise estimates can be obtained for relatively small populations (Ne < 200), and small populations are not likely to be mistaken for large ones. However, it is very difficult to obtain reliable estimates for large populations. (3) With ‘microsatellite’ data, the LD method has greater precision than the temporal method, unless the latter is based on samples taken many generations apart. Our results indicate the LD method has widespread applicability to conservation (which typically focuses on small populations) and the study of evolutionary processes in local populations. Considerable opportunity exists to extract more information about Ne in nature by wider use of single-sample estimators and by combining estimates from different methods.
- Top of page
- Literature cited
Effective population size (Ne) is widely regarded as one of the most important parameters in both evolutionary biology (Charlesworth 2009) and conservation biology (Nunney and Elam 1994; Frankham 2005), but it is notoriously difficult to estimate in nature. Logistical challenges that constrain the ability to collect enough demographic data to calculate Ne directly have spurred interest in genetic methods that can provide estimates of this key parameter, based on measurements of genetic indices that are affected by Ne (reviewed by Wang 2005). Although some early proponents suggested that indirect genetic estimates of Ne would only be useful in cases where the natural population was so large it could not be counted effectively, it was subsequently pointed out that these methods have much greater power if population size is small. Indeed, the rapid increase in applications in recent years has been fueled largely by those interested in conservation issues or the study of evolutionary processes in local populations that often are small (Schwartz et al. 1999, 2007; Leberg 2005; Palstra and Ruzzante 2008).
Estimates of contemporary effective size (roughly, Ne that applies to the time period encompassed by the sampling effort) can be based on either a single sample (Hill 1981; Pudovkin et al. 1996) or two samples (Krimbas and Tsakas 1971;Nei and Tajima 1981). The two-sample (temporal) method, which depends on random changes in allele frequency over time, has been by far the most widely applied, and it was the only method considered in a recent meta-analysis of genetic estimates of Ne in natural populations (Palstra and Ruzzante 2008). This is a curious result, given that every temporal estimate requires at least two samples that could each be used to provide a separate, single-sample estimate of Ne. Furthermore, whereas the amount of data used by the temporal method increases linearly with increases in numbers of loci (L) or alleles (K), the amount of data used by the most powerful single-sample estimators increases with the square of L and K. This suggests that, given the large numbers of highly polymorphic molecular markers currently available, there is a large, untapped (or at least under-utilized) resource that could be more effectively exploited to extract information about effective size in nature.
How is precision affected by factors under control of the investigator (L, K, number of individuals sampled) and those that are not [true (unknown) Ne]?
What effect do rare alleles have on precision and bias?
What practical guidelines can help balance tradeoffs between precision and bias?
Under what conditions can the LD method provide useful information for practical applications? If Ne is small, how often does the method mistakenly estimate a large Ne? If Ne is large, how often does the method mistakenly estimate a small Ne?
What kind of performance can we expect when data consist of a very large number of diallelic, single-nucleotide-polymorphism (SNP) markers?
How does performance of the LD method compare to other methods for estimating contemporary Ne?
- Top of page
- Literature cited
Genotypic data were generated for ‘ideal’ populations (constant size, equal sex ratio, no migration or selection, discrete generations, and random mating and random variation in reproductive success) using the software EasyPop (Balloux 2001). One thousand replicate populations were generated for each size considered (N = 50, 100, 500, 1000, 5000 ideal individuals). In the standard parameter set, each simulated individual had data for L = 20 independent gene loci, which had a mutational model approximating that of microsatellites (mutation rate μ = 5 × 10−4; k-allele model with A = 10 possible allelic states; see Table 1 for a definition of notation). In some runs, we used 5, 10, or 40 loci and/or 5 or 20 alleles per locus. Each simulation was initiated with maximal diversity (initial genotypes randomly drawn from all possible allelic states) and run for successive generations until the mean within-population expected heterozygosity (HE) reached 0.8 (comparable to levels found in many studies of natural populations using microsatellites). Simulations with N = 5000 used a lower mutation rate (μ = 5 × 10−5) because μ = 5 × 10−4 leads to mutation–drift equilibrium values of HE that are larger than 0.8. After the HE = 0.8 criterion was met, samples of S = 25, 50, 100, or 200 (for N ≥ 200) individuals were taken in the final generation. As the populations were ‘ideal,’ apart from random sampling errors the effective size and census size were the same (more precisely, for otherwise ideal populations in species with separate sexes, Ne ≈ N + 0.5; Balloux 2004).
Table 1. Notation used in this study.
|N||Population size, equal to the number of ideal individuals|
|Ne||Effective population size per generation|
|Nb||Effective number of breeders in a specific time period|
|An estimate of effective size based on genetic data|
|LD||Denotes the linkage disequilibrium method for estimating Ne|
|T||Denotes the temporal method for estimating Ne|
|CV||Coefficient of variation|
|S||Number of individuals sampled for genetic analysis|
|L||Number of (presumably independent) gene loci|
|A||Maximum number of allelic states for a gene locus|
|K||Actual number of alleles at a locus|
|Pcrit||Criterion for excluding rare alleles; alleles with frequency <Pcrit are excluded|
|n||Total number of independent allelic combinations (degrees of freedom) for the LD method (given by eqn 1)|
|n′||Total number of independent alleles (degrees of freedom) for the temporal method (given by eqn 5)|
|t||Elapsed number of generations between samples in the temporal method|
|Vk||Variance among adults in lifetime contribution of gametes to the next generation|
The LD method is based on the following theoretical relationship between and Ne (Hill 1981):
Thus, has two components: one due to drift (1/3Ne) and one to sampling a finite number of individuals (1/S). Subtracting the expected contribution of sampling error produces an unbiased estimate of the drift contribution to LD, which can be used to estimate Ne:
- ( (2a))
Equation (2) is only approximate as it ignores second-order terms in S and Ne, which can lead to substantial bias in . Therefore, the adjusted expectations for the drift and sampling error components of developed by Waples (2006), as implemented in the software Ldne (Waples and Do 2008), were used to calculate and estimate effective size. To assess possible biases from numerous low-frequency alleles, was computed separately after excluding alleles with frequencies below the following cutoffs: Pcrit = 0.1, 0.05, 0.02, 0.01. With S = 25, the lowest possible allele frequency is 1/(2S) = 0.02, which means that for this sample size Pcrit = 0.02 and 0.01 both fail to screen out any alleles that actually occur in the population. Therefore, for S = 25 we used Pcrit = 0.03 rather than 0.02; this provided a contrast between the criterion 0.01 (which allows all alleles) and 0.03 (which excludes only alleles that occur in a single copy).
This expression assumes that the loci are not physically linked and that S and K are constant across loci. Our simulations used unlinked loci and constant sample sizes, and variation in the actual number of alleles per locus was relatively small.
Table 2. Percentage of estimates for the LD method that fell outside the indicated lower and upper bounds relative to nominal Ne = N.
|N||S||Lower bound||Pcrit||Upper bound||Pcrit|
|L = 20|
|S = 50|| |
For comparative purposes, an analog to eqn (3) for the moment-based temporal method is (modified from Pollak 1983, Equation 29, to reflect current notation):
where the subscript T denotes the temporal method. In eqn (4), lower case t is the number of generations between samples and n′ is the number of independent alleles for the temporal method, which is given by
- Top of page
- Literature cited
It seems clear that previous efforts to estimate effective size in natural populations have not extracted as much information as possible from genetic data. Any application of the temporal method that collects multilocus genotypic data provides an opportunity to obtain at least two estimates of Ne from individual generations using the LD method or one of the other single-sample estimators, but relatively few have taken advantage of this opportunity.
The simulation program used here (EasyPop) differs in some important ways from the one used to generate data to develop the empirical bias correction for the LD method (Waples 2006). In particular, the original program had no mutation and considered only diallelic loci at moderate allele frequency, whereas EasyPop has an explicit mutation model and generates data with a wide range of allele frequencies and numbers of alleles per locus. The new simulated data thus represent an independent assessment of the bias-corrected LD estimator – and a more realistic assessment of performance with highly polymorphic markers currently in widespread use. In summarizing important results of our evaluations, we return to the specific questions posed in the Introduction before closing by discussing a few related issues.
Factors affecting precision and bias
The LD method benefits from the fact that the amount of information increases with the square of the numbers of loci and alleles, so efforts to capitalize on ready availability of highly variable markers can pay large dividends. Within the range of values of practical interest to most investigators, the same proportional increases in numbers of loci, alleles per locus, or individuals sampled should have roughly comparable effects on precision, and this result (along with the quantitative expression for CVLD in eqn 3) can be used to guide experimental design decisions. Although each SNP locus provides much less precision than a typical microsatellite, this can be overcome by brute force if enough new independent loci can be developed. Figure 1 indicates that about 180 SNP loci can be expected to provide precision comparable to that attained by about 10–20 typical microsatellite loci; this might seem like a lot, but techniques to develop thousands of SNP loci are rapidly advancing and declining in cost (Morin et al. 2004; Xu et al. 2009). As discussed below (Key assumptions), however, an application using a very large numbers of SNP loci should be accompanied by a careful analysis of assumptions of independence and neutrality.
Rare alleles tend to upwardly bias LD estimates of Ne, just as they do for the temporal method (Turner et al. 2001), but in many cases the effect is not too severe. This means that large numbers of alleles typically can be allowed into the analysis to boost precision without substantially increasing bias. For most applications, a good rule of thumb is to screen out any alleles at frequency <0.02, as well as any alleles that occur in only a single copy in the sample (see Nielsen and Signorovitch 2003, for discussion of effects on of using singletons from SNP data). Using this criterion, something close to maximum precision can be achieved while (in most cases) keeping bias to less than about 10% (Fig. 6). With large samples (S∼ 100 or larger), alleles with frequency as low as 0.01 can probably be used.
All genetic methods for estimating contemporary depend on a signal that is a function of 1/Ne, so these methods are most powerful with small populations (for which the signal is strong) and have difficulty distinguishing large populations from infinite ones (because the signal is so small). This effect is amply demonstrated for the LD method in Figs 1 and 3 and Table 2. With amounts of data commonly available today (samples of about 50 individuals; 10–20 microsatellite-like loci), quite good precision can be obtained for populations with relatively small effective sizes (about 100–200 or less). For very small populations (Ne less than about 50), small samples of only 25–30 individuals can still provide some useful information. These results are encouraging, as conservation concerns typically focus on populations that are (or might be) small, and modern molecular methods have facilitated an increasing interest in studying evolutionary processes in local populations in nature.
In contrast, estimating effective size with any precision in populations that are large (Ne∼ 1000 or larger) is very challenging. In general, a small sample of individuals (or a moderate or large sample based on only a few gene loci) will not provide much useful information about Ne in large populations, and even with relatively large samples of individuals and loci it might not be possible to say much about the upper bound to . In theory, with arbitrarily large numbers of loci and alleles (as might routinely be achievable in the future), it should be possible to produce estimates that place tight bounds even on the upper limit to in large populations (cf. Fig. 1). However, because the drift signal is so small for large populations, researchers who want to estimate Ne in populations that are or might be large should pay careful attention to various sources of noise in the analysis (slight departures from random sampling; data errors; violation of underlying model assumptions) that can have a disproportionate effect on results. In this respect, estimating contemporary Ne in large populations using genetic markers is as challenging as, and suffers many of the same intrinsic limitations as, genetic estimates of dispersal in high gene flow species (Waples 1998; Fraser et al. 2007). Fortunately, because the LD signals for large and small populations are quite different (Fig. 3), estimates based on even moderate amounts of data should be able to provide a useful lower bound for Ne, and this can be important, particularly in conservation applications where a major concern is avoidance and/or early detection of population bottlenecks.
Based on extensive computer simulations, Russell and Fewster (2009) reached a rather pessimistic conclusion about practical usefulness of the LD method. However, two factors make their results difficult to interpret in the present context. First, they presented quantitative results only for the original LD method (Hill 1981) which, when the ratio S/Ne is small, has been shown to produce an estimate that is more closely related to the sample size than to the true effective size (England et al. 2006; Waples 2006). Second, Russell and Fewster (2009) assessed bias by comparing arithmetic mean to the true Ne. Because of the inverse relationship between and (eqn 2a), this has the unfortunate consequence that if is a completely unbiased estimator of r2, arithmetic mean will be upwardly biased. Results in Table 2 and Figure 3 show how upwardly skewed the distribution of can be, in which case the arithmetic mean is not a useful indicator of central tendency. Here, we have followed the approach used by Nei and Tajima (1981), Pollak (1983), Waples (1989), Jorde and Ryman (2007), Nomura (2008), and Wang (2009), all of whom evaluated bias in terms of harmonic mean (or, equivalently, used the overall mean or temporal across replicates to compute an overall ). Importantly, this approach can readily accommodate negative or infinite values in individual replicates (see next section).
Negative estimates and nonsignificant LD
Many software packages provide tests of statistical significance of LD for each pair of loci or across all loci. Although these tests vary in the way they assess significance and combine information across multiple alleles and loci, in general they are testing the hypothesis that the observed LD can be explained entirely by sampling error. A nonsignificant test for LD, therefore, indicates that the null hypothesis (H0: ≤ 1/S) cannot be rejected, which implies that the upper bound of would include infinity. That is, a nonsignificant test provides no evidence for drift, which is not the same as saying no drift occurs (in fact, all finite populations have some contribution to from drift, and, assuming the test is valid, that drift component should become statistically significant if enough data are collected). So, for reasons discussed in the previous paragraph, even a dataset with a nonsignificant LD result can potentially provide useful information about effective population size.
Like other Ne estimators, the LD method assumes that of the four evolutionary forces (mutation, migration, selection and genetic drift), only drift is responsible for the signal in the data. Although mutation rate strongly affects estimates of long-term Ne, it probably is of little consequence for the LD method, apart from its role in producing genetic variation. Selection can cause nonrandom associations of genes at different gene loci, just as it can influence rates of allele frequency change, but it might be reasonable to assume that it has relatively little influence on LD measured in microsatellite loci. The neutrality assumption should be evaluated more rigorously, however, if large numbers of SNP loci are used. Vitalis and Couvet (2001) proposed a method to jointly estimate Ne and migration rate. Immigration of genetically differentiated individuals from other populations leads to mixture disequilibrium (Nei and Li 1973) that could downwardly bias LD estimates of local Ne; conversely, high migration rates among weakly differentiated populations could cause local samples to provide an estimate closer to the metapopulation Ne than the local Ne (because the sample is drawn from a larger pool of potential parents). Unpublished data (P. England, personal communication) indicate that under equilibrium migration models, the former effect is small and the latter effect is substantial only for migration rates that are high in genetic terms (∼10% or higher) – suggesting that under many natural conditions the LD method can provide a robust estimate of local (subpopulation) Ne. However, upward biases in might be more important in small subpopulations that are part of a metapopulation, as in that case even a few migrants per generation could represent a relatively high migration rate.
The LD method as implemented here assumes that loci are independent (probability of recombination = 0.5). This is probably a reasonable assumption in most current situations, given the numbers of markers typically used in studies of natural populations. However, some taxa (e.g., Drosophila) have only a few chromosomes and/or regions of the genome in which recombination is suppressed, and in the future LD estimates might be generated using thousands of SNP or other markers. In such cases, therefore, issues related to recombination rate would have to be re-evaluated. Linked markers actually provide more power, providing the recombination rate is known (Hill 1981). The LD method provides information primarily about Ne in the parental generation, but residual disequilibrium from a recent bottleneck can affect the estimate for a few generations (Waples 2005, 2006). If loci are closely linked, estimates from the LD method will be more strongly influenced by Ne in the distant past (see Tenesa et al. 2007, for an application to human SNP data).
The theoretical relationship between and Ne assumes either random mating without selfing or random mate choice with lifetime monogamy (Weir and Hill 1980; Waples 2006). The populations do not have to be ideal; the method still performs well with highly skewed sex ratios and overdispersed variance in reproductive success (Waples 2006). However, strongly assortative mating or widespread selfing would be expected to lead to biases that have not been quantitatively evaluated. Genotyping errors can also affect estimates of LD (Akey et al. 2001). Russell and Fewster (2009) found an upward bias in for the standard LD method (Hill 1981) when 1% allelic dropout was modeled, and this topic bears further study.
Finally, the underlying model for the LD method assumes discrete generations, and this is the only situation where the resulting estimate can be interpreted as effective size for a generation (Ne). Most natural populations do not have discrete generations; when samples are taken from age-structured species, the resulting estimate from the LD method can be interpreted as an estimate of the effective number of breeders (Nb) that produced the cohort(s) from which the sample was taken. The relationship between and Ne in age-structured species has been evaluated for the temporal method (Waples and Yokota 2007), but comparable evaluations have not been made for any single-sample estimator. A reasonable conjecture is that if the number of cohorts represented in a sample is roughly equal to the generation length, the estimate from the LD method should roughly correspond to Ne for a generation, but this remains to be tested.
Comparison with other methods
As illustrated in Fig. 7, with samples of individuals, loci, and alleles routinely available today, the LD method should generally provide better precision than the temporal method, unless samples for the latter are spaced a large number of generations apart.
Several other one-sample estimators of Ne have been proposed, although direct comparisons of performance have generally not been made with the LD method. The heterozygote excess method is generally much less precise than other single-sample estimators (Nomura 2008; Wang 2009) and is best suited for analyzing small populations of species with Type III survivorship for which large samples of offspring are possible (Hedgecock et al. 2007; Pudovkin et al. 2009). A single-sample ABC estimator (OneSamp; Tallmon et al. 2008) appears to have considerable potential but has not been rigorously evaluated under a wide range of conditions and assumes a specific type of mutation model that makes it useful only for microsatellite data. Two new methods, based on the analysis of molecular coancestry (Nomura 2008) and identification of full and half sibs (Wang 2009), each included some comparisons with some other Ne estimators. However, Nomura only considered populations with tiny Ne (<15) and only compared his new method to the heterozygote excess method, which was also the only single-sample estimator that Wang (2009) compared his new method to with simulated data.
Nevertheless, Wang did provide results for some analyses that are comparable enough to those conducted here that a quantitative comparison of the LD method and the sibship method is possible for a few parameter combinations. In Table 5 of his paper, Wang (2009) reported the root mean-squared error () for the quantity for simulations using random mating populations of constant size with equal sex ratio and 10–40 gene loci with eight alleles of initial equal frequency. That analysis involved a comparison with temporal samples taken in generations 3 and 5, so to get a single sibship-based estimate for each replicate Wang computed an estimate for both generations and took the average. For the parameters N = 200, S = 50, L = 20, for the sibship method was 0.0005. To allow a comparison, we simulated populations as described in Methods with N = 200, L = 20, A = 8, and drew samples after 10 generations – long enough for levels of LD to stabilize. The version of EasyPop we used does not allow sampling in two different generations, so we used the approach described above of taking a single sample of twice the size (i.e., we sampled 100 individuals once rather than 50 individuals twice). All else being equal, the two sampling schemes should provide roughly comparable precision. For our simulated datasets, we found that of was 0.0004, slightly lower than the value reported by Wang for his one-sample method and considerably less than the value he found (0.0015) for temporal samples separated by two generations. For the same set of simulated populations and sample size S = 100 (two samples of 100 for the sibship method, one sample of 200 for the LD method), we found of 0.00025 compared to 0.0003 reported by Wang. It would be a mistake to place too much emphasis on these results, given that tabular values in Wang (2009) are rounded off and that the direct comparisons that are possible cover only a small fraction of potential parameter space. Nevertheless, these data suggest that LD and sibship one-sample methods might have roughly comparable levels of performance as measured by some common indicators. A comprehensive comparison of performance of the LD, coancestry, and sibship methods would be useful.
Combining estimates across methods
Researchers who have reported estimates of Ne from more than one method too often have not taken advantage of another opportunity to increase precision – combining the estimates into a single estimator. Because all the estimators respond to a signal that is inversely related to Ne, an appropriate way to combine estimates across methods would be to take a weighted harmonic mean (Waples 1991). Ideally, the weights would be reciprocals of variances, which can be obtained for the moment-based LD and temporal methods from eqns (3) and (4), respectively. Combining data for these two methods could be particularly useful for large populations, as the temporal method is somewhat less sensitive to large N. Appendix A provides a worked example of how effective size estimates can be combined, both within and across methods. Additional work would be needed to determine the most appropriate way to weight estimates from different single-sample estimators. However, Nomura (2008) showed that considerable improvements in performance can be obtained even by taking an unweighted harmonic mean of from the heterozygote excess and molecular coancestry methods.
Some cautions are important to keep in mind here. First, which time period(s) each estimate applies to needs careful consideration. Each of the single-sample estimators is most closely related to inbreeding Ne and provides an estimate of the effective number of breeders (Nb) that produced the sample (Waples 2005). Combining estimates from single-sample methods should therefore be straightforward, provided an appropriate weighting scheme can be developed. However, in general the single-sample and temporal methods do not provide estimates of Ne in exactly the same generations (Waples 2005). Each single-sample estimate relates to Ne in a single generation (or Nb for a particular time period), while a temporal estimate depends on the harmonic mean Ne in the entire interval spanned by the samples. If Ne does not vary too much over time and the primary interest is an overall estimate of effective size for the population, then it might be reasonable to simply combine the temporal and single-sample estimates with appropriate weights as discussed above. However, if the primary interest is Ne in specific generations, which might vary considerably, then careful consideration is needed to determine whether combining estimates is desirable.
Second, the benefits of combining estimates depend on the degree to which they provide independent information about effective size. Based on unpublished data cited by Waples (1991), the LD and temporal methods are essentially independent, but correlations among the other estimators have not been determined. Conducting these evaluations should be an important research priority.
Third, the different Ne estimators generally depend on similar, but not identical, suites of assumptions (as discussed above). It will generally be the case that not all of these assumptions are completely satisfied in any particular dataset, and the different estimators might behave in different ways in response to violation of these assumptions. Researchers should think carefully before combining estimates in cases for which good reasons exist to believe some key assumptions are strongly violated.