More precisely biased: increasing the number of markers is not a silver bullet in genetic bottleneck testing

Authors


Abstract

In response to our review of the use of genetic bottleneck tests in the conservation literature (Peery et al. 2012, Molecular Ecology, 21, 3403–3418), Hoban et al. (2013, Molecular Ecology, in press) conducted population genetic simulations to show that the statistical power of genetic bottleneck tests can be increased substantially by sampling large numbers of microsatellite loci, as they suggest is now possible in the age of genomics. While we agree with Hoban and co-workers in principle, sampling large numbers of microsatellite loci can dramatically increase the probability of committing type 1 errors (i.e. detecting a bottleneck in a stable population) when the mutation model is incorrectly assumed. Using conservative values for mutation model parameters can reduce the probability of committing type 1 errors, but doing so can result in significant losses in statistical power. Moreover, we believe that practical limitations associated with developing large numbers of high-quality microsatellite loci continue to constrain sample sizes, a belief supported by a literature review of recent studies using next generation sequencing methods to develop microsatellite libraries. conclusion, we maintain that researchers employing genetic bottleneck tests should proceed with caution and carefully assess both statistical power and type 1 error rates associated with their study design.

We thank Hoban et al. (2013) for their response to our review of the use of genetic bottleneck methods (M-ratio and heterozygosity-excess tests) based on microsatellite loci to test for declines in effective population size in conservation situations (Peery et al. 2012). Hoban and co-workers make three specific points: (i) increasing the number of loci typically yields greater improvements in the statistical power to detect population bottlenecks than increasing the number of genotyped individuals; (ii) increasing the number of microsatellite loci to several dozen, as Hoban co-workers suggest is feasible in the age of genomics, can increase statistical power to acceptable levels; and (iii) statistical power to detect historic bottlenecks is reduced in recovered populations. We agree with Hoban and co-workers’ first point that, in principle, sampling more loci is a more efficient way of increasing statistical power than sampling more individuals (e.g. doubling the number of loci will typically yield greater gains in power than doubling the number of individuals). In practice, however, gains in statistical power will be constrained by the ability to develop more microsatellite loci, and in some cases, it will be more feasible to sample a greater number of individuals. Hoban and co-workers’ third point is also an important consideration when testing for historic population bottlenecks in re-covered populations, although species that have re-covered to their original effective population sizes will generally be of lower conservation concern than species that show no signs of recovery.

However, we question Hoban and co-workers’ second point that sampling dozens of loci is a viable means for achieving reliable inference from genetic bottleneck tests. While in principle, sampling up to 50 loci can result in reasonable statistical power to detect a bottleneck, our first concern relates to the possibility that sampling this many loci will increase the probability of committing type 1 errors (i.e. detecting a bottleneck in a stable population) when the mutation model is incorrectly assumed. Genetic bottleneck tests can be highly sensitive to incorrect assumptions about the frequency of multistep mutations (pg) in microsatellites, and this mutation model parameter value is typically unknown in nonmodel species (Peery et al. 2012). Thus, investigators must often guess at this parameter value, but pg appears to vary considerably among species and assumed values are often incorrect. In fact, the mean assumed frequency of multistep mutations in the bottleneck studies we reviewed was 0.10, whereas the mean frequency of multistep mutations in vertebrates was estimated to be 0.22 (Peery et al. 2012). Sampling large numbers of microsatellite loci will certainly improve precision in estimates of the M-ratio and heterozygosity-excess, but will not necessarily improve accuracy when assumptions about mutation models are violated and estimates may in fact be ‘more precisely biased’. Hoban and co-workers did not consider this point as all of the simulations they conducted to assess statistical power were predicated on the assumption that mutation models were assumed correctly. Our second concern relates to the practicality of increasing the number of loci to several dozen in nonmodel species. Although next-generation-sequencing (NGS) methods with long read lengths appear promising with regard to identifying microsatellite loci in nonmodel species, the genotyping of microsatellite loci still needs to be conducted by ‘traditional’ PCR and gel capillary electrophoresis and therefore remains labour intensive. Hoban and co-workers acknowledge this limitation, but a closer examination of current trends in microsatellite discovery is warranted.

Below, we present the results of (i) additional simulation analyses which reveal that sampling large numbers of loci can dramatically increase type 1 error rates when mutation model parameter values are incorrectly assumed; and (ii) a review of published NGS studies that indicates that practical considerations still constitute a major limitation in the number of microsatellite loci that are typically genotyped in nonmodel species. Our results lead us to conclude that the reliability of inference from genetic bottleneck tests is unlikely to be improved by genotyping a large number of microsatellite loci.

Effect of incorrect mutation models on inferences from genetic bottleneck tests

Hoban and co-workers assumed that mutation model parameters such as pg were assumed correctly in all of their simulations that demonstrated an increase in statistical power can be achieved by genotyping a large number of microsatellite loci, and thus did not evaluate the effect of incorrect assumptions about mutation models on type 1 error rates. As discussed above, pg is likely to be incorrectly assumed for most nonmodel species. Underestimating pg can increase the probability of committing a type 1 error dramatically for M-ratio tests when only eight microsatellite loci are genotyped (Peery et al. 2012). The pattern is opposite with heterozygosity-excess tests, where overestimating pg increases type 1 errors (although to a lesser degree) when eight microsatellite loci are genotyped.

Effect of increasing the number of microsatellite loci on type 1 error rates

To illustrate the effect of sampling a large number of microsatellite loci on type 1 error rates, we conducted genetic bottleneck tests on data from simulated populations in mutation–drift equilibrium. We compared type 1 error rates for both a small (= 8) and a large (= 50) number of microsatellite loci when pg was assumed incorrectly. The smaller sample size reflected the median number of loci used in the bottleneck studies reviewed by Peery et al. (2012), and the larger sample size reflected the upper limit in terms of the number of microsatellite loci Hoban and co-workers considered realistic (notwithstanding the practical limitations of sampling 50 loci discussed below). We simulated equilibrium populations using the same coalescent methods as Peery et al. (2012). For all simulations, we assumed prebottleneck θ = 5, the median estimate of θ in the bottleneck studies reviewed by Peery et al. (2012; θ = 4 Ne μ in diploid organisms and μ = the per-generation mutation rate). Type 1 error rates were estimated as the proportion of 200 simulated populations that yielded significant genetic bottleneck tests. We first estimated type 1 error rates when the true value of pg (i.e. the value used to simulate the data) = 0, 0.10, 0.20, 0.30, 0.40 and 0.50, a range which approximates values reported in nonmodel vertebrates (Peery et al. 2012). For each value of true pg, we estimated type 1 error rates assuming pg (i.e. the value used to test for bottlenecks) = 0.10 and 0.22 in separate sets of simulations. These two values for assumed pg represented the most frequent values employed in bottleneck studies and the mean observed frequency of multistep mutations in nonmodel vertebrates, respectively (Peery et al. 2012). We set both the true and the assumed mean size of multistep mutations to 3.1 repeats given that inferences from genetic bottleneck tests are reasonably robust to deviations between true and assumed values for this parameter (Peery et al. 2012).

Our simulations indicated that sampling 50, as opposed to 8, microsatellite loci can significantly increase type 1 error rates when values of pg are assumed incorrectly in genetic bottleneck tests (Fig. 1). This finding has important implications for inferences drawn from genetic bottleneck tests because type 1 error rates due to mistaken assumptions about the mutation model can already be high even when only 8 microsatellite loci are sampled, particularly for M-ratio tests (Peery et al. 2012). For example, type 1 error rates in M-ratio tests increased from 0.19 to 0.78 when sampling was increased from 8 to 50 microsatellite loci and true and assumed pg equalled 0.20 and 0.10, respectively (Fig. 1a). A type 1 error rate of 0.78 is clearly unacceptable in hypothesis testing, and it is noteworthy that this error rate occurred with (i) only a modest departure between the true and assumed frequency of multistep mutations; and (ii) realistic values for the true frequency of multistep mutations (Peery et al. 2012). In principle, the likelihood of committing a type 1 error in M-ratio tests can be reduced by increasing the assumed frequency of multistep mutations. Nevertheless, type 1 error rates still increased substantially from 0.09 to 0.41 when sampling effort increased from 8 to 50 loci and true and assumed pg equalled 0.30 and 0.22, respectively (Fig. 1b). Such a departure from the true parameter value is modest and is certainly within the realm of possibility given the range (and uncertainty) of values estimated in nonmodel species (Peery et al. 2012).

Figure 1.

Estimated type 1 error rates for genetic bottleneck tests based on M-ratios and heterozygosity excess. Multilocus genotypes were simulated under mutation–drift equilibrium using coalescent methods assuming θ = 5. Type 1 error rates were estimated as the proportion of simulated populations that yielded significant genetic bottleneck tests. Assumed pg was the proportion of microsatellite mutations assumed to involve more than one repeat in bottleneck tests, and true pg was the proportion of microsatellite mutations that involved more than one repeat in the simulated data.

A similar, albeit somewhat weaker, pattern was observed when sampling effort was increased from 8 to 50 microsatellite loci in heterozygosity-excess tests. Overestimating the frequency of multistep mutations in heterozygosity-excess tests increased type 1 error rates from 0.13 to 0.39 with increased sampling when true pg = 0 (i.e. microsatellites mutate according to a stepwise-mutation model; SMM) and assumed pg = 0.10 (Fig. 1c). While many microsatellite loci likely do not evolve according to a strict SMM, type 1 error rates increased by a similar degree, from 0.09 to 0.36, when true pg = 0.10 and assumed pg = 0.22 (Fig. 1d).

We agree with Hoban and co-workers that, in principle, increasing the number of loci sampled can result in acceptable statistical power when the null hypothesis of drift–mutation is in fact false (at least when the decline is large, e.g. a postbottleneck Ne = 100). However, investigators testing for population bottlenecks will not know whether the target species has actually experienced a bottleneck and must also weigh the consequences of rejecting a true null hypothesis if a bottleneck has in fact not occurred. Our results indicate that increasing sampling intensity to levels that yield adequate statistical power can substantially increase the probability of incorrectly inferring a bottleneck in equilibrium populations when assumptions regarding the frequency of multistep mutations are incorrect. Moreover, increases in type 1 error rates can be high, even for quite modest deviations between the assumed and true values of this parameter.

Power is reduced when the proportion of multirepeat mutations is overestimated in M-ratio tests and underestimated in heterozygosity-excess tests

In principle, elevated type 1 errors rates associated with the combined effects of large sample sizes of loci and erroneous assumptions about mutation models could be mitigated by assuming large values for pg in M-ratio tests and small values for pg in heterozygosity-excess tests. However, reducing type 1 error rates by adopting conservative estimates of pg potentially comes at a cost to statistical power. To illustrate this issue, we simulated populations with prebottleneck θ = 5. (i.e. Ne = 2500, assuming μ = 5 × 10−4) and postbottleneck Ne = 100. More drastic population declines will often be evident from nongenetic data and, thus, genetic bottleneck tests will often be unnecessary to inform management decisions in more extreme cases. We evaluated power to detect immediate bottlenecks of this magnitude 10 generations after the bottleneck by sampling 35 individuals and 50 microsatellite loci, the number of loci representing the upper limited considered feasible by Hoban and co-workers (again, notwithstanding the caveats described below). We set true pg to 0.10 and 0.22 and assumed pg to 0 to 0.50 (increments of 0.10) in separate simulations to reflect the assumptions an investigator might make in an attempt to reduce the likelihood of committing a type 1 error when pg is unknown. As above, we set both the true and the assumed mean size of multistep mutations to 3.1 repeats.

For M-ratio tests, increasing assumed pg to 0.30 in an attempt to reduce the probability of committing a type 1 error resulted in a dramatic decline in statistical power even when 50 loci were sampled. Specifically, statistical power was <0.01 and 0.22 when true pg was 0.10 and 0.22, respectively, both estimates being far below what is necessary for making reliable inference (Fig. 2a). Statistical power to detect departures from equilibrium with M-ratio tests was much higher when assumed pg ≤ 0.20 (as suggested by Hoban and co-workers), but even modestly underestimating pg (assumed pg < true pg) can dramatically increase the probability of committing type 1 errors (Fig. 1a,b).

Figure 2.

Estimated statistical power for genetic bottleneck tests based on M-ratios and heterozygosity excess when 35 individuals and 50 microsatellite loci are sampled. Multilocus genotypes were simulated using coalescent methods assuming prebottleneck θ = 5, postbottleneck Ne = 100, and that the bottleneck occurred instantaneously 10 generations prior to sampling. Statistical power was estimated as the proportion of simulated populations that yielded significant genetic bottleneck tests. Assumed pg was the proportion of microsatellite mutations assumed to involve more than one repeat in bottleneck tests, and true pg was the proportion of microsatellite mutations that involved more than one repeat in the simulated data.

For heterozygosity-excess tests, assuming a relatively low value of pg at 0.10 in an attempt to reduce the likelihood of committing a type 1 error resulted in modest power (0.47–0.50) to detect departures from equilibrium (postbottleneck Ne = 100), even when 50 microsatellite loci were genotyped (Fig. 2b). Reducing assumed pg further to zero (i.e. assuming a strict SMM) resulted in low estimates of statistical power (≤0.23), as has been shown previously by Cornuet & Luikart (1996) and Luikart & Cornuet (1998). Even though statistical power was reduced to a lesser extent in heterozygosity-excess tests, statistical power to detect a population bottleneck of ~50% does not, in our opinion, justify the expense and effort necessary to genotype 50 microsatellite loci. Thus, our simulation results indicate that, even when it is possible to genotype a large number of microsatellite loci, elevated type 1 error rates will often require adopting conservative values of pg which in turn can compromise the ability to detect bottlenecks.

Practicality of sampling up to 50 loci for genetic bottleneck testing

We agree with Hoban and co-workers that studies using NGS methods to develop microsatellite libraries in nonmodel species typically report many more DNA sequences containing putative microsatellite loci than studies using traditional, cloning-based discovery methods. However, we are less optimistic that this newfound ease in microsatellite discovery necessarily will translate into the routine sampling of sufficient loci to yield high statistical power in genetic bottleneck testing. Even when NGS methods are used for microsatellite discovery, individual samples still need be genotyped using traditional PCR-based methods. For reasons described below, only a modest subset of putative microsatellite loci is typically genotyped, of which even fewer loci typically are sufficiently polymorphic and amplify in a consistent manner. We summarized the results from 21 recent papers involving 36 species in which microsatellite loci developed using NGS techniques were verified using PCR (Table 1). Reviewed papers included those cited in Hoban et al. (2013), top hits from a Google Scholar search conducted using the terms ‘next-generation sequencing microsatellite development’ on 12 March 2013, and references included within these studies. PCR was generally performed for only a small subset (median = 32) of the hundreds to thousands of loci detected, and an even smaller number (median = 12) of these microsatellite loci satisfied the conditions for population level genotyping. For example, of the 3840 microsatellite-containing sequences discovered by NGS in the coral snake Micrurus fulvius, 1871 were identified as ‘potentially amplifiable’, and of these, 1090 contained forward and reverse primers that were unique (Castoe et al. 2012). Of these 1090 candidate microsatellite loci, a much smaller subset (= 40) were screened using PCR, of which 20 were deemed consistent, and eventually only eight microsatellite loci proved polymorphic and used in a population study. Further considerations such as deviation from Hardy–Weinberg equilibrium (two to three of the eight coral snake microsatellite loci displayed deviations from Hardy–Weinberg expectations in two of the populations surveyed) and imperfect repeat motifs (which are problematic for use with the M-ratio test) will likely serve to exclude additional microsatellite loci in genetic bottleneck tests. To date, we have identified only a few published studies in which NGS-discovered microsatellite loci have been applied to conservation genetics research, but no apparent increase in the number of microsatellite loci is evident (median number of loci used = 8; Table 2).

Table 1. Summary of loci characterized in studies utilizing next-generation sequencing technology to identify microsatellite sequences
SpeciesNo. loci testedNo. loci polymorphicReferences
  1. ‘No. loci tested’ represents the number of microsatellites for which primers were generated and PCR was attempted in a given organism, while ‘No. loci polymorphic’ represents the number of microsatellites which were identified by the authors as being easily amplified and exhibiting >1 allele.

Euphydryas editha 7210Mikheyev et al. (2010)
Galeorhinus galeus 3213Chabot & Nigenda (2011)
Schizothorax biddulphi 3029Luo et al. (2012)
Mustelus henlei 2811Chabot (2012)
Hymenolaimus malacorhynchos 2413Abdelkrim et al. (2009)
Micrurus fulvius 408Castoe et al. (2012)
Paeonia lactiflora 38412Gilmore et al. (2013)
Callitropsis nootkatensis 968Jennings et al. (2011a)
Chamaecyparis lawsoniana 14411Jennings et al. (2011b)
Coturnicops noveboracensis 3024Jennings et al. (2011a)
Pyrrhura pfrimeri 5027Jennings et al. (2011b)
Euphydryas aurinia 9612Sinama et al. (2011)
Etheostoma okaloosae 3021Saarinen & Austin (2010)
Papaver rhoeas 3011Kati et al. (2013)
Typha minima 3017Csencsics et al. (2010)
Plectritis congesta 4811McEwen et al. (2011)
Raja pulchra 3214Kang et al. (2012)
Ceanothus roderickii 4810Burge et al. (2012)
Tetrao tetrix 10210Wang et al. (2012)
Lobaria pulmonaria 1614Schoebel et al. (2013)
Phytophthora multivora 2418Schoebel et al. (2013)
Phytophthora plurivora 3613Schoebel et al. (2013)
Phytophthora pini 138Schoebel et al. (2013)
Pinus cembra 3711Schoebel et al. (2013)
Taxus wallichiana 8112Schoebel et al. (2013)
Baetis alpinus 125Schoebel et al. (2013)
Stethophyma grossum 5010Schoebel et al. (2013)
Fredericella sultana 3016Schoebel et al. (2013)
Lagopus muta 229Schoebel et al. (2013)
Pyrrhura atricapilla 168Schoebel et al. (2013)
Sylvia atricapilla 5123Schoebel et al. (2013)
Eliomys quercinus 127Schoebel et al. (2013)
Fusaria circinatum 2813Santana et al. (2009)
Arabis alpina 3419Buehler et al. (2011)
Paracirrhites arcatus 8038Whitney & Karl (2012)
Nasua narica 2418Garbiay et al. (2012)
Table 2. Number of microsatellite loci used in conservation genetics studies identified to date that incorporate loci from next-generation discovery methods
SpeciesLociApplicationReferences
Coturnicops noveboracensis 6Spatial genetic structure, bottleneck testsMiller et al. (2012a)
Pyrrhura pfrimeri 8Spatial genetic structureMiller et al. (2012b)
Arabis alpina 18Spatial genetic structure, mating systemBuehler et al. (2012)
Micrurus fulvius 8Spatial genetic structureCastoe et al. (2012)
Papaver rhoeas 11Spatial genetic structureKati et al. (2013)

In addition, Hoban and co-workers suggested that our estimate of the median number of microsatellite loci used in bottleneck studies conducted from 2001 to 2010 (8 to 9) was outdated and cited the mean number of loci used in population genetics studies published in Molecular Ecology in 2011 and 2012 (15–20; Rieseberg et al. 2012) as evidence that more recent studies were already genotyping significantly more microsatellite loci. However, a closer examination of the data presented in Rieseberg et al. (2012) reveals that the cited mean of ‘15–20’ microsatellite loci per study is upwardly biased due to two studies that genotyped >100 loci. Indeed, the median number of microsatellite loci in the studies reviewed by Rieseberg et al. (2012) was only 12, and most studies (>70%) sampled less than 16 microsatellite loci. It is beyond the scope of this study to summarize the median number of loci sampled in studies conducting bottleneck tests in 2011 and 2012, but we suspect it is likely lower than the overall median because microsatellite loci with irregular repeats cannot readily be employed in current genetic bottleneck tests and are often dropped from such tests. Thus, the number of microsatellite loci used in genetic bottleneck tests was likely only slightly elevated in 2011 and 2012 compared to the previous decade, and is not experiencing a rapid upward trajectory that will allow for the sampling of sufficient loci (e.g. 50) to reliably detect population bottlenecks in the near future (even if such were possible given the issues associated with assumptions about mutation models).

A reasonable question to pose is, “Why population genetic studies have not experienced a strong upward surge in the number of microsatellite loci, even in studies that have access to thousands of readily available candidate loci?” The answer lies with such seemingly trivial aspects as the practicality and feasibility of microsatellite primer design, amplification and genotyping. In fact, these aspects are nontrivial and currently constitute the main ‘bottleneck’ in terms of generating reliable microsatellite genotypes in nonmodel species. Typically, only a handful of candidate microsatellite loci will be polymorphic in the target population and generate high-quality genotypes. Many candidate microsatellite loci fail to amplify and those that amplify are often subject to stutter, null alleles and other technical difficulties. The end result is an inverse correlation between the number of loci and the average level of polymorphism and genotyping reliability. Perhaps most importantly, the level of effort needed to amplify and genotype up to 50 microsatellite loci using traditional PCR methods will preclude most investigators and laboratories from sampling such a large number of loci. Although multiplexing is routinely undertaken in model species, the amount of effort necessary to optimize and design PCR primers to multiplex more than four loci in a single PCR, in our experience, makes multiplexing unrealistic for the relatively small sample sizes often obtained from nonmodel species of conservation concern. Indeed, only one of the five studies in Table 2 that used microsatellite loci discovered with NGS methods employed multiplexing when conducting population level analyses. In contrast, genotyping more individuals with a set of optimized (and polymorphic) microsatellite loci often involves a comparatively modest increase in effort (e.g. Palsbøll et al. 1997).

Conclusions

The value of genetic bottleneck tests is greatest for species in which the occurrence of a bottleneck is unknown (otherwise, why conduct a bottleneck test?). Thus, empirical studies seeking reliable inference cannot simply aim towards reducing the probability of type 2 errors (i.e. achieve high statistical power) without considering the possibility of committing type 1 errors, particularly given the sensitivity of bottleneck tests to incorrect assumptions about the frequency of multistep mutations. Moreover, our simulations indicate the trade-off between type 1 and type 2 errors is exacerbated when large numbers of microsatellite loci are sampled. In fact, our simulations indicate that a silver bullet in terms of an optimal number of microsatellite loci that will yield both acceptable type 1 and type 2 error rates may be elusive when tests are considered individually. Thus, even if NGS-based microsatellite locus discovery greatly increases the number of microsatellite loci typically employed (which appears unlikely in the near future given the constraints noted above), it is by no means certain that such an increase will lead to a substantial improvement in the reliability of genetic bottleneck tests.

One possible solution to the above-described issues lies in the finding that type 1 error rates tend to be low for M-ratio tests when true pg is low, whereas type 1 error rates tend to be low for heterozygosity-excess tests when true pg is high (Fig. 1). Thus, a type 1 error will likely only occur for one of the two tests in stable populations across a range of possible values for true pg. It follows that inference of a population bottleneck may be robust to errors in assumed pg when both tests are statistically significant. However, power will be reduced if both tests need to be significant to infer a bottleneck. Joint statistical power appears to be maximized when assumed pg is approximately 0.20, as higher values for assumed pg result in low power for M-ratios, and lower values for assumed pg result in low power for heterozygosity-excess tests (Fig. 2). Nevertheless, it is important to note that estimates of statistical power presented in Fig. 2 are based on 50 microsatellite loci; power with typical numbers of microsatellite loci will be much lower (Peery et al. 2012). In conclusion, we suggest that future studies employing genetic bottleneck tests proceed with caution and carefully assess both statistical power and type 1 error rates across a range of values for the frequency of multistep mutations. We also advocate greater stringency on the part of editors and reviewers; studies that employ genetic bottleneck tests without careful, assessments of type 1 and 2 error rates should be viewed with skepticism.

All authors contributed to manuscript writing and editing. M.Z.P conducted data analysis, B.N.R. conducted the literature review, and P.J.P. developed the simulation software.

Data accessibility

A compiled, executable version of the Windows computer program BNAssessor used to conduct the simulations and error assessments described herein is available at: http://conserver.iugo-cafe.org/user/pjpalsboll/BNassessor.

Ancillary