Get access

Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations


  • Peter Beerli

    Corresponding author
    1. Computer Science and Information Technology and Biological Sciences Department, Florida State University, Tallahassee FL 32306–4120 USA
    Search for more papers by this author

P. Beerli. CSIT, Dirac Science Library, Florida State University, Tallahassee FL 32306–4120 USA. Fax: USA-(850) 644 0098; E-mail:


Current estimators of gene flow come in two methods; those that estimate parameters assuming that the populations investigated are a small random sample of a large number of populations and those that assume that all populations were sampled. Maximum likelihood or Bayesian approaches that estimate the migration rates and population sizes directly using coalescent theory can easily accommodate datasets that contain a population that has no data, a so-called ‘ghost’ population. This manipulation allows us to explore the effects of missing populations on the estimation of population sizes and migration rates between two specific populations. The biases of the inferred population parameters depend on the magnitude of the migration rate from the unknown populations. The effects on the population sizes are larger than the effects on the migration rates. The more immigrants from the unknown populations that are arriving in the sample populations the larger the estimated population sizes. Taking into account a ghost population improves or at least does not harm the estimation of population sizes. Estimates of the scaled migration rate M (migration rate per generation divided by the mutation rate per generation) are fairly robust as long as migration rates from the unknown populations are not huge. The inclusion of a ghost population does not improve the estimation of the migration rate M; when the migration rates are estimated as the number of immigrants Nm then a ghost population improves the estimates because of its effect on population size estimation. It seems that for ‘real world’ analyses one should carefully choose which populations to sample, but there is no need to sample every population in the neighbourhood of a population of interest.