Variation of Estimates of SNP and Haplotype Diversity and Linkage Disequilibrium in Samples from the Same Population Due to Experimental and Evolutionary Sample Size


  • P. M. Visscher

    1. Queensland Institute of Medical Research, Brisbane, Australia
    2. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, UK
    Search for more papers by this author

Corresponding author: Dr. Peter M. Visscher, Genetic Epidemiology, Queensland Institute of Medical Research, 300 Herston Road, Herston 4006, Australia. Tel. +61 7 3362 0166, fax: +61 7 3362 0101. E-mail:


Studies of genetic polymorphisms and diversity between and within human populations are increasingly characterised by a very large number of genetic markers but using a relatively small number of individuals from which DNA samples were taken. In this report we examine the limitations of a small experimental sample size relative to a large genomic sample size, and quantify the sampling variance of a number of measures of diversity and linkage disequilibrium. The relationship between sample size and observed levels of polymorphism and haplotype diversity at the level of a gene is investigated under a neutral model of sequence evolution, using coalescent simulations. It is shown that the effect of evolutionary sampling, as manifested by differences between samples (genes) in measures of diversity estimated using very large sample sizes, is substantial, with a coefficient of variation of the number of detected polymorphic SNPs or haplotypes in the order of 15%. The effect of experimental design (sample size) is also very large, and a number of ‘significant’ results reported in the literature can be explained by sampling alone. The expected correlation coefficient of measures of linkage disequilibrium across samples from the same population has been quantified and found to be consistent with empirical estimates from the literature.