• haplotype diversity;
  • linkage disequilibrium;
  • coalescent;
  • SNP;
  • sample size


Studies of genetic polymorphisms and diversity between and within human populations are increasingly characterised by a very large number of genetic markers but using a relatively small number of individuals from which DNA samples were taken. In this report we examine the limitations of a small experimental sample size relative to a large genomic sample size, and quantify the sampling variance of a number of measures of diversity and linkage disequilibrium. The relationship between sample size and observed levels of polymorphism and haplotype diversity at the level of a gene is investigated under a neutral model of sequence evolution, using coalescent simulations. It is shown that the effect of evolutionary sampling, as manifested by differences between samples (genes) in measures of diversity estimated using very large sample sizes, is substantial, with a coefficient of variation of the number of detected polymorphic SNPs or haplotypes in the order of 15%. The effect of experimental design (sample size) is also very large, and a number of ‘significant’ results reported in the literature can be explained by sampling alone. The expected correlation coefficient of measures of linkage disequilibrium across samples from the same population has been quantified and found to be consistent with empirical estimates from the literature.