Characterization and Correction of Error in Genome-Wide IBD Estimation for Samples with Population Structure
Version of Record online: 5 JUN 2013
© 2013 WILEY PERIODICALS, INC.
Volume 37, Issue 6, pages 635–641, September 2013
How to Cite
Morrison, J. (2013), Characterization and Correction of Error in Genome-Wide IBD Estimation for Samples with Population Structure. Genet. Epidemiol., 37: 635–641. doi: 10.1002/gepi.21737
- Issue online: 11 AUG 2013
- Version of Record online: 5 JUN 2013
- Manuscript Accepted: 17 APR 2013
- Manuscript Revised: 12 APR 2013
- Manuscript Received: 28 NOV 2012
- NIH. Grant Numbers: R01 GM 075091, T32 GM 81062
- identity by descent;
- population structure;
The proportion of the genome that is shared identical by descent (IBD) between pairs of individuals is often estimated in studies involving genome-wide SNP data. These estimates can be used to check pedigrees, estimate heritability, and adjust association analyses. We focus on the method of moments technique as implemented in PLINK [Purcell et al., 2007] and other software that estimates the proportions of the genome at which two individuals share 0, 1, or 2 alleles IBD. This technique is based on the assumption that the study sample is drawn from a single, homogeneous, randomly mating population. This assumption is violated if pedigree founders are drawn from multiple populations or include admixed individuals. In the presence of population structure, the method of moments estimator has an inflated variance and can be biased because it relies on sample-based allele frequency estimates. In the case of the PLINK estimator, which truncates genome-wide sharing estimates at zero and one to generate biologically interpretable results, the bias is most often towards over-estimation of relatedness between ancestrally similar individuals. Using simulated pedigrees, we are able to demonstrate and quantify the behavior of the PLINK method of moments estimator under different population structure conditions. We also propose a simple method based on SNP pruning for improving genome-wide IBD estimates when the assumption of a single, homogeneous population is violated.