Characterization and Correction of Error in Genome-Wide IBD Estimation for Samples with Population Structure

Authors

  • Jean Morrison

    Corresponding author
    1. Department of Biostatistics, University of Washington, Seattle, Washington
    • Correspondence to: Jean Morrison, Department of Biostatistics, University of Washington, Box 357232, Seattle, WA 98195-7232. E-mail: jeanm5@uw.edu

    Search for more papers by this author

ABSTRACT

The proportion of the genome that is shared identical by descent (IBD) between pairs of individuals is often estimated in studies involving genome-wide SNP data. These estimates can be used to check pedigrees, estimate heritability, and adjust association analyses. We focus on the method of moments technique as implemented in PLINK [Purcell et al., 2007] and other software that estimates the proportions of the genome at which two individuals share 0, 1, or 2 alleles IBD. This technique is based on the assumption that the study sample is drawn from a single, homogeneous, randomly mating population. This assumption is violated if pedigree founders are drawn from multiple populations or include admixed individuals. In the presence of population structure, the method of moments estimator has an inflated variance and can be biased because it relies on sample-based allele frequency estimates. In the case of the PLINK estimator, which truncates genome-wide sharing estimates at zero and one to generate biologically interpretable results, the bias is most often towards over-estimation of relatedness between ancestrally similar individuals. Using simulated pedigrees, we are able to demonstrate and quantify the behavior of the PLINK method of moments estimator under different population structure conditions. We also propose a simple method based on SNP pruning for improving genome-wide IBD estimates when the assumption of a single, homogeneous population is violated.

Ancillary