Get access

MaCH-Admix: Genotype Imputation for Admixed Populations

Authors

  • Eric Yi Liu,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
    Search for more papers by this author
  • Mingyao Li,

    1. Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pennsylvania
    Search for more papers by this author
  • Wei Wang,

    1. Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
    2. Department of Computer Science, University of California, Los Angeles, California
    Search for more papers by this author
  • Yun Li

    Corresponding author
    1. Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
    • Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
    Search for more papers by this author

Correspondence to: Yun Li, Department of Genetics, University of North Carolina, 120 Mason Farm Road, Chapel Hill, NC 27599. E-mail: yunli@med.unc.edu

Abstract

Imputation in admixed populations is an important problem but challenging due to the complex linkage disequilibrium (LD) pattern. The emergence of large reference panels such as that from the 1,000 Genomes Project enables more accurate imputation in general, and in particular for admixed populations and for uncommon variants. To efficiently benefit from these large reference panels, one key issue to consider in modern genotype imputation framework is the selection of effective reference panels. In this work, we consider a number of methods for effective reference panel construction inside a hidden Markov model and specific to each target individual. These methods fall into two categories: identity-by-state (IBS) based and ancestry-weighted approach. We evaluated the performance on individuals from recently admixed populations. Our target samples include 8,421 African Americans and 3,587 Hispanic Americans from the Women' Health Initiative, which allow assessment of imputation quality for uncommon variants. Our experiments include both large and small reference panels; large, medium, and small target samples; and in genome regions of varying levels of LD. We also include BEAGLE and IMPUTE2 for comparison. Experiment results with large reference panel suggest that our novel piecewise IBS method yields consistently higher imputation quality than other methods/software. The advantage is particularly noteworthy among uncommon variants where we observe up to 5.1% information gain with the difference being highly significant (Wilcoxon signed rank test P-value < 0.0001). Our work is the first that considers various sensible approaches for imputation in admixed populations and presents a comprehensive comparison.

Get access to the full text of this article

Ancillary