Genotype Imputation for African Americans Using Data From HapMap Phase II Versus 1000 Genomes Projects
Article first published online: 29 MAY 2012
© 2012 Wiley Periodicals, Inc.
Volume 36, Issue 5, pages 508–516, July 2012
How to Cite
Sung, Y. J., Gu, C. C., Tiwari, H. K., Arnett, D. K., Broeckel, U. and Rao, D. C. (2012), Genotype Imputation for African Americans Using Data From HapMap Phase II Versus 1000 Genomes Projects. Genet. Epidemiol., 36: 508–516. doi: 10.1002/gepi.21647
- Issue published online: 12 JUN 2012
- Article first published online: 29 MAY 2012
- Manuscript Accepted: 26 APR 2012
- Manuscript Revised: 6 APR 2012
- Manuscript Received: 4 SEP 2011
- NIH. Grant Numbers: HL55673, HL54473, HL72507, GM28719
- African Americans;
- 1000 genomes project;
- genotype imputations
Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of significant associations for SNPs across the whole MAF spectrum. Because the 1KG Project is still under way, we expect that later versions will provide better imputation performance. Genet. Epidemiol. 36:508-516, 2012. © 2012 Wiley Periodicals, Inc.