UNIT 1.19 Quality Control Procedures for Genome-Wide Association Studies

  1. Stephen Turner1,
  2. Loren L. Armstrong2,
  3. Yuki Bradford1,
  4. Christopher S. Carlson3,
  5. Dana C. Crawford1,
  6. Andrew T. Crenshaw4,
  7. Mariza de Andrade5,
  8. Kimberly F. Doheny6,
  9. Jonathan L. Haines1,
  10. Geoffrey Hayes2,
  11. Gail Jarvik7,
  12. Lan Jiang1,
  13. Iftikhar J. Kullo8,
  14. Rongling Li9,
  15. Hua Ling6,
  16. Teri A. Manolio9,
  17. Martha Matsumoto5,
  18. Catherine A. McCarty10,
  19. Andrew N. McDavid3,
  20. Daniel B. Mirel4,
  21. Justin E. Paschall11,
  22. Elizabeth W. Pugh6,
  23. Luke V. Rasmussen10,
  24. Russell A. Wilke12,
  25. Rebecca L. Zuvich1,
  26. Marylyn D. Ritchie1

Published Online: 1 JAN 2011

DOI: 10.1002/0471142905.hg0119s68

Current Protocols in Human Genetics

Current Protocols in Human Genetics

How to Cite

Turner, S., Armstrong, L. L., Bradford, Y., Carlson, C. S., Crawford, D. C., Crenshaw, A. T., de Andrade, M., Doheny, K. F., Haines, J. L., Hayes, G., Jarvik, G., Jiang, L., Kullo, I. J., Li, R., Ling, H., Manolio, T. A., Matsumoto, M., McCarty, C. A., McDavid, A. N., Mirel, D. B., Paschall, J. E., Pugh, E. W., Rasmussen, L. V., Wilke, R. A., Zuvich, R. L. and Ritchie, M. D. 2011. Quality Control Procedures for Genome-Wide Association Studies. Current Protocols in Human Genetics. 68:1.19:1.19.1–1.19.18.

Author Information

  1. 1

    Center for Human Genetics Research, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee

  2. 2

    Division of Endocrinology, Metabolism, and Molecular Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois

  3. 3

    Cancer Prevention, Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington

  4. 4

    Genetic Analysis Platform and Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts

  5. 5

    Division of Biostatistics and Informatics, Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota

  6. 6

    Center for Inherited Disease Research, Johns Hopkins University, Baltimore, Maryland

  7. 7

    Department of Genome Sciences, University of Washington, Seattle, Washington

  8. 8

    Division of Cardiovascular Diseases, Department of Medicine, Mayo Clinic, Rochester, Minnesota

  9. 9

    Office of Population Genomics, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland

  10. 10

    Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin

  11. 11

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland

  12. 12

    Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University, Nashville, Tennessee

Publication History

  1. Published Online: 1 JAN 2011
  2. Published Print: JAN 2011


Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research. Curr. Protoc. Hum. Genet. 68:1.19.1-1.19.18 © 2011 by John Wiley & Sons, Inc.


  • genome-wide association studies;
  • GWAS;
  • quality control;
  • QC;
  • biobanks;
  • electronic medical records;
  • eMERGE