Quality control issues and the identification of rare functional variants with next-generation sequencing data
Article first published online: 29 NOV 2011
© 2011 Wiley Periodicals, Inc.
Supplement: Genetic Analysis Workshop 17: Approaches to Analysis of Next-Generation Sequencing Data
Volume 35, Issue Supplement 1, pages S22–S28, 2011
How to Cite
Hemmelmann, C., Daw, E. W. and Wilson, A. F. (2011), Quality control issues and the identification of rare functional variants with next-generation sequencing data. Genet. Epidemiol., 35: S22–S28. doi: 10.1002/gepi.20645
- Issue published online: 29 NOV 2011
- Article first published online: 29 NOV 2011
- 1000 Genomes Project;
- collection of rare variants;
- family data;
- next-generation sequencing;
- quality control
Next-generation sequencing of large numbers of individuals presents challenges in data preparation, quality control, and statistical analysis because of the rarity of the variants. The Genetic Analysis Workshop 17 (GAW17) data provide an opportunity to survey existing methods and compare these methods with novel ones. Specifically, the GAW17 Group 2 contributors investigate existing and newly proposed methods and study design strategies to identify rare variants, predict functional variants, and/or examine quality control. We introduce the eight Group 2 papers, summarize their approaches, and discuss their strengths and weaknesses. For these investigations, some groups used only the genotype data, whereas others also used the simulated phenotype data. Although the eight Group 2 contributions covered a wide variety of topics under the general idea of identifying rare variants, they can be grouped into three broad categories according to their common research interests: functionality of variants and quality control issues, family-based analyses, and association analyses of unrelated individuals. The aims of the first subgroup were quite different. These were population structure analyses that used rare variants to predict functionality and examine the accuracy of genotype calls. The aims of the family-based analyses were to select which families should be sequenced and to identify high-risk pedigrees; the aim of the association analyses was to identify variants or genes with regression-based methods. However, power to detect associations was low in all three association studies. Thus this work shows opportunities for incorporating rare variants into the genetic and statistical analyses of common diseases. Genet. Epidemiol. 35:S22–S28, 2011. © 2011 Wiley Periodicals, Inc.