These authors contributed equally to this work
Incorporating biological information into association studies of sequencing data
Article first published online: 29 NOV 2011
© 2011 Wiley Periodicals, Inc.
Supplement: Genetic Analysis Workshop 17: Approaches to Analysis of Next-Generation Sequencing Data
Volume 35, Issue Supplement 1, pages S29–S34, 2011
How to Cite
Chen, G., Wei, P. and DeStefano, A. L. (2011), Incorporating biological information into association studies of sequencing data. Genet. Epidemiol., 35: S29–S34. doi: 10.1002/gepi.20646
- Issue published online: 29 NOV 2011
- Article first published online: 29 NOV 2011
- exome sequencing;
- pathway analysis;
- gene association
We summarize the methodological contributions from Group 3 of Genetic Analysis Workshop 17 (GAW17). The overarching goal of these methods was the evaluation and enhancement of state-of-the-art approaches in integration of biological knowledge into association studies of rare variants. We found that methods loosely fell into three major categories: (1) hypothesis testing of index scores based on aggregating rare variants at the gene level, (2) variable selection techniques that incorporate biological prior information, and (3) novel approaches that integrate external (i.e., not provided by GAW17) prior information, such as pathway and single-nucleotide polymorphism (SNP) annotations. Commonalities among the findings from these contributions are that gene-based analysis of rare variants is advantageous to single-SNP analysis and that the minor allele frequency threshold to identify rare variants may influence power and thus needs to be carefully considered. A consistent increase in power was also identified by considering only nonsynonymous SNPs in the analyses. Overall, we found that no single method had an appreciable advantage over the other methods. However, methods that carried out sensitivity analyses by comparing biologically informative to noninformative prior probabilities demonstrated that integrating biological knowledge into statistical analyses always, at the least, enabled subtle improvements in the performance of any statistical method applied to these simulated data. Although these statistical improvements reflect the simulation model assumed for GAW17, our hope is that the simulation models provide a reasonable representation of the underlying biology and that these methods can thus be of utility in real data. Genet. Epidemiol. 35:S29–S34, 2011. © 2011 Wiley Periodicals, Inc.