Get access

Incorporating biological information into association studies of sequencing data


  • Gary Chen,

    1. Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA
    Search for more papers by this author
    • These authors contributed equally to this work

  • Peng Wei,

    1. Division of Biostatistics and Human Genetics Center, University of Texas School of Public Health, Houston, TX
    Search for more papers by this author
    • These authors contributed equally to this work

  • Anita L. DeStefano

    Corresponding author
    1. Department of Biostatistics, School of Public Health, Boston University, Boston, MA; and Department of Neurology, Boston University School of Medicine, Boston, MA
    • Boston University School of Public Health, 801 Massachusetts Avenue, 3rd Floor, Boston, MA 02118
    Search for more papers by this author


We summarize the methodological contributions from Group 3 of Genetic Analysis Workshop 17 (GAW17). The overarching goal of these methods was the evaluation and enhancement of state-of-the-art approaches in integration of biological knowledge into association studies of rare variants. We found that methods loosely fell into three major categories: (1) hypothesis testing of index scores based on aggregating rare variants at the gene level, (2) variable selection techniques that incorporate biological prior information, and (3) novel approaches that integrate external (i.e., not provided by GAW17) prior information, such as pathway and single-nucleotide polymorphism (SNP) annotations. Commonalities among the findings from these contributions are that gene-based analysis of rare variants is advantageous to single-SNP analysis and that the minor allele frequency threshold to identify rare variants may influence power and thus needs to be carefully considered. A consistent increase in power was also identified by considering only nonsynonymous SNPs in the analyses. Overall, we found that no single method had an appreciable advantage over the other methods. However, methods that carried out sensitivity analyses by comparing biologically informative to noninformative prior probabilities demonstrated that integrating biological knowledge into statistical analyses always, at the least, enabled subtle improvements in the performance of any statistical method applied to these simulated data. Although these statistical improvements reflect the simulation model assumed for GAW17, our hope is that the simulation models provide a reasonable representation of the underlying biology and that these methods can thus be of utility in real data. Genet. Epidemiol. 35:S29–S34, 2011. © 2011 Wiley Periodicals, Inc.