Gene, region and pathway level analyses in whole-genome studies

Authors

  • Omar De la Cruz,

    1. Department of Statistics, The University of Chicago, Chicago, Illinois
    Current affiliation:
    1. Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305
    Search for more papers by this author
    • Omar De la Cruz and Xiaoquan Wen equally contributed to this work.

  • Xiaoquan Wen,

    1. Department of Statistics, The University of Chicago, Chicago, Illinois
    Search for more papers by this author
    • Omar De la Cruz and Xiaoquan Wen equally contributed to this work.

  • Baoguan Ke,

    1. Department of Statistics, The University of Chicago, Chicago, Illinois
    Search for more papers by this author
  • Minsun Song,

    1. Department of Statistics, The University of Chicago, Chicago, Illinois
    Current affiliation:
    1. Lewis-Sigler Institute, Princeton University, Princeton, New Jersey 08544
    Search for more papers by this author
  • Dan L. Nicolae

    Corresponding author
    1. Department of Statistics, The University of Chicago, Chicago, Illinois
    2. Department of Medicine, The University of Chicago, Chicago, Illinois
    • Departments of Statistics and Medicine, The University of Chicago, 5734 S. University Ave., Chicago, IL 60637
    Search for more papers by this author

Abstract

In the setting of genome-wide association studies, we propose a method for assigning a measure of significance to pre-defined sets of markers in the genome. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are not strongly correlated. This approach has several advantages. First, moderately strong signals from different SNPs are combined to obtain a much stronger signal for the set, therefore increasing power. Second, in combination with methods that provide information on untyped markers, it leads to results that can be readily combined across studies and platforms that might use different SNPs. Third, the results are easy to interpret, since they refer to functional sets of markers that are likely to behave as a unit in their phenotypic effect. Finally, the availability of gene-level P-values for association is the first step in developing methods that integrate information from pathways and networks with genome-wide association data, and these can lead to a better understanding of the complex traits genetic architecture. The power of the approach is investigated in simulated and real datasets. Novel Crohn's disease associations are found using the WTCCC data. Genet. Epidemiol. 34: 222–231, 2010. © 2009 Wiley-Liss, Inc.

Ancillary