Dealing with high dimensionality for the identification of common and rare variants as main effects and for gene-environment interaction

Authors

  • Heike Bickeböller,

    Corresponding author
    1. Department of Genetic Epidemiology, University Medical Center Göttingen, Göttingen, Germany
    • Department of Genetic Epidemiology, University Medical Center, Georg-August-University Göttingen, Humboldtallee 32, D-37073 Göttingen, Germany
    Search for more papers by this author
    • These authors contributed equally to the work.

  • Jeanine J. Houwing-Duistermaat,

    1. Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
    Search for more papers by this author
    • These authors contributed equally to the work.

  • Xuefeng Wang,

    1. Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH
    Search for more papers by this author
  • Xiting Yan

    1. Department of Epidemiology and Genetics, Yale University, New Haven, CT
    Search for more papers by this author

Abstract

In addition to genome-wide association studies, sequence data are now up and coming, increasing the need for even more effective methods of dealing with high dimensionality and the identification of variants beyond common variant main effects. The contributors to Genetic Analysis Workshop 17 Group 4 applied novel and recently proposed methods for handling population structure, high dimensionality, and gene-environment interactions in the context of mini-exome sequence data. For the collapsing of rare variants into gene summaries, some of the contributions considered the computationally fast, straightforward summing of all or particular subsets of rare variants. Other methods were comparatively time-consuming and complex but offered a data-driven approach, such as reduction in the subset of rare variants to be considered using a U statistic and semiparametric modeling of single-nucleotide polymorphism effects implementing kernel machines. Several approaches were applied using regression models, regularized regression, and kernels. Testing for gene-specific main effects and gene-environment interaction using least-squares kernel machines showed more flexibility and was supervised compared with a two-step approach that used a random effects model that incorporated an empirical Bayes estimate. However, the random effects model was the only method capable of treating family data, at least in their present form. Genet. Epidemiol. 35:S35–S40, 2011. © 2011 Wiley Periodicals, Inc.

Ancillary