Advertisement

Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities

Authors

  • Florian Mittag,

    1. Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
    Search for more papers by this author
  • Finja Büchel,

    1. Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
    Search for more papers by this author
  • Mohamad Saad,

    1. Institut National de la Sante et de la Recherche Medicale, UMR 1043, Centre de Physiopathologie de Toulouse-Purpan, Toulouse, France
    2. Département des Sciences du Vivant, Paul Sabatier University, Toulouse, France
    Search for more papers by this author
  • Andreas Jahn,

    1. Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
    Search for more papers by this author
  • Claudia Schulte,

    1. Department for Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, and DZNE, German Centre for Neurodegenerative Diseases, Tübingen, Germany
    Search for more papers by this author
  • Zoltan Bochdanovits,

    1. Department of Clinical Genetics, Section of Medical Genomics, VU University Medical Centre, Amsterdam, The Netherlands
    Search for more papers by this author
  • Javier Simón-Sánchez,

    1. Department of Clinical Genetics, Section of Medical Genomics, VU University Medical Centre, Amsterdam, The Netherlands
    Search for more papers by this author
  • Mike A. Nalls,

    1. Laboratory of Neurogenetics. National Institute on Aging, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Margaux Keller,

    1. Laboratory of Neurogenetics. National Institute on Aging, National Institutes of Health, Bethesda, Maryland
    2. Department of Biological Anthropology. Temple University, Philadelphia, Pennsylvania
    Search for more papers by this author
  • Dena G. Hernandez,

    1. Laboratory of Neurogenetics. National Institute on Aging, National Institutes of Health, Bethesda, Maryland
    2. Department of Molecular Neuroscience, Institute of Neurology, University College London, London, UK
    Search for more papers by this author
  • J. Raphael Gibbs,

    1. Laboratory of Neurogenetics. National Institute on Aging, National Institutes of Health, Bethesda, Maryland
    2. Department of Molecular Neuroscience, Institute of Neurology, University College London, London, UK
    Search for more papers by this author
  • Suzanne Lesage,

    1. Université Pierre et Marie Curie-Paris, Centre de Recherche de l'Institut du Cerveau et de la Moelle Epinière, UMR-S975, Paris, France
    2. Institut National de la Sante et de la Recherche Medicale, UMR_S975 CRicm, Paris, France
    3. Centre National de la Recherche Scientifique, UMR 7225, Paris, France
    Search for more papers by this author
  • Alexis Brice,

    1. Université Pierre et Marie Curie-Paris, Centre de Recherche de l'Institut du Cerveau et de la Moelle Epinière, UMR-S975, Paris, France
    2. AP-HP, Hôpital de la Salpêtrière, Département de Génétique et Cytogénétique, Paris, France
    3. Institut National de la Sante et de la Recherche Medicale, UMR_S975 CRicm, Paris, France
    4. Centre National de la Recherche Scientifique, UMR 7225, Paris, France
    Search for more papers by this author
  • Peter Heutink,

    1. Department of Clinical Genetics, Section of Medical Genomics, VU University Medical Centre, Amsterdam, The Netherlands
    Search for more papers by this author
  • Maria Martinez,

    1. Institut National de la Sante et de la Recherche Medicale, UMR 1043, Centre de Physiopathologie de Toulouse-Purpan, Toulouse, France
    2. Département des Sciences du Vivant, Paul Sabatier University, Toulouse, France
    Search for more papers by this author
  • Nicholas W Wood,

    1. Department of Molecular Neuroscience, Institute of Neurology, University College London, London, UK
    Search for more papers by this author
  • John Hardy,

    1. Department of Molecular Neuroscience, Institute of Neurology, University College London, London, UK
    Search for more papers by this author
  • Andrew B. Singleton,

    1. Laboratory of Neurogenetics. National Institute on Aging, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Andreas Zell,

    1. Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
    Search for more papers by this author
  • Thomas Gasser,

    1. Department for Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, and DZNE, German Centre for Neurodegenerative Diseases, Tübingen, Germany
    Search for more papers by this author
  • Manu Sharma

    for the International Parkinson's Disease Genomics Consortium (IPDGC), Corresponding author
    1. Department for Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, and DZNE, German Centre for Neurodegenerative Diseases, Tübingen, Germany
    • Hertie-Institute of Clinical Brain Research, Department of Neurology, University of Tuebingen, Hoppe-Seyler-Str. 3, 72076 Tübingen, Germany
    Search for more papers by this author

  • Communicated by Christopher G. Mathew

Abstract

The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1–5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ∼0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ∼0.56; heritability ∼38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/. Hum Mutat 33:1708–1718, 2012. © 2012 Wiley Periodicals, Inc.

Ancillary