Get access

Performance and Robustness of Penalized and Unpenalized Methods for Genetic Prediction of Complex Human Disease

Authors

  • Gad Abraham,

    1. Medical Systems Biology, Departments of Pathology and of Microbiology & Immunology, The University of Melbourne, Parkville, VIC, Australia
    2. NICTA Victoria Research Lab, Department of Computing & Information Systems, The University of Melbourne, Parkville, VIC, Australia
    Search for more papers by this author
  • Adam Kowalczyk,

    1. NICTA Victoria Research Lab, Department of Computing & Information Systems, The University of Melbourne, Parkville, VIC, Australia
    Search for more papers by this author
  • Justin Zobel,

    1. NICTA Victoria Research Lab, Department of Computing & Information Systems, The University of Melbourne, Parkville, VIC, Australia
    Search for more papers by this author
  • Michael Inouye

    Corresponding author
    • Medical Systems Biology, Departments of Pathology and of Microbiology & Immunology, The University of Melbourne, Parkville, VIC, Australia
    Search for more papers by this author

Correspondence to: Michael Inouye, Medical Systems Biology, Departments of Pathology and of Microbiology & Immunology, The University of Melbourne, Parkville 3010, Victoria, Australia. E-mail: minouye@unimelib.edu.au

Abstract

A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease.

Ancillary