Get access

Genetic Prediction of Quantitative Lipid Traits: Comparing Shrinkage Models to Gene Scores

Authors

  • Helen Warren,

    Corresponding author
    1. Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
    • Correspondence to: Helen Warren, Centre for Clinical Pharmacology, William Harvey Research Institute, Barts and the London Medical School, John Vane Science Centre, Charterhouse Square, London EC1M 6BQ, UK. E-mail: h.r.warren@qmul.ac.uk

    Search for more papers by this author
  • Juan-Pablo Casas,

    1. Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
    2. Department of Epidemiology and Public Health, University College London, London, United Kingdom
    Search for more papers by this author
  • Aroon Hingorani,

    1. Department of Epidemiology and Public Health, University College London, London, United Kingdom
    Search for more papers by this author
  • Frank Dudbridge,

    1. Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Search for more papers by this author
  • John Whittaker

    1. Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
    2. Statistical Platforms & Technologies, QSci, GlaxoSmithKline, Stevenage, United Kingdom
    Search for more papers by this author

ABSTRACT

Accurate genetic prediction of quantitative traits related to complex disease risk would have potential clinical impact, so investigation of statistical methodology to improve predictive performance is important. We compare a simple approach of polygenic scores using top ranking single nucleotide polymorphisms (SNPs) to a set of shrinkage models, namely Ridge Regression, Lasso and Hyper-Lasso. These penalised regression methods analyse all genotyped SNPs simultaneously, potentially including much larger sets of SNPs in the models, not only those with the smallest P values. We compare the accuracy of these models for predicting low-density lipoprotein (LDL) and high-density lipoprotein (HDL) cholesterol, two lipid traits of clinical relevance, in the Whitehall II and British Women's Health and Heart Study cohorts, using SNPs from the HumanCVD BeadChip. For gene scores, the most accurate predictions arise from multivariate weighted scores and include only a small number of SNPs, identified as top hits by the HumanCVD BeadChip. Furthermore, there was little benefit from including external results from published sets of SNPs. We found that shrinkage approaches rarely improved significantly on gene score results. Genetic predictive performance is trait specific, depending on the heritability and genetic architecture of the trait, and is limited by the training data sample size. Our results for lipid traits suggest no current benefit of more complex methods over existing gene score methods. Instead, the most important choice for the prediction model is the number of SNPs and selection of the most predictive SNPs to include. However further comparisons, in larger samples and for other phenotypes, would still be of interest.

Ancillary