Deciphering the complex: Methodological overview of statistical models to derive OMICS-based biomarkers

Authors

  • Marc Chadeau-Hyam,

    Corresponding author
    1. Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, Norfolk Place, London, United Kingdom
    Search for more papers by this author
    • Marc Chadeau-Hyam, Gianluca Campanella, Benoit Liquet, and Roel C.H. Vermeulen contributed equally to this work.

  • Gianluca Campanella,

    1. Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, Norfolk Place, London, United Kingdom
    Search for more papers by this author
    • Marc Chadeau-Hyam, Gianluca Campanella, Benoit Liquet, and Roel C.H. Vermeulen contributed equally to this work.

  • Thibaut Jombart,

    1. Department of Infectious Disease Epidemiology, MRC Centre for Outbreak Analysis and Modelling, Imperial College, London, United Kingdom
    Search for more papers by this author
  • Leonardo Bottolo,

    1. Department of Mathematics, Imperial College, London, United Kingdom
    Search for more papers by this author
  • Lutzen Portengen,

    1. Institute for Risk Assessment, Utrecht University, Utrecht, The Netherlands
    Search for more papers by this author
  • Paolo Vineis,

    1. Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, Norfolk Place, London, United Kingdom
    2. HuGeF, Human Genetics Foundation, Torino, Italy
    Search for more papers by this author
  • Benoit Liquet,

    1. MRC Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
    Search for more papers by this author
    • Marc Chadeau-Hyam, Gianluca Campanella, Benoit Liquet, and Roel C.H. Vermeulen contributed equally to this work.

  • Roel C.H. Vermeulen

    1. Institute for Risk Assessment, Utrecht University, Utrecht, The Netherlands
    2. Julius Center for health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
    Search for more papers by this author
    • Marc Chadeau-Hyam, Gianluca Campanella, Benoit Liquet, and Roel C.H. Vermeulen contributed equally to this work.


Abstract

Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets. Environ. Mol. Mutagen. 54:542-557, 2013. © 2013 Wiley Periodicals, Inc.

Ancillary