Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches
Bioinformatics and Chemoinformatics
Published Online: 15 SEP 2011
Copyright © 2009 John Wiley & Sons, Ltd. All rights reserved.
General, Applied and Systems Toxicology
How to Cite
Repsilber, D. and Jacobsen, M. 2011. Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches. General, Applied and Systems Toxicology. .
- Published Online: 15 SEP 2011
In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop.
- feature selection;
- multivariate signature;
- statistical learning;
- integrative bioinformatics