Use of pretransformation to cope with extreme values in important candidate features

Authors

  • Anne-Laure Boulesteix,

    Corresponding author
    1. Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377 Munich, Germany
    • Phone: +49-89-7095-7598, Fax: +49-89-7095-7491
    Search for more papers by this author
  • Vincent Guillemot,

    1. Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377 Munich, Germany
    2. Supélec, Department of Signal and Electronic Systems, F-91192 Gif-sur-Yvette, France
    Search for more papers by this author
  • Willi Sauerbrei

    1. Institute of Medical Biometry and Informatics, University Medical Center Freiburg, Stefan-Meier-Str. 26, 79104 Freiburg, Germany
    Search for more papers by this author

Abstract

Extreme values in predictors often strongly affect the results of statistical analyses in high-dimensional settings. Although they frequently occur with most high-throughput techniques, the problem is often ignored in the literature. We suggest to use a very simple transformation, proposed before in a different context by Royston and Sauerbrei, as an intermediary step between array preprocessing and high-level statistical analysis. This straightforward univariate transformation identifies extreme values in continuous features and can thus be used as a diagnostic tool for outliers. The use of the transformation and its effects is demonstrated for diverse univariate and multivariate statistical analyses using nine publicly available microarray data sets.

Ancillary