An Omnibus Consistent Adaptive Percentile Modified Wilcoxon Rank Sum Test with Applications in Gene Expression Studies

Authors

  • Olivier Thas,

    Corresponding author
    1. Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, B-9000 Gent, Belgium
    2. Centre for Statistical and Survey Methodology, School of Mathematics and Applied Statistics, University of Wollongong, NSW 2522, Australia
      email: olivier.thas@ugent.be
    Search for more papers by this author
  • Lieven Clement,

    1. Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, B-9000 Gent, Belgium
    2. I-Biostat, Katholieke Universiteit Leuven and Universiteit Hasselt, Kapucijnenvoer 35, B-3000 Leuven, Belgium
    Search for more papers by this author
  • John C.W. Rayner,

    1. Centre for Statistical and Survey Methodology, School of Mathematics and Applied Statistics, University of Wollongong, NSW 2522, Australia
    2. School of Mathematical and Physical Sciences, University of Newcastle, NSW 2308, Australia
    Search for more papers by this author
  • Beatriz Carvalho,

    1. Department of Pathology, VU University Medical Center, de Boelelaan 1117, 1081HV Amsterdam, The Netherlands
    Search for more papers by this author
  • Wim Van Criekinge

    1. Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, B-9000 Gent, Belgium
    Search for more papers by this author

email: olivier.thas@ugent.be

Abstract

Summary We present an adaptive percentile modified Wilcoxon rank sum test for the two-sample problem. The test is basically a Wilcoxon rank sum test applied on a fraction of the sample observations, and the fraction is adaptively determined by the sample observations. Most of the theory is developed under a location-shift model, but we demonstrate that the test is also meaningful for testing against more general alternatives. The test may be particularly useful for the analysis of massive datasets in which quasi-automatic hypothesis testing is required. We investigate the power characteristics of the new test in a simulation study, and we apply the test to a microarray experiment on colorectal cancer. These empirical studies demonstrate that the new test has good overall power and that it succeeds better in finding differentially expressed genes as compared to other popular tests. We conclude that the new nonparametric test is widely applicable and that its power is comparable to the power of the Baumgartner-Weiß-Schindler test.

Ancillary