The Wilcoxon–Mann–Whitney test under scrutiny



The Wilcoxon–Mann–Whitney (WMW) test is often used to compare the means or medians of two independent, possibly nonnormal distributions. For this problem, the true significance level of the large sample approximate version of the WMW test is known to be sensitive to differences in the shapes of the distributions. Based on a wide ranging simulation study, our paper shows that the problem of lack of robustness of this test is more serious than is thought to be the case. In particular, small differences in variances and moderate degrees of skewness can produce large deviations from the nominal type I error rate. This is further exacerbated when the two distributions have different degrees of skewness. Other rank-based methods like the Fligner–Policello (FP) test and the Brunner–Munzel (BM) test perform similarly, although the BM test is generally better. By considering the WMW test as a two-sample T test on ranks, we explain the results by noting some undesirable properties of the rank transformation. In practice, the ranked samples should be examined and found to sufficiently satisfy reasonable symmetry and variance homogeneity before the test results are interpreted. Copyright © 2009 John Wiley & Sons, Ltd.