Mann-Whitney U test and Kruskal-Wallis test should be used for comparisons of differences in medians, not means: Comment on the article by van der Helm-van Mil et al

Authors


Mann-Whitney U Test and Kruskal-Wallis Test should be Used for Comparisons of Differences in Medians, not Means: Comment on the Article by van der Helm-van Mil et al

To the Editor:

We read with great interest the recent article by van der Helm-van Mil et al (1). In their study, the authors evaluated the accuracy of a recently developed prediction rule in 3 independent cohorts of patients with undifferentiated arthritis. However, we would like to point out that the interpretation of the findings comparing the characteristics between the 3 cohorts was incompatible with the statistical methods used. Specifically, in the statistical analysis section, the authors stated that “Results are reported as the mean ± SD or, in situations in which the distributions were skewed, as the median and interquartile range. Differences in mean values between groups were analyzed with the Mann-Whitney U test for comparison of 2 groups and the Kruskal-Wallis test for comparison of 3 groups” (1).

The Mann-Whitney U test (2) and the Kruskal-Wallis test (3) are nonparametric methods designed to detect whether 2 or more samples come from the same distribution or to test whether medians between comparison groups are different, under the assumption that the shapes of the underlying distributions are the same. Thus, these nonparametric tests are commonly used to determine whether medians, not means, are different between comparison groups. Although these tests are often used to compare means when normality assumption is not violated, strictly speaking, interpreting the results of nonparametric tests for mean comparison is inaccurate. When the distribution of a variable is skewed (for example, as in the values for C-reactive protein that van der Helm-van Mil et al present in Table 2 of their article [1]), only assertions on whether medians, and not means, were different between groups should be made using nonparametric methods. While the conclusions regarding the differences in characteristics between the 3 cohorts may not change materially, such assertions might leave readers with the impression that these nonparametric tests can be used to test the difference in means rather than medians, the intended target, between comparison groups.

Acknowledgements

Supported by NIH grant P60-AR-47785.

Bin Zhang ScD*, Yuqing Zhang DSc*, * Boston University School of Medicine, Boston, MA.

Ancillary