Summary. The Wilcoxon rank sum test is frequently used in statistical practice for the comparison of measures of location when the underlying distributions are far from normal or not known in advance. An assumption of the ordinary rank sum test is that individual sampling units are independent. In many ophthalmologic clinical trials, the Early Treatment for Diabetic Retinopathy Scale (ETDRS) is a principal endpoint used for measuring the level of diabetic retinopathy. This is an ordinal scale, and it is natural to consider the Wilcoxon rank sum test for the comparison of the level of diabetic retinopathy between treatment groups. However, under this design, unlike the usual Wilcoxon rank sum test, the subject is the unit of randomization, but the eye is the unit of analysis. Furthermore, a person will tend to have different, but correlated, ETDRS scores for fellow eyes. Thus, we propose a correction to the variance of the Wilcoxon rank sum statistic that accounts for clustering effects and that can be used for both balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, both in the presence or absence of ties, with p-value adjusted accordingly. In this article, we present large-sample theory and simulation results for this test procedure and apply it to diabetic retinopathy data from type I diabetics in the Sorbinil Retinopathy Trial.