Existing literature on inter-rater reliability focuses on quantifying the disagreement between raters. In this paper, we introduce a method to correct for inter-rater disagreement (or observer bias), where raters are assigning scores on a continuous scale. To do this, we propose a two-stage approach. In the first stage, we standardise the distributions of rater scores to account for each rater's subjective interpretation of the continuous scale. In the second stage, we correct for case-mix differences between raters by exploiting pairwise information where two raters have read the same entity on a case. We illustrate the use of our procedure on clinicians’ visual assessments of breast density (a risk factor for breast cancer). After applying our procedure, 229 out of 1398 women who were originally classified as high density were re-classified as non-high density, and 382 out of 12 348 women were re-classified from non-high to high density. A simulation study also demonstrates good performance of the proposed method over a range of scenarios. Copyright © 2013 John Wiley & Sons, Ltd.