To evaluate the calibration of a disease risk prediction tool, the quantity E/O, i.e. the ratio of the expected to the observed number of events, is usually computed. However, because of censoring, or more precisely because of individuals who drop out before the termination of the study, this quantity is generally unavailable for the complete population study and an alternative estimate has to be computed. In this paper, we present and compare four methods to do this. We show that two of the most commonly used methods generally lead to biased estimates. Our arguments are first based on some theoretic considerations. Then, we perform a simulation study to highlight the magnitude of biases. As a concluding example, we evaluate the calibration of an existing predictive model for breast cancer on the E3N–EPIC cohort. Copyright © 2009 John Wiley & Sons, Ltd.