If something looks too good to be true, it probably is


In an analysis of data from the Kaiser Permanente system, Wallner et al. [1] claim that PSA velocity is a near-perfect predictor of prostate cancer, with areas under the curve (AUCs) > 0.95. The simple and obvious explanation of these findings is that the authors defined anyone not diagnosed with cancer as being cancer-free. As PSA was used to select who was biopsied, this is a straightforward example of what statisticians term ‘verification bias’. As a trivial illustration, if we only biopsied men who became bald, then baseline hair and rate of hair loss would be excellent predictors of prostate cancer risk.

The authors do make reference to verification bias in their discussion but, unusually, claim that this would lead to the study ‘conservatively estimating the overall accuracy of prostate cancer prediction’. This is one of a number of aspects of their paper that fall outside the range of reasonable scientific debate. For example, in contrasting their encouraging findings on PSA velocity with previous clearly negative reports, the authors state that this may be due to the ‘statistical methods used to calculate [PSA velocity] … previous studies in this field rely on the absolute changes in PSA over time when the annual percent change in PSA may be … more accurate’. One of the ‘previous studies’ cited is my own analysis of the Prostate Cancer Prevention Trial (PCPT) [2], which explicitly includes an analysis to determine ‘whether the percentage change in PSA might be of benefit.’ The authors also claim that a ‘majority of findings to date suggest a role for PSA velocity in prostate cancer screening’ despite failing to cite a single paper demonstrating that use of PSA velocity would improve clinical outcomes.

The key finding of the paper is that although PSA velocity adds only a marginal level of discrimination for prostate cancer detection overall, it is of particularly important benefit for detection of high grade disease. This is because the AUC of PSA alone decreases dramatically for high Gleason scores, from 0.94 to 0.72. This is in direct contradiction to the literature where PSA is universally found to be a better predictor for high grade disease than for low grade disease. In the PCPT, for example, the AUC of PSA is nearly 0.15 higher than for high compared with low grade tumours. One possible explanation for the apparent contradiction between the current paper and the previous literature is that Wallner et al. have made a basic typographical error. In Fig. 1B, they present receiver-operating characteristic curves for high grade cancer. The AUC for PSA is quite clearly nowhere near 0.72, and looks to be above 0.90.

Reputable statisticians have analysed the data on PSA velocity from large randomized trials. The findings from the European Randomized trial of Screening for Prostate Cancer [3], the PCPT [4], the Scandinavian radical prostatectomy trial [5] and the PLCO [6] have been unequivocally negative for PSA velocity. Wallner et al. attempt to explain their very different findings in terms of the comparison between clinical trials and ‘real world settings’ and by the unsubstantiated claim that previous studies had ‘inflated numbers of men with other benign prostatic conditions and/or family histories of prostate cancer’. A more plausible explanation is that Wallner et al. have cited erroneous numbers from an analysis beset by extreme verification bias.

Conflict of Interest

Andrew Vickers is named on a patent for a statistical method to detect prostate cancer.