In their article, Gur et al.1 caution against recommendations that set guidelines or goals for reduction of high recall rates. Brem,2 in her editorial, states that the Gur et al. study is based on solid evidence and that the benefit yielded by higher recall rates (specifically, improved detection of breast carcinoma) is worth the associated cost. We would like to call attention to some problems with the methods and analysis that make it difficult to draw definitive conclusions from the results reported by Gur and colleagues. We are also concerned about the strong endorsement of these results by Brem, which we believe is disproportionate to the strength of the design and results.
First, it is unfortunate that the patient population is not described. This information is important because screening recall and cancer detection rates are related to age, breast density, family breast carcinoma history, symptoms, and other patient characteristics. None of these were controlled for in the analysis.
Second, the authors did not examine specificity, sensitivity, or positive predictive value (PPV). As Yankaskas et al.3 demonstrated, the cancer detection rate increases with recall rate overall, but at some level of recall (5% in their study), the gain in sensitivity is slight, whereas PPV continues to decrease. Increasing recall beyond that level may detect a very small number of additional cancers, but at the high cost of a considerable number of extra workups and needless biopsies. A mammographer can maximize his or her cancer detection rate by recalling all women. However, the detection rate is ultimately limited by the intrinsic sensitivity of mammography.
Third, the linear least-square fit to relate the recall and cancer detection rates to each other represents a statistically crude analysis, and the claim by Gur et al. that the increase in the cancer detection rate was evident over the full range of recall rates examined was completely dependent on that linearity assumption. Their data could as easily support a threshold effect starting at the approximately 12% recall rate. A more robust approach would have been to use a flexible model to test the linear assumption.
Based on the problems cited, we believe the results should be considered with caution. We do not agree with Dr. Brem's suggestion that a recall rate of 17% is justified by this article without considering sensitivity and PPV.