In this issue of Cancer, Havrilesky and colleagues argue that the biologic diversity of ovarian cancer means that it will be even harder than we expected to save lives through screening.1 Instead of the 15% to 20% reduction in ovarian cancer mortality that otherwise might be expected, those authors estimate that deaths would fall only by 11% from their current levels if postmenopausal women in the United States underwent annual screening with current technology. They envision a screening regimen about as sensitive (accurate in detecting cancer) and as specific (accurate in registering the absence of cancer) as that offered currently in the large, randomized United Kingdom trial and the Gynecologic Oncology Group (GOG) trial of high-risk women. That algorithm uses serial values of cancer antigen 125 (CA 125) to estimate risk, and the providers refer women with elevated risk for an ultrasound examination.
Havrilesky et al reach their sobering conclusion by reasoning that one major subset of ovarian cancers differs profoundly from the other in how quickly the cancers move from stage I to stage IV. They assume that the more indolent phenotype typically spends 24 months in stage I and 12 months in stage II, in stark contrast to the more aggressive phenotype, which they assume spends 8 months in stage I and 5 months in stage II. When either the indolent or aggressive phenotype eventually reaches stage III, it is assumed that both spend 6 months on average at stage III followed by 2 months in stage IV. Logically, the shorter transit times of the more aggressive cancers make them harder to identify by screening at intervals and, thus, harder to intercept early enough to save lives. “Length bias” is the technical term for this general phenomenon in all screening programs. In ovarian cancer, if the mutation spectrum (tumor protein 53 [TP53] vs the v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog [KRAS], the B-Raf proto-oncogene serine/threonine-protein kinase [BRAF], phosphatase and tensin homolog [PTEN], and beta-catenin), the distribution of grades within stage, and other features stem from 2 major and distinctive phenotypes, as the authors suggest, then we should expect more modest gains from screening than would be expected otherwise.
Are these estimates of potential benefit realistic with current technologies? What future developments could improve the potential for lives saved? The 11% estimate appears to be slightly pessimistic, because the supporting mathematical model underestimated the accuracy of the first wave of positive screens in the UK trial. The model also underestimated the fraction of detected cancers in early stages. In addition, it is impossible to avoid uncertainty in the estimation process, because it is almost certain that more than 2 ovarian cancer phenotypes exist. The maximum mortality benefit could be smaller or larger, depending on the proclivity of each phenotype to shed specific markers early, the appearance on ultrasound, and the response to therapy. In short, we cannot calculate the best currently achievable reduction in mortality, but this modeling effort suggests the realistic range of possible mortality benefits.
Setting 1 year as the appropriate interval between screenings warrants especially close evaluation. The authors observe that intervals of screening >12 months offer some advantage in a reduced opportunity for false-positive results. A 2-year interval would improve the predictive power of each positive screen and substantially reduce the total number of false-positive screen results a woman could expect over her decades in the screening program.
Conversely, screening less often than once per year comes at a price. Cancers would be missed, because CA 125 levels often rise detectibly only in the year or so before clinical diagnosis. Anderson and colleagues recently observed that the time between the noticeable upturn in CA 125, human epididymis protein 4, mesothelin, or other promising markers and the clinical diagnosis of ovarian cancer seldom exceeded 1 year.2 That is, both harms and benefits would vary with intervals between screenings set at 6 months, 12 months, 2 years, 3 years, or other intervals. For now, for those situations in which providers recommend screening and women elect to be screened, both seem likely to judge that annual screening strikes a reasonable balance between false assurance and false alarm.
Empirical studies have demonstrated and the new report highlights the reality that most CA 125 screening regimens generate many “positive” or suspicious blood test results. The proportion of screenings that test positive in a specific population depends in part on whether a single CA 125 cutoff value is used for all women or, alternatively, patient age and the trajectory of CA 125 levels over time are used (the algorithm considered here). The positive results, however determined, must be followed with ultrasonography. Ultrasound examinations, by themselves, are somewhat less sensitive than CA 125 for detecting ovarian cancer, but they are substantially less specific.3 Ultrasound examinations often cannot distinguish tumors from more common benign adnexal abnormalities. Therefore, many women must undergo surgery to rule out cancer. In fact, as many as 10 to 20 surgeries are performed for each cancer that is discovered.4 This unintended consequence poses the greatest harm from screening.
For women who are at higher risk for developing ovarian cancer because of known mutations in the breast cancer 1 (BRCA1) or BRCA2 genes or a significant family history, screening presents a more hopeful picture. In general, the background risk of the population to be screened (that is, the risk estimated in the absence of biologic assays or imaging studies) critically influences the performance of screening programs. A higher background risk means that a positive screen will be accurate more often, resulting in fewer ultrasound studies and surgeries that yield no cancer. Current practices and guidelines recognize the distinction between screening higher risk groups and the general population. Women with a significant family history, for example, are encouraged to consult their physicians, but routine screening is not recommended currently for the general population.
In effect, the risk algorithm used in the UK trial and GOG study generalizes the concept of risk-based screening decisions. Similar in principle to a simple categorization of women according to the presence of highly penetrant mutations, strong family history, or completion of menopause, the quantitative algorithmic approach differs because it incorporates other risk factors and provides a graded estimate of risk. It is easy to imagine using such an estimate to decide, for example, not to screen parous women aged <65 years who have no family history of ovarian cancer or early or multiple breast cancer. Targeted screening would improve the potential for mortality reduction in those who are screened. Of course, such a program would leave many women unscreened, some of whom subsequently would develop ovarian cancer. Whether such a middle path will emerge remains to be determined and will depend in part on findings from ongoing trials.
Empirical results from the current randomized clinical trials will provide the strongest evidence about the potential for reducing ovarian cancer death through screening. Of necessity, the lessons we can draw will be limited by the specific screening programs that were used and the particular settings. The trial results well may generate a new set of questions about how, when, and who to screen. With their focus on the biologic heterogeneity of ovarian cancer, Havrilesky et al remind us that a fundamental breakthrough in understanding the early biology of ovarian cancer could radically change the opportunities to save lives through screening. Until then, efforts will continue on many fronts: biomarker discovery and vetting in longitudinal studies, etiologic research and risk algorithm development, and improvements in ultrasound and surgery.