Pathologic and genetic data suggest that epithelial ovarian cancer may consist of indolent and aggressive phenotypes. The objective of the current study was to estimate the impact of a 2-phenotype paradigm of epithelial ovarian cancer on the mortality reduction achievable using available screening technologies.
The authors modified a Markov model of ovarian cancer natural history (the 1-phenotype model) to incorporate aggressive and indolent phenotypes (the 2-phenotype model) based on histopathologic criteria. Stage distribution, incidence, and mortality were calibrated to data from the Surveillance, Epidemiology, and End Results Program of the US National Cancer Institute. For validation, a Monte Carlo microsimulation (1000,000 events) of the United Kingdom Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) multimodality prevalence screen was performed. Mortality reduction and positive predictive value (PPV) were estimated for annual screening.
In validation against UKCTOCS data, the model-predicted percentage of screen-detected cancers diagnosed at stage I and II was 41% compared with 47% (UKCTOCS data), and the model-predicted PPV of screening was 27% compared with 35% (UKCTOCS data). The model-estimated PPV of a strategy of annual population-based screening in the United States at ages 50 to 85 years was 14%. The mortality reduction using annual postmenopausal screening was 14.7% (1-phenotype model) and 10.9% (2-phenotype model). Mortality reduction was lower with the 2-phenotype model than with the 1-phenotype model regardless of screening frequency or test sensitivity; 68% of cancer deaths are accounted for by the aggressive phenotype.
Ovarian cancer usually is diagnosed at an advanced stage and has a high mortality rate.1 Screening and early detection have the potential to reduce mortality, but several obstacles exist, including the inaccessibility of the ovaries, lack of a well defined precursor lesion, and the relatively low prevalence of ovarian cancer. In recent randomized controlled trials, screening strategies incorporating cancer antigen 125 (CA 125) and transvaginal ultrasound imaging of the ovaries have demonstrated high specificities and acceptable positive predictive values (PPVs) but produced no measurable improvement in ovarian cancer-related mortality.2, 3 Two large trials recently reported the results from initial rounds of postmenopausal screening, but follow-up to assess the impact of screening on mortality will not be complete until 2014.4, 5
Another factor that potentially may have an impact on the utility of ovarian cancer screening is disease heterogeneity. High-grade papillary serous cancers and carcinosarcomas have generalized genomic instability and frequently contain tumor protein (TP53) mutations.6-12 Conversely, many low grade serous lesions as well as other histologic types of ovarian adenocarcinoma may arise in precursor lesions, and often have mutations in the v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS), the B-Raf proto-oncogene serine/threonine-protein kinase (BRAF), the phosphatase and tensin homolog (PTEN), and beta-catenin.13-23 Kurman et al have proposed 2 distinct classifications of ovarian cancer: an aggressive phenotype that encompasses most serous cancers, undifferentiated carcinomas, and ovarian carcinosarcomas; and an indolent phenotype, which is comprised of well differentiated serous adenocarcinomas as well as all endometrioid, clear cell, and mucinous adenocarcinomas.24
The classification of epithelial ovarian cancers into aggressive versus indolent phenotypes has potential implications for the success of early detection strategies. It is likely that indolent cancers have a slower rate of progression from early to late stage and, thus, are more likely to be detected by screening. Conversely, aggressive cancers likely remain in the early stages for a much shorter time before metastasizing and are unlikely to be detected while they are localized and treated more easily. We hypothesize that early stage cancers detected by screening are more likely to be of the indolent phenotype and are not significant contributors to mortality. Consequently, estimates of the efficacy of screening in reducing mortality based on the cumulative accuracy of a screening modality may be overly optimistic. In the current study, we used a Markov mathematical model of the natural history of ovarian cancer to explore the implications of a 2-phenotype paradigm (indolent vs aggressive) for the achievable mortality reduction from annual screening strategies for ovarian cancer.
MATERIALS AND METHODS
Classification of Ovarian Cancer Subtypes
We derived stage distribution and outcomes from individual case data from the Surveillance, Epidemiology, and End Results (SEER) 17 Registries (1988-2003; based on the November 2008 submission) using SEER*Stat software version 6.5.2 (SEER Program public use data, 1973-2006; National Cancer Institute, Division of Cancer Control and Population Sciences, Surveillance Research Program, Cancer Statistics Branch, Bethesda, Md: http://seer.cancer.gov/data/; accessed November 11, 2009). We chose histologic types corresponding to those classified as indolent or aggressive by Kurman et al24 as follows: Phenotype A or “aggressive,” which included grade 2 or 3 serous carcinomas, all undifferentiated and poorly differentiated carcinomas, and all carcinosarcomas; and Phenotype B or “indolent,” which included grade 1 serous carcinomas and all endometrioid, mucinous, and clear cell carcinomas. For each phenotype, we estimated total cases, stage distribution, and overall survival for Years 1 through 15 using individual case data. Survival estimates were stratified by stage and age using the following age groups: <50 years, 50 to 69 years, and ≥70 years. Ovarian cancers of unknown stage in SEER*Stat (2.3% of cases) were excluded. The “1-phenotype model” was calibrated to aggregate stage distribution and survival for all patients. The “2-phenotype model” was stratified and calibrated to the individual stage distribution of each phenotype and to overall stage distribution, and survival was modeled individually for each phenotype.
We modified a previously developed Markov state transition model to represent the natural history of ovarian cancer,25, 26 using 1-phenotype and 2-phenotype models. The basic structure of the 1-phenotype model and basic clinical estimates have been described previously.25 Figure 1 depicts health states and allowed transitions in both models. For the 2-phenotype model, cancer states were expanded to include an aggressive phenotype (“A”) and an indolent phenotype (“B”). The health states were as follows: well differentiated, undetected ovarian cancer Phenotype A (stages I, II, III, and IV); undetected ovarian cancer Phenotype B (stages I, II, III, and IV); detected ovarian cancer Phenotype A (stages I, II, III, and IV); detected ovarian cancer Phenotype B (stages I, II, III, and IV); benign oophorectomy; ovarian cancer survivor; death from other causes; and death from ovarian cancer. In the base-case, women enter the model at age 40 years, and screening is conducted between ages 50 years and 85 years; the length of each Markov cycle is 1 month. Age-associated ovarian cancer incidence is modeled from SEER data (http://seer.cancer.gov accessed September 15, 2009).25 Because stage transition and the probability of detection are not observable, estimates for these probabilities were imputed by fitting clinically reasonable values until model-predicted cancer incidence, phenotype-specific and overall stage distribution, and mortality approximated values from the SEER database. Baseline estimates of the monthly probability of progression and detection by stage with the 1-phenotype model were as follows: stage I, probabilities (P) = .068; stage II, P = .12; and stage III, P = .075 for the probability of progression; and stage I, P = .022; stage II, P = .03; stage III, P = .12; and stage IV, P = .5 for the probability of detection. For Phenotype A with the 2-phenotype model, the estimates were as follows: stage I, P = .12; stage II, P = .16; and stage III, P = .075 for the probability of progression; and stage I, P = .0122; stage II, P = .03; stage III, P = .12; and stage IV, P = .5 for the probability of detection. For Phenotype B with the 2-phenotype model, the estimates were as follows: stage I, P = .025; stage II, P = .06; and stage III, P = .07 for the probability of progression; and stage I, P = .024; stage II, P = .03; stage III, P = .12; and stage IV, P = .5 for the probability of detection. The probability of direct progression from stage I to stage III was P = .3 for the 1-phenotype model, P = .35 for Phenotype A with the 2-phenotype model, and P = .25 for Phenotype B with the 2-phenotype model. The average time (in months) spent in each stage with the 1-phenotype model was as follows: stage I, 12.8 months; stage II, 8.3 months; stage III, 5.6 months; and stage IV, 2.2 months. With the 2-phenotype model, the average time in each stage was as follows: stage I, 8.1 months; stage II, 4.8 months; stage III, 5.7 months; and stage IV, 2.1 months for Phenotype A; and stage I, 23.8 months; stage II, 11.7 months; stage III, 5.7 months; and IV, 2.1 months for Phenotype B. Within each health state, a yearly age-associated risk of death from causes other than ovarian cancer was derived from US life tables.25 Base-case input variable values and sources have been described previously.25 We made the following key assumptions: 1) death from ovarian cancer occurs only after or concurrent with detection; 2) direct progression from stage I to stage III is permitted; 3) individuals who remain alive 15 years after cancer diagnosis become “ovarian cancer survivors” and are considered cured of ovarian cancer but may die of other causes; and 4) all women die by age 100 years.
Because SEER ovarian cancer data are not adjusted to account for prior oophorectomy for benign disease, model calibration was performed using the assumption that all women are at a single age-associated risk of developing ovarian cancer. The model was calibrated to SEER in 3 categories: 1) the lifetime probability of developing ovarian cancer, 2) the stage distribution of detected cancers, and 3) the lifetime probability of death from ovarian cancer. In the 2-phenotype model, stage distribution was calibrated to 1) the individual phenotype's stage distribution and 2) stage distribution of all cancers (Table 1). Figure 2 illustrates the model-predicted and SEER probability of developing ovarian cancer (cumulative probability: 1-phenotype model, 1.4%; 2-phenotype model, 1.4%; SEER data, 1.4%) and the lifetime probability of death from ovarian cancer (cumulative probability: 1-phenotype model, 1.0%; 2-phenotype model, 0.99%; SEER data, 1.1%).
Table 1. Stage Distribution of Detected Cancers in the Absence of Screening: 1-Phenotype and 2-Phenotype Models Compared With Surveillance, Epidemiology, and End Results Data
Aggressive Phenotype “A”
Indolent Phenotype “B”
SEER indicates the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute.
After calibration, age-associated probabilities of oophorectomy for benign indications were incorporated into the model as described previously.25 We made the assumption that, after benign oophorectomy, women are not at risk of developing ovarian cancer but may die of other causes.
Simulation of Screening
We imposed a screening program on the natural history model using a hypothetical test (or combination of tests) with the ability to vary screen frequency, sensitivity, and specificity. The strategies that we evaluated included no screening and screening at intervals of 3 months to 36 months. Characteristics of the screening test for model validation and base-case analyses were based on published characteristics (sensitivity, 0.895; specificity, 0.998) of a multimodality screening strategy that is used in the ongoing United Kingdom Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) study, an ovarian cancer screening trial that recently reported results from a prevalence round of screening approximately 101,000 women.4 For each modeled screening interval, the test results were classified as true-negative, false-negative, true-positive, or false-positive based on the underlying health state. After true-negative tests, individuals could develop ovarian cancer. After a false-negative test, an individual could experience disease progression, remain within the same disease stage, or have disease detected clinically. We estimated the total expected number of false-positive results over the screening period and calculated a PPV for a screened population by dividing the model-predicted number of true-positive tests by the sum of true-positive tests plus false-positive tests.
For validation, we simulated the prevalence screen of UKCTOCS to reproduce the stage distribution of detected cancers and the PPV of screening. A Monte Carlo microsimulation (n = 1,000,000) was performed on a population with age to begin screening distributed normally (mean age, 60 years; standard deviation, 5.5 years); this age distribution was an approximation of the age of the UKCTOCS population (mean age, 60.6 years; interquartile range, 56.1-66.2 years). In the UKCTOCS prevalence screen, the rate of ovarian cancer was 0.0007588 (76 per 100,000 women); using ages 60 to 61 years (the mean age in the UKCTOCS trial) for the simulation, the SEER rate was 0.00038 (38 per 100,000 women). For the validation exercise, the modeled relative risk (RR) of developing ovarian cancer was set at 2 to simulate the ovarian cancer prevalence observed in the trial (trial prevalence for screen-detected plus screen-undetected cancers, 0.0007588; model simulation prevalence at age 60 years using an RR of 2, 0.000776). Screening test sensitivity and specificity were set at 0.895 and 0.998, respectively. For validation against a prevalence screen, the model simulated screening beginning at the start age and ending 1 year later. The stage distribution of detected cancers and the PPV of screening were estimated.
Mortality Reduction Achievable With Screening
After validation, test characteristics were held at the values described above. We estimated the mortality reduction that might be achievable in the context of the UKCTOCS trial by simulating the age distribution and the RR of developing ovarian cancer, as described above, but with annual screening performed over a 10-year period and with a time horizon of 15 years (the maximum follow-up in the UKCTOCS trial). For the purposes of evaluating the achievable mortality reduction of applying a similar screening test to a general postmenopausal population in the United States, a base-case analysis consisting of screening all women between ages 50 years and 85 years with follow-up to age 100 years was evaluated using the 2-phenotype model. We estimated the impact of 1-phenotype versus 2-phenotype natural history models on the mortality reduction achievable by comparing screening at various intervals versus no screening. Screening frequency and test characteristics were varied for sensitivity analysis.
In a simulation of the UKCTOCS trial's prevalence screen, the model predicted the following stage distribution for screen-detected cancers: stage I, 29.7%; stage II, 11.2%; stage III, 38.9%; and stage IV, 20.2%. The predicted percentage of screen-detected early stage cancers was 40.9% (2-phenotype model) and 40.8% (1-phenotype model) compared with 47.1% (95% confidence interval, 29.8%-64.9%) of early stage screen-detected cancers in the UKCTOCS prevalence screen. The PPV of annual screening, which we estimated using Monte Carlo microsimulation (n = 1,000,000), of a 1-year screening period for patients with a mean age of 60 years was 27.4% for the 2-phenotype model and 26.5% for the 1-phenotype model compared with 35.1% (95% confidence interval, 25.6%-45.4%) reported from the UKCTOCS prevalence screen.
In the base-case analysis, annual screening of an average-risk US population performed between ages 50 years and 85 years resulted in a PPV of 14% and an average of 0.05, or 5 lifetime false-positive tests per 100 women (1-phenotype and 2-phenotype models). In Figure 3, the 2-phenotype model demonstrates that more frequent screening resulted in a lower PPV and more lifetime false-positive tests, whereas less frequent screening resulted in a higher PPV and fewer lifetime false-positive tests.
Survival and Mortality Reduction
Modeled survival for the 1-phenotype and 2-phenotype paradigms is depicted in Figure 4. The 5-year, 10-year, and 15-year survival percentages with the 1-phenotype model were 42.6%, 30.3%, and 24.7%, respectively; and, with the 2-phenotype model, the Phenotype A rates were 30.8%, 17.7%, and 12.8%, respectively, and the Phenotype B rates were 61.5%, 50.6%, and 43.7%, respectively. By using the 2-phenotype model, the percentages of Phenotype A cancers in each stage were 23% stage I, 54% stage II, 78% stage III, and 79% stage IV. Phenotype A comprised 62% of ovarian cancers and 68% of ovarian cancer deaths. The achievable mortality reduction with screening differed between the 1-phenotype and 2-phenotype models. Annual screening at ages 50 years to 85 years resulted in a mortality reduction of 14.7% with the 1-phenotype model and 10.9% with the 2-phenotype model compared with no screening. Using the 2-phenotype model resulted in estimates of achievable mortality reduction that were lower by 3 to 6 percentage points than using estimates from the 1-phenotype model regardless of screening frequency (Fig. 5). A Monte Carlo simulation (n = 1000,000) of UKCTOCS trial follow-up data (mean age to start screening, 60 years; screening period, 10 years; time horizon, 15 years from the start screening age) resulted in a reduction in ovarian cancer mortality of 10.1% with the 1-phenotype model and 6.4% with the 2-phenotype model compared with no screening.
The PPV of screening depended most on test specificity and disease prevalence. Under base-case conditions, specificity <99.8% was associated with a PPV <10% for annual screening, whereas specificities <99% were associated with a PPV <4% (Fig. 3). Screening a population with an RR of 2 for developing ovarian cancer resulted in a PPV of 24.5% compared with a PPV of 14% for screening a general postmenopausal population.
The achievable level of mortality reduction with screening was sensitive to screening frequency (Fig. 5) but not test sensitivity. Variation in test sensitivity from 75% to 99% resulted in an increased mortality reduction with screening from 9.7% to 10.9% in the 2-phenotype model.
Although there is a theoretical benefit to the early detection of ovarian cancer, this has not been demonstrated convincingly to date. Two large, randomized controlled trials have reported the results from the early rounds of screening, and the impact on disease mortality will be judged based on longer term follow-up to be reported in the next few years.4, 5 Mathematical modeling offers the ability to estimate the effects of screening strategies on clinical outcomes and to inform the design of future strategies. In the current study, we calibrated an ovarian cancer natural history model to SEER data, validated the model by approximating key results from the UKCTOCS prevalence screen within reported 95% confidence intervals, and simulated population-based screening. We estimate that the achievable mortality reduction for the UKCTOCS clinical trial is between 6% and 11% and that the longer term achievable mortality reduction for annual screening in the general population of the United States is between 10% and 15%.
The difficulty of reducing ovarian cancer mortality through screening likely is attributable not only to the stage distribution of cancer at diagnosis but also to the inherent level of aggressiveness of the cancer. Recent genetic and pathologic data from our group and others suggest that not all ovarian cancers grow and progress in a similar fashion and that these differences may be related to underlying patterns of gene expression and mutations.24 Furthermore, even within a specific histologic type, genomic profiles have been described that characterize differential stage and survival.27, 28 In the current model, we have evaluated ovarian cancer screening using 2 alternative models of the natural history of the disease. Although both models result in similar estimates of PPV, use of the 2-phenotype model results in a less optimistic estimate of mortality reduction through screening, and this difference persists at different screening frequencies.
Our findings support the commonly held clinical impression that many early stage ovarian cancers are destined to remain in the early stages for some time, whereas advanced stage cancers likely have spread rapidly. Two previous models of the natural history of ovarian cancer assumed that the time a cancer spent in each stage was distributed log normally, with a mean time in stage I of 9 months based on the opinion of several oncologists.29, 30 Those models did not differentiate time in stage based on histologic type or other classifications. In the current 2-phenotype model, the mean time in stage I was slightly shorter than in previous models (8.1 months) for the aggressive phenotype but was significantly longer (23.8 months) for the indolent phenotype. The time a cancer spends in each stage is inherently “unknowable” by direct observation. However, in a recent study, Brown and Palmer used the prevalence and incidence of ovarian cancers detected at the time of prophylactic oophorectomy in high-risk populations to estimate the duration of an “occult period” at 5.1 years for serous cancers; this period encompasses any preinvasive phase as well as invasive stages I and II.31 This estimate implies that, with improvement, early detection technologies may have the lead time needed to identify preinvasive or early invasive lesions before they metastasize. However, this estimate may not be applicable to nonserous cancers or to cancers that arise in women without evidence of high genetic risk, both of which are included in the current model. Because a preinvasive state has not been well established for most ovarian cancers, our model does not include one. Moreover, the current model also does not assume orderly progression between stages but allows for transition directly from stage I to III, which is more anatomically and biologically plausible.
The current model reinforces the favorable impact of screening less frequently and with higher test specificity on the false-positive rate and the PPV of screening (Fig. 3). In simulating the UKCTOCS prevalence screen, our model approximates (26%-27%) the observed PPV of screening (35%) within its reported 95% confidence intervals. Some of the variation between model estimates and the UKCTOCS results may be because of inexact replication of the trial's age distribution or because we modeled a 1-time test, whereas the UKCTOCS study reported results from a screening algorithm in which a substantial proportion of women received multiple tests over time.4 The estimated PPV of a similar screening test applied annually to the general population in the United States is lower (14%) because of the slightly higher prevalence of disease observed in the UKCTOCS trial compared with the ovarian cancer incidence in the United States, which was modeled as reported by SEER. Although a PPV >10% often is considered desirable,32 the costs associated with achieving this may be considerable when applying a multimodality algorithm.
There are several general limitations to our approach to modeling ovarian cancer. First, the model is limited by its reliance on SEER registry data, which may not be as accurate as data obtained in the context of a prospective trial and may not be strictly representative of national statistics. A further limitation of SEER registry data is the uncertainty of the histopathologic diagnoses available; no central review was possible and, thus, inaccuracies are unavoidable. Finally, the modeling of ovarian cancer using even 2 phenotypes probably is still overly simplistic, because overlap and clinical “outliers” undoubtedly exist. The current study is intended less as a strict interpretation of cancer progression than as an exercise to explore the possible effects of heterogeneity in tumor behavior on screening efforts.
Currently, routine screening for ovarian cancer is not recommended; our model suggests that screening will result in a modest mortality reduction. However, the results of ongoing prospective studies will clarify further the utility of postmenopausal screening. A model that accounts for indolent and aggressive cancer phenotypes reduces the predicted effectiveness of screening strategies, probably because of the preferential detection of biologically indolent cancers at an early stage versus the detection of biologically aggressive cancers at an advanced stage. We hope that this model may lend insight into the interpretation of the results from ongoing trials and may better inform the design of future strategies for reducing mortality from ovarian cancer.
CONFLICT OF INTEREST DISCLOSURES
Supported by a grant from the American Board of Obstetrics and Gynecology/American Association of Obstetricians and Gynecologists Foundation. Dr. Myers and Dr. Havrilesky have received research funding from Precision Therapeutics, Inc. Dr. Havrilesky has received research funding from B.D. Tripath Oncology. Dr. Kulasingam has acted as a consultant for Medtronic.