Recently, the Prostate, Lung, Colorectal and Ovarian (PLCO) Trial reported no mortality benefit for annual screening with CA-125 and transvaginal ultrasound (TVU). Currently ongoing is the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), which utilizes the risk of ovarian cancer algorithm (ROCA), a statistical tool that considers current and past CA125 values to determine ovarian cancer risk. In contrast, PLCO used a single cutoff for CA125, based on current levels alone. We investigated whether having had used ROCA in PLCO could have, under optimal assumptions, resulted in a significant mortality benefit by applying ROCA to PLCO CA125 screening values. A best-case scenario assumed that all cancers showing a positive screen result earlier with ROCA than under the PLCO protocol would have avoided mortality; under a stage-shift scenario, such women were assigned survival equivalent to Stage I/II screen-detected cases. Updated PLCO data show 132 intervention arm ovarian cancer deaths versus 119 in usual care (relative risk, RR = 1.11). Forty-three ovarian cancer cases, 25 fatal, would have been detected earlier with ROCA, with a median (minimum) advance time for fatal cases of 344 (147) days. Best-case and stage-shift scenarios gave 25 and 19 deaths prevented with ROCA, for RRs of 0.90 (95% CI: 0.69–1.17) and 0.95 (95% CI: 0.74–1.23), respectively. Having utilized ROCA in PLCO would not have led to a significant mortality benefit of screening. However, ROCA could still show a significant effect in other screening trials, including UKCTOCS.
Recently, the ovarian component of The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial reported its results, which showed no mortality benefit of annual screening with CA125 and transvaginal ultrasound (TVU) versus usual care.1 In addition to a mortality relative risk (RR) that was slightly, although not statistically significantly, elevated (RR = 1.18), the majority (69%) of the screen detected cancers presented in Stage III or IV.
Another large ovarian cancer screening trial is ongoing, the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). UKCTOCS is a three-armed trial, with a usual care arm, a TVU arm and a multimodal arm.2 The latter utilizes as the first-line screen the risk of ovarian cancer algorithm (ROCA), a statistical tool that considers current and past CA125 values, as well as age, to assign an ovarian cancer risk probability, categorized as low, intermediate and elevated.2,3 The arm is denoted as multimodal because positive (i.e., intermediate or elevated) ROCA results may trigger subsequent follow-up with TVU.
In contrast to the screened arm of PLCO, which utilized a cut-off value of CA125 based on the current level only, the multimodal arm of UKCTOCS, and specifically ROCA, takes into account serial levels of CA125. The unfavorable stage distribution in PLCO of screen-detected cancers, and the lack of a mortality benefit, gives rise to the speculation that the CA125 cutoff of 35 U/ml is too high and catches cancers too late. A serial CA125 algorithm may be able to detect these cancers sooner, but without engendering too high a false positive rate, by considering the CA125 trajectory over time. A high false positive rate is problematic in ovarian cancer screening due to the frequent use of oophorectomy, and attendant complications, following false positive screens.1,4
The purpose of this manuscript is to ascertain whether the use of ROCA in the PLCO trial could have favorably affected the trial's outcome. Specifically, we utilize the observed PLCO CA125 values to calculate a ROCA score at each screening visit and then analyze how many women would have had their tumor detected earlier (or later) using ROCA than they did under the standard PLCO protocol (CA125 Ò35 and/or positive TVU). Under a “best-case” scenario, any women dying of ovarian cancer who would have had their cancer detected earlier with ROCA could be considered “saved” under ROCA. If, under the best-case scenario, there was still no significant mortality reduction for the screened arm, then having used ROCA in the PLCO trial likely would not have resulted in a qualitatively different outcome. Note that we are not attempting here to simulate the results of UKCTOCS, since that trial had a different protocol, over and above the use of ROCA, than did PLCO. The differences include different lengths of the screening and post-screening periods and the use in UKCTOCS of a standardized diagnostic algorithm following a positive screen.2
The design of PLCO has been described in detail.5 Briefly, subjects aged 55–74 were randomized at 10 US centers into an intervention or usual care arm between 1993 and 2001. Two initial exclusion criteria—previous oophorectomy and current tamoxifen use—were dropped in 1996 and 1999, respectively. However, women who had undergone previous bilateral oophorectomy were not screened for ovarian cancer (they were screened for lung and colorectal cancer). The primary outcome paper excluded them and they are similarly excluded here. Other PLCO exclusion criteria included history of a PLCO cancer and cancer treatment within the past year.
At study entry, participants completed a self-administered baseline questionnaire, which inquired about demographics, general risk factors and screening and medical history. Women in the intervention arm received a CA125 blood test and TVU at baseline, an annual TVU for three additional years, and an annual CA125 for five additional years; women randomized before April 1995, received only three additional years of CA125 testing. CA125 assays were performed centrally at the Immunogenetics Laboratory at the University of California at Los Angeles; CA125 results > 35 U/mL were classified as abnormal. TVU was conducted by trained examiners using a 5–7.5 MHz transvaginal probe; ovary or cyst volume greater than 10 cubic cm, any solid area or papillary projection extending into the cavity of a cyst of any size and any mixed (solid/cystic) component within a cyst were considered positive results. Diagnostic follow-up was determined by participants' primary care physicians.
Incident cancers, and deaths, were ascertained primarily by means of a mailed Annual Study Update (ASU) questionnaire. Additionally, to obtain more complete mortality data, ASU follow-up was supplemented by periodic linkage to the National Death Index (NDI). Medical records pertaining to diagnosed cancers were obtained by the screening centers and certified tumor registrars abstracted data on stage, histology, grade and initial treatment. The underlying cause of death was determined in a uniform and unbiased manner from the death certificate and relevant medical records.5
The endpoint of the PLCO ovarian component was deaths from ovarian, peritoneal and fallopian tube cancers through 13 years of follow-up from randomization; these are denoted here for simplicity as “ovarian cancers.” As in the primary outcome paper, ovarian cancers of low malignant potential (LMP) are excluded. Included here are several more ovarian cancer cases, and deaths, determined by updating the PLCO data set.
Intervention arm cases were categorized by method of detection as either screen detected, interval, post-screening or never-screened. Screen detected cases were diagnosed within a year of a positive screen, interval cases were non-screen detected cases diagnosed within 3 years of a screen and post-screening cases were diagnosed more than 3 years from the last screen; never-screened cases had no PLCO screens.
The ROCA utilizes current and past CA125 values, as well as age, to estimate absolute ovarian cancer risk.3 As used in UKCTOCS, it categorizes women at each screen into three risk categories: normal, intermediate and elevated risk.2 Initially in UKCTOCS, the cutoffs for these categories were set at 1/1,818 for intermediate and 1/500 for elevated risk; during the trial these were modified to 1/3,500 and 1/1,000, respectively, to try to attain the desired distribution of 85% low, 13% intermediate and 2% elevated risk.2 In this analysis, we used the modified cutoffs for our primary analysis. We also examined the raw ROCA results for all PLCO screens to derive cutoffs that gave the desired 85/13/2 percent distribution in PLCO women; for a sensitivity analysis, these derived cutoffs were also evaluated. In UKCTOCS, prescribed diagnostic follow-up algorithms were employed for intermediate and elevated risk ROCA scores; these involved further ROCA tests and TVUs, followed by biopsy referral if warranted.
Figure 1 shows a flowchart of the analysis scheme. First, for each intervention arm woman with an ovarian cancer diagnosis and any PLCO screens, we examined CA125 levels at each screening year (SY), as well as age, in order to compute ROCA scores at each screen. The next step was to determine the earliest SY for which ROCA demonstrated a stable positive (i.e., intermediate or elevated risk) score, defined as a positive that was not followed by a negative (i.e., low risk) ROCA score at a later screen; this is denoted as the earliest stable positive ROCA SY. In the absence of any stable positive ROCA score, there was no earliest stable positive ROCA SY. The requirement that the positive not be followed by a negative was to guard against a spurious (false) positive ROCA score. For the final PLCO screen there was no subsequent exam for which to assess whether ROCA was subsequently negative; to guard against spurious positive ROCA scores in this instance it was assumed that if the cancer was diagnosed more than 3 years from the last PLCO screen (and last ROCA score) that any positive ROCA scores were spurious and not stable positives. The rationale for using the 3 year cutoff comes from an analysis of the PLCO data. Specifically, we examined all subjects whose first positive ROCA score was 3.0–4.9 years before ovarian cancer diagnosis and who had at least one subsequent ROCA determination. Of these, 17/18 (94%) had a subsequent negative ROCA score. Additionally, the screening results for women with subsequent cancer within 3.0–4.9 years were essentially identical to those for women without cancer; median CA125, percent with ROCA score intermediate or elevated and percent TVU positive were 11.0, 19.9% and 3.7% in the former compared to 10.0, 17.7% and 3.5% in the latter. Note also that the design of UKCTOCS specifies that the last scheduled screen is 3 years before the end of follow-up, suggesting that screening with UKCTOCS modalities (including ROCA) is thought to have little potential impact on cancers diagnosed after this time interval.
We defined a similar earliest stable positive SY for the combined PLCO modality of CA125/ TVU, determining the earliest positive screen, if any, that was not followed by a negative screen, and compared the two earliest stable positive SYs (PLCO vs. ROCA). Women whose earliest stable positive SY with ROCA occurred earlier than with PLCO screening (including instances where there was no earliest stable positive SY under PLCO) were defined as having earlier diagnosis under ROCA. Figure 2 displays six scenarios of CA125/ TVU and ROCA outcomes over time to illustrate the process of determining whether diagnosis would have occurred earlier under ROCA.
We hypothesized two scenarios for women with earlier diagnosis under ROCA. First, we evaluated a “best-case” scenario, where all subjects who died of ovarian cancer and who would have had their diagnosis moved up, i.e., earlier, with ROCA were presumed to have survived. Subjects not dying who had their diagnosis moved back (i.e., later) due to ROCA were presumed not to be affected in terms of ovarian cancer mortality, nor were subjects dying of ovarian cancer whose diagnosis was not moved up under ROCA. We also examined a more realistic scenario, which was less optimistic but still assumed a large benefit from earlier detection with ROCA. Under this “stage-shift” scenario, women with diagnosis moved up under ROCA were assumed to have the same ovarian cancer specific survival as PLCO screen-detected Stage I/II cases and a total follow-up time in PLCO equal to the trial average (11.4 years); 10 year survival for these cases (n = 23) was 64%. Prevented deaths were then calculated as N(1 - D) where N is the number with diagnosis moved up and D is the expected proportion of these dying of ovarian cancer during the trial based on the Stage I/II survival curves. Cases diagnosed later, or never, with ROCA were still, as with the best-case scenario, not counted against ROCA, since there was no evidence that the PLCO screening regimen affected mortality.
In order to estimate the amount of time that diagnosis would have been advanced in those women with earlier stable positive screens under ROCA, we assumed that such women would have had ovarian cancer diagnosis 74 days after the first stable positive ROCA screen, where 74 days was the median time from first positive screen to diagnosis in PLCO.
As noted above, cases diagnosed more than three years from the last screen had little chance to be affected by screening with either CA125 (single cutoff or ROCA) or TVU. In PLCO, the a priori primary endpoint was all deaths from ovarian cancer, regardless of when the disease was diagnosed. Therefore, deaths from cancers diagnosed in this time period, generally eight or more years from study entry, are essentially “noise” and only serve to attenuate both the magnitude and statistical significance of the mortality RR estimate. In the screening trial literature, this phenomenon is known as the “dilution” effect.6 To take into account the dilution effect, we also estimated mortality RRs, under both the best-case and stage-shift scenarios, after subtracting out those deaths, in each arm, that arose from cancers diagnosed more than 3 years after the end of scheduled screening. This corresponds to the start of study year 8 for subjects enrolled from April 1995 on (about 85%) and the start of study year 6 for those enrolled before April 1995, who only had four annual screens instead of six.
Table 1 summarizes the ovarian cancer cases by arm in the trial. Among the analysis set of 34,253 intervention and 34,304 control arm women with at least one ovary at baseline, there were 243 and 218 ovarian cancers, respectively. About 4/5 of the cases were primary tumors of the ovary, 3/4 were Stage III or IV and 80–85% were either cystadenocarcinoma or adenocarcinoma not otherwise specified.
Table 2 shows the number of cases and deaths by method of detection. A total of 25% (n = 60) of intervention arm cases were screen detected by CA125 (with or without TVU positivity), 5% screen detected by TVU only and 33% post-screening. The distribution of the deaths by method of detection was generally similar to that of the cases. Among screen detected cases, ovarian cancer-specific survival varied significantly (p = 0.01, log-rank test) by method of detection, with CA125 positive-TVU negative women having the lowest 5 year survival (41.9%) and CA125 negative-TVU positive (“TVU-only”) women having the highest (84.6%). Survival for the post-screening (5-year survival 30%) and never-screened (5-year survival 27%) cases were lowest of all the groups. The TVU-only cases had a much greater proportion of cancers being Stages I–II (77%) than all of the other groups (11–36%).
Table 1. Ovarian cancer cases in PLCO
Table 2. Ovarian cancers, deaths and survival by method of detection
Table 3 summarizes the estimated changes in diagnosis time by method of detection category. Among CA125 screen detected cases, ROCA moved up (earlier) the diagnosis in 13 of 40 (33%) fatal cases and 6 of 20 (30%) non-fatal cases. None of the TVU-only cases had diagnosis moved earlier. Of interval cases within one year, 10 of 19 fatal and 5 of 9 non-fatal cases were moved earlier; the corresponding figures were 2 of 18 (fatal) and 6 of 18 (non fatal) for 1–3 year interval cancers. With respect to diagnoses delayed under ROCA, two were delayed for CA125 screen detected cases and 8 for TVU-only screen detected cases. Of the 25 (13+10+2) fatal cases with diagnosis moved earlier with ROCA, the median interval between original and modified diagnosis date was 344 days, with an inter-quartile range of 247–542 and a minimum of 147 days. Of the cases detected later (or never) with ROCA, 8 of 10 were Stage I/ II (including 7 of 8 TVU-only detected cases).
Table 3. Potential changes in diagnosis time with ROCA
The above results were obtained using the UKCTOCS cutoffs for ROCA of 1/3,500 and 1/1,000. Over all screening rounds, these cutoffs gave proportions of 82.3%, 14.3% and 3.4% of the screened population in the low, intermediate and elevated risk categories, respectively. To achieve the desired breakdown of 85%, 13% and 2% in the PLCO screening population, the derived cutoffs were 0.00032 (1/3,125) and 0.00159 (1/629). We re-ran the above analyses using these cutoffs and the results were similar. There were four fewer women with earlier detection with ROCA using the modified cutoffs as compared to the original cutoffs, with one of these a fatal case. In addition, there were three additional cases with later (or never) detection with ROCA using the derived cutoffs, all in non-fatal cases.
Table 4 gives revised mortality RR estimates based on the best-case and other scenarios under ROCA. The observed RR, based on 119 control and 132 intervention arm deaths from ovarian cancer was 1.11 (95% CI 0.87–1.42). Under the best-case scenario there were 25 fewer intervention arm deaths, giving an RR of 0.90 (95% CI 0.69–1.17). The stage-shift scenario showed 18.8 expected fewer intervention arm deaths and an RR = 0.95 (95% CI: 0.74–1.23). With 24 prevented deaths using the derived cutoffs (and best-case scenario), the RR was 0.91 (95% CI: 0.70–1.18). For the analysis accounting for dilution, the RRs were 0.84 (95% CI: 0.62–1.15) and 0.91 (95% CI: 0.67–1.23) for the best-case and stage-shift scenarios, respectively, with 34 intervention and 32 control arm deaths excluded due to diagnosis more than 3 years after the end of scheduled screening.
Table 4. Potential modified relative risks for ovarian cancer death
This exercise in retrospectively applying the ROCA algorithm to intervention arm subjects in PLCO shows that, under the best-case scenario, having had used ROCA as the screening modality in PLCO would not have produced a statistically significant, or clinically substantial, mortality effect in the trial, with best-case and stage-shift RR estimates of 0.90 (95% CI: 0.69–1.17) and 0.95 (95% CI: 0.74–1.23), respectively. A limitation to this analysis, however, is the relatively small number of ovarian cancer deaths and the resulting lack of precision of our best-case (and the PLCO) mortality RR estimates, as evidenced by the rather wide 95% confidence limits. Therefore, chance may be playing some role in our findings. More specifically, the observed modest excess of deaths in the intervention (n = 132) compared to the control arm (n = 119) clearly makes it more difficult to achieve a substantial and/or statistically significant mortality reduction with ROCA, even under the best-case scenario. Although it is possible that this small excess represents a true elevated mortality risk with screening, most would probably agree that it is likely purely due to chance, in the face of a null or slightly favorable true mortality benefit for PLCO screening. Accounting for a probable dilution effect in PLCO, i.e., for deaths arising from cancers diagnosed at a time when screening was unlikely to affect them, led to a modest reduction in the mortality RRs, to 0.84 and 0.91 for the best-case and stage-shift scenarios, respectively, which still failed to reach statistical significance.
Another limitation of this analysis is that we assumed that follow-up of positive ROCA screens was the same as that for positive PLCO screens, and utilized the median interval in PLCO from first positive screen to diagnosis (74 days) as the time from first positive ROCA screen to diagnosis. However, as mentioned above, the targets with ROCA are 13% of women classified as intermediate and 2% classified as elevated risk; this total 15% rate of referral is substantially greater than the 5% rate observed in PLCO. A major problem in PLCO was the high rate of oophorectomy among positive screens without evidence of cancer. Therefore, for ROCA to be practical, the oophorectomy rate in intermediate risk women, over 99% of whom will not have ovarian cancer, must be very low. In UKCTOCS this rate is kept low by the diagnostic follow-up algorithm for intermediate risk, which mandates a repeat ROCA at 3 months followed by either another repeat ROCA at 3 months, for an intermediate ROCA score, or TVU for an elevated ROCA score.2 In part due to this algorithm, in the baseline screening round in UKCTOCS the median interval from initial screen to diagnosis for women with an intermediate initial ROCA result (n = 9) was 274 days; median time was 75 days for women (n = 42) with an elevated initial ROCA result.2 Thus women with intermediate ROCA risk will likely be followed up less aggressively than were women with a positive screen in PLCO. Of the 42 PLCO cases detected earlier with ROCA, 24 (57%) had an intermediate ROCA score, including 12 of 25 (48%) fatal cases. Thus our calculated median advance interval of 344 days for the above 25 fatal cases is likely an overestimate of the true median advance interval had ROCA screening taken place in PLCO with the UKCTOCS follow-up algorithm employed.
The survival patterns presented here stratified by mode of detection were intriguing and may shed light on the potential benefit of screening. Interval cases in PLCO had similar survival (5 year survival 49–57%) as cases screen detected with CA125 alone (42–46%) and significantly (p = 0.04) better survival than post-screening cases (30%). In general, due to length-biased sampling, which selects out on average the faster growing tumors as interval cases, interval cases tend to have worse survival than both screen detected cancers and cancers diagnosed in the absence of screening (post-screening cases here).7 Thus, it is puzzling why this particular pattern was observed in PLCO. Better observed survival in screen detected cancers, of course, does not itself imply a mortality benefit of screening, due to lead time and over-diagnosis bias.
However, among screen detected cancers, those detected with TVU alone (and negative CA125) had substantially and statistically significantly better survival (5 year survival 84%) than those detected solely by CA125. It is not clear whether this high level of survival in the TVU-only detected cancers resulted from early detection and intervention or whether these tumors are intrinsically of low risk. If the former is the case, then including TVU with ROCA would be critical, since ROCA only detected 7 of these 13 TVU-only tumors. Stratton et al., though, showed that larger ovarian tumors, which are more likely to be detected by TVU, had better survival than smaller tumors in a cohort of unscreened patients, suggesting that the latter possibility, that these are intrinsically lower risk, may at least partially explain the difference.8
In PLCO, under 25% of CA125 detected tumors were Stage I/II and survival for these cancers was poor, both of which suggest that CA125 with the cutoff of 35 U/ml is detecting ovarian cancer too late. Whether ROCA can be effective in reducing ovarian cancer mortality will depend on whether the algorithm can detect changes in CA125 levels reflective of ovarian cancer progression early enough while still maintaining a reasonable referral rate; additionally, the corresponding work-up process must result in a relatively rapid diagnosis of cancers while limiting the number of “unnecessary” oophorectomies.
In conclusion, having utilized ROCA in PLCO would not likely have led to a statistically significant or substantial mortality benefit of screening in that study. This result does not imply that utilizing ROCA in another setting, and specifically in the UKCTOCS trial, would give a similarly null result. The results of that trial, expected in 2014/2015, are eagerly awaited.9
Dr. Skates is a co-inventor of ROCA (patent no. 5800347).