The performance of screening tests for ovarian cancer: results of a systematic review

Authors


Correspondence: Dr R. Bell, Northumberland Health Authority, Morpeth, Northumberland NE61 5PD, UK.

Abstract

Objective To estimate the performance of currently available tests in detecting ovarian cancer in asymptomatic women.

Methods Systematic review of prospective screening studies.

Results Twenty-five studies were identified: sixteen studied women at average risk and nine studied women at higher risk. Most studies evaluated only one screening method, were small, detecting few cancers, and gave few follow up details. Sensitivity estimates are therefore imprecise. In a typical larger study, reported sensitivity of ultrasound screening at one year was around 100% (95% CI 54%–100%), while the sensitivity of CA125 measurement followed by ultrasound (multimodal screening) was about 80% (95% CI 49%–95%). False positive rates ranged between 1.2% and 2.5% for grey scale ultrasound, between 0.3% and 0.7% for ultrasound with colour Doppler and between 0.1% and 0.6% for multimodal screening. This implies that, in annual screening of a population with an incidence of 40 per 100,000, and if no cancers were missed, between 2.5 and 60 women would undergo surgery for every primary ovarian cancer detected.

Conclusions Ultrasound and multimodal screening can detect ovarian cancer in asymptomatic women, but there is currently no evidence on whether screening improves outcome for women in any risk group. On-going randomised controlled trials should establish the magnitude of any benefit of screening. The low prevalence of ovarian cancer in the population, and its rate of progression, may limit the potential cost-effectiveness of screening.

INTRODUCTION

The overall five year survival rate from ovarian cancer remains poor at about 30%, and there has been little improvement in survival over the past 20–30 years1–3. Survival is much better (around 75% at five years) for women who present when the disease is localised to the ovaries (FIGO Stage I), but currently about three quarters of women in the UK present at a later stage1. This has led to interest in population screening, in the hope that if ovarian cancer can be detected in asymptomatic women this will lead to earlier treatment and improved outcomes. However, it is not obvious that stage shifts at diagnosis can be achieved by screening, or that this would result in a significantly large survival advantage at an acceptable financial and human cost.

Tests investigated as potential screening methods include ultrasound scanning (transvaginal or transabdominal), and measurement of serum CA125. Ultrasound scanning detects ovarian enlargement and morphological abnormalities which may indicate the presence of a tumour; the transvaginal route is preferred because of the more detailed images obtained4. Recently colour Doppler imaging (CDI) has been used as an adjunct to grey scale ultrasound, which provides images of ovarian vasculature and estimates of flow velocity, with the aim of identifying abnormal blood flow patterns suggestive of malignancy5. CA125 is an antigen produced by most primary ovarian malignancies, but raised levels may also be found in other malignancies and certain benign gynaecological conditions6. Measurement of serum CA125 has been used in screening, either in combination with ultrasound scanning, or as an initial test, with ultrasound carried out only in those women showing elevated levels of CA125 (‘multi-modal screening’).

The effectiveness of screening has not been evaluated in a randomised controlled trial (RCT) although several trials are in progress7–9. Such trials are necessary to estimate the impact of screening on mortality and quality of life. In the absence of results from trials, an assessment of the potential benefits, harms and costs of screening may be made using indirect evidence. One important factor influencing the likely impact of screening is the performance of screening tests in distinguishing between women with and without ovarian cancer. The more sensitive the test, the greater the ability to detect cancer at an earlier stage, and the greater the potential benefit, if treatment at an earlier stage results in increased survival. The more specific the screening test, the lower the harms due to unnecessary investigations of women without ovarian cancer who are falsely screened positive.

This paper summarises the results of a systematic literature review commissioned by the NHS Health Technology Assessment Programme. It assesses the evidence on the ability of currently available tests to detect ovarian cancer in asymptomatic women, and examines the implications for population screening.

METHODS

A computerised search was performed on MEDLINE (1966–1996), EMBASE (1982–1996) and CANCERLIT (1966–1996). Keywords included ovarian neoplasms, screening and early detection or diagnosis. The references of review papers and primary studies were checked and experts in the field contacted. Full details will be available elsewhere10.

Retrieved titles and abstracts were independently assessed by two reviewers. Studies in any language evaluating tests to detect ovarian cancer were selected if the women included in the study were asymptomatic, and if women with abnormal test results were followed up with a definitive diagnostic investigation.

Studies in women scheduled for surgery because of clinical gynaecological problems or presenting with symptoms suggestive of ovarian cancer were excluded, because the estimates of sensitivity and specificity are unlikely to be applicable to screening. Results reported here exclude studies of clinical pelvic examination as a screening test.

Data were extracted on a standard proforma by one reviewer and checked by a second reviewer. Details were recorded relating to the screening protocol, the participating women, the method of recruitment, the number of women screened, the method and completeness of follow up of women screened negative, the procedures performed and the diagnosis in women screened positive and the completeness of result reporting. These were used to assess study validity11.

The reported crude data were used to calculate the following summary statistics for each included study:

  • 1The test sensitivity (the proportion of primary ovarian cancers diagnosed in the study population which were screen-detected).
  • 2The prevalence of screen-detected primary ovarian cancers and the proportion which were at Stage I.
  • 3The recall rate (the proportion of screened women who were recalled for further assessment).
  • 4The false positive rate determined at diagnostic surgery (the proportion of screened women who underwent diagnostic surgery but did not have primary ovarian cancer).

Exact confidence intervals were calculated where appropriate12. No quantitative pooling of sensitivity and specificity was carried out, as the thresholds used to define abnormal results and the methods of follow up varied between studies. Pooled estimates of the prevalence of screen-detected cancer, and the percentage of cancers detected at Stage I were calculated. These estimates do not depend on the method or length of follow up and were therefore considered to be more consistent across the studies than measures of test sensitivity.

RESULTS

Study details

Twenty-five separate studies were identified (Table 1)13–37. Four comparisons of screening methods have been published24,25,38,39, while the remaining studies evaluated a single screening protocol. Sixteen studies mainly recruited women who were at average risk of ovarian cancer for their age. Many of these studies recruited volunteers responding to publicity; two invited a random sample from a population register19,23, while a further two recruited women already attending for breast or cervical screening17,25. Each study set a minimum age limit, usually 45 or 50 years, and several studies excluded premenopausal women13,15,20,27,28,37.

Table 1.  Prospective screening studies. TAS = trans-abdominal sonography; TVS = trans-vaginal sonography; CDI = Colour Doppler imaging.
StudyNo. screenedInclusion criteriaScreening method
Greyscale ultrasound (6 average risk, 1 higher risk)
  Goswamy 198313 UK1084Aged 39–78 postmenopausalTAS
  Millo 198917 Italy500Aged 45+ or postmenopausal (mean 54 years)TVS
  Campbell 198916 UK5479Aged over 45 or with family history (4%) (mean 53 years)TAS (3 screenings at 18 month intervals)
  Demidov 199018 Russia11,996Aged 18+US (not otherwise specified)
  Tabor 199419 Denmark435Aged 46–65TVS
  Van Nagell 199520 US8500Aged 50+ and postmenopausal, or 25+ with a family historyTVS (invited for rescreening after 1 year)
  Andolf 198614 Sweden805Aged 40–70 years attending gynae OPD (higher risk group)TAS
Ultrasound with CDI (2 average risk, 1 higher risk)
  Kurjak 199421 Croatia5013Aged 40–71 (mean 45 years)TVS and CDI
  Vuento 199523 Finland1364Aged 56–61 (mean 59 years)TVS and CDI
  Weiner 199322 Israel600Previous breast cancer (higher risk group)TVS and CDI
Ultrasound followed by CDI (1 average risk, 1 higher risk)
  Parkes 199425 UK2953Aged 50–64TVS then CDI
  Bourne 199324 UK1601Aged 17–79 (mean 47 years) and with a family history of ovarian cancerTVS alone for 1000 women, TVs then CDI for 601 women
Ultrasound followed by other secondary tests (3 average risk)
  Sato 199226 Japan15,282Aged 30+ yearsTVS then CT, MRI and combination of tumour markers
  Holbert 199428 US478Aged 30–89 and postmenopausalTVS then CA125
  Schincaglia 199427 Italy3541Aged 50–69TAS then aspiration cytology or biopsy
Ultrasound with CA125 measurement (6 higher risk)
  Akulenko 199229 Russia1003Aged over 18 and with positive family historyUS and CA125, CA19–9, REA
  Karlan 199330 US597Aged 35+ and with positive family historyTVS with CDI and CA 125
  Muto 199331 US384Aged 25+ and with positive family historyTVS and CA125
  Belinson 199533 US137Aged 23+ (mean 43 years) and with positive family historyTVS with CDI, and CA125
  Schwartz 199532 US247Aged 30+ (median 42 years) and with positive family historyTVS with CDI, and CA125
  Dorum 199634 Norway180Aged 18+ (mean 43 years) and with positive family historyTVS and CA125
CA125 followed by ultrasound multimodal screening (4 average risk)
  Jacobs 198815 UK1010Aged 45+(mean 54 years) and postmenopausalCA125 then TAS
  Jacobs 199337 UK22,000Aged 45+ (median 56 years) and postmenopausalCA125 then TAS
  Grover 199536 Australia2550Aged over 40 (median 51 years) or with family history (3%)CA125 then TAS/TVS
  Adonakis 199635 Greece2000Aged over 45 (mean 58 years)CA125 then TVS

Nine studies recruited women at higher risk of developing ovarian cancer14,22,24,29–34. Seven of these investigated screening in women with a family history of ovarian cancer or certain other cancers24,29–34. The minimum age for entry into these studies was considerably lower, usually between 18 and 25 years.

The number of women screened ranged from 435 to 22,000 (median 2572) for studies of average risk women, and from 137 to 1601 for studies of higher risk women.

Definitions of abnormal results varied between studies even when the same test was used, and many studies did not report either their definition of an abnormal result or their full screening protocol. Details of those screening protocols which were reported are available elsewhere10.

Test sensitivity

In a prospective screening study, the number of cases ‘missed’ by screening can be estimated from the subsequent diagnosis of clinical ovarian cancer in women screened negative and followed up. The number of these ‘false negatives’ increases with the duration of follow up. It is not possible to determine whether these ‘missed’ cancers were present at the time of screening, or developed subsequently. This indirect estimate of sensitivity therefore indicates the proportion of cancers which would have been screen-detected, if the interval between screens was equivalent to the length of follow up. Only 6 of the 25 studies reported any details of methods used to record clinically diagnosed ovarian cancers occurring in women screened negative15,16,23,24,27,37. Three further studies reported information concerning such interval cancers, but gave no methodological details20,25,35.

No studies of ultrasound screening reported any cases of primary ovarian cancer diagnosed clinically within 12 months of a negative screen (Table 2). Two studies of ultrasound screening (with CDI) followed up women for more than one year, and both reported ovarian cancers diagnosed between 1 and 2 years after a negative screen23,24. One of these studies reported 60% sensitivity (95% CI 26%–88%) at four years24. This is the only study in higher risk women for which sensitivity can be estimated, and so it is not possible to assess whether test sensitivity varies with risk. The sensitivity of multimodal screening appears lower than that for ultrasound. One study of 22,000 women, the largest screening study reported to date, reported 79% (95% CI 49%-95%) sensitivity at one year and 58% (95% CI 34%-80%) at two years37.

Table 2.  Sensitivity of screening tests at 1 year follow up*. TAS = trans-abdominal sonography; TVS = trans-vaginal sonography; CDI = Colour Doppler imaging; FNA = fine needle aspiration cytology; NS = not stated.
StudyNo. of women screenedTestCancers detected at screening (n)Method of follow upCancers arising in women screened negativeSensitivity after 1 year follow up (95% CI)
  1. *All studies recruited mainly women who were at average risk except Bourne 1993, which recruited women with a positive family history.

Campbell 1989165419TAS5 (after 3 screening rounds)89% women contacted at 1 yearNone100 (48–100)
Van Nagell 1995208500TVS8NS1 at 1 year (discovered at surgery)89 (52–100)
Vuento 1995231364TVS with CDI1Cancer registryNone at 1 year; 1 at 3.5 years100(3–100)
Bourne 1993241601TVS or TVS then CDI6100% women contacted between 6 and 16 months following screeningNone at 1 year; 4 at 4 years100(54–100)
Parkes 1994252953TVS then CDI1NSNone at 1 year; 1 at 19 months100(3–100)
Schincaglia 1994273541TVS then FNA2Cancer registry and annual questionnaire: 100% completeNone at 1 year100(16–100)
Jacobs 1988151010CA125 then US1Postal questionnaire: 110% responseNone at 1 year100(3–100)
Jacobs 19933722,000CA125 then US11Postal questionnaire: 99% response at 1 years, 57% at 2 years3 at 1 year: 8 at 2 years79 (49–95)
Adonakis 1996352000CA125 then US2NSNone at 1 year100(16–100)

Stage at diagnosis

If screening is to have the potential to improve outcome, a necessary (but not sufficient) condition is that it should result in an increase in the proportion of cancers diagnosed at an early stage. Tables 3 a & b show the reported proportion of screen-detected primary ovarian cancers diagnosed at Stage I in the first (prevalence) screening round. Studies where it was unclear that only prevalence screening was reported18,20,21, or where stage information was not reported26, are excluded.

Table 3 a.  Stage at diagnosis and prevalence of screen-detected primary ovarian cancer in prospective screening studies in the general population.
 No. screenedScreen detected primary ovarian cancers [of which borderline tumours] (n)Prevalence of screen-detected cancer per 100,000 (95% CI)Proportion diagnosed at Stage I (% and 95% CI)
Ultrasound screening*
  Goswamy 1983131,0841 [0]921/1
  Millo 19891750000
  Campbell 1989 (first screen)1654792[1]362/2
  Tabor 19941943500
  Vuento 1995231,3641 [1]731/1
  Parkes 1994252,9531 [0]341/1
  Schincaglia 1994273,5412[0]560/2
  Holbert 1994284781 [0]2101/1
  All ultrasound studies15,8348 [2]51 (16–90)6/8 (75%; 95% CI 35–97)
  All ultrasound studies excluding borderline tumours15,834638 (8–68)4/6 (67%; 95% CI 22–96)
Multimodal screening
  Jacobs 1988151,0101 [0]991/1
  Jacobs 19933722,00011 [0]504/11
  Grover 1995362,5500[0]0
  Adonakis 1996352,0002[1]1002/2
  All multimodal studies27,56014 [1]51 (24–78)7/14 (50%; 95% CI 23–77)
  All multimodal studies excluding borderline tumours27,5601347 (22–73)6/13 (46%; 95% CI 19–75)
Table 3 b.  Stage at diagnosis and prevalence of screen-detected primary ovarian cancer in prospective screening studies in higher risk women.
  1. *Excludes studies where more than one screening round may have been reported together.

Ultrasound screening
  Andolf 199614 (Outpatient attenders)8053 [2]3722/3
  Bourne 199324 (Positive family history)1,6016 [3]3755/6
  Weiner 199322 (with breast cancer)60035001/3
Ultrasound with CA125
  Karlan 199330 (family history)5971 [1]1681/1
  Muto 199331 (family history)38400
  Schwartz 199532 (family history)24700
  Belinson 199533 (family history)13717300/1
  Dorum 199634 (family history)1807 [3]3,8893/7
All higher risk ultrasound studies4,55121 [9]46112/21 (57%; 95% CI 34–78)
All studies on women with a family history3,14615 [7]4779/15 (60%; 95% CI 32–84)
All studies on women with a family history excluding borderline tumours3,14682542/8 (25%; 95% CI 3–65)

Average estimates based on the reported data were calculated separately for ultrasound and multimodal screening in average risk populations, and for all screening studies of higher risk women. For women at average risk, 75% (95% CI 35%–97%) of primary ovarian cancers were at Stage I when detected by ultrasound screening, and 50% (95% CI 23%–77%) when detected by multimodal screening. For either method, this appears to be a higher proportion than is found at routine presentation in the largely unscreened UK population, where around 22%28% of all primary ovarian cancers are diagnosed at Stage I (personal communication, Information Department, Thames Cancer Registry, D. Hole, Scottish Cancer Therapy Network). In studies of women at higher risk, 60% (95% CI 32%–84%) of screen-detected tumours were at Stage I when borderline tumours are included, but only 25% (95% CI 3%–65%) if they are excluded.

The prevalence of screen-detected cancer can be used to estimate the lead time; that is, the average time by which screening brings forward diagnosisa. The prevalence of screen-detected primary ovarian cancers in studies of average risk women is around 50 per 100,000 (95% CI 30%–72%). (Table 3a). Assuming an average annual incidence of 40 per 100,000 in unscreened women of a similar age41, then at prevalence screening, the number of cancers detected is equivalent to that which would normally be diagnosed over a period of around 15 months (95% CI 8–22 months). On average, then, screening brings forward diagnosis by half this length of time42, or around eight months.

False positive rates

The false positive rate is conventionally calculated as the proportion of all women without the disease who have a positive test result [false positives/(false positives + true negatives)]. For most studies, however, lack of follow up means that the number of true negatives is unknown. Ovarian cancer is relatively rare, and so the number of women without disease is very similar to the number screened. The false positive rate can, therefore, be approximated by the proportion of all women screened who were false positives, which can be estimated from the studies (Table 4).

Table 4.  False positive rates reported in prospective screening studies.
StudyAge and menopausal status of screened women (general population studies only)False positive rate: % of all screened women (95% CI)
  1. *Criteria for positive screening result not fully reported.

  2. † Incomplete follow up: significant numbers of women awaiting further assessment, or significant numbers of screen positive women did not undergo diagnostic intervention.

Grey-scale ultrasound screening
General population
  Goswamy 19831339–78 postmenopausal1.3 (0.7–2.1)†
  Campbell 19891645–78Screen 1:3.5 (3.0–4.0)
  Screen 2: 1.8 (1.4–2.2)
  Screen 3: 1.2 (0.8–1.6)
  Millo 19891745+ or postmenopausal1.2 (0.5–2.6)*
  Demidov 19901818+2.1 (1.8–2.4)*
  Tabor 19941946–652.1 (0.9–3.9)
  Van Nagell 199520Mixed group: 50+ and postmenopausal, or 25+ with family history1.3 (1.1–1.6)
Higher risk
  Andolf 198614 4.5 (3.2–6.1)*†
  Bourne 199324 4.9 (3.6–6.4)
Ultrasound with CDI
General population
  Kurjak 19942140–71 (mean 45)0.7 (0.4–0.9)*†
  Vuento 19952356–61 (mean 59)0.3 (0.1–0.8)*†
Higher risk Weiner 199332 2.5 (1.4–4.1)†
Ultrasound followed by CDI
General population
Parkes 19942550–640.5 (0.3–0.8)
Higher risk
  Bourne 199324 1.0 (0.4–2.2)
Ultrasound followed by other secondary tests
General population
  Sato 19922630 +0.3 (0.2–0.4)*
  Schincaglia 19942750–690.5 (0.3–0.8)
  Holbert 19942830–89 postmenopausal1.9 (0.9–3.6)
Ultrasound with CA125
Higher risk
  Akulenko 199229 1.3 (0.7–2.2)*
  Karlan 199330 1.5 (0.7–2.8)*†
  Muto 199331 3.9 (2.2–6.4)
  Schwartz 199532 0.4 (0.0–2.2)*†
  Belinson 199533 0.7 (0.2–4.0)*†
  Dorum 199634 3.9 (1.6–7.8)
Multimodal screening
General population
  Jacobs 19881545+ (mean 54) postmenopausal0.2 (0.02–0.7)
  Jacobs 19933745+ (median 56) postmenopausal0.1 (0.09–0.2)
  Grover 19953640+ (median 51)0.3 (0.1–0.6)
  Adonakis 19963545+ (mean 58)0.6 (0.3–1.0)

Studies in women at average risk which used grey-scale ultrasound alone reported higher false positive rates (range 1.2%-2.5%) than either those using ultrasound with CDI (range 0.3%–0.7%) or those using CA125 followed by ultrasound (range 0.1%-0.6%). Studies on higher risk populations tended to have higher false positive rates compared with studies using the same screening method on populations at average risk (Table 4).

The false positive rate is particularly important in ovarian cancer screening because definitive diagnosis can only be made at laparotomy or laparoscopy. The false positive rates calculated therefore give the proportion of screened women who underwent diagnostic surgery, but did not have primary ovarian cancer. In studies which reported details of diagnostic procedures, the majority of these women underwent laparotomy and oophorectomy10.

Applied to a population with an annual incidence of ovarian cancer of 40 per 100,000, such false positive rates imply that between 30 and 60 diagnostic surgical procedures would be carried out for every cancer detected at annual grey-scale ultrasound screening (if no cancers are missed), and between 7 and 17.5 procedures if CDI is used. For multimodal screening, between 2.5 and 15 procedures would be required for every cancer detected, again assuming 100% sensitivity.

Screening can cause harm not only because of diagnostic surgery in false positives, but also because of the distress and anxiety caused in the many women who initially test positive43. Studies of grey scale ultrasonography as the initial screening test reported recall rates between 5% and 12%14,16,17,19,26–28, those using ultrasound with CDI as the initial test had recall rates between 8.5% and 17%21–23, and studies of multimodal screening reported recall rates between 0.9% and 4%15,35–37. The proportion of women initially recalled who are subsequently found to have ovarian cancer is very low.

Comparisons of screening tests

Comparisons of estimates of the accuracy of performance of screening tests obtained from different studies may be misleading because of the different study populations, and because of differences in how a positive result is defined. More reliable comparisons are obtained when different tests are carried out on the same cohort of women, with investigators blinded to the results of the different tests11.

Two of the studies identified have compared trans-vaginal sonography (TVS) with multimodal screening. One study of 1502 higher risk women found that the use of CA125 with a cutoff of 35 U/mL as an initial screening test reduced the false positive rate to from 3.8% to 0.5%, but missed three of the six ovarian cancers38. However, the second, in 1291 women at average risk, found that the use of CA125 would have missed the single cancer and increased the false positive rate from 0.3% to 0.9%39. This study reported one of the lowest false positive rates for ultrasound screening. Two retrospective studies which screened 1,000 and 2,593 women found that adding CDI to ultrasound screening resulted in lower false positive rates24,25.

DISCUSSION

Test performance

The findings suggest that these tests can detect ovarian cancer in asymptomatic women at average risk, perhaps at an earlier stage than would occur without screening. Estimates of the average proportion of Stage I ovarian cancers in screened women should be interpreted cautiously, because of the different screening methods and cutoff points, the wide confidence intervals and the exclusion of clinically diagnosed cancers. However, because the prevalence screen is likely to detect a higher proportion of advanced cancers than subsequent screening rounds, the apparent stage shift may be an underestimate of what would occur in routine population screening. RCTs with mortality as an endpoint are necessary to determine whether treating cancers detected earlier by screening does in fact improve overall outcome.

The estimated lead time for screening appears to be short, around eight months. Whether this is sufficient to result in a clinically important improvement in survival cannot be known until RCTs are completed, but it does imply that screening would need to be relatively frequent. Following ultrasound screening, interval cancers begin to appear by 18–24 months following a negative screen. The sensitivity of multimodal screening appears lower, with interval cancers appearing in the first year following screening. A longer screening interval may therefore be possible with ultrasound screening, for the same detection rate. However, only one study has reported the results of repeated screenings in the same women, and there is therefore very little information concerning the likely impact of different screening intervals.

False positive rates appear highest for grey-scale ultrasound and lowest for multimodal screening. However, due to the low prevalence of ovarian cancer in the general population, the positive predictive value of screening is low, resulting in many women undergoing surgery for each ovarian cancer diagnosed.

Evidence from the two direct comparisons confirms the findings from comparisons between studies that the use of CA125 as an initial test (multimodal screening) reduces the sensitivity of ultrasound screening. The false positive rates reported for multi-modal screening, however, were among the lowest. Recently, an algorithm incorporating rate of change of CA125 and epidemiological variables has been used to determine which women to recall for ultrasound scanning. This may result in increased sensitivity with little reduction in specificity compared with recall for ultrasound based on a single CA125 measurement44, but the full results from a prospective study using such an algorithm have not yet been published.

The studies indicated that CDI may increase the specificity of ultrasound screening. However, this has not been confirmed by one of the RCTs, which has now dropped the use of CDI as part of the screening protocol after finding that its use did not reduce the false positive rate8. The impact of CDI on the performance of ultrasound screening therefore requires further evaluation.

There is insufficient evidence to determine whether overall test performance is different in higher risk women, although false positive rates appear to be slightly higher. This may be due to lower thresholds for the definition of an abnormal result. It is unclear whether the higher proportion of borderline tumours reported in women with a family history is a genuine finding, as these tumours appear to have been inconsistently classified and reported across different studies. When borderline tumours are excluded, a lower proportion of screen-detected cancers in higher risk women were at an early stage compared with average risk women (although the numbers are small). This may reflect a faster progression of cancer and thus a shorter lead time in women at higher risk, reducing the potential benefit of screening.

Methodological quality of studies

The studies identified were generally small relative to the low prevalence of this cancer, detecting only a few ovarian cancers each and using a wide variety of screening methods. Sensitivity estimates are therefore imprecise. Most studies did not follow up women with a negative result, making estimates of the sensitivity of the test impossible. There was generally little evidence of a systematic approach to the identification of optimal screening strategies, for example by directly comparing screening methods or by investigating the effects of repeated screenings. The definition of abnormal results was sometimes unclear, and some studies did not report recruitment methods or the number of women lost to follow up. Even some of the well-conducted studies failed to report fully either the procedures undergone by women who had been screened positive, or the total number of tests performed on each woman.

Implications for population screening

Estimates of test performance can only indicate whether early ovarian cancer can be detected in asymptomatic women, and how many women will have a false positive result. Judging whether investment in a screening programme is worthwhile also requires assessment of any effect on ovarian cancer mortality, and of the adverse effects experienced by otherwise healthy women, including morbidity arising from unnecessary investigations and adverse psychological effects. The resources necessary to establish and maintain a quality-assured screening and treatment programme must also be considered. Arbitrary criteria for defining the adequacy of a screening test, such as the achievement of a positive predictive value of 10%45, should be avoided. Judging whether or not a test is ‘good enough’ depends on an overall assessment of the benefits, harms and costs arising from its use in a screening programme.

Applying currently observed stage-specific five year survival rates to screen-detected disease, an increase in cancers detected at Stage I from 25% to 50%-75%, as reported in the studies reviewed, might be expected to result in around a 20%40% reduction in ovarian cancer mortality at five years (a similar relative reduction as found with breast cancer screening). Ovarian cancer is, however, a relatively uncommon disease. Among women aged 50–69 years the incidence is around 44 per 100,00041, and mortality about 36 per 100,00046, which is about one third that of breast cancer, This means that population screening for ovarian cancer would result in many fewer lives saved than breast screening, even if it was shown to be as effective in reducing mortality.

Even if screening detected all cases of ovarian cancer and these could then be treated with 100% success, the absolute reduction in mortality would only be 1 in about 2500 screened women per year. This is much smaller than the likelihood of experiencing adverse effects from screening, such as unnecessary diagnostic surgery (between 0.1% and 2.5% of screened women, or between 3 and 63 women for every death avoided), or recall for further tests (between 1% and 18% of screened women). It is difficult to quantify the risk of morbidity in women falsely screened positive, because studies have generally not reported this information. Only one has reported any complication of surgery (a woman who suffered a bowel perforation)31. An indication of likely complication rates can be obtained from published case series of similar surgery; these suggest that perhaps 0.5%–1% of women undergoing laparoscopic or open oophorectomy may suffer a significant complication such as excessive bleeding, infection or bowel or bladder damage10. There is also a small but unquantified risk of mortality.

There is little information concerning the likely cost of population screening for ovarian cancer. However, costs increase with increasing frequency of screening, and it appears that for ovarian cancer the interval between screenings may need to be as short as one year. A modelling study has suggested that annual multimodal screening may prove more cost-effective than annual ultrasound screening47, although if ultrasound screening can be undertaken less frequently this might increase its relative cost-effectiveness.

Implications for policy and practice

In the absence of results from RCTs, there is currently no reliable evidence that screening for ovarian cancer is effective in improving the length and quality of life of women with the disease. Screening should therefore not be undertaken outside of properly designed and conducted clinical trials. If screening is shown to be effective in reducing mortality when the trials report in 5–7 years' time, it will then be appropriate to judge whether investment in a national screening programme is worthwhile, taking into account not only the potential benefits but also the risks to healthy women and the costs of establishing and maintaining such a programme. Such a decision should be made as a matter of national policy, because the ad hoc and uncoordinated screening of individual women is likely to result in sub-optimal outcomes. The Chief Medical Officer has suggested a framework on which to base such policy decisions48.

The discussions above, however, suggest that the low prevalence of ovarian cancer, and the apparent need for frequent screening, mean that it is unlikely that general population screening would be considered cost-effective. The potential balance of benefits, harms and costs may be more favourable in groups at increased risk of developing ovarian cancer. In such groups, fewer women need to be screened for each case detected, and the ratio of false positives to true positives is lower. However, women with one close affected relative are at only moderately increased risk of the disease, perhaps 2–3 times the average risk49, and this may be unlikely to alter the balance in favour of screening for these women. Until adequate research in this subgroup has been carried out it should not be assumed that screening will be more beneficial or cost-effective than in the general population. A few women, perhaps 50,000 throughout the UK50 have a more extensive family history, with two or more affected close relatives, and these women appear to have around a 10 times relative risk, or on average around 15% lifetime risk of developing ovarian cancer. Screening is increasingly being offered to these women, in the absence of evidence of effectiveness or information on the risks of screening, reducing the possibility of a national policy based on research evidence.

Implications for research

The current trials, if successfully completed, should establish whether or not screening is effective in reducing mortality from ovarian cancer. However, such trials should also provide sufficient information on which to base policy decisions. This requires assessment of the cost-effectiveness, and not simply the effectiveness, of screening, and also the full documentation and reporting of any adverse physical and psychological effects of screening. An ‘evidence-based’ epidemiological screening model which would allow estimation of the cost-effectiveness of screening, and how this varies with changes in key parameters such as baseline risk, lead time, characteristics of the screening protocols, costs and harms, is required. This would allow the assessment of cost-effectiveness under reasonable assumptions, (and therefore assessment of whether RCTs using current technologies are likely to be worthwhile), as well as the likely impact of various improvements in screening and treatment. This model would thus allow a rational prioritisation of future research.

Improvements in screening methods to reduce the number of women undergoing invasive diagnostic tests, and to reduce the costs of screening, may improve the potential value of screening. Future evaluations of new screening methods should incorporate direct comparisons with the screening strategies currently being evaluated in RCTs, and should also consider their adverse effects and costs.

The potential harms of ovarian cancer screening have been emphasised. However, little is known about the natural history of asymptomatic benign tumours detected by screening, and the effects of their removal on the subsequent incidence of ovarian cancer is unknown. Research into the optimal management of screen detected benign lesions could be incorporated into screening trials, by including randomised comparisons of active and conservative management of abnormalities where there is uncertainty as to the value of operative intervention. This might enable a reduction in the false positive rate for ultrasound, if noninvasive methods can be developed to distinguish between benign and malignant lesions.

Assessment of the effectiveness and cost-effectiveness of screening in higher risk groups is also required, but currently no RCTs are planned. Although the results from the existing RCTs in the general population could be extrapolated to a higher risk population, this would only apply to the screening methods used in the RCTs, and would only be valid if the natural history of ovarian cancer (i.e. the rate of progression) were the same as in the general population.

CONCLUSIONS

Rare diseases such as ovarian cancer offer special problems to advocates of screening. A large healthy population must be tested in order to detect a small number of people with the disease; any harms of screening are therefore experienced by many more people than any potential benefits, and the costs of detecting each case may be high.

It will be several years before we know from RCTs whether screening for ovarian cancer is effective. Even then, it will still remain to be decided whether it is worthwhile. Until this evidence becomes available no NHS screening programme should be considered for any risk group. The value of screening women at increased risk should be considered on the same basis as general population screening, with a full assessment of the effectiveness and cost-effectiveness of this approach, and the formulation of a national policy based on sound research evidence.

Acknowledgements

The authors would like to thank Ms O. Jones, Dr S. Luengo and Ms P. Press for help in the production of the review, and Professor L. Irwig for methodological advice.

Ancillary