We compared test sensitivity (in terms of prevented cancers) and overdiagnosis (in terms of non-progressive pre-invasive lesions) between the human papillomavirus test (HPV test, Hybrid Capture 2) and the traditional Pap test in routine screening for cervical cancer. The design was a randomised (1:1) health services study in Finland with intake between 2003 and 2007. We estimated sensitivity by the incidence method within one screening round. Overdiagnosis was based on the rate of cervical intraepithelial Grade 3 (CIN3) lesions diagnosed at screen and during the following interval. Out of 203,788 randomised women 132,298 attended (65% in both study arms) and 600,753 person-years accumulated among attenders up to the end of 2010. In all attenders, 34 invasive cervical cancers and 288 CIN3 lesions were diagnosed at screen or during the following interval. The interval cancer incidence was 2.5/105 person-years (sensitivity 0.87) and 1.4 (sensitivity 0.93) in the HPV arm and Pap test arm, respectively. The rate of CIN3 lesions was 57.1 and 38.8, respectively. In conclusion, sensitivity of HPV testing was similar to that of Pap testing but caused more overdiagnosis. Therefore, implementation of HPV testing needs to be reconsidered especially in countries with well organised programmes.
Recommendations concerning new screening tests for cervical cancer are mostly based on cross-sectional studies and on detection rates of pre-invasive lesions.1–5 The few follow-up studies have focused on pre-invasive lesions and on the length of reassurance that a negative test provides.6–10 Only few studies have had emphasis on public health aspects.11,12 The purpose of screening is to prevent invasive cervical cancer and the demonstration of this effect requires follow-up for incident invasive cancers after the screen. In fact, detection (at screen) of pre-invasive lesions results in both benefit and harm; benefit through preventing some of the lesions from progressing to invasive cancer, and harm by detecting also such pre-invasive lesions that would never have progressed to invasive cancers. These two types of pre-invasive lesions cannot be separated and thus, both types have to be treated if diagnosed.
Sensitivity measures a screening test's ability to correctly identify unrecognised disease. It can be estimated in several ways. The most common way is called the detection method4 which is based on screen detected lesions, both pre-invasive and invasive ones. However, the sensitivity estimate is usually biased and too big (closer to one) due to overdiagnosis of the detection of pre-invasive lesions. Sensitivity can also be estimated by the incidence method13 that is based on failures of screening.
The purpose of the present study was to compare the human papillomavirus (HPV) test to the traditional Pap test in cervical cancer screening in terms of sensitivity (detection of progressive lesions) and overdiagnosis (detection of non-progressive lesions).
Material and methods
In Finland, women between 30 and 60 years of age (sometimes also 25 or 65 or both) are invited for cervical cancer screening decided by the municipalities every five years. Screening with the HPV test (Hybrid Capture 2, Digene Corporation, Gaithersburg, MD) was started as a randomised public health policy within the national organised cervical cancer screening programme in 2003.14 Women of selected municipalities within two screening laboratories were individually randomised in a one-to-one ratio into screening with the HPV test or with the traditional Pap smear (Pap test). By the end of 2007, alltogether 203,788 women were invited in municipalities taking part in the HPV study. Of these, 125 women were excluded from the current study (61 in the HPV arm and 64 in the Pap test arm) due to attending a second invitation outside the study area (non-attendance in the study area). The number of randomised women was around 14,000 in 2003, thereafter about 47,000 women entered the study each calendar year from 2004 to 2007. The final number of women in this study was 101,797 in the HPV arm and 101,866 in the traditional arm with Pap test (control arm). In these 203,663 women, 352 had double invitations due to migration, with a potential of randomisation both into the HPV and the Pap test arm. In case the woman did not attend at all or attended twice, the invitation leading to attendance or the latter of the two subsequent invitations was chosen.
The Mass Screening Registry of the Finnish Cancer Registry receives routinely reports on all screened women in the country. While information on screening are fairly complete, the data on interval lesions and cancers as well as the histology of test positives may remain incomplete. Therefore, the screening database and the cancer registry database are routinely linked. The linkage was updated in May 2011 and follow-up ended on December 31, 2010 due to some delay in cancer reporting.
Attendance was similar in the two arms; 66,457 women (65.3%) attended screening in the HPV arm and 65,841 women (64.6%) in the Pap test arm resulting in 301,774 and 298,979 person years of follow-up by the end of 2010, respectively (Fig. 1). The results on sensitivity are based on invasive cervical cancer incidence between the screens (not including cases detected at the first or second screen) in women whose screening test was negative at entry (first) screen. The results on overdiagnosis are based on the detection of cervical intraepithelial lesions of high grade (CIN3) at the entry screen and during the first 5 year interval (but not including cases of CIN3 at the second screen) in all women who attended the first screen. The term CIN3 includes the diagnoses of carcinoma in situ, dysplasia gravis and high-grade intraepithelial cervical dysplasia (both squamous-cell and adenomatous lesions).
In the HPV arm, a normal (negative) test result was one with a relative light unit (RLU) ratio of less than 1.00. Women with a positive HPV test result were triaged with the traditional Pap test. In the traditional Pap test arm, Pap Class I was considered to be a normal (negative) result. Women with Pap test Classes III–V result were referred for coloposcopy (in both arms). Those with Pap Class II or HPV test positive but triage Pap Class I or II were invited for an additional screen within 1–2 years. However, the follow-up did not end at this additional screen, but was continued up to the next five yearly (second) screen. We also analysed the material using referral for colposcopy as the threshold for a positive test result in the Pap test arm. Because the results did not change materially, they are not presented.
We studied sensitivity of screening by estimating the incidence of prevented cervical cancers between two successive screens. The incidence of prevented cancers was estimated as the expected incidence if no screening existed (denoted P0 in the following) minus the interval cancer rate (denoted P1). Interval cancers were those invasive cervical cancers that were diagnosed in test negative women between two successive screens. We also studied overdiagnosis of non-progressive lesions for all attenders. This was estimated using the rate of cervical intraepithelial grade 3 (CIN3) lesions diagnosed at screen and during the following interval (denoted P3) minus the rate of prevented cancers within the same screening round (denoted P0 – P1). The interval CIN3 lesions were detected in those with either additional screening, with opportunistic screening, or with testing due to symptoms.
We define test sensitivity (S):
where P1 is the interval cancer incidence in those with a negative test at the entry screen (observed between two successive screening tests five years apart but neither screen included) and P0 is the expected incidence of cervical cancer if no screening had existed. This method of sensitivity estimation is consistent with the incidence method.4,13
The person-years at risk for interval cancer incidence were calculated in test negative women from the time of the entry test in 2003–2007 to the time of the next (second) five-year test, to diagnosis of cancer, to emigration to a foreign country, to death, or to the end of 2010, whichever occurred first. The mean follow-up time was 4.5 years because the women randomised in 2006–2007 were not yet followed up to the next five-yearly screen.
The expected cancer incidence assuming no screening, P0, was estimated as the incidence in entire Finland in 1958–1962 and corrected for the hypothetical time trend in the absence of screening and for the cervical cancer risk in non-attenders. The same correction factors were applied in both study arms. Based on the age-period-cohort analyses from 1953 to 1992 and selective attendance, we assumed the rate of the expected incidence to be 20 per 100,000 person-years in the early 2000s in the hypothetical absence of screening.4,15,16 The risk of cervical adenocarcinoma was less than 3 per 100,000 person-years in the 1960s, and it has stayed quite stable over time.17,18 Therefore, the expected incidence without screening (P0) for squamous cell carcinoma was estimated at 17/105 person years.
The observed CIN3 lesions included both progressive and non-progressive lesions. We estimated overdiagnosis (occurrence of non-progressive lesions) as the difference between the detection rate of CIN3 lesions and the rate of prevented cancers within one screening round. With the previous notation the estimate of overdiagnosis is:
where H is the risk of CIN3 cases that would not have progressed to invasive disease by the next screen, P3 is the period prevalence rate of CIN3 lesions at screen and during the subsequent interval in all attenders of the entry screen. The difference, P0 − P1, indicates the risk of prevented cancers in an interval and we call this benefit (B). The person-years at risk for overdiagnosis were calculated in all attenders from the time of the entry test in 2003–2007 to the time of the next (second) five-year test, to diagnosis of cancer or CIN3, to emigration to a foreign country, to death, or to the end of 2010, whichever occurred first. The confidence limits of S and H were estimated with the assumption that the numerators of P1 and P3 follow the Poisson probability law.
In all, 13 cervical cancers were diagnosed at entry screen in 132,298 attenders. In 61,208 test negative women of the HPV arm, a total of 278,094 person-years accumulated (Table 1). In the Pap test arm, the corresponding number of test negative women was 61,327 and that of person-years 278,506. During follow-up, only seven and four interval cancers were diagnosed in test negative women in the HPV and Pap test arms, respectively (Fig. 1). Six of the seven cancers in the HPV arm were adenocarcinomas, as were two of the four in the Pap test arm (Table 1). In addition, ten test positive women (seven in the HPV arm and three in the Pap test arm) had no cancer detected at the diagnostic confirmation but cancer was detected later during follow-up.
Table 1. Numbers and person years (P-yrs) in test negative women and interval cancers and their rate (per 100,000 person-years) by study arm and morphology
The number of CIN3 lesions diagnosed at the entry screen and during follow-up was 172 in the HPV arm and 116 in the traditional arm (Fig. 1, Table 2). The person years in all screened women were 301,185 in the HPV arm and 298,632 in the Pap test arm.
Table 2. Numbers and person years (P-yrs) in attendees and CIN3 detection rate (per 100,000 person years) during the first screening round in 2003–2010 by arm
Interval cancer incidence (after a negative test) was 2.5 per 100,000 person-years in the HPV arm and 1.4 per 100,000 in the Pap test arm (Table 1). Accordingly, the incidence of prevented cancers (P0 − P1) was 17.5 per 100,000 person-years in the HPV arm and 18.6 in the Pap test arm (Table 3). Therefore, test sensitivity (1 − P1/P0) was 87% (1 − 2.5/20) in the HPV arm and 93% (1 − 1.4/20) in the Pap test arm. If only squamous cell cancers were considered, the sensitivity was 98% (1 – 0.4/17) in the HPV arm and 96% (1 − 0.7/17) in the Pap test arm.
Table 3. Benefit from reduction of invasive cancer incidence and sensitivity of the HPV and the Pap test
The rate of CIN3 (P3) was 57.1 per 100,000 person-years in the HPV arm and 38.8.in the Pap test arm (Table 2). Overdiagnosis (P3 – (P0 − P1)) was 39.6 in the HPV arm and 20.2 in the Pap test arm (Table 4).
Table 4. Overdiagnosis of non-progressive CIN3 compared to the benefit from reducing cancer within one screening round by study arm
The relative sensitivity of the HPV test compared to the Pap test was 0.87/0.93 = 0.94 (1.02 for squamous-cell cancer) and the relative overdiagnosis was 39.6/20.3 = 2.0. The overdiagnosis − benefit ratio (Table 4) of the HPV screen was 39.6/17.5 = 2.3 and that of the Pap test 1.1. (2.4 and 1.3, respectively, for squamous-cell cancer).
We showed that the HPV test is as sensitive as the traditional Pap test (at the threshold of invasive cervical cancer) but causes more overdiagnosis of pre-invasive lesions (at the threshold of CIN3) and results in more colposcopic examinations. The number of interval cancers remained very small and, therefore, involved large random variation. The 11 interval cancers (three of squamous-cell type) among test negative women in 556,600 person-years showed that failure of the test was rare and the true sensitivity was very high with both tests. In the absence of screening, an expected number of more than 100 invasive cancers had been diagnosed. Ultimately, our results demonstrate that a high detection rate of pre-invasive lesions indicates more harm in terms of overdiagnosis than benefit in preventing the lesions from progressing to invasive cancer. The harm was greater in the HPV arm than in the traditional Pap test arm. Overdiagnosis and thus overtreatment consists of unnecessary interventions leading to both human and financial implications in the society. The human implications are both psychological (increased anxiety and fear) and clinical (e.g., cervical stenosis, reduced fertility, problems in pregnancy and delivery).4,19,20
The objective of our study was to compare two tests, the HPV test to the traditional Pap test. The means of comparison were interval cancers, failures of the test to identify unrecognised disease. Therefore, we did not consider as a failure those interval cancer cases that were diagnosed in test positive but confirmation negative women; that is, no cancer was found at colposcopy following the entry screen (episode negative). In this case, the test was correctly positive. Many of the follow-up studies include these cases and such studies are, in fact, not considering the sensitivity of the test but the sensitivity of the total screening episode. For comparison, we also estimated the sensitivity of the screening episode. We had only ten interval cancers in test positive but episode negative women, and thus, also the sensitivity of the screening episode was high in both arms.
Sensitivity of the test13 in screening for a pre-invasive lesion of cancer cannot be measured by detection rates at screen. This is true independently of the threshold chosen to identify the lesion. Invasive cancers detected at screen are not informative since they do not imply failure of the test (because detected at screen by the test), nor do they indicate benefit, because the objective is to prevent such lesions. The few follow-up studies have combined the screen detected cases and interval cancers.12 This fails to separate the non-informative cancers from the failures of the test. Some follow-up studies have reported preinvasive lesions as benefit of screening.9,10 Detection rates of any pre-invasive lesion represent a mixture of success in prevention of invasive cancers and harm of overdiagnosis. Therefore, valid observations as to sensitivity are based on interval cancers, i.e., on failure of the screen, and not on lesions identified at screen. All invasive cancers diagnosed both at screen and during the interval in the population indicate lack of effect and they affect both the efficacy and the effectiveness. Of these, however, only the interval cancers affect sensitivity. Because we specifically compared the sensitivity of two tests it is justified to include in analysis the interval cancers but not the screen detected cancers.
Also interval cancers as such are misleading when two screening tests are compared. The true sensitivity can only be estimated by the number of invasive cancers that were prevented. In our study, the relative risk (RR) of interval cancer in the HPV arm compared to the Pap test arm was 1.8 (2.5/1.4) and the respective relative sensitivity was 0.94 (0.87/0.93). That is, the traditional test was not 80% (RR 1.8) but only 6% (1–0.94) better than the HPV test.
The coverage of the Finnish Cancer Registry is practically complete for invasive cervical cancers.21 However, reporting of pre-invasive lesions is not quite complete22 but underreporting is not likely to be differential between the study arms and would result in underestimates of the overdiagnosis–benefit ratio.
The screening interval was five years. Slightly more than half of the women were followed up to the next screen and the mean follow-up time per woman was 4.5 years. Thus, the lack of five full years of follow-up was not likely to explain the estimate of similar sensitivity in the Pap test arm and the HPV arm.
Cervical cancer screening, irrespective of test, is more effective in preventing invasive squamous cell cancers than adenocarcinomas.4,18 This was true also in our material: only one interval cancer of squamous type (sensitivity 98%) was diagnosed in the HPV arm. The type specific results do not change our conclusion but rather make it more credible: the sensitivity was high in both arms.
The estimate of sensitivity assumes that the expected incidence in the absence of screening is known. The expected incidence remains unknown when two tests are compared without non-screened controls. We arrived at the expected cervical cancer risk of 20 per 100,000 person-years without screening after correcting for the time since start of screening and for the low risk in attendees. Our sensitivity estimate is an underestimate if the expected risk estimate is an underestimate, and vice versa. On the other hand, the estimate of overdiagnosis becomes larger when the expected incidence of cancer gets smaller. The assumed value of 20/100,000 is likely to be at the low side of the credible ones. Therefore, the estimates of sensitivity may be even better than those estimated in our study and the estimate of relative overdiagnosis (HPV vs. Pap test) likewise.
In our study, test sensitivity was about 90% in both arms. Our estimates of sensitivity are substantially better than previously published ones for the Pap test from 30% to 80%.4,23 The lower confidence limit of sensitivity in the Pap test arm, 0.82, indicates that random variation cannot account for the large difference between our study and previous reported sensitivity estimates in spite of the small numbers of invasive cancers in our study. Our result on the percentages of invasive cancer preventable by screening was similar to the WHO collaboration in 1980s but somewhat larger than that of the UK audit.24,25 The difference in sensitivity estimates between our result and the literature is due to methodological differences in addition to differences in the quality of screening. The differences include using pre-invasive lesions in the estimation of sensitivity and deriving the results from cross-sectional detection rates.
There is evidence that the low risk both of invasive cervical cancer and of pre-invasive lesions persists longer in those who were test negatives by the HPV test than those by the Pap test.7,8,12 The negative predictive value of the test on invasive cancer should be distinguished from the negative predictive value on CIN3. The former is related to sensitivity, the latter to both sensitivity and overdiagnosis. We estimated the sensitivity by progression of the lesions during one screening interval only. Our result is, however, more general on invasive cancer, representing a long-term steady state. Our screening interval of five years was the longest ever proposed in a developed country.4 The interval cancers in our study represent cancers with a pre-invasive phase (sojourn time) shorter than five years. Whether the low risk of interval cancers will persist more than five years—i.e., beyond the subsequent screen—does not affect the ultimate sensitivity. All progressive pre-invasive lesions with a long sojourn time (i.e., more than five years) are still pre-invasive at the next (second) screen. Therefore, they will be detected at the second screen with similar high sensitivity and prevented from surfacing as invasive cancers during the subsequent (third and on) intervals. Thus, our results apply to screening programmes with repeated five year (or shorter) screening intervals but not to policies with one life time screening, a relevant method in low or medium resource countries.
The finding of only few interval CIN3 lesions in HPV negatives was similar to the low detection rate after the first screen found in the study by Ronco et al.12 This difference between the Pap test and the HPV test in the negative predictive value on CIN3 can be incorrectly interpreted as a failure of the Pap test. However, such an estimate does not take into account the total burden of CIN3, cases detected both at and after the screen, nor the potential of overdiagnosis. In our study, the total number, screen-detected and interval CIN3 lesions together in the first round was 172 in the HPV arm and 116 in the Pap test arm. It is unlikely that the excess of CIN3 lesions will get balanced during an average of three screening tests in the future life span of the woman in the Finnish programme.
The POBASCAM study with two screening rounds seems to deviate from our result.10 The CIN2+ lesions were only 8% more common in the HPV arm. However, both arms were HPV tested at the second screen. This balanced the detection rates between the arms and the study does not give evidence on the life time difference in risk of preinvasive lesions between the Pap test and the HPV screening policies.
Triage—i.e., testing with the Pap test after a positive HPV test—is one of the proposed means to reduce the detection of non-progressive pre-invasive lesions.4,26 Such an approach does not eliminate the difference between the HPV test and the Pap test, because a positive HPV test was confirmed by the Pap test in the same screening episode in our study.14 The diagnostic threshold of RLU 1.00 was based on data from clinical materials. For screening purposes, the cut-off level could be increased, as even a level of RLU 10.0 results only in a marginal loss in detection of pre-invasive lesions but improves the specificity of the HPV test substantially.27 To the best of our knowledge, there are no conclusive studies on the correlation of the RLU level and the aggressiveness of the pre-invasive lesion. If there is a positive correlation, increasing the cut-off level from RLU 1.00 would provide even more improvement for HPV testing in screening than indicated by our previous study.27
Another approach to increase the specificity of the HPV test is to use HPV testing only at ages with a low risk of HPV infection. The cut-off of 30 or 35 years was proposed earlier because the test positivity has a downward trend by age.28,29 Therefore, women younger than 35 years should be screened with the traditional Pap test. Concerning the outcome of invasive cancer, our material is too small to be analysed by age. However, the similar and high sensitivity with both of the tests overall suggests that also the recommendation on changes in the test by age may need reconsideration.
In Finland, an organised screening programme has been in place since the early 1960s. Even if the evaluation has been based on a non-experimental design there is no doubt of the effectiveness in screening with the Pap test. Special features in the Finnish screening programme are long screening intervals of five years, some cases of Pap Class II are considered test negative, some result in referral (i.e., are test positive). In this study, however, we defined only Pap Class I as a negative test. We estimated the sensitivity and the harm also at the threshold of referral to colposcopy, actual test positivity, and the results were similar. All CIN lesions are histologically confirmed and thus, subjected to medical attention. At individual level, there is variation but until 2006 all CIN lesions were treated. Since 2006, the Finnish guidelines recommend treatment of CIN2 or higher grade lesions and active surveillance of CIN1 in women younger than 30 with treatment only if progression occurred. There is also considerable opportunistic testing parallel with organised screening. The long five year screening interval is strength of our study. Even if treatment of pre-invasive lesions varied this would not be likely to correlate with the arms of our study. The opportunistic testing was shown to be of low effectiveness and most of the diagnoses in test positives but episode negatives were likely due to monitoring in the programme.30
Our results apply to high resource countries with a high standard of health services. However, the magnitude of the cervical cancer problem is large in low and medium resource countries. There is evidence that other options than Pap testing, including HPV testing, should be considered in countries with limited resources.11,31–33 In particular, the persistent low risk after a negative HPV test could provide benefit in low- and medium resource countries with, for example, only one lifetime test. In a randomised screening study in India, the effectiveness was found to be bigger in the HPV arm than in women subjected to other screening tests.11 It is not clear, however, whether the high technology involved in the HPV testing can be successfully applied into large-scale routine screening in these countries. There are other problems, such as low participation, limited resources and logistical problems, in the screening programmes than the sensitivity of the test. Specifically, Rebolj and Lynge34 commented on the imbalance in incidence and efficacy of treatment between the cluster randomised study in India.11
Proper comparison of sensitivity between different screening tests for cervical cancer takes place only in terms of invasive disease detected as interval cancers. In our study, the sensitivity of both the HPV test and the Pap test was very high and any difference in sensitivity would affect only a marginal number of women. Instead, performing a colposcopy and overdiagnosis of non-progressive CIN3 lesions were substantially more common in the HPV arm than in the traditional Pap test arm. When the results are applied in screening at population level, roughly equal numbers benefit from Pap testing, benefit from HPV testing and experience harm of overdiagnosis (at the level of CIN3) from Pap testing. However, in the same population there are more women who will experience harm due to HPV testing.
It is commonly believed that routine screening with the HPV test works better than that with Pap testing. In a randomised health services study on cervical cancer screening, our results show similar sensitivity in the two arms and greater overdiagnosis of CIN3 lesions using HPV testing compared to Pap testing. Therefore, implementing HPV testing in routine screening programmes needs to be reconsidered especially in countries with organised programmes and a high quality infrastructure.
In this study, M.H. planned the design and method; N.M. analysed the data; M.L. and L.K.T. were responsible for data management; P.L. and J.T. were responsible for the analyses in the screening laboratory; N.M. and M.H. prepared the first version of the manuscript and all authors contributed to the text with critical comments. The study was approved by the National Authority of Medicolegal Affairs, health boards of the participating municipalities and the ethical committee of the Helsinki and Uusimaa hospital district. The study is registered as an International Standard Randomised Controlled Trial, number ISRCTN23885553 (URL http://isrctn.org). All authors declare no conflict of interest