Conflict of Interest: C.J.L.M. Meijer is member of the advisory board of Qiagen (formerly Digene) and received lecture fee from GSK. E.L. Franco provided occasional consultation to Gen-Probe and Roche. G. Ronco provided occasional consultation to Gen-Probe. F.X. Bosch provided occasional consultation to Qiagen and Roche. J. Cuzick is member of the advisory boards of Qiagen, Roche and Gen-Probe. P.J.F. Snijders provided occasional consultation to Roche and Gen-Probe. Qiagen, Gen-Probe and Roche are companies involved with HPV diagnostics.
Guidelines for human papillomavirus DNA test requirements for primary cervical cancer screening in women 30 years and older†
Article first published online: 19 SEP 2008
Copyright © 2008 Wiley-Liss, Inc.
International Journal of Cancer
Volume 124, Issue 3, pages 516–520, 1 February 2009
How to Cite
Meijer, C. J.L.M., Berkhof, J., Castle, P. E., Hesselink, A. T., Franco, E. L., Ronco, G., Arbyn, M., Bosch, F. X., Cuzick, J., Dillner, J., Heideman, D. A.M. and Snijders, P. J.F. (2009), Guidelines for human papillomavirus DNA test requirements for primary cervical cancer screening in women 30 years and older. Int. J. Cancer, 124: 516–520. doi: 10.1002/ijc.24010
- Issue published online: 18 NOV 2008
- Article first published online: 19 SEP 2008
- Accepted manuscript online: 19 SEP 2008 12:00AM EST
- Manuscript Accepted: 20 AUG 2008
- Manuscript Received: 29 MAY 2008
- HPV-DNA testing;
- cervical screening;
- HPV test guidelines;
- HPV test requirements;
- HPV test statistics
Given the strong etiologic link between high-risk HPV infection and cervical cancer high-risk HPV testing is now being considered as an alternative for cytology-based cervical cancer screening. Many test systems have been developed that can detect the broad spectrum of hrHPV types in one assay. However, for screening purposes the detection of high-risk HPV is not inherently useful unless it is informative for the presence of high-grade cervical intraepithelial neoplasia (CIN 2/3) or cancer. Candidate high-risk HPV tests to be used for screening should reach an optimal balance between clinical sensitivity and specificity for detection of high-grade CIN and cervical cancer to minimize redundant or excessive follow-up procedures for high-risk HPV positive women without cervical lesions. Data from various large screening studies have shown that high-risk HPV testing by hybrid capture 2 and GP5+/6+-PCR yields considerably better results in the detection of CIN 2/3 than cytology. The data from these studies can be used to guide the translation of high-risk HPV testing into clinical practice by setting standards of test performance and characteristics. On the basis of these data we have developed guidelines for high-risk HPV test requirements for primary cervical screening and validation guidelines for candidate HPV assays. © 2008 Wiley-Liss, Inc.
It is now well-established and widely, if not universally, accepted that virtually all cervical cancer and its immediate precancerous lesions arise from persisting cervical infections by ∼15 cancer-associated (high-risk or hr) human papillomavirus (hrHPV) genotypes.1, 2 The most important of these HPV genotypes are HPV16 and HPV18, which account for ∼70% of all invasive cervical cancers with minor variations in this percentage between continents.3 A new paradigm of cervical carcinogenesis replaces an older model of stepwise progression from low-grade to high-grade morphological changes and can now be summarized as four reliably measured stages: (i) HPV acquisition, (ii) HPV persistence (vs. clearance), (iii) progression of a persisting infection to cervical precancer (with incidental co-occurrence of both conditions) and (iv) invasion.4, 5
On the basis of this nearly absolute etiologic link between carcinogenic HPV and cervical cancer, testing for hrHPV is now being considered as an alternative for cytology-based cervical cancer screening. However, before cost-effective implementation of population-based hrHPV testing in cervical cancer screening and prevention can be envisaged, any candidate HPV testing technologies must offer an optimal balance between clinical sensitivity and specificity for detection of cervical intraepithelial neoplasia grade 2 or 3 and treatable cancer (≥CIN 2) to minimize redundant or excessive follow-up procedures. Reliable clinical performance needs to be established before any candidate screening test is widely disseminated and adopted into clinical practice or in organized screening programmes. Data from various studies can be used to guide the translation of hrHPV testing into clinical practice by setting standards of test performance and characteristics. On the basis of these data, guidelines for hrHPV DNA test requirements and use in primary cervical cancer screening can be developed, as outlined below.
The key issue for hrHPV DNA testing in cervical screening is to detect hrHPV infections that are associated with or develop into ≥CIN 2 and to differentiate them from transient hrHPV infections. This implies that there should be a balance between clinical sensitivity and specificity for detection of ≥CIN 2. Currently, two tests, i.e. the U.S. Food and Drug Administration-approved Hybrid Capture 2 (hc2; Qiagen, Gaithersburg, MD) and GP5+/6+-PCR enzyme immunoassay (GP5+/6+-PCR EIA) have repeatedly demonstrated clinical sensitivity of about 90–95% for the detection of ≥CIN 2 in large prospective cohorts or randomized controlled trials.6–11
We caution against misguided attempts to increase the clinical sensitivity for HPV assays, as the adverse effect of a small gain in sensitivity will be a dramatic increase in the number of false positives (i.e., hrHPV positives without ≥CIN 2).12 Given the low prevalence of ≥CIN 2 in the screened populations even small reductions in clinical specificity will have dramatic effects on the number of unneeded follow-up procedures and associated costs. Changes in analytic sensitivity to improve clinical sensitivity require formal evaluation and validation, using receiver-operator curve (ROC) or other analytic approaches that permit thoughtful consideration of the balance between true and false positives.13
For example, in a case–control format the hrHPV GP5+/6+-PCR EIA was compared with an ultra-sensitive commercially available PCR-based broad spectrum HPV assay in women with normal cytology over 29 years of age.14 The application of the latter did not lead to an increase in clinical sensitivity for ≥CIN 2, but instead resulted in a substantial decrease in clinical specificity compared to that of the GP5+/6+-PCR. The extra positivity scored by the ultra-sensitive assay mainly involved infections characterized by a very low viral load that were not associated with ≥CIN 2. Conversely, insufficient analytic sensitivity will translate to unacceptably low clinical sensitivity. One example is a commercially available DNA in situ hybridization (ISH) assay, which showed a substantially lower sensitivity for prevalent ≥CIN 2 than hc2.15 Since ISH positivity was only found in samples displaying relatively high viral loads, as deduced from the corresponding hc2 RLU/CO values that are semiquantitative measures of HPV viral load,16 it can be concluded that the ISH assay used suffered from a low sensitivity to detect HPV, resulting in a clinical sensitivity for ≥CIN2 of less than 77%.
In particular, evidence has been collected that women testing negative for hrHPV hc2 or GP5+/6+-PCR have a 3–5 year risk of ≥CIN 2 that is 40–50% lower than of women testing cytologically negative.17, 18, 20 Consequently, hrHPV hc2 and GP5+/6+-PCR method can be considered as clinically validated for use in cervical cancer screening. Both hc2 and GP5+/6+-PCR target 13 hrHPV types (ie. HPV16, −18, −31, −33, −35, −39, −45, −51, −52, −56, −58, −59 and −68). GP5+/6+-PCR additionally targets HPV66, which hc2 detects as the result of coincidental cross-reactivity with types genetically related to targeted HPV genotypes.21 Comparison of the automated version of the hc2 assay and GP5+/6+ PCR-EIA in a cross-sectional study of women participating in a population-based cervical screening trial, revealed that the assays had nearly similar sensitivities for high-grade CIN or cervical cancer.22 There is no consensus on the minimal number of hrHPV types that a HPV detection assay should be able to detect as there is still debate about the carcinogenicity of certain HPV types that have been rarely detected in carcinomas,1, 23, 24 although it is widely accepted that the detectable types should include the 14 listed above. However, as long as the HPV detection assay complies with the criteria listed below and performs well as a screening test, it is of minor importance to what extent uncommon hrHPV types are actually targeted.
Because both median and clinically relevant viral load levels in women of a screening population differ markedly across the various hrHPV types,25 clinical test requirements cannot easily be translated into analytical test requirements in terms of setting universal assay cut-off points that can be used by candidate test manufacturers. Therefore, clinical criteria should form the basis for guidelines for hrHPV test requirements for primary cervical screening. Here, a proposal for such guidelines in the European setting is presented as well as a clinical validation strategy in order to determine whether new tests fulfill these guidelines. Since in large prospective screening trials both hc2 and GP5+/6+-PCR have been shown to be superior to cytology the hrHPV test requirements are deduced from data obtained with these assays.8, 10, 17, 18
The HPV test requirements are formulated relative to the performance of hc2 because, in contrast to GP5+/6+ PCR, hc2 is approved by the US Food and Drug Administration (FDA) and commercially available. In line with FDA, we only consider HPV testing in women of 30 years and older. In younger women, the specificity of an HPV test is relatively low11 because transient HPV infections are very common.
In addition, indications for quality assurance26 of hrHPV testing by the laboratories over time are proposed. We believe that these guidelines would also be useful for non-European countries seeking to adopt HPV testing as part of their screening programmes.
Requirements of HPV tests in primary cervical screening
In a primary cervical screening setting a HPV detection assay should fulfill the following requirements:
- 1The candidate test should have a clinical sensitivity for ≥CIN2 not less than 90% of the clinical sensitivity of the hc2 in women of at least 30 years. This recommendation is based on recent meta-analyses that reported a pooled sensitivity for hc2 of 97.9% (95%CI: 95.9%–99.9%) in primary screening in Europe and North-America6 and a pooled sensitivity of hc2 and GP5+/6+ PCR in European studies of 96.1% (95%CI: 94.2%–97.4%).11 In addition, in recent prospective screening trials, hc2 sensitivities were 97.3% (95%CI 90.7%–99.7%)8 and 94.6% (95%CI 84.2–100)10 and the GP5+/6+-PCR-EIA sensitivity was 94.1% (95%CI: 91.7%–95.9%).18 This high sensitivity translates into a very high negative predictive value (reassurance) of the HPV detection assay, allowing for extending screening intervals for test negative women, who are typically the majority of participants in a screening programme.
- 2Acceptable standards for clinical specificity are more difficult to define because prevalences of the targeted HPV genotypes vary across populations. Notwithstanding that, we suggest a clinical specificity for ≥CIN2 of the candidate test not less than 98% of the clinical specificity of the hc2 in women of at least 30 years of age. The rationale for the high lower bound on the clinical specificity is that a high clinical specificity will limit the number of test positives that would possibly trigger increased surveillance and unnecessarily stigmatized women as HPV positive. In North America and Europe, the pooled specificity of hc2 was 91.3% (95% CI: 89.5–93.1%; range: 85–95%).6 In European trials, the hc2 and GP5+/6+ PCR pooled clinical specificity was 93.3% (95%CI 92.9%–93.6%) for women 35–49 years of age and 90.7% (95%CI 90.4%–91.1%) for all women.11 In recent trials, the hc2 specificities were 93.2% (95%CI 92.8–93.6) for women 35–60 years of age8 and 94.1% (95%CI 93.4–94.8%) for women 30–69 years of age10 and the specificity of GP5+/6+-PCR was 96.1% (95%CI 96.0%–96.1%) for women 30–60 years of age.7 When the hc2 cut-off is increased to 2–3 RLU/CO the clinical specificity of hc2 increases and approaches that of GP5+/6+-PCR,8, 19, 21, 22, 27 partly due to the reduction in cross-reactivity of hc2 with low-risk HPV genotypes.21, 22
- 3To ensure a robust and highly reliable performance of the test in clinical practice the candidate test should display intra-laboratory reproducibility (i.e., agreement in test result when the same specimens are tested more than once) and inter-laboratory agreement with a lower confidence bound not less than 87%. The hc2 and GP5+/6+-PCR revealed high inter-laboratory agreements of at least 92%.28–30
Validation guidelines for candidate HPV assays
It is obvious from the requirements outlined above that validation of a candidate assay for clinical application in cervical screening requires a comparative analysis with a clinically validated reference HPV test on samples that originate from a population-based screening cohort.
The following validation strategy is advised:
- 1The sensitivity of the candidate test for ≥CIN2 should be at least 90% of the sensitivity of the hc2 (i.e. relative sensitivity of at least 90%) as assessed by a noninferiority score test.31 A description of the noninferiority test is given in the Appendix. Samples should be derived from a representative set of women in a population-based screening cohort, tested by hc2, either or not combined with cytology, and the candidate test that had a histologically confirmed ≥CIN2 detected through either of these tests. The noninferiority test has been shown to perform well if the number of samples is 50 or more. The power of the noninferiority test, obtained under the assumption that the candidate test and the reference test have equal sensitivity and that the agreement between the tests is moderate/good (κ value of 0.7), is presented in Figure 1. To achieve a power of 80%, 60 samples should be tested with both the new test and hc2. The power increases with the sample size and is greater than 99% when 100 samples are tested.
- 2The specificity of the candidate test for ≥CIN 2 should be at least 98% of the specificity of hc2. This should be determined by applying the noninferiority test to a random sample of women of at least 30 years of age from a population-based screening cohort, tested by hc2, either or not combined with cytology, and the candidate test and that did not have histologically confirmed ≥CIN 2. To achieve a power of 80% under the assumption that the new test and the reference test have equal specificity and that the agreement between the tests is moderate/good (κ value of 0.7), a sample size of 800 samples is required. When 2500 samples are tested, the power is greater than 99% (Fig. 1).
- 3The intra-laboratory reproducibility in time and inter-laboratory agreement should be determined by evaluation of at least 500 samples, 30% of which tested positive in a reference laboratory using a clinically validated assay. This should result in a percentage of agreement with a lower confidence bound not less than 87% (κ value of at least 0.5 in this series of samples including 30% positives). The same intra-laboratory reproducibility should be reached after testing the same set of samples several weeks later.
All the above mentioned samples can be obtained either through new studies where women are tested by hc2, either or not combined with cytology, or by exploiting well preserved archived material from previously conducted studies with the described features as long as this material is qualitatively adequate for applying the candidate test.26
Example: validation strategy applied to GP5+/6+-PCR EIA assay indicating non-inferiority of sensitivity and specificity
As indicated above, the GP5+/6+-PCR EIA method can be considered clinically validated on the basis of data collected in large prospective screening trials As an example of applying the validation strategy we here illustrate the noninferiority of the GP5+/6+-PCR EIA as compared to the hc2 reference test. To assess noninferiority of the sensitivity (i.e., relative sensitivity not lower than 90%), 75 cervical scrapes of women with ≥CIN2 were tested. The cervical samples were obtained from a population-based screening study where women were referred for colposcopy-guided biopsy on the basis of positive hc2 and/or cytology result (VUSA-SCREEN study, the Netherlands). The data are presented in Table I( VUSA-SCREEN trial: women with ≥CIN2). The null hypothesis of inferiority is rejected (T = 2.68, p-value 0.0037) and hence the sensitivity of the GP5+/6+-PCR-EIA is not inferior to the sensitivity of the hc2. To assess noninferiority of the specificity of the GP5+/6+-PCR EIA (i.e., relative specificity not lower than 98%), 8,040 samples of women without ≥CIN2 were tested (Table I(POBASCAM II trial: women without ≥CIN2)). The samples were obtained from a population-based screening study where all women aged >35 years were tested for hc2 and GP5+/6+-PCR EIA (POBASCAM II study). The null hypothesis is rejected (T=16.57, p-value <0.00001) and hence the specificity of the GP5+/6+-PCR-EIA (i.e. 96.0%) is judged not inferior to the specificity of the hc2 (i.e. 94.1%; Table I(POBASCAM II trial: women without ≥CIN2)).
|hc2 +||hc2 −||Total|
|VUSA-SCREEN trial: women with ≥CIN2|
|GP5+/6+-PCR EIA +||73||1||74|
|GP5+/6+-PCR EIA −||1||0||1|
|POBASCAM II trial: women without ≥CIN2|
|GP5+/6+-PCR EIA +||284||34||318|
|GP5+/6+-PCR EIA −||193||7529||7722|
Laboratory guidelines for HPV testing
Laboratories performing HPV testing for clinical and screening purposes should comply with quality assurance (QA), including internal quality control (IQC), external quality assessment (EQA) and quality improvement (QI). To realize QA, at least the following items should be fulfilled:
- 1The laboratory should have a specific infrastructure in case nucleic acid amplification technology is used. This includes separate laboratories for preparation of test reagents, sample identification/preparation and DNA extraction, and DNA amplification and detection.
- 2The laboratory should have accreditation for clinical molecular testing and should comply with standard operation procedures (SOP) and good laboratory practice (GLP) guidelines.
In practice, large volume labs have been associated with higher proficiency in molecular testing than small volume labs. However, the laboratory requirements set forward here are formulated independently of the size of the laboratory and can in principle also be fulfilled by laboratories that process a small number of smears.
In conclusion, within a cervical cancer screening setting hrHPV tests should exhibit specific requirements to assure high clinical sensitivity for detection of cervical precancer and cancer and at the same time high clinical specificity to limit unnecessary procedures and follow-up of HPV test-positive women. Most of the candidate hrHPV assays mainly differ in clinical specificity. These differences in clinical specificity can be largely attributed to differences in the detection rate of transient HPV infections characterized by low viral loads. Such infections do not cause malignancies, are therefore clinically irrelevant, and potentially harmful in a screening setting because they trigger unnecessary follow-up of HPV positive women and redundant anxiety and costs coupled thereto. At present, hc2 and GP+/6+-PCR-EIA fulfill the listed requirements; it is our hope that other HPV tests with proven, reliable clinical performance are forthcoming.
Although we focused on ≥CIN2 as an endpoint, we note that there is increasing recognition that histologic CIN2 is an equivocal diagnosis of cervical precancer, representing a mixture of CIN3 and low-grade lesions resulting from productive human papillomavirus (HPV) infections by both low-risk and high-risk HPV genotypes.4, 5 Consequently, European guidelines have consistently promoted the separate reporting of CIN2 and CIN326, 34 and also the WHO classifies these two histologic entities separately, supporting this distinction. There is universal agreement that CIN3 is the best surrogate marker for risk of progression to invasive cancer.4
It must be acknowledged that in the future, as molecular screening tests become more accurate for true precancerous lesions, some CIN2 will test negative because in fact some are not truly precancerous lesions. This will create quandary of whether some test results for CIN2 are false or true negative. In evaluating the performance of the next generation of tests, it will be important to have an a priori plan to adjudicate cases of test-negative CIN2 by pathology review, adjunctive molecular markers, and/or tissue-based testing for HPV genotype presence and E6/E7 oncoprotein expression.
It can be envisioned that future implementation of hrHPV testing in primary cervical cancer screening is accelerated once this proposal for guidelines for HPV tests has received international consensus. These guidelines should prove useful for national and supra-national regulatory bodies in the approval of new HPV tests for public health and clinical use in cervical cancer screening. This article has been written with the aim to achieve such international consensus by defining criteria that should be fulfilled by a new test before it can be used in primary cervical screening.
- 2IARC Monographs on the evaluation of carcinogenic risks to humans. Human Papillomaviruses. International Agency for Research on Cancer. Vol. 90. Geneva, Switzerland: WHO press, 2007.
- 19New Technologies for Cervical Cancer Screening Working Group. Results at recruitment from a randomized controlled trial comparing human papillomavirus testing alone with conventional cytology as the primary cervical cancer screening test. J Natl Cancer Inst 2008; 100: 492–501., , , , , , , , , , , , et al.
- 26ArbynM,AnttilaA,JordanJ,RoncoG,SchenckU,SegnanN,WienerH,DanielJ,von KarsaL, editors. European Guidelines for Quality Assurance in Cervical Cancer Screening,2nd edn. Luxembourg: Office for Official Publications of the European Communities, 2008. pp. 1–291.
- 33International quality assurance of human papillomavirus testing. Cent Eur J Public Health 2008; 16: S18–S20.,
Suppose n samples have been tested with the new test and the hc2 reference test. The results are presented in the following Table AI.
|hc2 +||hc2 −||Total|
|New test +||a||b||a+b|
|New test −||c||d||c+d|
Under the null hypothesis, the relative sensitivity (when comparing the new test to hc2) is δ0 and under the alternative hypothesis, the relative sensitivity is greater than δ0. According to the present guidelines ∂0 should be set to 0.90 for sensitivity and to 0.98 for specificity. The test statistic is defined as
with A = n(1+δ0), B = (a+c)δ − (a + b + 2c) and C = c(1−δ0)(a+b+c)/n. The null hypothesis is rejected at nominal significance level α if T is equal to or greater than the 100 × (1 − α) percentile point of the standard normal distribution (T is interpreted as a z statistic).