Comparison of the accuracy of Hybrid Capture II and polymerase chain reaction in detecting clinically important cervical dysplasia: a systematic review and meta-analysis

The effectiveness of screening programs for cervical cancer has benefited from the inclusion of Human papillomavirus (HPV) DNA assays; which assay to choose, however, is not clear based on previous reviews. Our review addressed test accuracy of Hybrid Capture II (HCII) and polymerase chain reaction (PCR) assays based on studies with stronger designs and with more clinically relevant outcomes. We searched OvidMedline, PubMed, and the Cochrane Library for English language studies comparing both tests, published 1985–2012, with cervical dysplasia defined by the Bethesda classification. Meta-analysis provided pooled sensitivity, specificity, and 95% confidence intervals (CIs); meta-regression identified sources of heterogeneity. From 29 reports, we found that the pooled sensitivity and specificity to detect high-grade squamous intraepithelial lesion (HSIL) was higher for HCII than PCR (0.89 [CI: 0.89–0.90] and 0.85 [CI: 0.84–0.86] vs. 0.73 [CI: 0.73–0.74] and 0.62 [CI: 0.62–0.64]). Both assays had higher accuracy to detect cervical dysplasia in Europe than in Asia-Pacific or North America (diagnostic odd ratio – dOR = 4.08 [CI: 1.39–11.91] and 4.56 [CI: 1.86–11.17] for HCII vs. 2.66 [CI: 1.16–6.53] and 3.78 [CI: 1.50–9.51] for PCR) and accuracy to detect HSIL than atypical squamous cells of undetermined significance (ASCUS)/ low-grade squamous intraepithelial lesion (LSIL) (HCII-dOR = 9.04 [CI: 4.12–19.86] and PCR-dOR = 5.60 [CI: 2.87–10.94]). For HCII, using histology as a gold standard results in higher accuracy than using cytology (dOR = 2.87 [CI: 1.31–6.29]). Based on higher test accuracy, our results support the use of HCII in cervical cancer screening programs. The role of HPV type distribution should be explored to determine the worldwide comparability of HPV test accuracy.


Introduction
Cervical cancer is a significant cause of morbidity and mortality among women worldwide [1]. Human papillomavirus (HPV) infection is one of the most common sexually transmitted diseases in the world, and infection with high-risk oncogenic types of HPV has been recognized as a necessary cause of cervical cancer and its precursor lesion, cervical intraepithelial neoplasia (CIN) [2,3]. Fortunately, preventing cervical cancer is possible due to its distinct premalignant stage, and as the introduction of population-based screening programs, cervical cancer incidence and mortality have greatly decreased in developed countries [4][5][6]. Screening has largely relied on cytology-based tests; however, given their subjective nature as well as low sensitivity and specificity, adding HPV DNA testing to screening programs to improve their efficacy has been proposed [7][8][9]. Furthermore, the positive predictive value (PPV) of current screening tests is projected to decrease in populations vaccinated against HPV, but this drop in test performance could be mitigated by adding HPV DNA testing to the screening paradigm [10]. There are several ways in which HPV DNA testing might be implemented. First, the HPV DNA assay may be used, either in combination with cytology or alone, as the primary screening method. Studies have shown that HPV testing has a higher sensitivity than cytology, indicating that a longer interval between screenings is possible when including HPV DNA testing in a screening program [11][12][13][14]. Second, HPV DNA detection may be used to triage women with cytological abnormalities to determine whether referral for colposcopy is warranted [15,16]. Lastly, HPV DNA testing may be used as a follow-up to detect residual disease or predict recurrence among women who have been treated for high-grade CIN [17].
The two most common methods used for HPV DNA detection are the Hybrid Capture II (HCII, Qiagen Gaithersburg, Inc., MD) and polymerase chain reaction (PCR) assays. The HCII assay is commercially available and approved for clinical use, and several types of PCR assays have been primarily used in the research setting [18]. Furthermore, both assays have shown high sensitivity to detect high-risk HPV infections but only moderate specificity [18]. Meijer et al. [19] have recommended that any HPV DNA test should have an optimal balance between clinical sensitivity and specificity. Stoler et al. [20] proposed a minimum sensitivity of 92% and a specificity of 85% for any new HPV DNA test. The selection of a screening test is important to detect clinically relevant cases of HPV infection while avoiding the unnecessary cost, stress, and compromise of the cervix to patients associated with overtreating mild cytological abnormalities. Therefore, the goal of this meta-analysis was to compare the clinical performance of HCII and PCR assays in both the screening and diagnostic settings.

Study screening and selection
Inclusion criteria for the meta-analysis were English language reports of studies comparing the sensitivity and specificity of PCR (i.e., MY/PGMY 09/11 or GP5+/6+ or Amplicor) and HCII using either cytologic or histologic results as the gold standard for testing comparison (i.e., Bethesda classification system) in either a screening or follow-up/diagnostic setting. The three specific PCR tests mentioned above were chosen because they are currently the most-used tests. All citations were independently reviewed by two investigators (H. N. L. and K. R. D.). When necessary, authors of a selected article [21] were contacted to obtain further information. Normally, a standard threshold of 1 relative light unit (RLU) or 1 pg/mL of HCII was used to detect the positive presence of HPV DNA. However, to maximize power in our study, we did not restrict by this cutoff.

Data abstraction and coding
All eligible studies were abstracted independently by two reviewers (H. N. L. and K. R. D.) using a coding system based on the Standards for Reporting Diagnostic Accuracy (STARD) and MOOSE for meta-analysis of observational studies in epidemiology [22,23]. Any discrepancies were resolved by discussion and consensus between the two investigators. Variables used to present our analysis were grouped into two components, as follows.
predictive value [NPV], agreement and level of reproducibility [kappa -j], and their respective 95% confidence interval, CI); and blinding and/or quality control methods.

Clinical outcomes
Three clinical outcomes were examined, atypical squamous cells of undetermined significance (ASCUS), low-grade squamous intraepithelial lesion (LSIL), which includes HPV infection or mild dysplasia (CIN1), and high-grade squamous intraepithelial lesion (HSIL), which includes moderate (CIN2) and severe dysplasia (CIN3) [24]. Because there were only 10 studies on ASCUS, we merged ASCUS and LSIL to improve the power of analysis.

Statistical analysis
In this meta-analysis, a study unit was defined as a study having complete information to compare the testing accuracy between PCR and HCII. Depending on the specific PCR test, setting, gold standard, age group, or sample collection method, one article could contribute more than one study unit. For example, an article by Riethmuller et al. [25] compared PCR MY09/11 with HCII in two clinical outcomes (i.e., LSIL and HSIL) and thus generated two study units. Likewise, an article by Stevens et al. [26] contributed eight study units for our analysis because the original analysis included both cytology and histology as gold standards with two clinical outcomes (LSIL vs. HSIL) and two types of PCR (PGMY09/11 and Amplicor).
Complete information from each study was extracted to construct two-by-two tables, which included truepositive, false-positive, true-negative, and false-negative values. The sensitivity was calculated as (true positive)/([true positive]+[false negative]) and specificity was calculated as (true negative)/([true negative]+[false positive]). Forest plots were generated to present, by type of clinical outcome, individual and pooled sensitivity and specificity of each test and to show heterogeneity across studies [27,28]. Additionally, heterogeneity across studies was examined using Cochran's Q-test and the chi-square test [27,29]. To examine the threshold effect or the difference derived from the use of different cutoffs or thresholds, we computed the Spearman correlation coefficient. This coefficient can be defined as the result of the logit of sensitivity divided by the logit of (1-specificity) [30].
Stratified meta-analysis and meta-regression were used to examine the influence of study characteristics and the magnitude of interstudy heterogeneity on sensitivity and specificity for both PCR and HCII. Stratified metaanalyses were performed for the two clinical outcomes (ASCUS/LSIL and HSIL) by setting (screening vs. followup/diagnostic) and by PCR testing technique (i.e., MY/ PGMY 09/11, GP5+/6+, and Amplicor). Age is an important variable, particularly age cutoff of 30 years old; however, in our analysis, only two articles reported age using this cutoff (Luu, H. N., K. Adler-Storthz, L. M. Dillon, M. Follen, and M. E. Scheurer, submitted) [31]. We, therefore, decided not to include this variable in the analysis because of low power and inability to generate pooled sensitivity and/or specificity or perform meta-regression [30]. For meta-regression, a generalized linear model was fitted to the data and weighted by the inverse of the variance using Moses and Littenberg methods [32]. Additionally, a random effects model was used to pool variation between studies [33] in the current meta-regression model. For both PCR and HCII, four variables (setting, gold standard, type of lesion, and study location) were included in the meta-regression models. For PCR, an additional variable, type of PCR (i.e., MY/PGMY 09/11, GP5+/6+, Amplicor), was added to the meta-regression model. Diagnostic odds ratios (dOR) and their respective 95% CIs were calculated to determine diagnostic test performance as well as the influence of covariates on test accuracy. The dOR is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease. The dOR was calculated as (sensitivity odds)/(odds of [1-specificity]) [34]. Meta-DiSc [30], a comprehensive software program to evaluate diagnostic and screening tests through meta-analysis, was used to perform the statistical analysis for this study. All statistical tests were two sided and were considered significant at the level of P < 0.05.

Search results
We identified 481 citations from search databases, of which 259 citations were duplicates (Fig. 1). By examining reference lists of those studies, we found an additional 48 citations. We then excluded 175 citations by applying the inclusion criteria to the titles and abstracts and retrieved 95 full-text articles for further review. The review process yielded 28 articles that met all inclusion criteria [14,21,25,26,31,. In addition, we added one manuscript from our own group (Luu et al., submitted), which is currently under review (Fig. 1).
The 29 included articles contained 82 PCR study units and 79 HCII study units ( Table 1). The uneven number of study units between PCR and HCII resulted from two articles [26,31] that contained studies comparing more than one type of PCR with HCII.

Accuracy of the tests
In detecting ASCUS/LSIL, PCR was more sensitive ( Fig. 2A and B) but less specific than HCII ( Fig. 3A  Articles meeting all inclusion criteria (k = 28) Article from our lab (k = 1) Citations excluded by examining full text:   Table 2).

Specificity (95% CI)
Pooled    0. 45 (0.36 -0.54) Fontaine (2007) 0. 37 (0.21 -0.55) Halfon (2007) (6 †) 0.95 (0.91 -0.98) Halfon (2007) (6 † †) 0.95 (0.90 -0.98)  0. 63 (0.44 -0.79) Stenvall (2007) 0. 50 (0.16 -0.84) Stevens (2007) 0.77 (0.74 -0.81) De  0.59 (    0.83 (0.77 -0.87)  0. 61 (0.52 -0.70) Halfon (2007) (6 †) 0.94 (0.88 -0.98) Halfon (2007) (6 † †) 0.86 (0.78 -0.92)  0. 68 (0.59 -0.76) Stenvall (2007) 0. 66 (0.48 -0.81) Stevens (2007) 0. 42 (0.39 -0.45) De Francesco (2008) 0.13 (0.07 -0.21)   (7 § and follow-up/diagnostic settings, following the 2001 Bethesda Classification (i.e., ASCUS, LSIL, and HSIL). We identified 28 published articles and our own manuscript that compared HCII and PCR in the same report. We found that in detecting ASCUS/LSIL, HCII was less sensitive but more specific than PCR. We also found that HCII was both more sensitive and more specific in detecting HSIL than was PCR (both in screening and diagnostic settings). Clinical outcome and study location were sources of heterogeneity for the accuracy of both PCR and HCII. Additionally, PCR types and gold standard were sources of interstudy variability of the accuracy of PCR and HCII, respectively. To our knowledge, this is the first meta-analysis that directly compares the accuracy of HCII and PCR in screening and diagnostic settings and across two clinical outcomes. In 2004, Arbyn and associates [9] conducted a meta-analysis and reported that for ASCUS detection, HCII alone was less sensitive but more specific ( Arbyn's [9] is that they reported the test accuracy of HCII and a combination of HPV DNA tests (i.e., HCII, HCI, and PCR). Therefore, HCII and PCR were not compared directly. The other difference between our studies is that they [9] included test results from studies of a single test, whereas our review was restricted to studies that compared HCII and PCR. Because findings of the test accuracy come from the same population, we sought to minimize the source of interstudy heterogeneity. Furthermore, Arbyn et al. [9] included 17 articles from 1992 to 2002, whereas we identified 29 articles from 1999 to 2011. During the 1999-2011 period, HCII has been used more than it was during the period covered by Arbyn, and our meta-analysis included only one article [35] that was also in Arbyn's metaanalysis [9]. Additionally, the meta-analysis by Arbyn et al. [9] was restricted to cross-sectional studies while ours expanded to other study designs (i.e., cohort and randomized controlled trial). As Sherman et al. [58] recommended, a longitudinal study design helps to detect missed lesions by repeated cytology before invasive cancer occurs. The article by Schiffman et al. [42] (the ASCUS-LSIL Triage Study-ALTS), included in our meta-analysis, reported that during 2 years of follow-up during which study participants were asked to visit at 6-month intervals, HCII showed higher sensitivity than and comparable specificity with PCR. Finally, while Arbyn et al. [9] restricted their analysis to ASCUS, we expanded ours to more important clinical outcomes (i.e., LSIL and HSIL). With the availability of the test accuracy for these two clinical outcomes, misdiagnosis or overtreatment due to screening test results can be avoided.
Type of clinical outcome was a source of interstudy heterogeneity in our meta-analysis. Accordingly, both tests appeared to have higher accuracy to detect HSIL than to detect ASCUS/LSIL. This is expected because of the cytological and histological differences between these lesions, which are driven by HPV. The other source of heterogeneity within HCII studies was the choice of the gold standard (i.e., cytology vs. histology). For example, an article by Stevens et al. [26] reported that when cytology is used as the gold standard, the sensitivity and specificity of HCII was 0.87 (95% CI: 0.86-0.89) and 0.47 (95% CI: 0. 44-0.49)  higher HCII test accuracy if histology is used as the gold standard supports the findings of Sherman et al. [58] that a lead time bias occurs if repeat cytology is performed, particularly among women with ASCUS or LSIL. Consequently, one might only detect a smaller proportion of CIN3 lesions that do not have sufficient features associated with invasive cancer and miss a larger proportion of lesions usually associated with invasive cancer [58]. Study location is another important source of heterogeneity in our findings. We found higher accuracy of both PCR and HCII tests between European and Asia-Pacific Region studies and between European and North American studies. We thought this might be related to the HPV type distribution in the different study locations. This interpretation is supported by the meta-analysis by Smith and associates [59] showing that although 16 and 18 presence in all regions, there is difference in HPV specific types in different regions, from 16,31,33,and 18 in Europe;16,58,18,16,18,31,and 35 in North America.
Results from several large randomized controlled trials in Europe [60][61][62], North America [61,63,64], and Asia-Pacific [65] supported the use of HPV DNA testing over cytology for cervical cancer screening. For example, the 5-year Population-Based Screening Study Amsterdam [62], which included approximately 45,000 women aged 26-45, reported that HPV cotesting is more sensitive than cytology alone to detect baseline CIN2 and 3 and to detect cervical cancer at the end of the trial. Another randomized trial [65] of approximately 132,000 women, aged 30-59, conducted over 7 years in rural India also reported substantially higher sensitivity of HCII over cytology. Our analysis showed that while PCR was more sensitive but less specific than HCII in detecting ASCUS/LSIL, HCII was more sensitive and more specific than PCR in detecting HSIL. Our findings, therefore, support the use of HCII because of its clinical relevance. The 2006 Consensus Guidelines for the Management of Women with CIN or Adenocarcinoma in situ [66] recommended that patients with CIN1 preceded by ASCUS or LSIL be followed-up with either HPV DNA testing every 12 months or repeated cytology every 6-12 months. CIN1 is heterogeneous in that it may be ASCUS; however, it may also include LSIL, ASC-H, or even HSIL [67]. Both high-risk and low-risk HPV types may be present in CIN1 lesions [68,69]. Additionally, several studies show that in the absence of treatment there is a high rate of spontaneous regression of low-grade cervical lesions [70][71][72] and that CIN1 unusually progresses to CIN2 or CIN3 [62,73]. For example, a study by Moscicki et al. [71] reported that in more than 91% of adolescents and young women with LSIL, lesions cleared spontaneously within 36 months. These findings, together with ours support the use of HCII in a cervical screening program. This has clinical importance, as a recent report from the US Preventive Services Task Force (USPSTF) [74] concluded that there was insufficient evidence to recommend HPV testing for cervical cancer screening. We noticed that the estimated accuracy of HCII (HSIL: sensitivity = 0.82, specificity = 0.78; LSIL: sensitivity = 0.66, specificity = 0.91) from the USPSTF report came from the 1999 study by Cuzick et al. [75] of older women. As Castle [76] pointed out, the conclusion from the USPSTF was reached without the results of randomized, controlled trials in Europe and India [60][61][62]65], and HPV testing in the US was not evaluated. As more evidence accumulates for HPV testing, the results of our meta-analysis could be used as an additional tool for public health professionals as they decide the best test for their specific cervical screening programs.
The major strengths of our meta-analysis are the use of both the STARD [22] and MOOSE [23] reporting guidelines for study selection, data analysis, and comparison of test accuracy, which led to the important condition that both tests must be present in the same article. This eligibility criterion enabled us to minimize a substantial source of interstudy heterogeneity. The other strength is that our search allowed us to capture articles and studies from 1999 to the present, which is the time that HCII has been most widely used. The other strength is our inclusion of other important clinical outcomes that allows more conservative application of colposcopy.
The main limitation in our meta-analysis is that we did not include the technique/device for sample preparation (i.e., collection of cervical cells and DNA extraction methods) and age of the study participants, which could be two potential sources of interstudy heterogeneity. The large variety of sample collection and preparation methods (Table 1) prevented us from establishing meaningful groups for a categorical analysis. We also could not include the age variable in our meta-analysis because only two studies [31] (Luu et al.,submitted) provided the relevant information.
In summary, we found that while PCR is more sensitive but less specific than HCII in detecting ASCUS/LSIL, HCII has higher sensitivity and specificity than PCR in detecting HSIL, in both screening and diagnostic settings. Given the clinical relevance and importance of cervical cancer worldwide, our results support the use of HCII in cervical screening programs. Also the role of HPV type distribution should be explored to determine the worldwide comparability of HPV test accuracy. While cost of the test has a consideration for any screening program, it appears that the cost of both HCII and PCR has reduced overtime. Further studies on the cost-effectiveness of HCII over PCR in a cervical screening program are, therefore, warranted.