Accuracy of colposcopy-directed punch biopsies: a systematic review and meta-analysis

Authors


CWE Redman, Department of Gynaecological Oncology, University Hospital of North Staffordshire, Newcastle Road, Stoke on Trent ST4 6QG, UK. Email charles.redman@uhns.nhs.uk

Abstract

Please cite this paper as: Underwood M, Arbyn M, Parry-Smith W, De Bellis-Ayres S, Todd R, Redman CWE, Moss EL. Accuracy of colposcopy-directed punch biopsies: a systematic review and meta-analysis. BJOG 2012;119:1293–1301.

Background  The colposcopy-directed punch biopsy is widely used in the management of women with abnormal cervical cytology; however, its accuracy compared with definitive histology from an excision biopsy is not well established.

Objectives  To assess the accuracy of the colposcopy-directed punch biopsy to diagnose high-grade cervical intraepithelial neoplasia (CIN) by performing a systematic review and meta-analysis.

Search strategy  A systematic search of MEDLINE, EMBASE and the Cochrane Library was performed.

Selection criteria  Articles that compared the colposcopically directed cervical punch biopsy with definitive histology from an excisional cervical biopsy or hysterectomy.

Data collection and analysis  Random effects and hierarchical summary receiver operating characteristic regression models were used to compute the pooled sensitivity and specificity applying different test cut-offs for outcomes of high-grade CIN.

Main results  Thirty-two papers comprising 7873 paired punch/definitive histology results were identified. The pooled sensitivity for a punch biopsy defined as test cut-off CIN1+ to diagnose CIN2+ disease was 91.3% (95% CI 85.3–94.9%) and the specificity was 24.6% (95% CI 16.0–35.9%). In most of the studies, the majority of enrolled women had positive punch biopsies. Pooling of the four studies where the excision biopsy was performed immediately after the punch biopsy, and where the rate of positive punch biopsies was considerably lower, yielded a sensitivity of 81.4% and specificity of 63.3%.

Author’s conclusion  The observed high sensitivity of the punch biopsy derived from all studies is probably the result of verification bias.

Introduction

Cervical intraepithelial neoplasia (CIN) has been shown to progress to invasive cervical cancer in a proportion of cases cases, with the more severe the abnormality, the greater the risk of malignant progression.1 The detection of cytological abnormalities through screening has resulted in a fall in the incidence of cervical cancer because of treatment of these pre-invasive lesions.2 Colposcopic examination of the cervix allows assessment of the abnormality before it is treated, either by excision or ablation.3–5 The colposcopically directed punch biopsy is a cornerstone of colposcopic practice because it allows a small piece of cervical tissue, typically <5 mm in diameter, to be taken to confirm the clinical impression because colposcopy alone is known to miss approximately one-third of high-grade CIN.6–8

In the management of CIN2+ the punch biopsy is primarily used to confirm the diagnosis of a high-grade abnormality, thereby reducing the number of unnecessary treatments and the associated morbidity.9,10 The punch biopsy also plays a role in the management of women undergoing ablative treatment for CIN because pretreatment biopsies are required to exclude invasive disease.11,12

Despite its widespread use, there is increasing concern over the accuracy of the colposcopically directed punch biopsy to diagnose the presence or absence of high-grade CIN. A Norwegian study has recently shown that of 520 women whose colposcopy-directed biopsies were reported as negative, 78 women (23.8%) were found to have CIN2+ in a follow up biopsy.13 Many studies over the past five decades have been performed attempting to compare the histological diagnosis obtained from a punch biopsy with a reference standard diagnosis obtained from an excisional biopsy. Results were highly variable. Recent reports have revealed a lower sensitivity of colposcopy and colposcopy-based biopsies than was generally expected previously and have raised considerable concerns about the probability of missed CIN2+.14 Various reasons have been proposed to explain this low sensitivity, including insufficient experience of the colposcopist, inability to target the abnormal area with the biopsy forceps and the occurrence of lesions not being visible on colposcopy. To quantify the ability of colposcopy-based punch biopsies to diagnose the presence or absence of cervical precancer, a systematic review and meta-analysis was conducted.

Methods

Outcome measures

The aim of the review was to assess the sensitivity and specificity of colposcopically targeted punch biopsies of the uterine cervix to detect CIN2, CIN3, adenocarcinoma in situ or cervical cancer using subsequent histological assessment of excision biopsies (large loop excision of the transformation zone, laser or cold knife conisation) or hysterectomy specimen as the reference standard.

Retrieval of studies and data extraction

Relevant references were searched from MEDLINE (1966–2011), EMBASE (1980–2011) and the Cochrane Library. The last search took place in April 2011. The key words used for the search were ‘Cervix’, ‘Cervical’, ‘Biopsy’, ‘Colposcopy’, ‘Punch Biopsy’. Literature retrieval was completed by hand searching of reference lists from selected articles and reviews, conference abstracts and contact with experts. Reports were selected if they provided the cross-tables of the histological results of a cervical punch biopsy and of the subsequent reference standard. The grades of CIN were used for the histological classification of epithelial lesions of the cervix.15

Data extraction from the selected reports was performed by two independent readers (MU and DA) who compiled the cross-tables with histological results of punch biopsy versus the reference standard. The QUADAS tool was used to assess the quality of each study included in the review.16

Statistical analysis

From the cross-tables the number of true and false positive and negative cases were calculated considering two cut-offs of test positivity for the punch biopsy (CIN1+, CIN2+) and two thresholds of disease outcome assessed by the reference standard (CIN2+ and CIN3+). Adenocarcinoma in situ was included in the CIN3+ category. For each test cut-off/outcome combination, the sensitivity, specificity and predictive values were computed. Random-effects models were used for pooling of the sensitivity and specificity separately.17 Inter-study heterogeneity in accuracy parameters was assessed with Cochran’s Q test.18

Metandi (a procedure of Stata, version 10.1; Stata Corp., College Station, TX, USA) was used for the joint computation of the absolute sensitivity and specificity, which integrates both the bivariate model19 and the hierarchical summary receiver operating characteristic.20 This procedure incorporates the intrinsic negative correlation between the sensitivity and specificity and it allows for sparse data.20–22

Results

Inclusion of studies

A total of 211 studies were identified, of which 141 were excluded at the initial screening because of inappropriate study design, or population, test or reference standards not being met. The remaining 70 papers were collected and reviewed further. Of these, three were discarded because they were a review article or letter, three were discarded because they did not include histological outcomes, seven were discarded for lack of adequate description of the histology, 20 were discarded because of the inability to create cross-tables and five were discarded because they had no reference standard or index test. Figure S1 (see Supplementary material) demonstrates the PRISMA flow chart detailing the process for inclusion and exclusion of reports. Finally, 32 papers,6,8,23–53 published between 1969 and 2011, could be included in the systematic review, comprising of 10 598 women (see Supplementary material, Table S1).

Study characteristics

All papers commented on the reason for referral to colposcopy but only four stated that the women had not undergone previous treatment to the cervix.8,23,36,50 In three studies the study population was confined to either low-grade cytological abnormalities8,52 (borderline/mild dyskaryosis; atypical squamous cells of unknown significance/low-grade squamous intra-epithelial lesion) or high-grade cytological abnormalities50 (moderate/severe dyskaryosis; high-grade squamous intra-epithelial lesion).

The majority of papers were retrospective reviews of the outcome of biopsies taken in colposcopy clinics, although in many cases there was no clear description of the study design. In only four studies was the punch biopsy taken immediately before the definitive biopsy.8,35,50,53 The total number of women included in the studies was 10 598; however, there were only 7873 paired punch/definitive histology biopsy results reported. The majority of the papers that had equal numbers of women recruited and completing the study were retrospective reviews of data. Most of the studies automatically excluded women from receiving punch biopsy if obvious cervical cancer was encountered at the colposcopy, for the woman to receive more definitive treatment quickly.

Accuracy of punch biopsies

Test cut-off of punch biopsies at CIN1+

The pooled sensitivity for a punch biopsy defined as test cut-off CIN1+ to diagnose CIN2+ disease was 91.3% (95% CI 85.3–94.9%) and the specificity was 24.6% (95% CI 16.0–35.9%) (Figure 1; Table 1). Tests for heterogeneity were significant, P = 0.001 for sensitivity and P = 0.041 for specificity. The reported sensitivities ranged from 55.9 to 100.0%. Studies that reported 100% sensitivity were retrospective in design.25,28,29 Three out of four of the prospective studies with immediate excision biopsies were at the upper end of the specificity spectrum 67.5–79.4%, but showed lower sensitivity (55.9–82.8%).8,35,53 Four studies reported specificities of <5% but sensitivities between 87.4 and 99.2%.34,36,39,40

Figure 1.

 Accuracy of punch biopsies (CIN1+) to detect CIN2+. n = 25 studies. (A) sensitivity; (B) specificity.

Table 1.   Pooled sensitivity and specificity of punch biopsies at cut-offs CIN1+ and CIN2+ for detection of underlying CIN2+ and CIN3+
 Outcome CIN2+Outcome CIN3+
Cut-offSensitivitySpecificitySensitivitySpecificity
CIN1+91.3 (85.3–94.9)24.6 (16.0–35.9)91.1 (83.7–95.4)18.2 (11.3–27.9)
CIN2+80.1 (73.2–85.6)63.4 (50.9–76.7)83.6 (74.9–89.8)44.5 (34.3–55.2)

When using the disease threshold of CIN3+, the sensitivity was similar (91.1%; 95% CI 83.7–95.4%) but a substantially lower specificity was noted (18.2%; 95% CI 11.3–27.9%) (Figure 2).

Figure 2.

 Accuracy of punch biopsies (CIN1+) to detect CIN3+. n = 22 studies. (A) Sensitivity; (B) specificity.

Test cut-off of punch biopsies at CIN2+

When looking at the ability of punch biopsies at a cut-off of CIN2+, the sensitivity fell to 80.1% (95% CI 73.2–85.6%) for the outcome CIN2+ and 83.6% (95% CI 74.9–89.8%) for the outcome CIN3+, whereas the specificity rose considerably, to 63.4% (95% CI 50.9–76.7%) and 44.5% (95% CI 34.3–55.2%), for disease thresholds of CIN2+ and CIN3+, respectively (Figure 3). The range of values for sensitivity and specificity remained wide for both CIN2+ and CIN3+ cut-offs and inter-study heterogeneity was significant (P < 0.001).

Figure 3.

 Summary receiver operating characteristic curves representing the sensitivity and specificity of punch biopsies for high-grade CIN. (A) Accuracy of punch biopsies (CIN1+) to detect CIN2+; (B) accuracy of punch biopsies (CIN1+) to detect CIN3+; (C) accuracy of punch biopsies (CIN2+) to detect CIN2+; (D) accuracy of punch biopsies (CIN2+) to detect CIN3+.

Sensitivity analysis

To investigate the accuracy of the punch biopsy according to the proportion of positive punch biopsies in each study an analysis was performed to determine test specificity (at cut-off CIN1+ to detect CIN2+) in studies where the rate of positive punch biopsies was <95%, <90%, <80% and <70% (Figure 4). The pooled sensitivity decreased when only studies with <70% of positive punch biopsies were included in the meta-analysis compared with all studies, from 94.6% (95% CI 90.5–97.0%) to 80.6% (95% CI 73.7–86.0%), whereas the specificities increased, from 25.1% (95% CI 16.4–36.5%) to 59.4% (95% CI 45.8–71.7%) (Table 2). Pooling of the four studies where the excision biopsy was performed immediately after the punch biopsy yielded a sensitivity of 81.4% (95% CI 77.6–85.1%, heterogeneity P = 0.38), and specificity of 63.3% (95% CI 49.2–77.4%, heterogeneity P = 0.004) (Figure 5). The average test positivity rate in these studies was 63.3% (95% CI 49.2–77.4%).

Figure 4.

 Accuracy of punch biopsies (at cut-off of CIN1+) to detect underlying CIN2+ including studies where the rate of positive punch biopsies is (A) <95%, (B) <90%, (C) <80% and (D) <70%.

Table 2.   Pooled sensitivity and specificity (percentage in brackets) of punch biopsies at cut-off CIN12+ to diagnose CIN2+, including studies according to the positivity rate of the punch biopsies
Restriction inclusionNo. of studiesSensitivitySpecificity
All studies2594.6 (90.5–97.0)25.1 (16.4–36.5)
<95% test positivity1788.0 (83.3–91.4)40.1 (29.8–51.4)
<90% test positivity1585.0 (81.9–87.6)43.9 (34.0–54.3)
<80% test positivity1182.9 (79.9–85.6)50.4 (40.6–60.1)
<70% test positivity580.6 (73.7–86.0)59.4 (45.8–71.7)
Figure 5.

 Sensitivity and specificity of punch biopsies at CIN1+ to detect CIN2+ in four studies where punch and subsequent excision biopsy were performed in one operation.

Similar results were observed for the other test cut-off and disease outcome combinations (see Supplementary material, Appendix S1).

Discussion

This is the first study to quantify the ability of colposcopy-based punch biopsies to diagnose the presence or absence of cervical pre-cancer by performing a systematic review and meta-analysis of the published medical literature. Attempting to quantify the ability of the punch biopsy to accurately identify the presence of high-grade CIN is an important task because it is its diagnostic ability that underpins its utility in clinical practice. Management decisions, whether to treat or monitor abnormal cervical cytology, are determined in many cases by a punch biopsy result and therefore an understanding and appreciation by the user of its error rate is essential in the aim of avoiding over-treatment or under-treatment. This is becoming increasingly important with the growing awareness of pregnancy-related morbidity associated with excisional biopsies.9,10,54

Limitations of the meta-analysis

The studies included in this meta-analysis span over five decades, four continents and an evolution of colposcopic equipment and imaging. The referral pathway, number of punch biopsies taken and reference standard used varied, with the majority of studies being retrospective in nature comparing biopsies taken with a variable time interval. The comparison that will give the most valid picture between punch/definitive histology will be in those studies where the biopsies were performed contemporaneously because there is a theoretical risk that performing a punch biopsy will stimulate an immune response and therefore change the histology of the definitive biopsy.15 Only in four studies was the excision biopsy performed immediately after the punch biopsy.8,35,50,53 In the majority of the other studies the extent of the time delay between biopsies was not recorded, however, delays of over 12 weeks were noted in seven of the studies.6,24,31,32,41,45,51 It is therefore possible that any CIN present in the cervix following the punch biopsy may have started to regress spontaneously and the true extent of high-grade CIN has been under-reported resulting in a greater level of under call by the punch biopsy. The studies where the biopsies were taken contemporaneously reported a lower level of sensitivity compared with the studies with a time delay but a higher level of specificity for CIN1+ for a cut-off of CIN2+, sensitivity 66.7–82.8% versus 77.5–100.0% and specificity 41.2–79.4% versus 0.0–60.0%.

Another issue in the meta-analysis is the occurrence of verification bias caused by unbalanced partial verification with an over-representation of women with positive biopsies, which is a type of selection bias. Not offering an excisional biopsy to women with negative punch biopsies is considered good clinical practice; however, this results in an inflation of sensitivity and a reduction in specificity. Because of this selection very few punch negative cases are referred, thereby decreasing (by selection) the number of true negatives. Ultimately this results in very low observed specificity rates. The same type of selection avoids finding several false negatives because few punch-negative cases are referred for an excisional biopsy thereby reducing the probability of finding false negatives. Because it is quite usual to apply excision to the majority of punch biopsies, the effect of partial verification is less prominent in sensitivity than in specificity. A more realistic estimate can be obtained by including only studies with a balanced verification, such as studies with test and reference standard applied in one operation or studies with a test positivity rate reflecting the true rate of positive punch biopsies in a given population. Another method to minimise the partial verification bias is to relax STARD criteria, proposing application of test and reference standard in a very short period of time and to incorporate information derived from longer-term follow up over women with negative punch biopsies.

Potential factors influencing the accuracy of colposcopy-directed biopsies, for example the number of biopsies, cytological result before the biopsy, screening history, quality control of the interpretation of punch and excisional biopsies, the interval between punch and excision biopsy and study design issues (QUADAS items) were insufficiently reported in the majority of studies to allow for a comprehensive multivariate analysis.

The number of punch biopsies

It has been shown that increasing the number of biopsies increases the detection rate of CIN355 as does the taking of random biopsies from apparently normal cervical tissue.56,57 We have demonstrated that the pooled sensitivity for a single punch biopsy is 90%, if one or more punch biopsies are performed this increases to 93%, and if multiple biopsies were always performed then the sensitivity would be in the order of 100% (intergroup heterogeneity P < 0.001). This supports Pretorius’ findings, who observed that colposcopy-targeted biopsies missed a proportion of high-grade CIN,57 but that some of the missed disease may be picked up by random biopsies. The TOMBOLA trial identified a false-negative rate associated with punch biopsy but concluded that it did not have an impact on the clinical outcome because missed cases would be picked up at the next round of screening.58 Missed disease might be an over-diagnosis of a small regressive focus of CIN2/3 and more research is needed to determine whether the high-grade CIN missed by directed biopsies but detected on random biopsies actually has a clinical impact on patient outcome. The answer of how many biopsies to perform will be dependent upon the level of sensitivity/specificity the clinician and patient is prepared to accept knowing the inherent failure rate of the punch biopsy and the limitations of colposcopy in diagnosing high-grade CIN.

Punch biopsy in low-grade abnormalities

Three studies included in this meta-analysis limited their populations to suspected high-grade or low-grade lesions, which will have altered their sensitivity/specificity rates because the prevalence of disease would not be the same as the unselected study populations. Two studies focused specifically on low-grade disease, one large study containing 492 women where the definitive biopsies were taken after a time interval and a smaller study containing 68 women where the biopsies were taken simultaneously. When using the punch biopsy to detect CIN3+ the sensitivity and specificity varied between the two studies, 87.1% versus 50% and 33.6% versus 96.7%. The negative predictive values for the detection of CIN3+ were 87.7% and 93.5%. Therefore we can conclude that in women with a suspected low-grade lesion on cytology and colposcopy the colposcopically directed punch biopsy has a moderate negative predictive value for CIN3 disease.

Colposcopy

When looking at the reason for missed high-grade CIN the question arises as to whether this is a function of colpo-scopy, either not identifying the abnormality or failing to sample an identified abnormal area. Developments in colposcopy and improvements in the images obtained are associated with improved detection of CIN2+ rates,59 but apparently normal cervix, even under enhanced imaging, has been shown to contain CIN2+ disease in 25% of cases of women referred to colposcopy with any cytological abnormality or undergoing follow up for a CIN1 or CIN2 lesion.60 Therefore this implies that it is colposcopy rather than the punch biopsy itself that is the limiting factor for detecting CIN2+ because a targeted biopsy cannot be used to detect disease that is not visible.

Conclusion

In conclusion, the observed high sensitivity and low specificity of the colposcopy-directed punch biopsy for high-grade CIN might be a result of verification bias. The sensitivity looks high but is probably a spurious finding caused by the fact that most studies restricted excision mainly to women with a positive punch biopsy.

Disclosure of interests

None of the other authors have any interests to declare.

Contribution to authorship

CWE and RT conceived the idea for the study. MU, WP-A and DB-A performed the systematic review under the supervision of MA. Statistical analysis was performed by MA. EM, MU and MA wrote the manuscript and all authors approved the final version.

Details of ethics approval

None.

Funding

None.

Acknowledgements

MA received financial support from: (1) the European Commission (Directorate of SANCO, Luxembourg, Grand-Duchy of Luxembourg), through the ECCG project (European Cooperation on development and implementation of Cancer screening and prevention Guidelines, IARC, Lyon, France) and the 7th Framework Programme of DG Research through the PREHDICT project (grant No. 242061, coordinated by the Vrije Universiteit Amsterdam, the Netherlands); (2) the Belgian Foundation Against Cancer (Brussels, Belgium); and (3) FNRS (le Fonds national de la Recherche scientifique), through TELEVIE, Brussels, Belgium (ref 7.4.628.07.F).

Ancillary