Use of human papillomavirus DNA testing to compare equivocal cervical cytologic interpretations in the United States, Scandinavia, and the United Kingdom
Human papillomavirus (HPV) DNA testing may be useful in clarifying equivocal cervical cytologic interpretations. One application might be to standardize the meaning of equivocal interpretations from laboratories in various regions. Because international differences may be particularly marked, international comparisons of emerging data will require clear translations of “equivocal” and similar terms.
To perform a three-country comparison, the authors selected a morphologically diverse set of 188 conventional Papanicolaou tests initially classified as “squamous atypia” from a study of more than 20,000 women in Portland, Oregon (1989–1990). Previously, five U.S. expert cytopathologists independently interpreted the slides with screening cytotechnologists' marks in place. For this comparison, one British and two Scandinavian reviewers involved in HPV research reviewed the slides after original marks had been removed. The authors compared all eight reviewers' classifications of negative, equivocal, or abnormal in a series of pairwise comparisons using the kappa statistic. They then compared cytologic interpretations with HPV DNA testing.
Oncogenic HPV DNA detection was significantly associated with increasingly abnormal interpretations for each reader. The British reader tended to rate tests as more abnormal than the American pathologists did, whereas the Scandinavians tended to rate tests as more normal. Reference to the HPV DNA standard clarified the tendency of readers to render systematically more or less severe interpretations. For example, the Scandinavian cytologists discounted subtle (often HPV-associated) changes in favor of cytologic certainty, making HPV triage of equivocal tests less applicable there.
International research on cytopathology, particularly on the possible uses of HPV DNA testing, will require calibration of local cytologic definitions. Cancer (Cancer Cytopathol) 2002;96:14–20. © 2002 American Cancer Society. DOI 10.1002/cncr.10317
Equivocal cervical cytologic interpretations represent an extremely common, costly, and labor intensive problem for many cervical carcinoma prevention programs. Consequently, large and intensive research efforts are underway in several countries to increase the accuracy of programs by incorporation of testing for the etiologic agent of cervical carcinoma, human papillomavirus (HPV).1–6 There is already substantial evidence that HPV testing can be used to manage equivocal cervical cytology.7–12
However, cytologic diagnostic practices are known to vary between regions and countries. Proper interpretation and development of a coherent understanding of data related to the efficacy of different management strategies for equivocal cytology requires a clear, shared sense of how the biologic and clinical associations of these interpretations vary between and within countries.10, 12, 13
Efforts to develop precise criteria for equivocal cytologic interpretations that would promote strong interobserver agreement have failed. In the U.S., the category atypical squamous cells of undetermined significance (ASCUS) is largely an interpretation of exclusion, applied when a cytopathologist is unable to classify a specimen either as completely normal or as a definite squamous intraepithelial lesion (SIL).14, 15 This assessment is highly subjective.16 In a formal interobserver reproducibility study of 200 slides initially classified as “squamous atypia, ” there was not a single slide in which the interpretation of ASCUS was unanimously confirmed, demonstrating that this interpretation is highly subjective and irreproducible.17
Clearly, interobserver agreement between individual pathologists working in different countries is expected to vary even more than among pathologists within countries, because inherent interpretative variation is further compromised by differences in terminology and clinical practices. However, a critical but understudied question is whether there are systematic biases in interpretation between countries that could affect international comparisons of research results. For example, if thresholds for ASCUS and SIL are systematically higher or lower in different countries, the performance of ancillary triage methods in these different clinical environments will also likely vary. In the U.S., the ASCUS LSIL Triage Study found that HPV DNA testing represents a viable option for colposcopy triage of ASCUS if proven cost-effective.13 The implications for Europe might be quite heterogeneous depending on how the threshold for equivocal interpretations varies among European countries. Comparing cytologic interpretations from different, distant laboratories to an objective, reproducible HPV DNA standard might provide a useful means of assessing relative diagnostic practices.
The U.S., Scandinavia, and the U.K. are three areas where investigators are actively trying to improve cervical carcinoma prevention via the use of HPV DNA testing. The investigators conducted this interlaboratory project to investigate whether HPV DNA testing could help them translate their findings freely without concern about miscommunication based on differences in the utilization of cytologic terms.
MATERIALS AND METHODS
The study slides were collected in 1989–1990 in cervical cytology screening clinics at Portland Kaiser Permanente, as part of the enrollment phase of a large natural history study.17–19 Kaiser Permanente had not yet adopted the Bethesda System and still was using a modified dysplasia scale for cytologic interpretations. To cover the range of equivocal interpretations, we selected 200 conventional Papanicolaou (Pap) smears using stratified random sampling from computer records. The Pap smears represented minimal abnormalities: 113 originally were classified as “severe reactive atypia, possible dysplasia” (of 492 such interpretations among 22,564 total cohort enrollment slides), and the remaining 87 were chosen from the large (n = 4015), adjacent category of “mild reactive atypia.” We did not include more clearly negative or abnormal (e.g., CIN 1) smears. Twelve slides could not be located or were broken before meaningful numbers of reviews, leading to a study set of 188 slides reviewed by at least four pathologists.
U.S. Cytopathology Review
All pathology reviewers in the U.S. and Europe were masked to each other's interpretations and the HPV DNA data. The Kaiser Permanente cytopathology team (headed by D.R.S.) rescreened and reinterpreted available slides, using prepared data collection forms. The pathologist was asked only to make the distinction between negative, ASCUS, or SIL (with no distinction retained between low-grade SIL and high-grade SIL). Marks placed by screening cytotechnologists to indicate the location of possibly abnormal cells were retained as the slide set was shipped to four additional experts (M.E.S., R.J.K., N.B.K., and M.H.S.). As previously reported,17 we observed only mediocre agreement between the American experts, with poor agreement particularly with regard to the interpretation of ASCUS. However, there was a strong relation between collective certainty of SIL and the detection of HPV DNA in concurrently collected specimens as performed by consensus primer polymerase chain reaction (PCR) (M.M.M.) or other sensitive HPV DNA assays.
Specifically, we created a SIL “certainty score” for each slide by adding the scores of each of the five pathologists. Each interpretation of definite SIL equaled 1.0, ASCUS equaled 0.5, and negative equaled 0. Oncogenic HPV DNA was detected in 10 (83.3%) of 12 women with slides that received unanimous interpretations of SIL (certainty scores of 5 in total). Conversely, slides unanimously interpreted as negative (certainty scores of 0 in total) were associated with a low prevalence of oncogenic HPV DNA (7.7% of 39) like the cytologically normal women in the larger Portland study cohort.18 To create a composite, three-category U.S. interpretation analogous to negative vs. ASCUS vs. SIL, we divided the certainty scores into three categories as follows: negative (certainty score of 0.0–1.0), equivocal (certainty score of 1.5–3.0), SIL (certainty score of 3.5–5.0). We were able to compute the trichotomous U.S. certainty scale for 186 tests that all 5 pathologists considered adequate for interpretation.
European Cytopathology Reviews
The European comparison was conducted on the same slides after an intermediate study involving removal of the cytotechnologists' screening marks.20 The slide set was shipped first to the U.K. for screening and review by a British reader (P.M.). In Scandinavia, independent reviews by B.H. and A.H incorporated rescreening by a single cytotechnologist. The British review was performed using local “dyskaryosis” terms, which then were translated into the three-category scale of negative (negative, negative but inflamed), equivocal (borderline dyskaryosis, possible wart virus infection), and SIL (dyskaryosis with or without wart virus infection). The Scandinavian reviewers used the CIN cytologic diagnostic system21, 22 as currently used in Sweden. For comparability with the Bethesda system, the Scandinavian interpretations then were translated into normal (negative; inflammatory), equivocal (ASCUS; atypia), and SIL (all grades of CIN with or without koilocytosis). Cytologic evidence of HPV infection alone is considered a form of “normal” cytology in Scandinavia.
HPV DNA Testing
HPV DNA testing methods are described in earlier publications.17, 23 We used three different methods, all of which generated the same conclusions. For this presentation, we have presented the consensus primer PCR test results for oncogenic types of HPV. At the time of the original study, only probes for HPV-16, -18, -31, -33, -35, -39, -45, -51, -52, and -58 were used, somewhat limiting the assay's clinical sensitivity. As shown in previous work, low-risk types of HPV are not related to cytologic certainty and are not useful for management of equivocal cytologic interpretations.17
To assess international differences in cytopathology interpretations, we compared the two Scandinavian and one British interpretation to the trichotomous U.S. certainty scale. In addition, we compared the European reviewers to each of the American pathologists individually. The comparisons were performed using the unweighted kappa statistic, which adjusts for chance agreement between reviewers. Interpretations of the kappa statistic vary, but roughly a kappa value of < 0.0 represents disagreement (like a negative correlation), a value of > 0–0.2 represent slight agreement, > 0.2–0.4 is fair agreement, > 0.4–0.6 indicates moderate agreement, > 0.6–0.8 shows substantial agreement, and > 0.8–1.0 is nearly perfect agreement.24, 25
We also computed the asymmetry chi-square for each interreader comparison, which indicates whether one reviewer reads systematically higher (more severe) or lower than the other. Finally, we compared the pathology results to the HPV test results as an independent point of comparison, using contingency table methods (i.e., independent proportions and related chi-square tests).24
The mean age of the 188 women in the study was 34 years (median, 32 years; range, 16–71 years). Seventy-eight of the tests had an original interpretation of benign reactive atypia, and 110 were called severe reactive atypia, possibly dysplasia.
Table 1 shows the trichotomous interpretation for each of the seven reviewers and the U.S. certainty score. Systematic international differences were noted. The Scandinavian reviewers tended toward more normal and fewer equivocal and SIL interpretations than the American or British readers. Among the American reviewers, pathologist 1 was most likely to call tests normal, whereas reviewer 5 was least likely. The interpretation from each reviewer was compared with HPV DNA test results. For all reviewers, the percentage of oncogenic HPV DNA detection clearly increased with severity of interpretation. However, the tendency of Scandinavian reviewers to render more normal interpretations was reflected in the corresponding HPV DNA positivity percentages, that is, many women with tests interpreted as normal were HPV DNA positive. The U.S. interpretations were most highly associated with oncogenic HPV DNA, as indicated by individual chi-square tests for trend (data not shown). Of peripheral interest, low-risk HPV types were found commonly in the cases of definite SIL without oncogenic types but also were commonly found with ASCUS or negative interpretations (data not shown).17
Table 1. Eight Independent Pathology Reviews and Associated Oncogenic HPV DNA Prevalencea
|U.S. certainty scaleb||“Normal”||85||46.7||12.9|
Table 2 summarizes the pairwise comparisons of each reviewer to each other, giving the kappa statistic, and the asymmetry chi-square P value, as well as the general conclusion from each statistical comparison. The U.S. consensus scale was composed of the individual pathology readings; thus, it was not compared with the individuals except to indicate that U.S. pathologist 1 tended toward less severe interpretations whereas U.S. pathologist 5 tended toward more severe interpretations than the U.S. consensus.
Comparisons of Interpretations between Reviewers, Including Kappa Statistics and Asymmetry Chi-Square Tests
The one British reviewer tended systematically toward more severe interpretations compared with the U.S. pathologists. Conversely, the Scandinavian reviewers both gave significantly less severe interpretations than either the U.S. or British readers. To summarize the overall order of severity of review, the ranking appeared to be U.K., U.S. 5, (U.S. 2, 3, 4 as a group), U.S. 1, Scandinavian 2, and Scandinavian 1.
As shown in Table 3, HPV test results were used to clarify the three comparisons between the trichotomous U.S. certainty scale and the European reviewers. These results showed that the Scandinavian reviewers' tendencies toward lower interpretations were because of a systematic shift relative to the HPV DNA tests. Specifically, many HPV-associated tests were called negative rather than equivocal or SIL. As a result, this would leave virtually no equivocal cases to triage using HPV DNA testing. SIL was called only rarely but was highly likely to be HPV DNA–associated, again ruling out a possible use of HPV DNA triage. Direct comparisons of the Scandinavian reviewers to each of the U.S. reviewers individually yielded the same conclusion.
Table 3. HPV DNA Test Results (Percentage Oncogenic DNA Positive), Stratified by U.S. Consensus Cytologic Interpretations and European Interpretations
|U.S. negative||7/73 (9.6)||3/7 (42.9)||1/1 (100.0)|
|U.S. equivocal||18/36 (50.0)||3/7 (42.9)||2/3 (66.7)|
|U.S. SIL||20/30 (66.7)||3/5 (60.0)||10/11 (90.9)|
|B. U.S. consensus vs. second Scandinavian pathologist|
|Scand. 2 negative (%)||Scand. 2 equivocal (%)||Scand. 2 SIL (%)|
|U.S. negative||8/73 (11.0)||0/4 (0.0)||3/6 (50.0)|
|U.S. equivocal||17/39 (43.6)||2/4 (50.0)||5/6 (83.3)|
|U.S. SIL||14/19 (73.7)||3/3 (100.0)||18/26 (69.2)|
|C. U.S. consensus versus British reader|
|U.K. negative (%)||U.K. equivocal (%)||U.K. SIL (%)|
|U.S. negative||7/57 (12.3)||1/16 (6.3)||3/12 (25.0)|
|U.S. equivocal||7/18 (38.9)||6/12 (50.0)||11/19 (57.9)|
|U.S. SIL||3/3 (100.0)||2/3 (66.7)||29/41 (70.7)|
Although less striking a difference, the British reader tended to give more severe interpretations than the U.S. consensus score, whether or not oncogenic HPV DNA was detected.
This analysis suggests that there may be systematic differences in the interpretation of equivocal cervical cytology between academic readers in the U.S., Scandinavia, and the U.K, relative to an independent HPV DNA test standard. The sharing of cytotechnologists' markings among the U.S. reviewers could partly explain their similarity but is unlikely to explain the U.S. group tendency toward intermediate interpretations, i.e., more severe than the Scandinavians and less severe than the U.K. reader. It would be wrong to generalize too much from the examples of a few readers from each country. We believe, however, that some noteworthy points can be made.
In Scandinavia, there is an emphasis on cytologic specificity and Scandinavian pathologists tend to limit abnormal interpretations to slides with definite findings while using the equivocal category sparingly. Our HPV DNA test data confirmed that Scandinavian pathologists tend to undercall subtle HPV-associated lesions compared with the U.S. and U.K. experts. Conversely, when the Scandinavian pathologists did interpret a slide to be abnormal, their opinion was commonly corroborated by the other readers and by HPV testing.
There are advantages and disadvantages to the Scandinavian approach of classifying tests with HPV effect but no unequivocal nuclear changes as normal. It addresses concerns regarding the economic and public health impact of overdiagnosis and builds on the Scandinavian success in achieving high rates of repeated screening. In the U.S., sensitivity is paramount because of the emphasis on individual safety as opposed to public health and the concern over litigation if a precancerous lesion is missed and/or a woman is lost to follow-up.26
The apparent emphasis on specificity in Scandinavia suggests strongly that use of HPV DNA testing for triaging mildly abnormal tests would be less cost-effective there because the specificity of diagnosis is already high (like LSIL in the U.S.27). This further suggests that introduction of secondary HPV screening will require estimation of cost efficiency for each country and setting, using relevant cytologic collections. As a note of caution, the current evaluation slide set from Portland, Oregon was useful in comparing the utilization of terminology but might not suffice in representing all aspects of the Scandinavian screening effort. For example, a population-based study of HPV DNA presence in the Swedish screening program indicated that HPV DNA testing might be useful in triaging particularly among women age > 40 years.28
The British reader participating in our study was much more like his American counterparts, although he tended to classify more tests as SIL. If representative of British readers as a whole, this finding would suggest that studies of HPV triage of equivocal cytology in the U.S. and the U.K. might have broadly similar implications for clinical practice in both countries.
Replication of results in multiple settings is central to good epidemiologic research and the development of sound public health policies. Hopefully, in some settings it may be possible to avoid costly clinical management studies of equivocal cytology when large-scale and costly randomized studies already have been performed elsewhere. It would be much less expensive and time-consuming for public health planners in different countries to perform good population-based studies establishing the comparability of different cytologic terms than to replicate large clinical trials.
In this regard, we have observed that the presence of oncogenic HPV DNA types to be elevated in equivocal cytologic diagnoses from all seven reviewers in the U.S., Scandinavia, and the U.K. On the basis of the data, the performance of HPV DNA testing would differ internationally. HPV DNA triage of equivocal cytologic interpretations might perform roughly the same in the U.S. and the U.K. However, HPV DNA testing might be less cost-effective in Scandinavia, where pathologists use analogous cytologic terminology, but with an emphasis on specificity that decreases the need for a secondary triage test. It will be interesting to distribute this and other slide sets to additional readers in other countries.
The authors thank the collaborators at Kaiser Permanente in Portland, Oregon (particularly Andrew G. Glass, Brenda B. Rush, and Patti Lawler) for their exceptional long-term dedication to the project.