Measuring sensitivity in gynecologic cytology: A review

Authors

  • Andrew A. Renshaw M.D.

    Corresponding author
    1. Department of Pathology, Baptist Hospital of Miami, Miami, Florida
    • Department of Pathology, Baptist Hospital of Miami, 8900 N. Kendall Dr., Miami, FL 33176
    Search for more papers by this author
    • Fax: (305) 598-5986


Abstract

BACKGROUND

The sensitivity of gynecologic cytology has been measured in several different ways. The current review summarizes the major sources of bias and the results of these efforts.

METHODS

In the current study, a review of the literature was conducted.

RESULTS

The major sources of bias in measuring the sensitivity of gynecologic cytology are a lack of reproducibility, bias in the review process, bias in case selection, selection and correction of the gold standard, and the value of surrogate markers. Despite these sources of variation, the sensitivity of the Papanicolaou (Pap) smear is relatively stable, ranging from 50–75% if a single consistent threshold is used, to up to 94% if either Autopap-directed rescreening or thin-layer methods are used to diagnose high-grade squamous intraepithelial lesions using a threshold of atypical squamous cells of undetermined significance. Methods for the routine evaluation of sensitivity currently are not available and may be difficult to devise.

CONCLUSIONS

The sensitivity of the Pap smear in study situations is well known. Whether these reflect performance in real life is not known, and methods to compare performance in real life are not available. Cancer (Cancer Cytopathol) 2002;96:000–000. © 2002 American Cancer Society.

Sensitivity in gynecologic cytology, one of the most important measures of this test, routinely is evaluated in several different ways. Laboratory directors evaluate the sensitivity of those individuals working in their laboratories, clinicians (or, more often, insurance companies) chose which laboratory to send their work to, and experts evaluate the sensitivity of laboratories and individuals in the courts on an all-too-frequent basis. However, sensitivity in gynecologic cytology is measured only rarely, that is, it is unusual to observe any but the most basic quantitative measures (measures that do not allow direct comparisons to be made) employed in these evaluations. There are several reasons for this. Accurate measurement of sensitivity is hard to achieve, requires significant time and effort, is not reimbursed, and is difficult to justify. In addition, at the current time there appears to be little desire to measure sensitivity accurately. The vast majority of laboratory directors believe that their own laboratories are performing satisfactorily, many laboratories already have significant reputations for sensitivity, and the majority of accurate measurements that have been made to date are not necessarily flattering. Nevertheless, the potential power of quantitative measures that can compare sensitivity accurately is tremendous. In particular, to determine whether new technologies actually are improving sensitivity, one needs to have quantitative measures of sensitivity that can be compared. Although technologies may be shown to perform well in study situations, this is no guarantee that these same tests will perform well in real-life laboratory situations, which are known to vary considerably. In addition, valid quantitative measures would restrict experts in the legal setting from expressing opinions that are contradictory to the known sensitivity of the test. Finally, quantitative measurements are a de facto requirement for cytopathology to be a science.

There are five major issues that most commonly cause disagreement when measuring sensitivity in gynecologic cytology: reproducibility, bias of the review, bias in case selection, selection of a gold standard and verification/correction of its accuracy, and the value of surrogate markers. The current review summarizes the literature regarding these five issues and presents what appear to be the best available measurements of sensitivity in gynecologic cytology. It is hoped that by presenting these data all together future efforts at measuring the sensitivity of the Papanicolaou (Pap) smear will be encouraged and some of the limitations of previously collected data will be avoided.

Reproducibility

The relatively poor reproducibility of the interpretation of Pap smears is well known. Using relatively small sets of cases, several studies have shown that an individual clinician is likely to agree exactly with their own previous diagnosis (intraobserver variability) approximately 78% of the time1 and to agree exactly with others (interobserver variability) only 28–72% of the time.1–7 Recent data from the ASCUS-LSIL Triage Study (ALTS) are similar, demonstrating that the interobserver agreement for cytology in a consecutive nonselected series of monolayer Pap smears still is only approximately 60% (kappa [κ] = 0.468). It is interesting to note that the ALTS trial also demonstrated that exact concordance regarding the interpretation of both cervical biopsies and loop electrosurgical excision procedure (LEEP) biopsies also was only approximately 60% (κ values = 0.46 and 0.49, respectively).

In general, this lack of reproducibility is believed to decrease the overall accuracy of the Pap smear. However, this may not be entirely correct. For example, it may be not only prudent but more accurate for those clinicians practicing in a relatively high-risk legal setting with a low incidence of disease and minimal clinical support/history to interpret Pap smears differently from those who work in a relatively protected, high-incidence environment with excellent clinical support. Indeed, what constitutes the “best“ interpretation may vary considerably depending on the circumstances. Nevertheless, this lack of reproducibility appears to be a problem when trying to measure the sensitivity of a test such as the Pap smear. How can one have confidence in a result when neither the test nor the gold standard is reproducible? The answer to this question is that reproducibility is a measure of precision, not accuracy. One still can have accurate measurements without great precision. The best conclusion to this situation is that when measuring the sensitivity of a test with a relatively nonreproducible method, the answer should, if possible, be expressed in terms of a range of values generated using a variety of interpretations rather than as a single figure generated using only the “best” interpretation.

Bias in the Review Process

The subjectivity involved in the evaluation of the Pap smear is well known and allows the potential for bias in its interpretation. Indeed, one of the principle reasons for developing guidelines for the review of Pap smears in the context of litigation9 was to address and reduce the potential bias in the review process. The principle on which these guidelines were developed was that of blinded review, that is, the method that generates the most useful results is one in which the reviewer examines the Pap smear in a context that is most similar to the setting in which it originally would have been examined, without knowledge (i.e., blinded) of the clinical outcome or previous opinions concerning the case. Although this principle in the form of the guidelines has been supported by the vast majority of cytopathology societies, cytopathologists, and cytotechnologists, to my knowledge data that measure the effect of blinded review in the context of litigation currently are not available.

However, the principle of blinded review is not limited to the context of litigation. Indeed, blinded review also is the method of choice for reviewing Pap smears in the context of measuring sensitivity. For example, the well known method of reviewing negative smears alone to generate a measure of the sensitivity of Pap smear screening is an excellent example of a biased, nonblinded review method because the reviewer knows that the slides were reviewed already and were diagnosed as negative and this may reduce the sensitivity of the review process. Indeed, the sensitivity of this method for atypical squamous cells of undetermined significance (ASCUS) has been shown to be only 26%,10 and the sensitivity for low-grade squamous intraepithelial lesions (LSIL) was 0%. In contrast, when previously negative cases are reviewed using an AutoPap (TriPath Imaging, Inc., Burlington, NC)-directed technique (another biased though very different method), the sensitivity for ASCUS is 64% and that for LSIL is 41%.10 In both cases the sensitivity was only measurable because of the study setting; in real life the exact sensitivity of neither method is measurable. Nevertheless, in either case, the apparent sensitivity of Pap smear screening will be overestimated when the sensitivity (i.e., bias) in the review method is not accounted for, and the results obtained using these methods therefore are of limited utility.

In contrast, mandatory rescreening of previously negative cases after an initial diagnosis of high-grade squamous intraepithelial lesion (HSIL) or greater11–13 also is a well known and biased review that many believe is overly sensitive and thus underestimates the true sensitivity of Pap smear screening. Sensitivities of only 50% for the primary screening of HSIL and above have been reported with this technique.11 To my knowledge, the sensitivity of the review process itself is not known.

Finally, the majority of large clinical studies have used a consensus panel of experts to resolve discrepant opinions generated in the study. Although this method has been recommended by an intersociety working group,14 it has been shown by several authors to be biased and to overestimate the sensitivity of Pap smear screening.15–17 Other methods to reduce potential bias in the review process have been proposed, including directly accounting for differences in diagnostic thresholds17 or multiple repeated blinded rescreenings,16 but to my knowledge these have not yet been tested.

At the current time, the bottom line is that although it is well known that the review of Pap smears is subjective and easily biased, and that blinded review is well accepted as the preferred and most useful method with which to evaluate Pap smear sensitivity, there currently is no accepted method with which to perform blinded review in the context of measuring the sensitivity of interpreting Pap smears.

Bias in Case Selection

All Pap smears are not created equal. Even smears with the same diagnosis vary from those that are textbook examples to those that are extremely difficult to interpret. In addition, the incidence of disease and difficulty in the interpretation of the Pap smears a laboratory interprets can vary considerably.18, 19 As a result, two laboratories with the exact same sensitivity for screening Pap smears may appear to have very different sensitivities if the cases that one laboratory has to screen are significantly different from those in the other laboratory.

To my knowledge there currently are no measures of difficulty that have been applied to Pap smears, although one can imagine factors such as the number of atypical cells, the degree of cytologic changes, and obscuring factors being quantified to create such a measure that could be verified by sensitivity in repeated blinded review. Nevertheless, subgroups of Pap smears with similar difficulties have been identified by a variety of means.

For example, cases that are highly reproducible and therefore relatively easy to identify have been defined. The College of American Pathologists' Interlaboratory Comparison Program in Gynecologic Cytopathology (PAP) contains validated slides identified by confirmation by at least 3 board-certified cytopathologists and validation by prior performance among participants to achieve a 90% concordance with the correct general diagnosis and a 50% concordance with the exact diagnosis.20 For these validated slides, pathologists achieve a sensitivity of > 96%, and cytotechnologists consistently achieve an even higher rate of nearly 99%.

This group of reproducibly identifiable cases may represent a small proportion of all abnormal cases. Theoretic evaluation of the available data involving repeated review of routine cases has suggested that the percentage of all abnormal cases that can always be identified by routine screening is small (≤ 6%).16

In contrast, false-negative cases, which often may be the subject of litigation, have been shown to be difficult to identify consistently.21–23 These would include cases identified on review for an initial diagnosis of HSIL or worse.11–13 The percentage of all abnormal cases that this type of case would comprise is not known.

Finally, the existence of Pap smears of differing difficulty is a hurdle that needs to be overcome if one wishes to use seeding of abnormal cases to determine the sensitivity of screening.24, 25 The measured sensitivity using this technique will depend heavily on the difficulty of the cases that are seeded into the routine material. Because to my knowledge no technique with which to measure the difficulty of cases currently exists, any seeding technique that does not use a random selection of abnormal cases (which would include a random selection of cases that are not identified by the initial screening technique) may be biased by the cases that are chosen to be seeded.

The variability in the difficulty of the Pap smears to be interpreted can affect the measured sensitivity of Pap smear interpretation. Although very easy and very difficult cases can be identified, and the sensitivity of these types of cases can be used to define the limits of sensitivity in general, better methods of measuring the difficulty of the slides reviewed in a laboratory or study are necessary to permit more appropriate comparisons of the results.

Selection of a Gold Standard and Correction/Verification of Its Accuracy

By definition, to measure sensitivity one requires a separate measurement that assesses truth (the gold standard). Unfortunately, there is no perfect gold standard for gynecologic cytology. Even the most rigorously clinically relevant gold standard, clinical follow-up with biopsy, is imperfect because of imperfect sampling26 and regression of disease. Indeed, this idea is strongly supported by the common clinical practice of recommending cone biopsy in those patients with a diagnosis of HSIL on Pap smear and negative colposcopy/biopsy. It also is supported by the fact that in a recent large clinical study that used a combination of results to define final case diagnosis, > 10% of all patients with a final diagnosis of HSIL or greater had that diagnosis made based solely on the basis of the ThinPrep smear® (Cytyc Corporation, Boxborough, MA).27 Finally, a recent large study from China28 of 1997 patients using 6 separate testing modalities including 4-quadrant biopsy in women with negative results demonstrated a a sensitivity for colposcopy-directed biopsy of only 81%.

The bottom line is that there are significant limitations to the use of even the best available gold standard. Thus, no matter which gold standard is employed, some effort must be made to account for errors. However, to my knowledge few studies published to date29 have employed such techniques.

Gold Standard: Rescreening of Conventional Smears

To my knowledge, three large, blinded, rescreening studies currently exist.10, 30, 31 The reported sensitivities at a single threshold (i.e., both the test screening and the gold standard using the same threshold) of ASCUS or LSIL are shown in Tables 1 and 2, respectively. The sensitivity both before and after the application of a consensus panel to resolve discrepancies is shown. The true sensitivity most likely lies between the estimates of these two different methods. It is important to note that the sensitivity is similar for the threshold of ASCUS and LSIL, and averages 54–76% depending on the method used to evaluate discrepancies, not the diagnostic threshold used.

Table 1. Best Estimates of the Sensitivity of Primary Human Screening from Large, Interlaboratory, Blinded Rescreening Trials Using ASCUS as the Threshold
StudyNo. of patientsInitial sensitivitySensitivity after consensus review
  • ASCUS: atypical squamous cells of undetermined significance.

  • Each study had two arms; each arm had its own sensitivity.

  • a

    Not reported. Estimated based on 27% of the review diagnoses being changed back to the original diagnosis on consensus.

  • b

    Not clearly blinded.

Yobs et al.3019,4725268
  4157
Renshaw et al.1025,1245578
  3980
Keenlyside et al.3140,19569a75
  77ab83b
Mean 5473
Table 2. Best Estimates of the Sensitivity of Primary Human Screening from Large Double-Screening, Blinded Trials Using LSIL as the Threshold
StudyNo. of patientsInitial sensitivitySensitivity after consensus review
  • LSIL: low-grade squamous intraepithelial lesion.

  • Each study had two arms; each arm had its own sensitivity.

  • a

    Not reported. Estimated based on 27% of the review diagnoses being changed back to the original diagnosis on consensus.

  • b

    Not clearly blinded.

Yobs et al.3019,4726183
  4281
Renshaw et al.1025,1244771
  4869
Keenlyside et al.3140,19562a69
  81ab86b
Mean 5776

Few studies of primary screening have enough cases of HSIL or higher with which to measure a statistically meaningful sensitivity for primary screening. One study with 70 cases and a consensus panel review found a sensitivity of 74%,10 which is similar to the rates found at ASCUS or LSIL. If a clinician employs more than one threshold (i.e., the sensitivity of Pap smears for a gold standard for HSIL using ASCUS as a threshold in the test screening), the sensitivity is higher, ranging from 83–99% (Table 3).

Table 3. Best Estimates of the Sensitivity of Primary Human Screening from Large, Double-Screening, Blinded Trials Using ASCUS as a Threshold for HSIL in the Other Arm
StudyNo. of patientsSensitivity after consensus review
  • ASCUS: atypical squamous cells of undetermined significance; HSIL: high-grade squamous intraepithelial lesion; NA: not available.

  • Each study had two arms, each arm had its own sensitivity.

  • a

    Based on use of AutoPap system-directed rescreening.

  • b

    Not clearly blinded.

Yobs et al.3019,47295
  83
Wilbur et al.3225,12494a
  NA
Keenlyside et al.3140,19599b
  85
Mean 91

The sensitivity of Autopap primary screening using directed rescreening at a single threshold of ASCUS was determined to be 86%.32 To my knowledge this is the highest sensitivity yet achieved at this single threshold in a blinded review of unselected cases.

The results of the ALTS are interesting.8 This study involves blinded rescreening of a highly selected group of patients, all with a previous abnormal smear, with an abnormal rate of > 50% and rescreening of thin-layer preparations. Thus, there is reason to believe that the results from this study may not reflect those reported in routine screening with a lower incidence of disease. Nevertheless, even though this study often is cited as an example of the poor reproducibility of Pap smear interpretation, the sensitivity of the two arms using the other as a gold standard was 80% and 87%, respectively, at the threshold of ASCUS; 96% and 97%, respectively, at the threshold of LSIL; and 97% and 97%, respectively, using ASCUS as a threshold for a gold standard of HSIL in the other arm. These results effectively demonstrate that reproducibility and accuracy are independent measures, and that poor reproducibility does not exclude accuracy.

Gold Standard: Second Thin-Layer Preparation

These results are summarized in Tables 4 and 5 and.

Table 4. Sensitivity of the Conventional Pap Smear Using the Thin-Layer Pap Test as a Gold Standard
StudyNo. of patientsSensitivity at ASCUSSensitivity at LSILSensitivity at HSIL
  • Pap: Papanicolaou; ASCUS: atypical squamous cells of undetermined significance; LSIL: low-grade squamous intraepithelial lesion; HSIL: high-grade squamous intraepithelial lesion.

  • a

    Adapted from the Australian terminology.

Lee et al.337360608164
Hutchinson et al.278636284553
Roberts et al.34a35,560627875
Mean 506864
Table 5. Sensitivity of the Thin-Layer Pap Test Using the Conventional Pap Smear as a Gold Standard
StudyNo. of patientsSensitivity at ASCUSSensitivity at LSILSensitivity at HSIL
  • Pap: Papanicolaou; ASCUS: atypical squamous cells of undetermined significance; LSIL: low-grade squamous intraepithelial lesion; HSIL: high-grade squamous intraepithelial lesion.

  • a

    Adapted from the Australian terminology.

Lee et al.337360658962
Hutchinson et al.278636524852
Roberts et al.34a35,560738587
Mean 637467

To my knowledge three large studies exist.27, 33, 34 In general, these studies rely on a direct comparison of the results of the conventional smear compared with those of thin-layer samples without the use of a consensus panel. Overall, the sensitivity of the conventional smear using the thin-layer smear as a gold standard was 50% at the level of ASCUS, 68% at the level of LSIL, and 64% at the level of HSIL. The sensitivity of the thin-layer technique using conventional smears as a gold standard was 63% at the level of ASCUS, 74% at the level of LSIL, and 67% at the level of HSIL.

The sensitivity of conventional and thin-layer preparations for HSIL as the gold standard using a threshold of ASCUS is summarized in Table 6. With the exception of one study,27 both methods have a sensitivity at this threshold of ≥ 90%.

Table 6. Sensitivity of the Thin-Layer Pap Test and Conventional Pap Smear for HSIL at the Threshold of ASCUS Using the Other as the Gold Standard
StudyNo. of patientsSensitivity of conventional Pap smearSensitivity of thin-layer Pap
  • Pap: Papanicolaou; HSIL: high-grade squamous intraepithelial lesion; ASCUS: atypical squamous cells of undetermined significance.

  • a

    Adapted from the Australian terminology.

Lee et al.33736010094
Hutchinson et al.2786366989
Roberts et al.34a35,56096100
Mean 8894

Gold Standard: Repeated Pap Smear at a Subsequent Time

This gold standard often is overlooked. Nevertheless, studies using this method were developed as early as 1974 and appear to be statistically valid. The sensitivity of 60% for “dysplasia,” 55–80% for carcinoma in situ, and 76% for invasive carcinoma are in keeping with the results of studies using other gold standards.29

Gold Standard: Biopsy

A study from 1935 following 14,859 women with annual cervical examination using a Schiller test and biopsy of any abnormality suggested that the sensitivity of the Pap smear was approximately 78% for HSIL (carcinoma in situ) and higher. False-negative cases primarily were the result of sampling.35 To the best of my knowledge this remains the largest study with systematic biopsy of every identified abnormality, but does not include biopsies in any woman without abnormal findings.

The lack of biopsies in those patients without abnormal test results is a valid statistical criticism. Indeed, a recent study28 involving 1997 patients suggested that the sensitivity of colposcopy is only 81%. With this in mind, the influential review by the Agency for Health Care Policy and Research36 suggested that the overall sensitivity of the Pap smear, including all sources of error (sampling and screening) using biopsy as the gold standard (and including only studies of biopsies in patients without colposcopic abnormalities), is 51% and the specificity is 98%. However, this conclusion was based on only 3 studies with a total of 1812 patients, and in the largest study (containing 1539 of the 1812 patients) the two-slide Pap smears were performed by “residents and university physicians.”37

Fortunately, since the publication of this review, two additional large studies have been performed, each with at least some biopsies performed in women without abnormal test results. Indeed, in both these studies multiple tests were performed, including human papillomavirus testing, and all three studies concluded from their data that essentially all cases of disease were identified. These three studies as well as the two previous studies are summarized in Table 7. As with prior studies using different gold standards, the sensitivity for all SILs was 63%, and that for HSIL was 69%. Results using thin-layer techniques are shown in Table 8. These results demonstrate a higher sensitivity for the thin-layer technique for all SIL cases (88%) and for HSIL cases (94%). However, whether the differences between conventional and thin-layer techniques are statistically significant is not known because one study was relatively small and did not include conventional cytology,28 and the other study contained multiple methodologic uncertainties, including the exclusion of > 8% of the patients in the study with “equivocal” final diagnoses.27

Table 7. Sensitivity of the Conventional Pap Test at the Threshold of ASCUS Using Biopsy as the Gold Standard
StudyNo. of patientsSensitivity for all SILSensitivity for HSIL
  • Pap: Papanicolaou; ASCUS: atypical squamous cells of undetermined significance; SIL: squamous intraepithelial lesion; HSIL: high-grade squamous intraepithelial lesion; NA: not applicable.

  • a

    Visual inspection and Schiller test.

  • b

    Colposcopy plus cervicography, plus biopsy of 10% of negative women.

  • c

    Colposcopy, ThinPrep test, human papillomavirus testing, and biopsy of 2% of negative women; 8% of patients in the study had an “equivocal” final diagnosis.

Friedell35a14,859NA78%
Baldauf et al.37b153956%65%
Hutchinson et al.27c863669%65%
Mean 6369
Table 8. Sensitivity of the Thin-Layer Pap Test Using Biopsy as the Gold Standard
StudyNo. of patientsSensitivity for all SILSensitivity for HSIL
  • Pap: Papanicolaou; SIL: squamous intraepithelial lesion; HSIL: high-grade squamous intraepithelial lesion; NA: not available.

  • a

    Colposcopy, ThinPrep test, human papillomavirus testing, and biopsy of 2% of negative women. The range was generated by assuming all 8% of women in the study had an “equivocal” final diagnosis.

  • b Colposcopy, human papillomavirus testing (×2) fluorescence spectroscopy, 100% 4-quadrant biopsy of all negative women.

Hutchinson et al.27a863688%94%
Belinson et al.281997NA94%

Summary of Gold Standard Data

Despite the many different choices for a gold standard (rescreening the same Pap smear, comparing it with a simultaneously or subsequently obtained smear of a similar or different type, colposcopy, or biopsy), the most striking thing regarding these results is how similar they are. In general, if one uses a single diagnostic threshold, the sensitivity of the Pap smear ranges between 50–75%, regardless of what threshold one chooses. Similarly, if one uses a diagnostic threshold of ASCUS to detect an HSIL determined by another method, the sensitivity of a conventional smear is approximately 90% (except in those cases when biopsy is the gold standard, in which the sensitivity is only 70%) whereas that of the thin-layer technique is approximately 94%. Whether this difference is significant is not known at the current time.

Surrogate Markers

Recent studies have emphasized that for clinical purposes a classification of HSIL and above is the more meaningful gold standard rather than all SIL.38 Although this may be true, this creates a very real problem for the cytology laboratory in that the incidence of HSIL is so low that one essentially will never be able to generate meaningful data at this threshold for use in quality control within the laboratory. As a result, for quality assessment purposes, one is forced to examine surrogate markers.

Unfortunately, to my knowledge no good surrogate marker has emerged to date. A review of the available studies demonstrates that the ASCUS/SIL ratio does not appear to correlate with the measured sensitivity of the test (unpublished data). In addition, the sensitivity of the test at one threshold (i.e., ASCUS), although similar to those at other thresholds (i.e., LSIL), is not similar enough to predict the sensitivity at HSIL reliably (unpublished data). In addition, current publishing trends that do not even report the results of large studies in regard to findings other than HSIL will make it more difficult to explore the value of these findings at other thresholds. Although other surrogate markers exist, such as sentinel cases (i.e., the percentage of all HSIL cases that are identified on the basis of relatively few cells), the value of these markers is not known. To my knowledge there currently is no method for determining or estimating the sensitivity of the Pap test in any laboratory on a routine basis.

Conclusions

When all these data are reviewed, several conclusions can be reached. First, despite all the potential problems associated with bias and the selection of a gold standard, in the majority of studies the sensitivity of conventional Pap smears and thin-layer techniques at any one threshold using any gold standard at the same threshold is consistently between 50–75%. If one adds Autopap-directed rescreening to conventional smears, the sensitivity at the single threshold of ASCUS increases to 84%. If one uses a threshold of ASCUS for the detection of HSIL in any gold standard, the sensitivity of the conventional smear is approximately 90% for every gold standard except biopsy, which is only 70%. If one either adds Autopap-directed rescreening or switches to thin-layer techniques, the sensitivity is 94%. The statistical significance of this difference is not known. Taken together, the approximate sensitivity of the Pap smear in study situations actually is fairly well defined. Whether these numbers reflect sensitivity in real-life studies is not known.

In addition, small subgroups of variably well characterized Pap smears have been identified based on repetitive rescreenings that are identified both very reproducibly (98%) as well as very inconsistently (< 50%). The percentage of all abnormal Pap smears represented by these cases is not known.

Unfortunately, to my knowledge there currently are no available techniques with which to actually measure the sensitivity of Pap smear interpretation on a routine basis. Although methods for this have been proposed, they have not yet been tested to the best of my knowledge, and even a general agreement regarding the best approach to the problem is not clear. As a result, although evaluation and comparisons of individuals and laboratories continue to be made on a daily basis, these evaluations are not scientific assessments but rather subjective assessments made on the basis of reputation and data that do not reflect the actual sensitivity of the test. At the current time no meaningful comparisons can be made between the sensitivity of individuals or laboratories, and the standard of practice as applied to individuals and laboratories remains unmeasurable.

Ancillary