• diagnostic error;
  • pathology;
  • patient safety;
  • cancer;
  • interobserver agreement


  1. Top of page
  2. Abstract


To the authors' knowledge, the frequency and clinical impact of errors in the anatomic pathology diagnosis of cancer have been poorly characterized to date.


The authors examined errors in patients who underwent anatomic pathology tests to determine the presence or absence of cancer or precancerous lesions in four hospitals. They analyzed 1 year of retrospective errors detected through a standardized cytologic–histologic correlation process (in which patient same-site cytologic and histologic specimens were compared). Medical record reviews were performed to determine patient outcomes. The authors also measured the institutional frequency, cause (i.e., pathologist interpretation or sampling), and clinical impact of diagnostic cancer errors.


The frequency of errors in cancer diagnosis was found to be dependent on the institution (P < 0.001) and ranged from 1.79–9.42% and from 4.87–11.8% of all correlated gynecologic and nongynecologic cases, respectively. A statistically significant association was found between institution and error cause (P < 0.001); the cause of errors resulting from pathologic misinterpretation ranged from 5.0–50.7% (the remainder were due to clinical sampling). A statistically significant association was found between institution and assignment of the clinical impact of error (P < 0.001); the aggregated data demonstrated that for gynecologic and nongynecologic errors, 45% and 39%, respectively, were associated with harm. The pairwise kappa statistic for interobserver agreement on cause of error ranged from 0.118–0.737.


Errors in cancer diagnosis are reported to occur in up to 11.8% of all reviewed cytologic-histologic specimen pairs. To the authors' knowledge, little agreement exists regarding whether pathology errors are secondary to misinterpretation or poor clinical sampling of tissues and whether pathology errors result in serious harm. Cancer 2005. © 2005 American Cancer Society.

The diagnosis of many disease processes depends to a large extent on the pathologic assessment of tissues. The majority of cancer diagnoses are made on the basis of histologic or cytologic evaluation. Consequently, diagnostic pathology errors may lead to incorrect patient management plans, including delays in treatment or the implementation of incorrect treatment regimens.1, 2 The reported frequency of anatomic pathologic errors ranges from 1–43% of all specimens, and the effect of these errors is unknown.1–14 Diagnostic pathology error frequency and effect are poorly characterized, partly because of the lack of uniform measurement processes, a lack of understanding of when an error has occurred, and fear of disclosure.

Anatomic pathology errors are detected by several methods.2 The most commonly used method is secondary review, in which a second pathologist reviews slides previously examined by a first pathologist.5 Pathologists employ different types of secondary review. For example, the Clinical Laboratory Improvement Amendments of 1988 (CLIA '88) require correlation of patient material in which same-site cytologic and surgical specimens are obtained (e.g., sputum cytology and lung biopsy specimens) and the two pathologic diagnoses are discrepant (e.g., sputum is suspicious for cancer and the lung biopsy tissue is benign).15 Nearly all correlations are performed to detect potential errors in cancer diagnosis. Errors detected through correlation review may be classified as interpretive (i.e., the disease process is misclassified) or sampling (i.e., the specimen does not contain the diagnostic tissue).1 In general, interpretive errors are related to errors made in the pathology laboratory, whereas sampling errors are made either in the pathology laboratory (in tissue processing) or during tissue procurement.5

To our knowledge, a detailed study of the effect of errors in cancer diagnosis, such as those detected by cytologic-histologic (CH) correlation, is lacking. Based on CH review of nongynecologic cases performed at a single institution, Clary et al. reported that 2.3% of cytologic specimens and 0.44% of surgical specimens contained an error and 23% of errors had a marked effect on patient care.1 The current study examines the frequency and cause of anatomic pathology error at four institutions, the interinstitutional variability in assigning the cause of correlation error, and the clinical impact of anatomic pathology error on patient care.


  1. Top of page
  2. Abstract

Background and Design

In 2002, the Agency for Healthcare Research and Quality (AHRQ) funded four institutions to: 1) share deidentified anatomic pathology diagnostic error data using a Web-based database, 2) determine baseline error frequencies detected by different methods, 3) collect patient outcome information to determine the clinical impact of diagnostic errors, 4) perform root cause analysis to derive error reduction strategies, and 5) assess the success of these error reduction strategies using both quantitative and qualitative measures.5

We have added a different error detection method in each year of the project. In 2002, we began collecting errors detected by the CH correlation process. In this study, we used the Year 2002 data to establish CH correlation error frequencies, causes, and outcomes. Each institution obtained Institutional Review Board approval for the performance of this project.

Participating Sites

The four institutions are geographically located either in the mid-Atlantic region or the Midwestern region of the U.S.

Standardization of CH Correlation Review Process

Because CLIA '88 does not mandate how the CH process is to be performed, laboratories perform CH quite differently (Table 1), which leads to bias in error reporting.5 In the beginning of the project, we first standardized the CH correlation process in the four institutions. On a monthly basis, a cytotechnologist used an existing laboratory information system program to identify all patients who had both cytology and surgical specimens from the same anatomic site that had been obtained within 6 months of each other prior to the date of review. A designated “review” pathologist selected cases in which the cytologic and surgical specimens were discrepant. The cytotechnologist then retrieved the patient slides and reports and generated a hardcopy review sheet. The review pathologist examined the material and determined the cause of error.

Table 1. Pre-study Methods of Cytologic–Histologic Correlation Error Case Detection and Review
SiteMethod of case retrievalTime interval used for searchDetermination method for case reviewPrescreening performed by cytotechnologistReviewerArbitration method
AComputer search for all correlating surgical and cytology specimens followed by manual review for those anatomically correlating6-mo wide search using previous month's surgical pathology specimens to search for cytology cases. Search performed monthly.Two-step discrepancyNoDesignated group of three pathologistsCases shown to original patholo for input. Final decision based on reviewer
BSame4-mo wide search. Reviewed periodically as convenientAny disagreementYesOne pathologistNone
CSame12-mo search. Cases reviewed bimonthlyTwo-step disagreementYesTwo pathologistsNone
DPrevious cytology case for review identified only when triggered by positive surgical specimenReviewed dailyAny disagreementNoPathologist signing out surgical caseCase shown to original cytolog if disagreement

Definition of CH Error and Cause

We defined a discrepancy as a difference between the cytologic and histologic diagnoses.5 Because cytology and surgical diagnostic schema are somewhat different, we considered the diagnoses in a scaled categoric context to determine whether a discrepancy occurred. The categoric context was different if the specimens were gynecologic (e.g., Papanicolaou [Pap] test and cervical biopsy) or nongynecologic (e.g., lung brushing and biopsy) (Table 2). We defined a CH correlation error as at least a two-step discrepancy.5 We evaluated only two-step or greater CH correlation discrepancies because of the lack of reproducibility and the clinical import of one-step discrepancies.1, 16, 17 For example, a diagnostic error occurred if a patient's bronchial brush specimen was diagnosed as benign and the patient's lung biopsy specimen was diagnosed as nonsmall cell carcinoma. This example falls within the scope of the Institute of Medicine's definition of error because in at least one specimen, the definitive pathologic diagnosis was not reached.18

Table 2. Diagnostic Steps for Gynecologic and Nongynecologic Specimens
StepGynecologic specimensNon-gynecologic specimens
Cytology diagnosisSurgical diagnosisCytology diagnosisSurgical diagnosis
0No evidence of intraepithelial lesion or malignancy (NIL)BenignBenignBenign
1Atypical squamous cells of undetermined significance (ASC-US)No equivalentAtypical 
2Low-grade squamous intraepithelial lesion (LSIL)Cervical intraepithelial neoplasia of type 1 (CIN1)Suspicious 
3High-grade squamous intraepithelial lesion (HSIL)Cervical intraepithelial neoplasia of type 2 or 3 (CIN 2 or CIN 3)MalignantMalignant
4Invasive carcinomaInvasive carcinoma  

The review pathologist microscopically examined all slides and determined if the cytology, surgical, both diagnoses, or neither diagnosis was in error. The pathologist then assigned a “cause” of the error, using the categories of interpretation, sampling, or both.1 An interpretation error was an error in disease categorization, and this error was classified further as an overcall (if the review diagnosis was categorically lower than the original diagnosis) or an undercall (if the review diagnosis was higher than the original diagnosis). A sampling error was an error in which the diagnostic material was not present on the slide, even on review. Using the above example, if the review pathologist concurred with both the original lung biopsy and brushing diagnoses, a sampling error occurred in the brushing specimen, because material diagnostic of cancer was not present on the cytology slides.

CH Correlation Data Collection

We developed a two-part CH correlation data collection instrument. The first part contained pathology items, including the date of cytology and surgical specimen collection, specimen type, original and review diagnoses, original and review pathologists and cytotechnologists, limitations in specimen quality, and causes of error. The second part contained patient management and outcomes items, including additional tests ordered, unnecessary or additional treatment protocols initiated, morbidity or mortality related to additional tests or treatments, and delays in diagnosis. We performed a clinical record review on all nongynecologic errors, all gynecologic errors that had either original or review diagnoses of high-grade squamous intraepithelial lesion (HSIL)/cervical intraepithelial neoplasia (CIN) of type 2 or greater, and a random sample of 10% of all gynecologic errors that had either original or review diagnoses of less than HSIL/CIN 2.5 A 10% review was performed on this subset because of the lower likelihood of adverse outcomes.

A data collector reviewed the pathology CH correlation logs and pathology reports to complete the first part of the instrument. An honest broker reviewed the hospital electronic and hardcopy medical records to complete the second part of the instrument. An honest broker was a clinical outcomes data collector who was the only person exposed to clinical data linked to individual patient identifiers. Use of the honest broker satisfied the Health Insurance Portability and Accountability Act (HIPAA) requirements regarding the use of medical records data for research purposes. The data then were deidentified by the honest broker, and a pathologist assessed the clinical severity of the error, using the categories shown in Table 3. We devised this scheme based on error severity schema published in the medical literature,19–22 recognizing that error severity instruments have not been specifically designed for diagnostic pathology errors. The data collector entered the deidentified case data into the Web-based patient safety database.

Table 3. Categories of Error Clinical Severity
No harm: The clinician acted regardless of an erroneous diagnosis.
 Example: A patient had a lung mass and the clinician performed a bronchial washing and biopsy at the same time. The washing was diagnosed as malignant and the biopsy was diagnosed as benign (sampling error). The clinician acted on the malignant cytology diagnosis regardless of the surgical diagnosis.
Near miss: The clinician intervened before harm occurred or the clinician did not act on an erroneous diagnosis.
 Example: A patient had a lung mass and a bronchoalveolar lavage was obtained and diagnosed as benign (sampling error). The surgeon proceeded with a therapeutic surgical procedure because the radiological evidence supported the diagnosis of malignancy. The diagnosis on the surgical specimen was malignant.
Significant event:
 Minimal harm (Grade 1):
 a. Further unnecessary noninvasive diagnostic test(s) performed (e.g., blood test or non-invasive radiologic examination).
 b. Delay in diagnosis or therapy of ≤ 6 mos.
 c. Minor morbidity due to (otherwise) unnecessary further diagnostic effort(s) or therapy (e.g., bronchoscopy) predicated on the presence of (unjustified) diagnosis.
Moderate harm (Grade 2):
 a. Unnecessary invasive further diagnostic test(s) (e.g., tissue biopsy, re-excision, angiogram, radionuclide study, or colonoscopy).
 b. Delay in diagnosis or therapy of > 6 mos.
 c. Major morbidity lasting ≤ 6 mos due to (otherwise) unnecessary further diagnostic efforts or therapy predicted on the presence of (unjustified) diagnosis.
Severe harm (Grade 3): Loss of life, limb, or other body part, or long-lasting morbidity (lasting > 6 mos).

Data Analysis

We analyzed the Year 2002 CH correlation errors by stratifying the errors by institution, specimen type (gynecologic vs. nongynecologic), nongynecologic specimen anatomic site, cause of error, clinical management protocol, outcome, and clinical severity. A priori sample size calculations, assuming an error frequency difference of at least 2% (the smallest difference in an error frequency that we deemed clinically significant), a nondirectional alpha of 0.05, and a power of 0.80, showed that we needed a denominator of 1398 cases per institution to detect statistical significance.

Overall error frequencies were calculated for each institution using the number of CH correlation errors as a numerator. Error frequencies were calculated in two ways, using different denominators. First, we used the total number of discrepant and nondiscrepant correlating cytology and surgical specimen pairs a denominator.1 This measure expressed error frequency in terms of the total number of cases secondarily reviewed. The percentage of correlated cases in relation to the total cytology and surgical workload varied considerably by institution. Second, we used the total number of institutional cytology cases as a denominator. This measure expressed error frequency in terms of total laboratory case workload, recognizing that the majority of cases were not reviewed.

We retrieved denominator data from the institutional laboratory information systems using query tools for the number of correlating cytology and surgical pairs and overall 2002 cytology workload. Aggregated error frequencies for the entire 2002 dataset were calculated using weighted means.

We examined institutional differences in overall error frequencies, error frequencies by specimen type, cause of error (i.e., sampling or interpretation), and assessment of error severity using chi-square and Fisher exact tests. Statistical significance was assumed with a P value ≤ 0.05. All statistical analyses were performed using SPSS software (Version 11; SPSS Inc., Chicago, IL).

Pathologist Agreement on the Cause of CH Correlation Error

A confounding factor in the comparison of CH correlation errors is the interobserver variability23–27 of the review pathologists' assessment of error cause. To our knowledge, no previous studies have measured the level of pathologist agreement with regard to CH correlation error cause.

We selected a sample of 10 CH correlation errors (5 gynecologic errors with conventional Pap tests and 5 nongynecologic pulmonary errors with either bronchial brushes or bronchial washes) from each institution for review. Slides from all 40 cases were deidentified and assessed independently by the CH correlation review pathologists. Each pathologist examined the slides, recorded cytology and surgical diagnoses, and determined the cause of error. Estimates of agreement between reviewers were calculated using an unweighted kappa statistic. The agreement between the review pathologists' assignment and the original assignment at the institution and the agreement between the review pathologists' current assignments were measured. Pathologists rereviewed their own institutional cases, and the kappa statistic was used as an intraobserver measure of agreement.


  1. Top of page
  2. Abstract

Institutional CH correlation error frequencies for the Year 2002 are shown in Table 4. Error frequencies for nongynecologic specimens were higher than for gynecologic specimens. For some institutions, more than 1 of every 10 patients who had a correlating CH specimen pair had an error in diagnosis. Contingency tables showed that gynecologic and nongynecologic error frequencies, regardless of the denominator used, were dependent on institution (P < 0.001). Compared with the other three institutions, the gynecologic error frequencies of Institution A were higher (P < 0.001). Compared with Institution C, the gynecologic and nongynecologic error frequencies of Institution B were lower (P < 0.001).

Table 4. Institutional and Aggregated CH Correlation Errors in Cancer Diagnoses
InstitutionNo. of gynecologic errorsNo. of correlating casesError frequency using denominator of correlating casesTotal cytology workloadError frequency using denominator of workload
  1. CH: cytologic-histologic.

Aggregated690 4.00 0.30
 No. of non-gynecologic errorsNo. of correlating casesError frequency using denominator of correlating casesTotal cytology workloadError frequency using denominator of workload
Aggregated674 10.8 3.10

Table 5 shows the number of institutional nongynecologic errors by anatomic site. All the institutions showed a relatively high number of nongynecologic errors associated with specimens obtained from the urinary tract and lung. A variable number of errors were detected for some specimen sites. For example, Institution C reported that 19% of correlating specimens from the pleura were associated with an error, and Institutions B and D reported no errors in pleural specimens.

Table 5. Distribution of Institutional Non-gynecologic Errors by Anatomic Site
Anatomic siteNo. of errors (error percentage by total no. of correlating discrepant and non-discrepant cases specimens for Institutions A and C) by institution
  • GI: gastrointestinal.

  • a

    Includes ovary, pancreas, salivary glands, and pericardium.

  • For Institutions A and C, the percentage of errors is expressed as the number of errors from that anatomic site divided by the total number of discrepant and non-discrepant cytology-histology correlating pairs from that anatomic site examined during 2002. Institutions B and D were unable to obtain case-load values stratified by the non-gynecologic cytology anatomic site.

Biliary tract0 (0)67 (9)0
Bone/soft tissue0 (0)04 (8)1
 Brain0 (0)02 (4)0
 Breast3 (15)344 (13)0
 GI tract1 (4)26 (6)0
Urinary tract17 (11)2599 (25)3
 Liver0 (0)07 (16)1
 Lung48 (17)4680 (6)12
Lymph node1 (5)322 (16)3
 Pelvis0 (0)151 (13)0
Peritoneum0 (0)378 (16)1
 Pleura4 (13)034 (19)0
 Thyroid0 (0)126 (11)2
 Item missing0020

The institutional causes of error are shown in Table 6. Contingency tables showed a statistically significant association between institution and error cause for both gynecologic and nongynecologic errors (P < 0.001). Institutions A and B reported significantly fewer interpretation errors and Institution C reported significantly more interpretation errors than expected. The majority of errors were attributed to cytology, rather than surgical, sampling or interpretation; Institution D never attributed an error to surgical specimen interpretation or sampling.

Table 6. Distribution of Institutional Errors by Cause for Error
Specimen typeCause error InstitutionAggregated (%)
A No. (%)B No. (%)C No. (%)D No. (%)
  1. The reason that the institutional percentages for cause of error do not add up to 100% is that some institutions assigned more than 1 cause of error per case. The aggregate frequencies for error cause were calculated after forcing each case into only one cause category. Aggregated data were not calculated separately for cytology and surgical causes of error because of the low numbers in some categories.

GynecologicInterpretationCytology4 (3)7 (7)195 (45)3 (17)40
  Surgical3 (2)2 (2)23 (5)0 (0) 
 SamplingCytology110 (79)37 (36)114 (27)15 (83)60
  Surgical23 (17)61 (59)126 (29)0 (0) 
Non-gynecologicInterpretationCytology3 (4)16 (16)163 (34)2 (9)29
  Surgical1 (1)037 (8)0 (0) 
 SamplingCytology60 (81)76 (76)321 (67)21 (91)71
  Surgical13 (18)4 (4)36 (8)0 

The major clinical outcomes and error severity for gynecologic and nongynecologic errors are shown in Tables 7 and 8, respectively. Contingency tables showed that for both gynecologic and nongynecologic errors, a statistically significant association existed between institution and the assignment of clinical error severity (P < 0.001). For both gynecologic and nongynecologic errors, Institution A reported significantly more no-harm events and fewer harm events, Institution B reported significantly fewer no-harm events and more harm events, and Institution D reported significantly fewer harm events than expected (P < 0.001). For nongynecologic cases alone, Institution C reported significantly fewer no-harm events (P < 0.001). Institution D reported that errors in cancer diagnosis never resulted in patient harm.

Table 7. Major Clinical Outcomes and Severity of Gynecologic Errors
OutcomePercentage by institution
  1. Pap: Papanicolaou.

Patient lost to follow-up26.17.09.412.5
Repeat Pap test39.852.250.012.5
Colposcopy with additional sampling31.822.125.075.0
Ancillary therapy for cancer00.97.00
Not answered2.301.70
Clinical severity assessmentABCD
No harm83.311.715.6100
Near miss11.11.012.60
Harm, Grade 12.848.755.10
Harm, Grade 2023.315.00
Harm, Grade 304.90.30
Not answered2.85.81.40
Table 8. Major Clinical Outcomes and Severity of Non-gynecologic Errors
OutcomePercentage by institution
Patient lost to follow-up14.908.76.7
No specific follow-up documented9.
Routine monitoring for malignancy032.060.017.4
Additional cytology specimen obtained28.419.032.34.4
Additional surgical specimen obtained40.537.034.217.4
Ancillary cancer therapy52.754.044.052.2
Antibiotics or other non-chemotherapy medications12.21.010.513.0
Not answered001.70
Clinical severity assessmentABCD
No harm74.332.055.156.5
Near miss24.
Harm, Grade 1016.022.40
Harm, Grade 2045.015.10
Harm, Grade 302.01.90
Not answered14.90039.1

For the aggregated data, the frequency of error severity assignment for gynecologic errors was 46% for no-harm events, 8% for near-miss events, and 45% for harm events. The frequency of error severity assignment for nongynecologic errors was 55% for no-harm events, 5% for near-miss events, and 39% for harm events. If harm occurred, it generally was assessed as Grade 1 or 2.

Interobserver (40 cases) and intraobserver (10 cases) agreement concerning the review causes of error with the original assessment is shown in Table 9. Institution B exhibited worse agreement when reviewing their own cases compared with when reviewing cases from Institutions A and D. Pairwise agreement between the review pathologists is shown in Table 10 and was found to demonstrate high variability.

Table 9. Agreement between Originally Assigned Cause of Error (Interpretation vs. Sampling) and Reasons Assigned at Review
 InstitutionReview reason for error
  • CNBD: could not be determined.

  • a

    Pair-wise kappa statistics could not be determined when one of the observer pair used only one reason category for all cases reviewed (i.e., Institution D originally assigned the cause of error in all cases as sampling).

Original reason for errorA0.6150.4120.0240.286
Table 10. Agreement between Causes of Error (Interpretation vs. Sampling) for Each Institutional Slide Set (10 Slides per Set)
InstitutionSlide set A
  • CNBD: could not be determined.

  • a

    Pair-wise kappa statistics could not be determined when one of the observer pair used only one reason category for all cases reviewed (i.e., Institution D originally assigned the cause of error in all cases as sampling).

 Slide set B
 Slide set C
 Slide set D


  1. Top of page
  2. Abstract

It is exceedingly difficult to measure the true frequency of errors in cancer diagnosis because of the variety of detection methods used, bias, and the inability of institutions to secondarily review large case volumes.5 As part of a multiinstitutional, national effort to improve practice, we are in the process of standardizing methods, decreasing bias by sharing cases and data among institutions, and establishing more accurate error frequencies by detecting errors using multiple methods.5

In the current study, we reported cancer diagnostic error frequencies based on the CH correlation method, with the understanding that these are minimum frequencies because the majority of patient specimens are not reviewed. Assuming that our aggregated error frequencies are representative of all American laboratories, the minimum number of patients per year who have a Pap test/gynecologic histologic diagnostic error is 150,000 (assuming 50 million annual Pap tests), and that for patients who have a nongynecologic diagnostic error (assuming 5 million nongynecologic specimens) is 155,000.28 If the frequency of error were based on the denominator of correlating CH case pairs, these numbers would be 2–10 times higher.

The effect of diagnostic cancer errors on patient outcome is largely unknown.5 Clinicians often do not know when a diagnostic error has occurred and pathologists often have no knowledge of the effect. The study of errors in cancer diagnosis has been limited by the lack of taxonomy to classify error severity. We devised an error severity scale based on several factors such as morbidity, mortality, delay in diagnosis, and additional testing performed. Similar to other classification systems,19–22 we found that pathologists disagreed on the extent of harm caused by diagnostic errors.21, 29 Some institutions claimed that harm never occurred after an error in cancer diagnosis. Using aggregated clinical severity data and the number of errors calculated above, harm appears to occur in a minimum of 127,950 patients per year in the U.S. as a result of errors in the diagnosis of cancer in gynecologic and nongynecologic specimens.

Individual institutional error frequencies differed partly because of variable clinical and laboratory practices and partly because of biases that were difficult to control. Standardization of laboratory practices is haphazard. For example, our participant laboratories differed with regard to pathologist experience, subspecialty sign-out practice, training programs, methods of preparing specimens, and presign-out quality assurance methods, all of which potentially affected error frequency. We currently are measuring these processes to identify those that may be key to better outcomes. Several authors have reported regional variations in several medical practices such as organ-specific surgery and other treatment protocols.30–35 We believe that differences in test ordering practices contribute to clinical sampling error frequencies. For example, clinicians who bypass noninvasive cytologic diagnostic techniques for more invasive surgical techniques may have a higher rate of more accurate cancer diagnoses, but with associated higher costs, morbidity, and mortality.

The results of the current study demonstrate how false-positive and false-negative cancer diagnoses affect patient outcome. In the pathology literature, false-positive diagnoses usually are attributed to interpretation failures that may be avoided if pathologists learn potential pitfalls. The pathology culture is one of individual diagnostic responsibility and errors are not attributed to poorly designed systems that may be fixed.36 False-negative diagnoses often are attributed to inherent limitations in testing, with the solution being the adoption of better tests. For example, meta-analyses have demonstrated that the mean sensitivity of the conventional Pap test is 58%,37, 38 and the growth of some new Pap test technologies is aimed at improving sensitivity. Our goal is to use error data to maximize the sensitivity and specificity of tests for cancer diagnosis based on tissue sampling. For pathology laboratories, this entails reducing diagnostic variability and designing systems that decrease the probabilities of false-positive and false-negative results. For clinical systems, this means improving test sampling.

Variability in cancer diagnosis has been shown to exist for nearly every cancer type,23–27 although to our knowledge successful interventions at decreasing variability are rare. Page et al. showed that variability in the diagnosis of breast cancer was reduced after education and the adoption of standard histologic criteria.27, 39, 40 However, experts often do not agree on the standard criteria, and without a mechanism to force agreement and adherence, decreasing diagnostic variability is difficult. The diagnostic variability measured in the current study is an expression of institutional diagnostic differences related to all organ systems. We are attempting to use standardized criteria within and across laboratories and other processes, such as telepathology, to standardize diagnoses in CH correlation. Clearly, this is a first small step in decreasing national interobserver diagnostic variability, and the entire pathology community will need to play a role in this effort.

The standardization and uniform reporting of errors in cancer diagnosis is a first step in improving safety. Additional sites have been recruited to contribute error data to further nationalize this effort. In the second phase of this project, root cause analysis was used to devise error reduction plans that were implemented in all laboratories. Reports of the success and failure of these plans are forthcoming.


  1. Top of page
  2. Abstract
  • 1
    Clary JM, Silverman JF, Liu Y, et al. Cytohistologic discrepancies: a means to improve pathology practice and patient outcomes. Am J Clin Pathol. 2002; 117: 567573.
  • 2
    Raab SS, Nakhleh RE, Ruby SG. Patient safety in anatomic pathology: measuring discrepancy frequencies and causes. Arch Pathol Lab Med. 2005; 129: 459466.
  • 3
    Bruner JM, Inouye L, Fuller GN, Langford LA. Diagnostic discrepancies and their clinical impact in a neuropathology referral practice. Cancer. 1997; 79: 796803.
  • 4
    Furness PN, Lauder I. A questionnaire-based survey of errors in diagnostic histopathology throughout the United Kingdom. J Clin Pathol. 1997; 50: 457460.
  • 5
    Raab SS. Improving patient safety by examining pathology errors. Clin Lab Med. 2004; 24: 849863.
  • 6
    Whitehead ME, Fitzwater JF, Lindley SK, Kern SB, Ulirsch RC, Winecoff WF 3rd. Quality assurance of histopathologic diagnoses: a prospective audit of three thousand cases. Am J Clin Pathol. 1984; 81: 487491.
  • 7
    Zardawi IM, Bennett G, Jain S, Brown M. Internal quality assurance activities of a surgical pathology department in an Australian teaching hospital. J Clin Pathol. 1998; 51: 695699.
  • 8
    Zuk JA, Kenyon WE, Myskow MW. Audit in histopathology: description of an internal quality assessment scheme with analysis of preliminary results. J Clin Pathol. 1991; 44: 1015.
  • 9
    Wakely SL, Baxendine-Jones JA, Gallagher PJ, Mullee M, Pickering R. Aberrant diagnoses by individual surgical pathologists. Am J Surg Pathol. 1998; 22: 7782.
  • 10
    Safrin RE, Bark CJ. Surgical pathology signout. Routine review of every case by a second pathologist. Am J Surg Pathol. 1993; 17: 11901192.
  • 11
    Renshaw AA. Measuring and reporting errors in surgical pathology. Lessons from gynecologic cytology. Am J Clin Pathol. 2001; 115: 338341.
  • 12
    Ramsay AD, Gallagher PJ. Local audit of surgical pathology. 18 month's experience of peer-review-based quality assurance in an English teaching hospital. Am J Surg Pathol. 1992; 16: 476482.
  • 13
    Jones BA, Novis DA. Cervical biopsy-cytology correlation: a College of American Pathologists Q-Probes study of 22,439 correlations in 348 laboratories. Arch Pathol Lab Med. 1996; 120: 523531.
  • 14
    Zarbo RJ, Hoffman GG, Howanitz PJ. Interinstitutional comparison of frozen section consultation: a College of American Pathologists Q-Probes study of 79,647 consultations in 297 North American institutions. Arch Pathol Lab Med. 1991; 115: 11871194.
  • 15
    Department of Health and Human Services, Health Care Financing Administration. Clinical laboratory improvement amendments of 1988: final rule. Federal Register 57, no. 7146 (1992) (codified at 42 CFR §493).
  • 16
    Jones BA, Novis DA. Follow-up of abnormal gynecologic cytology: a College of American Pathologists Q-Probes study of 16,132 cases from 306 laboratories. Arch Pathol Lab Med. 2000; 124: 665671.
  • 17
    Joste NE, Crum CP, Cibas ES. Cytologic/histologic correlation for quality control in cervicovaginal cytology. Experience with 1,582 paired cases. Am J Clin Pathol. 1995; 103: 3234.
  • 18
    KohnLT, CorriganJM, DonaldsonMS, editors. To err is human: building a safer health system. Washington, DC: National Academy Press, 1999.
  • 19
    Battles JB, Kaplan HS, Van der Schaaf TW, Shea CE. The attributes of medical event reporting systems: experience with a prototype medical event reporting system for transfusion medicine. Arch Pathol Lab Med 1998; 122: 231238.
  • 20
    Dovey SM, Meyers, DS, Phillips RL Jr., et al. A preliminary taxonomy of medical errors in family practice. Qual Saf Health Care. 2002; 11: 233238.
  • 21
    Rubin G, George A, Chinn DJ, Richardson C. Errors in general practice: development of an error classification and pilot study of a method for detecting errors. Qual Saf Health Care. 2003; 12: 443447.
  • 22
    Fernald DH, Pace WD, Harris DM, West DR, Main DS, Westfall JM. Event reporting to a primary care patient safety reporting system: a report from the ASIPS collaborative. Ann Fam Med. 2004; 2: 327332.
  • 23
    O'Sullivan JP. Observer variation in gynaecological cytopathology. Cytopathology. 1998; 9: 614.
  • 24
    Llewellyn H. Observer variation, dysplasia grading, and HPV typing: a review. Am J Clin Pathol. 2000; 114: S21S35.
  • 25
    Schlemper RJ, Kato Y, Stolte M. Review of histological classifications of gastrointestinal epithelia neoplasia: differences in diagnosis of early carcinomas between Japanese and Western pathologists. Gastroenterology 2001; 36: 445456.
  • 26
    Carlson GD, Calvanese CB, Kahane H, Epstein JI. Accuracy of biopsy Gleason scores from a large uropathology laboratory: use of a diagnostic protocol to minimize observer variability. Urology. 1998; 51: 525529.
  • 27
    Dalton LW, Page DL, Dupont WD. Histologic grading of breast carcinoma. A reproducibility study. Cancer. 1994; 73: 27652770.
  • 28
    Pap Test. Primary Care Consultants. Available at URL: [accessed February 3, 2005].
  • 29
    Mason DJ. Who says it's an error? Research highlights a disagreement among health care workers. Am J Nurs. 2004; 104: 7.
  • 30
    Birkmeyer JD, Sharp SM, Finlayson SR, Fisher ES, Wennberg JE. Variation profiles of common surgical procedures. Surgery. 1998; 124: 917923.
  • 31
    Garg PP, Landrum MB, Normand SL, et al. Understanding individual and small area variation in the underuse of coronary angiography following acute myocardial infarction. Med Care. 2002; 40: 614626.
  • 32
    Carlisle DM, Valdez RB, Shapiro MF, Brook RH. Geographic variation in rates of selected surgical procedures within Los Angeles County. Health Serv Res. 1995; 30: 2742.
  • 33
    Wrobel JS, Mayfield JA, Reiber GE. Geographic variation of lower-extremity major amputation in individuals with and without diabetes in the Medicare population. Diabetes Care. 2001; 24: 860864.
  • 34
    McPherson K, Wennberg JE, Hovind OB, Clifford P. Small-area variations in the use of common surgical procedures: an international comparison of New England, England, and Norway. N Engl J Med. 1982; 307: 13101314.
  • 35
    Fisher ES, Wennberg JE. Health care quality, geographic variations, and the challenge of supply-sensitive care. Perspect Biol Med. 2003; 46: 6979.
  • 36
    Banja JD. Medical errors and medical narcissism. Boston: Jones & Bartlett Publishers, 2005.
  • 37
    Fahey MT, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. Am J Epidemiol. 1996; 143: 406407.
  • 38
    Nanda K, McCrory DC, Myers ER, et al. Accuracy of the Papanicolaou test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann Intern Med. 2000; 132: 810819.
  • 39
    Page DL, Dupont WD, Jensen RA, Simpson JF. When and to what end do pathologists agree? J Natl Cancer Inst. 1998; 90: 8889.
  • 40
    Schnitt SJ, Connolly JL, Tavassoli FA, et al. Interobserver reproducibility in the diagnosis of ductal proliferative breast lesions using standardized criteria. Am J Surg Pathol. 1992 16: 11331143.