Prospective multicenter assessment of interobserver agreement for radiologist interpretation of multidetector computerized tomographic angiography for pulmonary embolism


  • Presented as an oral abstract at the XXII Congress of the International Society on Thrombosis and Haemostasis, Tuesday 14 July 2009.

Jeffrey A. Kline, Department of Emergency Medicine, 1000 Blythe Boulevard, MEB 3rd floor, Room 306, Charlotte, NC 28203, USA.
Tel.: +1 704 355 3658; fax: +1 704 355 7047.


Summary. Background: Emergency physicians rely on the interpretation of radiologists to diagnose and exclude pulmonary embolism (PE) on the basis of computerized tomographic pulmonary angiography (CTPA). Few data exist regarding the interobserver reliability of this endpoint. Objective: To quantify the degree of agreement in CTPA interpretation between four academic hospitals and an independent reference reading (IRR) laboratory. Methods: Hospitalized and emergency department patients who had one predefined symptom and sign of PE and underwent 64-slice CTPA were enrolled from four academic hospitals. CTPA results as interpreted by board-certified radiologists from the hospitals were compared against those from the IRR laboratory. CTPAs were read as indeterminate, PE or PE+, and percentage obstruction was computed by the IRR laboratory, using a published method. Agreement was calculated with weighted Cohen’s kappa. Results: We enrolled 492 subjects (63% female, age 54 ± 1 years, and 16.7% PE+ at the site hospitals). Overall agreement was 429/492 (87.2%; 95% confidence interval 83.9–90.0). We observed 13 cases (2.6%) of complete discordance, where one reading was PE+ and the other reading was PE. Weighted agreement was 92.3%, with kappa = 0.75. The median percentage obstruction for all patients was 9% (25th–75th percentile interquartile range: 5% to − 30%). For CTPAs interpreted at the site hospitals as PE or indeterminate but read as PE+ by the IRR laboratory, the median of percentage obstruction was 6% (4–7%). Conclusion: We found in this sample a good level of agreement, with a weighted kappa of 0.75, but with 2.6% of patients having total discordance. Overall, a large proportion of clots were distal or minimally occlusive clots.


The two most common symptoms associated with pulmonary embolism (PE), shortness of breath and chest pain, are responsible for over 10 million patient visits annually to US emergency departments (EDs) [1]. The mainstay of diagnostic confirmation of PE is computerized tomographic pulmonary angiography (CTPA). Extrapolating from data obtained in a large multicenter diagnostic study performed in 12 US EDs [2], it can be estimated that approximately 1–2% of all ED patients undergo CTPA for evaluation of PE. As a result, CTPA is one of the most commonly ordered computed tomography (CT) scans in the ED setting.

Part of the motivating force behind the high rate of CTPA ordering may be the perception that CTPA reliably produces a binary output: positive or negative. This is in contrast to the graded output from ventilation/perfusion scintigraphy [3]. Another advantage of CTPA has been its high sensitivity and perceived accuracy in not generating false-negative reports.

Determinates of overall diagnostic test performance include accuracy and precision. The accuracy of CTPA for PE has been widely reported, with results from clinical trials [4,5], as well as one meta-analysis [6], suggesting that when pretest probability is also taken into account, CTPA is unlikely to result in false-negative diagnosis.

However, the precision of CTPA has been less well examined. Prior studies have typically compared resident-fellow readings with attending readings [7,8], used older-generation scanners (< 16-slice) [9–11], or did not quantify percentage obstruction in cases of disagreement. Quantifying the reliability of CTPA has clinical importance. A radiologic interpretation of a filling defect in the pulmonary arteries almost always establishes the diagnosis of PE, and commits a patient to extended risk of anticoagulation, and the associated economic and psychological consequences of being labeled as having had PE.

The objectives of this prospective study were to: (i) quantify interobserver agreement for CTPA in acutely symptomatic patients; (ii) quantify the overall percentage obstruction seen on CTPA; and (iii) describe the clinical and radiographic characteristics, as well as degree of obstruction, for cases of disagreement.

Materials and methods

Study design

This was a primary analysis of an Institutional Review Board-approved Food and Drug Administration-regulated pivotal trial performed to evaluate the safety and efficacy of a handheld device that analyzes exhaled breath to aid in the diagnosis of PE (BreathQuant BreathScreen PE, New York, NY, USA).

Study setting and population

Prospective enrollment occurred in four academic medical centers in the USA (Carolinas Medical Center, Charlotte, NC; Northwestern University, Feinberg School of Medicine, Northwestern Memorial Hospital, Chicago, IL, USA; Wake Forest University, Baptist Hospital, Winston-Salem, NC; and Baystate Hospital, Springfield, MA, USA). All patients underwent CTPA, the result of which was interpreted both at the site as standard care, and at a radiographic reference laboratory staffed by board-certified radiologists blinded to patient characteristics and outcome.

Inclusion criteria required that the enroller confirm from source documentation that patients were > 17 years of age and had at least one sign of PE and symptom of PE as defined on the data collection form (Table 1). All patients had to provide written informed consent. Patients were excluded if they were unlikely to provide follow-up (imprisonment, homelessness, no telephone, history of non-compliance) or if they were pregnant, hemodynamically unstable, intubated, unable to breathe through their mouth, had undergone fibrinolytic treatment within 48 h, had PE diagnosed within the last 6 months and were on anticoagulation, or had known active tuberculosis.

Table 1.   Inclusion criteria for signs, symptoms and risk factors for pulmonary embolism (PE)
Signs or symptoms of PE (at least one required for inclusion)
 New-onset dyspneaPulse ≥ 90 beats min–1
 Dyspnea worse than baselineDizziness
 Pleuritic chest painConfusion/altered mental status
 Upper abdominal painRespiratory rate > 20 breaths min–1
 Upper back painCough
 SyncopeObservation of unilateral limb swelling
 Near syncopeAny pulse oximetry reading < 95%
Risk factors for PE (at least one required for inclusion)
 Age > 49 yearsChronic neuromuscular disease with immobility
 Surgery (within previous 4 weeks requiring general  endotracheal anesthesiaBody mass index > 36 kg m−2
 Bed rest or hospitalization > 48 hStroke, myocardial infarction or arterial embolism within previous 30 days
 Current hospitalization for > 11 h for traumaCongestive heart failure
 Trauma requiring hospitalization within previous 2 weeksActive intravenous recreational drug use
 Personal history of thrombophiliaCurrent deep vein thombosis diagnosed within past 3 months without known PE
 Active malignancy (currently under the care of an  oncologist for treatment)Active connective tissue disease (lupus, mixed connective tissue disease, scleroderma)
 Any exogenous estrogen useFocal infection requiring hospitalization or observation
 Postpartum status (within past 2 weeks)Indwelling deep vein catheter or port (excludes pacemaker wire)
 Immobilization of an ankle, knee, hip or shoulder  for > 48 h within past 7 daysHemodialysis-dependent renal failure (within past 2 weeks)
 Paralysis of one or more limbs from prior stroke  or spinal cord injury 

Study protocol

Patients were enrolled from both ED, ward and intensive care settings to maintain a study-mandated ED/inpatient balance of 50 : 50. Study data collection had to be completed within 24 h of CTPA completion. General demographic information, clinical characteristics and past medical history elements were abstracted into a preformed, written, 17-page template.

CTPA images were obtained at each site as part of standard care, and were obtained on 64-slice multidetector equipment with ≤ 2.5-mm collimation. Intravenous contrast medium was given to all patients according to local protocol, using a computer-controlled mechanized timing injector in all cases. Images were obtained using energy, pitch and rotation settings as required for the patient’s body habitus. All patients had reconstructions that included transverse, coronal and sagittal views. Images were converted to a digital file using a digital imaging and communications in medicine format, devoid of any annotations or protected health information, and transferred to the independent reference reading (IRR) laboratory (Medical Metrics Inc., Houston, TX, USA).


The IRR laboratory is a private, for-profit company; the interpretations of the IRR radiologists were compensated on a contract basis, and their interpretations were not used for clinical care. Each CTPA interpretation from the IRR laboratory was performed by one of three board-certified radiologists who had completed an Accreditation Council for Graduate Medical Education-accredited residency in radiology and a fellowship in cardiothoracic radiology. The IRR radiologist evaluated the complete image sets, which included reconstructions, and completed a case report form that included the interpretation of ‘No PE’, ‘Positive for acute PE,’‘Positive for chronic PE,’ and ‘Positive for other finding.’ All scans read as positive for PE were further evaluated for the location of the filling defect and the percentage obstruction of the vessel(s), using an explicit data collection format to facilitate computation of the percentage of total pulmonary vascular occlusion with the method of Mastora et al. [12]. The IRR radiologist could also deem a scan to be ‘Indeterminate’ because of degraded image quality. The reason for the degraded quality had to be reported. For example, it was decided a priori that reasons for indeterminate status would include poor opacification, defined as < 200 Hounsfield Units in the pulmonary trunk, or excessive motion artefact. For the pivotal trial report, the IRR laboratory interpretations will serve as the criterion standard. These interpretations were compared with the interpretations made at the enrolling site.

Follow-up occurred at 45 days after enrollment and included a telephone questionnaire and structured review of the medical record. Study duration was the index visit with follow-up through 45 days. Follow-up was targeted to determine any deaths, any adverse clinical events in general, and any imaging or diagnosis of new PE or deep vein thrombosis (DVT).

Data analysis

The aim of this study was to characterize agreement between the clinical site reading of the CTPA and the IRR laboratory reading of the same images. We expected that some of both the clinical site CT readings and the IRR laboratory readings would include indeterminate scans. Prior to data analysis, we constructed 3 × 3 tables that were subsequently populated with outcome possibilities, which were positive, negative or indeterminate for acute PE. We weighted agreement between positives and negatives as 1.0, disagreement between positives and negatives as 0.0, and disagreement between positives/negatives and indeterminates as 0.5 and computed a weighted kappa. Other statistics are presented as proportions with 95% confidence intervals (CIs), means with standard deviations for normally distributed data, and medians with interquartile range (IQR) for non-normally distributed data. stata version 10.0 (College Station, TX, USA) was used for data analysis. The sample size of 500 was estimated to narrow CIs for selected diagnostic indexes for the test devise (BreathScreen PE), which is not the subject of this report.


Four hundred and ninety-eight patients met the inclusion criteria for the study. Five patients withdrew from the study and did not have further information available for analysis, and one patient did not have a subsequent CT, leaving a total of 492 enrolled with CTPA performed and interpreted at the clinical site as well as at the central radiology reading site. General demographic characteristics and clinical characteristics of patients are shown in Table 2. Patients had acute symptoms; over 75% had either new dyspnea or worsened chronic dyspnea, over half were tachycardic, over one-third had hypoxemia, and approximately 10% had prominent risk factors such as recent surgery, active cancer or estrogen use. The sample included approximately half inpatient and half ED patients, and was diverse with respect to race. The incidence of acute PE as diagnosed by the clinical site was 16.7% (13.4–20.0).

Table 2.   Clinical characteristics of included patients
  1. DVT, deep vein thrombosis; PE, pulmonary embolism.

General characteristics
 Mean age (years)54.1
 Female (%)63.0
 Hispanic or Latino ethnicity (%)4.5
 White (%)69.1
 Black or African American (%)29.3
 Native American or Alaskan native (%)0.8
 Native Hawaiian or Pacific Islander (%)0.4
 Asian (%)0.4
 Enrolled in the emergency department (%)48.4
 Enrolled as an inpatient (%)51.6
Signs and symptoms of PE (%)
 New-onset dyspnea58.9
 Pulse ≥ 90 beats min–158.5
 Substernal chest pain42.1
 Respiratory rate > 20 breaths min–139.6
 Pulse oxygenation < 95%37.4
 Increased chronic dyspnea15.6
Risk factors for PE (%)
 Age > 49 years67.7
 Body mass index > 36 kg m−225.0
 Bed rest or hospitalization > 48 h16.7
 Previous surgery within 4 weeks15.4
 Active malignancy13.8
 Estrogen use10.2
 Indwelling deep venous catheter9.3
Patient history of PE or DVT (%)
 PE on current treatment1.4
 DVT on current treatment4.3
 PE or DVT not on current treatment1.4

The main findings are presented in Table 3, a 3 × 3 table reporting results of readings by both the clinical site and the IRR laboratory for acute PE. Overall agreement was 429/492 (87.2%; 95% CI 83.9–90.0). We observed 13 cases (2.6%) of complete discordance, where one reading was positive and the other reading was negative. However, significantly more disagreement was observed regarding the adequacy of image quality. At the enrolling site, radiologists deemed only 10 scans (2.1%) to be indeterminate. In contrast, the IRR laboratory deemed 52 scans (10.6%) to be indeterminate. The 95% CI for this difference of 8.5% was 6–11%, suggesting that the IRR laboratory was significantly more likely to interpret CTPA images as inadequate than were the site radiologists. Figure 1 is a representative image from a patient read as indeterminate by the IRR laboratory. The weighted agreement was 92.3%, which takes into account a reduced penalty when one interpretation is positive or negative and the other interpretation is indeterminate. The weighted kappa was 0.754, with a standard error of 0.041.

Table 3.   3 × 3 table of interobserver agreement for computed tomography scan interpretation
Site readingIndependent reference reading
Figure 1.

 Computerized tomographic pulmonary angiograph scan read as negative at the clinical site, and indeterminate at the independent reference reading laboratory. Hounsfield Units were calculated as 187.6, indicating inadequate opacification of the pulmonary artery trunk.

Negative at the clinical site but positive or indeterminate at the IRR laboratory

Of the 400 scans read as negative at the clinical site, six were read as positive by the IRR laboratory (1.5%), and 43/400 were interpreted as indeterminate by the IRR laboratory (10.8%). At the time of discharge from the hospital, four of the six patients read as positive by the IRR laboratory and six of the 43 patients read as indeterminate by the IRR laboratory were prescribed anticoagulation. Of this total group of patients read as negative at the clinical site, but positive or indeterminate by the IRR laboratory, six were diagnosed with DVT and treated with anticoagulation in the subsequent period after the CT scan. Four of the total patients read as negative for PE at the clinical site died during the 45-day follow-up period (two from cancer, one from myocardial infarction, and one from liver failure). Figure 2 demonstrates imaging from a patient read as negative at the clinical site but positive by the IRR laboratory.

Figure 2.

 Representative slice of a computerized tomographic pulmonary angiography scan read as negative at the clinical site, and positive at the independent reference reading laboratory. The arrow marks the vessel with a possible filling defect that was interpreted differently between sites.

Positive at the clinical site but negative or indeterminate at the IRR laboratory

Seven of the 82 CTPA scans read as positive at the clinical sites were read as negative at the IRR laboratory (8.5%), and 3/82 were read as indeterminate (3.7%). In aggregate, 12.2% of the scans read as positive at the clinical site were subsequently read as negative or indeterminate upon second read by an independent reference radiologist (95% CI 6.0–21.3). All 10 of these patients were discharged on warfarin, six were still being treated at the time of follow-up, and none had reported bleeding events. Figure 3 demonstrates imaging from a patient read as positive at the clinical site but negative at the IRR laboratory.

Figure 3.

 Representative slice of a computerized tomographic pulmonary angiography scan read as positive at the clinical site, and negative at the independent reference reading laboratory. The arrow marks the vessel with a possible filling defect that was interpreted differently between sites.

Percentage obstruction

We investigated the potential relationship between the size of the filling defect and the discordant readings by using the percentage pulmonary vascular occlusion reported by the IRR laboratory as a basis for the size. The clinical sites did not report percentage obstruction data. The median percentage obstruction for all patients was 9.0% (25th–75th percentile IQR: 5.0–30.0%) (Fig. 4). For the patients interpreted at the clinical sites as negative (n = 6) or indeterminate (n = 1) but read as positive for PE by the IRR laboratory, the median (range) of percentage obstruction was 5.6% (3.8–6.7%). Despite having CTPA scans at the site that were not diagnostic of acute PE, five of these seven patients were discharged on warfarin for either new DVT (n = 2) or standard indications such as atrial fibrillation (n = 3). Only one patient with discordance had > 5% obstruction (6.7%).

Figure 4.

 Histogram of percentage obstruction among pulmonary embolism PE-positive patients.

We then examined the PE incidence, PE size, and rate of indeterminate scans, stratified by patient location (inpatient vs. ED). The incidence of PE as diagnosed at the clinical site differed between inpatients (22.8%) and ED patients (10.1%) (95% CI for difference, 6.3–19.2). Inpatients had a larger but non-significantly different median percentage obstruction than ED patients (median 12.0% vs. 7.5%, P = 0.8 by Mann–Whitney test). There was no significant difference in the rate of indeterminate scans between inpatients (11.0%) and ED patients (10.1%).


We report, for the first time, a large, multicenter sample of CTPA scans obtained using 64-slice technology, interpreted by independent radiologists, with one group independent of clinical care. The observed weighted kappa was 0.75, which is typically considered to be a good level of agreement [13]. However, we found an overall 2.6% binary discordance rate. When disagreement occurred and the IRR laboratory detected acute PE contrary to the clinical site read, these were very small clots. One such case had a percentage obstruction of 6.6%; all others were < 5%. Discordances appeared to occur primarily over the decision to interpret apparently small filling defects as positive for PE, and over the adequacy of image quality.

The initial literature, in 1999–2001, reported higher interobserver agreement for single-slice CTPA, with kappa values consistently above 0.8 [9,10,14]. Subsequent work performed in 2003–2004 with images from four-slice CT resulted in lower kappa values, from 0.71 [11] to 0.80 [7]. Our work continues to demonstrate this trend, which is probably due to the rise in number of detectors in CT machines, with resultant increased image acquisition speed and reduction of motion artefact. With these advances, imaging beyond the segmental level has become possible, but perhaps not reliable. Work by Patel et al. [11]supported this idea by demonstrating a stepwise reduction in kappa values when segmental and subsegmental PE were compared. Our data suggest that almost one quarter of all PE cases diagnosed overall were limited to the subsegmental vessels, with 5% or less of vascular obstruction.

This work adds to prior work demonstrating a significant rate of indeterminate scans, and questioning the notion that CTPA always produces a binary, positive or negative, diagnostic result. The PIOPED II study, published in 2006, found that of the patients who received an adjudicated diagnosis of presence or absence of PE, 51/824 (6%) had scans that were non-interpretable. We reported a higher percentage of patients with an indeterminate scan (11%), but also employed a prespecified protocol for determining inadequate opacification of the pulmonary vasculature, and blinded the IRR radiologists from clinical information. Both factors may have contributed to our finding of a higher proportion of indeterminate scans in the IRR interpretations than in the clinical site readings. This finding may be an important consideration in the design of future research using CTPA as the reference standard to determine the presence or absence of PE in clinical trials.

These findings have clinical significance. Had the results of the IRR laboratory been used for clinical care, between 2% and 3% of patients would have had different diagnostic and treatment decisions from those at the clinical site. If 1.5% of all 115 million ED patients undergo CTPA in the USA each year [1,2], it can be estimated annually in the US alone several thousand patients may be told that they have PE or do not have PE on the basis of specious evidence.

There is no consensus on the clinical significance of isolated subsegmental PE, but our work indicates that this is a common finding. We used a standardized measurement of vascular obstruction, and found a median percentage obstruction for all patients of only 9.0%, with three-quarters having obstruction < 30%. This skewed distribution, towards the low end of the percentage obstruction range, has not been previously demonstrated. This may, in part, be due to our exclusion of intubated patients or patients in shock, but it may also be a function of high CTPA resolution allowing visualization of a proportionally larger number of small clots.

Finally, this study, in which over 10% of scans read as positive at the clinical site were subsequently read as negative or indeterminate at the IRR laboratory, raises questions about the use of CT as a stand-alone diagnostic test in all situations. It suggests that situations of low pretest probability and positive CT findings may require further imaging with duplex Doppler or ventilation/perfusion, or re-examination of CT images prior to commitment to 3–6 months of anticoagulation. Also, patients with high pretest probability and negative CT findings could be false-negatives, and additional imaging may be helpful in these cases. Although this may not need to be performed in the ED, physicians should be aware of the diagnostic properties of CT in conjunction with pretest probability, and advocate that admitted patients have subsequent testing as indicated.


There are several limitations to our work that must be considered. This was an observational study in which physicians were allowed to base treatment decisions on clinical site CT findings. Subsequent imaging and anticoagulation were not standardized per the protocol. This, in conjunction with the fact that there was no third-party reference standard, prevents knowledge of which reading was ‘right’. Furthermore, the study was not powered to detect events such as bleeds resulting from anticoagulation in patients with clinical site CT scans read as positive that were subsequently read as negative or indeterminate at the IRR laboratory. For this reason, the precise clinical consequences of the CT disagreement observed are uncertain. Also, the site radiologists were blinded to the study. If they were not, it is possible that they would have provided more conservative readings, particularly with respect to CT quality and increased resultant indeterminate scans. This would have perhaps resulted in agreement that was higher than that measured. However, because of the non-interventional nature of the study, we preserved the standard clinical process as much as possible, and in doing so also maximized the ability to generalize our results to the actual patient care setting outside of a research environment. Despite these limitations, this work is unique because of the inclusion of both inpatients and ED patients, a racially diverse sample, a rigorous methodology to determine the nature of CT interpretation, and the utilization of 64-slice CT technology.


As with any diagnostic modality, CTPA for PE is subject to some degree of imprecision. We found, in this sample of patients with acute symptoms requiring 64-slice CTPA, a good level of agreement, with a kappa of 0.75 but with 2.6% of patients having total discordance. A large proportion of clots that were visualized, overall as well as in cases of disagreement, were small. Further work is needed to determine the optimal accurate radiographic and clinical level of diagnosis, as well as the significance of isolated subsegmental PE.

Disclosure of Conflict of Interests

This work was supported by Grant K23HL077(-04 and -05) NHLBI, R42 HL086316-01 and BreathQuant Medical LLC.