A related commentary appears in the June issue.
Systematic Review of Emergency Physician–performed Ultrasonography for Lower-Extremity Deep Vein Thrombosis
Article first published online: 14 APR 2008
© 2008 by the Society for Academic Emergency Medicine
Academic Emergency Medicine
Volume 15, Issue 6, pages 493–498, June 2008
How to Cite
Burnside, P. R., Brown, M. D. and Kline, J. A. (2008), Systematic Review of Emergency Physician–performed Ultrasonography for Lower-Extremity Deep Vein Thrombosis. Academic Emergency Medicine, 15: 493–498. doi: 10.1111/j.1553-2712.2008.00101.x
- Issue published online: 14 APR 2008
- Article first published online: 14 APR 2008
- Received December 3, 2007; revision received January 18, 2008; accepted January 25, 2008.
- deep vein thrombosis;
- systematic review;
- emergency department
Objectives: The authors performed a systematic review to evaluate published literature on diagnostic performance of emergency physician–performed ultrasonography (EPPU) for the diagnosis and exclusion of deep venous thrombosis (DVT).
Methods: Structured search criteria were used to query MEDLINE and EMBASE, followed by a hand search of published bibliographies. Relevance and inclusion criteria required prospective investigation of emergency department (ED) outpatients with suspected DVT; diagnostic evaluations had to consist of EPPU followed by criterion standard (radiology-performed) imaging. Two authors independently extracted data from included studies; study quality was assessed utilizing a validated tool for quality assessment of diagnostic accuracy studies (QUADAS). Pooled data were analyzed using an unweighted summary receiver-operating-characteristic (SROC) curve; sensitivity and specificity were estimated using a random effects model.
Results: The initial search yielded 1,162 publications. Relevance screening and selection yielded six articles including 936 patients. Four of the six studies reported adequate blinding but a number of other methodologic flaws were identified. A random effects model yielded an overall sensitivity of 0.95 (95% confidence interval [CI] = 0.87 to 0.99) and specificity of 0.96 (95% CI = 0.87 to 0.99).
Conclusions: Systematic review of six studies suggests that EPPU may be accurate for the diagnosis of DVT compared with radiology-performed ultrasound (US). However, given the methodologic limitations identified among the primary studies, the estimates of diagnostic test performance may be overly optimistic. Further research into EPPU for suspected DVT is needed before it can be adopted into routine clinical practice.
Deep venous thrombosis (DVT) causes death and disability in ambulatory patient populations.1,2 Approximately 200,000 outpatients each year are diagnosed with DVT, and experts estimate that many more outpatients have DVT that is not diagnosed.2 At least one-third of patients with untreated DVT also experience clinically significant pulmonary embolism (PE), and the short-term mortality rate from untreated PE probably exceeds 20%.3 Emergency medicine (EM) practitioners have become accustomed to using the D-dimer assay as a method to exclude DVT; however, the D-dimer only allows exclusion of DVT in fewer than one-half of patients with suspected DVT, and the D-dimer cannot confirm the diagnosis of DVT.4,5 Venous ultrasonography remains the main comprehensive modality to evaluate for DVT, but at present, venous ultrasonography requires significant time, patient transport, and availability of a medical sonographer and radiologist to perform and interpret.
The majority of EM residencies in the United States train their residents to use ultrasound (US) in the emergency department (ED).6 Several original research reports have focused on emergency physician–performed US (EPPU) to evaluate for lower-extremity DVT.7 Hence, we pursued the following research question: What is the accuracy of EPPU for suspected lower-extremity DVT?
Two physicians (PRB, JAK) independently performed structured searches of MEDLINE, and a librarian searched EMBASE, including in-process and other nonindexed citations (January 1988 to December 2007). The Medical Subject Headings (MeSH) string utilized was (ultrasonography) AND (venous thrombosis) with both terms exploded followed by free text searches including the terms “ultrasound” and “venous thrombosis.” The search was limited to English language and human subjects.
We also pursued data from research studies that were not published as full-length articles. We conducted online bibliographic searches of abstract submissions to Academic Emergency Medicine and Annals of Emergency Medicine from January 1994 to July 2007. The query string used to search both publications was “ultrasound” AND “venous thrombosis.” Additionally, we read all bibliographies of studies that passed relevance screening and attempted personal communication with authors to further pursue unpublished studies.
Two independent reviewers read all abstracts for relevance. Criteria for relevance required all of the following: 1) original research reports of ED patients, 2) patient population with signs and symptoms suggestive of DVT, 3) venous US performed by nonradiology personnel, and 4) second venous US performed by a radiology department or vascular laboratory.
We obtained and reviewed full-length reprints of all studies that met relevance criteria for inclusion. For inclusion, studies had to report on a prospective sample of predominantly outpatients (>50%). Patients could self-refer or could have been referred from another medical facility. We specifically sought to exclude studies performed solely for DVT surveillance in high-risk populations; hence, studies had to specify that patients manifested clinical findings suggestive of DVT as the basis for enrollment. The “diagnostic test” had to be an US of one or both legs, performed by an emergency physician (EP). The criterion standard required a second US to be performed by a US technician with images interpreted by a radiologist or vascular physician sonographer.
Following the relevance search, two reviewers (JAK, PRB) compared exclusion logs for discordance, reaching consensus by conference. Systematic data extraction was completed via a predesigned data collection sheet (available as an online Data Supplement at <link removed as it refers to this location>) for studies meeting relevance screen and inclusion criteria. Authors were individually contacted as needed for data and inclusion criteria clarification. If it was determined after full article review and clarification from the authors that a study was retrospective, the study was excluded. Two reviewers (JAK, MDB) independently confirmed numeric calculations.
We elected to include higher quality studies by using a few key quality measures as inclusion criteria; as described under “Inclusion Criteria,” studies had to include an appropriate patient spectrum and use an acceptable reference standard.8 We then graded each study based on adequate blinding. Grade A defined prospective studies in which the EP US performer was blinded to the criterion standard. Grade B defined studies wherein blinding measures were not explicitly stated or not performed. A recently validated tool for quality assessment of diagnostic accuracy studies (QUADAS)8,9 was subsequently applied by two independent observers. Final agreement was reached by consensus regarding potential study limitations.
Agreement between reviewers was assessed with Cohen’s kappa (κ). Study US performance was assessed using summary receiver-operating-characteristic (SROC) curve analysis, pooled diagnostic odds ratios (DORs), and pooled sensitivity and specificity values.10 The SROC curve analysis was based on an unweighted least-squares regression model, which has been fully described previously.10–14 A random effects model was used to pool estimates; a correction factor of 0.05 was added to each cell. The SROC curve analysis was performed using Meta-Test (Version 0.9, Tufts-New England Medical Center, Boston, MA) and dr-ROC (Version 2.0, Diagnostic Research Design & Reporting, Glenside, PA) software. Given the questionable validity of using funnel plots or statistical models to detect publication bias for diagnostic test meta-analysis, we did not formally test for the presence of publication bias.15,16
Figure 1 shows the results of the search and article selection. The Medline search yielded 1,162 titles. After screening the abstracts for relevance criteria, 1,156 were excluded from further review. Most of these were publications of traditional US performed by radiology. Five studies had insufficient abstract information, namely, the specification of the training of the study sonographers. The listed corresponding authors of these five studies were contacted via e-mail and/or telephone, but none responded. Thirteen publications reported that the US was performed by EPs; 7 of these 13 were rejected for reasons stated in Table 1. Six articles remained eligible for SROC curve analysis (Figure 1). Search of EMBASE yielded the 6 identified articles, and no further publications were eligible by relevance criteria. Comparison of agreement between two independent reviewers for the relevance screen results yielded a Cohen’s κ = 0.65 (95% confidence interval [CI] = 0.50 to 0.78).
|Unable to discern US study operator or no author response||5|
|Comments/letters to the editor||2|
|US use consisted of Doppler stethoscope||1|
|Clinical policy statements/practice guideline||2|
|Study sonographer was “radiology-based” sonographer||1,144|
Search of published non–full-length articles (i.e., abstracts) yielded eight potential studies. Two abstracts were part of the above full publication articles (eligible for full review), three abstracts met the relevance criteria, and three abstracts had insufficient information to ascertain study methods. These three authors did not respond to repeated e-mail inquiries regarding their methods. Personal communications with experts in the emergency US realm did not yield studies beyond those identified in the aforementioned strategies. After correspondence with the authors of the three abstracts that passed the relevance screen, quality assessment revealed retrospective designs; these articles were thus excluded.
Table 2 shows quality scoring for each of the six included studies17–22 and summaries of extracted data, with four of the studies reporting adequate blinding (Grade A). Also within Table 2 are potential limitations identified using the QUADAS tool. Given the presence of statistical heterogeneity and the imprecision of all these estimates, caution must be taken with interpretation.
|Study (year)||DVT+/sample size (%)||Sensitivity (95% CI)||Specificity (95% CI)||Quality Grade||Potential Limitations* (QUADAS)|
|Blaivas (2000)17||33/112 (30)||1.0 (0.89, 1.0)||0.99 (0.92, 1.0)||A||a, f, g|
|Frazee (2001)18||18/76 (24)||0.89 (0.64, 0.98)||0.76 (0.63, 0.86)||A||b|
|Jang (2004)19||23/72 (32)||1.0 (0.85, 1.0)||0.92 (0.80, 0.97)||B||c, d, e, f, h|
|Theodoro (2004)20||32/156 (21)||1.0 (0.89, 1.0)||0.98 (0.94, 1.0)||A||f|
|Jacoby (2007)21||9/121 (7)||0.89 (0.51, 0.99)||0.97 (0.92, 0.99)||A||a, b, i, j|
|Magazzini (2007)22||72/399 (18)||1.0 (0.95, 1.0)||0.98 (0.96, 0.99)||B||b, h|
|Pooled results†||132/936 (14)||0.95 (0.87, 0.99)||0.96 (0.87, 0.99)|
The prevalence of DVT within the six studies ranged from 7% to 32%. Gender and age information was provided in only two trials. Authors defined a positive result in all studies as the inability to compress the common femoral vein or popliteal vein as demonstrated by gray scale B-mode ultrasonography. Of the six studies, two included routine color flow and one had discretionary use of color flow and augmentation techniques. Studies that included indeterminate and equivocal findings classified them as a positive test. Within each study, the mean number of EPs who performed test US was 5.3 (range 2–8).
The sensitivity and specificity of each included study were calculated with the 95% CI displayed (Figure 2). The pooled summary estimate using a random effects model produced a sensitivity of 0.95 (95% CI = 0.87 to 0.99) and a specificity of 0.96 (95% CI = 0.87 to 0.99). The β (slope) of 0.33 (95% CI = −2.5 to 3.1) indicated no significant threshold effect. Statistical testing for heterogeneity indicated substantial variability among the results (I2 = 77%) but this variability is difficult to appreciate upon visual inspection of the SROC curve (Figure 3). Although the pooled DOR was 591 (95% CI = 70 to 4,940), caution must be taken with interpretation given the presence of statistical heterogeneity and lack of precision.
We believe that this report is the first systematic review of EPPU for DVT. In view of the fact that most residencies in EM incorporate US as a required curriculum, the results have relevance to the academic EM community. Although our SROC curve analysis suggests the high potential value for EPPU, the estimates are imprecise due to the small sample size. More importantly, we found a number of methodologic issues that raise caution regarding the validity and generalizability of these results. First, the low number of EP sonographers and their high level of expertise raise concern. The ultrasonography skills of EPs at academic medical centers where EM US research is conducted likely exceed the capabilities of most EPs in community practice.23 Second, the lack of details regarding patient enrollment methods and patient clinical characteristics makes it difficult to compare patients enrolled in these studies with other populations. Additionally, our six included studies provided little information about the anatomic location of the DVTs identified and do not address the issue of missed calf vein thrombosis. Current literature demonstrates a growing concern over the clinical significance of calf vein DVT that may not be detected by our criterion standard of radiology-performed US.24–26
In this report, we essentially quantify the discordance between EPPU and formal ultrasonography. We can speculate on the influences that may cause discordance between the results of EPPU for DVT versus radiology technician–performed, radiologist-interpreted venous ultrasonography. These influences might include US equipment, technique, patient location, experience, and the impact of preexisting knowledge of the overall clinical picture held by the EP, which may not be available to the sonographer and radiologist.
Our data show promise for EPPU as a method to evaluate for DVT. However, the limitations in the literature comprising this systematic review point toward the need for further research before widespread adoption of EPPU for DVT.
We performed a comprehensive search but did not include foreign language studies. Although we attempted to contact authors to further clarify and describe data and methods, the response rate was low, which limited our ability to adequately perform data extraction and quality assessment.
The criterion standard we used was a second US. Thus, to some degree, the SROC results represent agreement data, as opposed to a measurement of diagnostic test performance compared against clinical follow-up or an alternative imaging modality. Only two studies included patient follow up, one at 30 days22 and the other at 1 year.19 This lack of clinical follow-up among the majority of included studies may have resulted in significant measurement bias. The number of false-negative studies may have been underestimated, since it is unknown how many patients with DVT may have been missed by both EPPU and the radiology department–based US. Estimates for test sensitivity may have been further biased given that only 25% of the included studies reported that radiology department–based sonographers either were registered diagnostic medical sonographer (RDMS) credentialed or had specifically defined expertise. However, few EP sonographers are RDMS-certified, and the minority have sufficiently documented US examinations to meet the suggested training guidelines.23
The quality assessment of individual studies was limited since none of the six studies provided detailed descriptions of the study populations, and three studies provided no clinical information whatsoever. The limitations of using quality scores have been well described; therefore, the quality grade was only used to provide a summary of the blinding methodology reported in each included publication, rather than attempting to use scores in a weighted regression analysis or sensitivity analysis.1,3,27 Finally, given the low number of EP examiners (range 2–8) among the six studies, the external validity of this systematic review is limited.
We did not use funnel plots or statistical models to detect publication bias since there is a lack of empirical evidence validating the use of these methods for diagnostic test meta-analysis.15,16 Even if we were to use the recently recommended effective sample size funnel plot, the power to detect publication bias is low when there is significant heterogeneity among the DORs.28 If publication bias was present, it would be anticipated that our estimates for test sensitivity and specificity may be inflated.
Based on the results of six studies comprising 936 patients, the overall sensitivity and specificity of EPPU for DVT appear to be excellent. However, we identified a number of potential study biases that suggest the need for a properly designed study that includes a larger numbers of EP ultrasonographers with fully described methods of patient selection and clinical follow-up to assess clinically important outcomes.
Data Supplement S1. Data collection form.
Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.
|ACEM_101_sm_DataSupplementS1.pdf||21K||Supporting info item|
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.