Understanding history, physical examination, and ultrasonography (US) to diagnose extremity fractures compared with radiography has potential benefits of decreasing radiation exposure, costs, and pain and improving emergency department (ED) resource management and triage time.
The authors performed two electronic searches using PubMed and EMBASE databases for studies published between 1965 to 2012 using a strategy based on the inclusion of any patient presenting with extremity injuries suspicious for fracture who had history and physical examination and a separate search for US performed by an emergency physician (EP) with subsequent radiography. The primary outcome was operating characteristics of ED history, physical examination, and US in diagnosing radiologically proven extremity fractures. The methodologic quality of the studies was assessed using the quality assessment of studies of diagnostic accuracy tool (QUADAS-2).
Nine studies met the inclusion criteria for history and physical examination, while eight studies met the inclusion criteria for US. There was significant heterogeneity in the studies that prevented data pooling. Data were organized into subgroups based on anatomic fracture locations, but heterogeneity within the subgroups also prevented data pooling. The prevalence of fracture varied among the studies from 22% to 70%. Upper extremity physical examination tests have positive likelihood ratios (LRs) ranging from 1.2 to infinity and negative LRs ranging from 0 to 0.8. US sensitivities varied between 85% and 100%, specificities varied between 73% and 100%, positive LRs varied between 3.2 and 56.1, and negative LRs varied between 0 and 0.2.
Compared with radiography, EP US is an accurate diagnostic test to rule in or rule out extremity fractures. The diagnostic accuracy for history and physical examination are inconclusive. Future research is needed to understand the accuracy of ED US when combined with history and physical examination for upper and lower extremity fractures.
Certeza Diagnóstica de la Historia Clínica, la Exploración Física y la Ecografía a Pie de Cama para el Diagnóstico de las Fracturas de Extremidades en el Servicio de Urgencias: Una Revisión Sistemática
El conocimiento de la historia clínica, la exploración física y la ecografía para diagnosticar las fracturas de extremidades tiene beneficios potenciales en comparación con la radiografía ya que disminuye la exposición a la radiación, los costes y el dolor y mejora el manejo de los recursos del servicio de urgencias (SU) y el tiempo del triaje.
Se realizaron dos búsquedas electrónicas mediante las bases de datos PubMed y EMBASE para estudios publicados entre 1965 y 2012 mediante una estrategia basada en la inclusión de cualquier paciente que presentara lesiones en las extremidades sospechosas de fractura y que tuviera historia clínica y exploración física, y una búsqueda distinta para ecografías realizadas por un urgenciólogo con la confirmación posterior mediante radiografía. El resultado principal fue las características operacionales de la historia clínica, el examen físico, y la ecografía en el diagnóstico de fracturas de extremidades demostradas radiológicamente. Se valoró la calidad metodológica de los estudios mediante la escala Quality Assessment of Studies of Diagnostic Accuracy Tool (QUADAS-2).
Nueve estudios cumplieron los criterios de inclusión de historia clínica y exploración física, mientras que ocho cumplieron los criterios de inclusión de ecografía. Existía una heterogeneidad significativa en los estudios que evitaba la agrupación de datos. Los datos se organizaron en subgrupos basados en las localizaciones anatómicas de la fractura, pero la heterogeneidad entre los subgrupos también evitaba la agrupación de datos. La prevalencia de fractura osciló del 22% al 70% en los estudios. Las pruebas de exploración física en la extremidad superior tienen una razón de probabilidad positiva del 1,2 al infinito, y una razón de probabilidad negativa del 0 al 0,8. La sensibilidad de la ecografía varió entre el 85% y el 100%, y la especificidad osciló entre el 73% y el 100%, y la razón de probabilidad positiva varió entre el 3,2 y el 56,1, y la razón de probabilidad negativa entre el 0 y el 0,2.
En comparación con la radiografía, la ecografía a pie de cama realizada por urgenciólogos es una prueba diagnóstica de certeza para confirmar o descartar las fracturas de extremidades. La certeza diagnóstica para la historia clínica y la exploración física son no concluyentes. Son necesarias futuras investigaciones para comprender la certeza de la ecografía a pie de cama en los SU cuando es combinada con la historia clínica y la exploración física para las fracturas de las extremidades superiores e inferiores.
Extremity injuries are among the most frequent reasons for visiting the emergency department (ED). These injuries are significant, comprising over 42.4 million injury-related visits to the ED in 2006. The typical work-up of the injured patient generally involves a medical provider obtaining a history and physical examination, often followed by radiologic imaging. However, many times the radiologic imaging may be negative or inconclusive, which calls to question whether the imaging contributed to the management or outcome of the patient. Studies have shown that often the imaging obtained is unnecessary and results in radiation exposure to patients and increased ED wait times.
There is a low rate of positive radiography when assessing for fractures as evidenced by a retrospective review by Bentohami et al., in which only 50% of upper extremity x-rays showed fractures, and another study by Heyworth, which showed 15% of patients with ankle injuries had documented fractures on x-ray. In the study by Stiell et al., patients with ankle injuries had midfoot fracture rates of 4.3%, and 9.3% had malleolar fractures. Therefore, 50% to 95% of extremity x-rays can be avoided without missing fractures. The challenge is to identify which 50% to 95% of extremity injuries can be managed without x-rays. This has led to the search for clinical decision rules (CDRs) to safely reduce radiographic imaging for suspected extremity fractures.
According to the Society for Academic Emergency Medicine (SAEM) consensus workshop “Guideline Implementation and Clinical Pathways” in 2007, there is a knowledge translation gap that is created when the care patients receive differs from the evidence that is available for that disease process and pathology. According to McGinn et al., CDRs also inform our clinical judgment and change clinical behavior while maintaining quality of care, patient safety, and cost savings. Without CDRs, clinicians rely on their intuitive sense that develops through patient encounters. Examples of already well-accepted and utilized CDRs in extremity trauma evaluation include the Ottawa ankle and knee rules. These studies try to define which historical and physical examination findings are independently associated with fractures, in which case radiography is recommended.
There is also an economic incentive to utilize CDRs when ordering radiography for extremity fractures. Nichol et al. studied the economic effect of the implementation of the Ottawa knee rules. He was able to show that use of the rule was associated with meaningful reductions in health care costs of approximately 31 to 34 million dollars annually in the United States, without reducing the quality of care. The Ottawa ankle rules have also been shown to decrease the ED length of stay, reduce the number of radiographs ordered, and allow for faster diagnosis, without negatively affecting satisfaction with the care provided. Reduced time spent in the ED can also have a positive economic effect.
Emergency medicine (EM) has also evolved since the first description of ankle extremity trauma CDRs. Emergency physicians (EPs) are now routinely using bedside ultrasound (US) to complement the traditional physical examination of patients. “Point-of-care ultrasound” has been defined by Moore and Copel as “ultrasonography brought to the patient and performed by the provider in real time.” This concept has been described as a paradigm shift of the physical examination. As technology has evolved through smaller machines and increased portability, it is now possible to have an US machine on hand, available to aid in goal-oriented physical examinations. With this advancement there must be rigorous evaluation to understand the ability of EPs to accurately use this technology to assist in making diagnoses.
Bedside US has the potential benefits of reducing radiation exposure, costs, and pain, while potentially improving ED patient throughput and satisfaction. This reflects on the original purpose of developing CDRs for extremity fractures. Use of bedside US can help triage patients during a busy ED shift by quickly assessing for the presence of fracture as an adjunct to the normal history and physical examination. It can also aid nurses and physicians who may require more resources for reduction of a fracture. EPs have become more adept at fracture diagnosis through independent review of US and radiographic imaging, and many researchers have examined the ability of EPs to obtain US imaging and diagnose fracture.[12, 13] Additionally, bedside US has excellent diagnostic test characteristics when performed by EPs compared to radiologists in the diagnostic evaluation for soft tissue infections, cholecystitis, pneumothorax, or ruling out ectopic pregnancy.
Bedside US also provides clinically useful information in settings outside of the traditional hospital, including military combat zones, disaster areas, and developing countries. Although not traditional medical settings, it is important to have knowledge of how to use bedside US to assist in decision-making, such as evacuation protocols or prioritization of treatment. Specialists such as orthopedic surgeons or radiologists may not always be present; therefore, the treating physician must feel comfortable with the performance and interpretation of the available diagnostic testing. The objective of this systematic review is to assess the operating characteristics of EP history and physical examination findings in conjunction with bedside US in the clinical evaluation for extremity fractures.
The conception and layout of this systematic review abide by the recommendations from the Meta-analysis of Observational Studies in Epidemiology (MOOSE) statement and the Cochrane collaboration guidelines. In conjunction with a medical librarian, two search strategies were conducted to gather relevant studies for this systematic review. The electronic searches were conducted by two investigators (NJ, AL). The first strategy aimed to gather studies pertinent to historical and examination features suggestive of extremity fractures. The second search strategy identified relevant studies regarding the US diagnosis of extremity fractures (please see Data Supplement S1, available as supporting information in the online version of this paper, for detailed search strategies).
Two authors (NJ, AL) then reviewed the titles and abstracts to identify potentially relevant articles. Full manuscripts were retrieved and reviewed for inclusion criteria. We also searched the Cochrane Central Register of Controlled Trials and the Cochrane Collaboration review on fractures and US and checked the bibliographies of the included articles. We used the “PICO” format to assist in developing inclusion and exclusion criteria for this review: Patients—any adult or pediatric ED patient with an injury to an extremity suspicious for fracture; Intervention—historical features, examination findings, and ultrasonography (US) by an EP at presentation; Comparator—radiographic extremity imaging; Outcome—operating characteristics of ED history, physical examination, and US in diagnosing radiologically proven extremity fractures. Of note, we chose to focus on the operating characteristics of the upper extremities for the history and physical examination component of the review because of the already established Ottawa ankle and knee rules. However, for US operating characteristics, we included studies of both upper and lower extremities.
We included retrospective and prospective observational studies conducted in the ED, using consecutive or convenience patient sampling. We included studies with patients presenting to the ED for evaluation of injuries to the extremities suspicious for fracture. Inclusion required that all studies used a radiographic study (x-ray, computed tomography [CT], or magnetic resonance imaging [MRI]) as a reference test, and both US and the reference test were performed on the initial presentation. Only studies published in English and limited to human subjects were included. We excluded studies of patients with open fractures, neurovascular compromise, and other injuries requiring immediate resuscitative or operative management. Review articles and case studies were considered inappropriate study types and were excluded. Studies in which diagnosis was delayed for greater than 72 hours after injury or in which the physician performing the initial US was not an EP were excluded as non-ED based. For examination of the performance of bedside US, studies in which patients were examined by US alone and did not have a reference radiograph at the time of presentation were excluded because they lacked definitive criterion standards. If a search produced a relevant abstract, but a full-text study was not found, the study was excluded. Figures 1A and 1B account for all studies reviewed and ultimately analyzed.
Individual Evidence Quality Appraisal
Methodologic quality assessment of the included studies was performed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) guidelines. This tool was designed to assess studies for quality and validity by assessing for potential bias and applicability. In accordance with the 2011 updated guidelines for QUADAS-2, the authors agreed on signaling questions that were pertinent to the review. The signaling questions assessed bias and applicability in patient selection, index test, and reference standard. Two of the authors (NJ and NM) independently performed the methodologic assessment. Each author was blinded to the other's assessment. Inter-rater agreement for the QUADAS-2 assessments of each paper was evaluated via a kappa analysis using SPSS v18.0 (SPSS Inc., Chicago, IL). There were no discrepancies between the two authors performing methodologic assessment, and thus discussion was not necessary for reconciliation of discrepancies and consensus.
Methodologic questions thought to be particularly relevant to this review included assessing for inter- or intrarater reliability between historical features, physical examination findings, and US image acquisition and interpretation. Interobserver reliability of index test was recorded only in studies that used two independent operators to perform and review the historical features, physical examination findings, and US study. If a single operator performed both tasks, the question regarding interobserver reliability was answered “no.” In particular, we thought that the experience level of the physician performing the US imaging was pertinent to disclose. If a study did not report on the prior training of the US operator, the question regarding evidence of prior training was answered “no.” In evaluation of the reference standard, if a study reported any patient who was not categorized using a reference standard, then the question pertaining to the standardization of reference test for all patients was answered “no.” The QUADAS-2 results are reported separately for the studies reporting historical and examination features (see Data Supplements S2 and S3, available as supporting information in the online version of this paper) and the studies reporting US examination findings (see Data Supplements S4 and S5, available as supporting information in the online version of this paper).
Data from the included trials were reviewed in detail and abstracted for analysis. The information obtained included patient demographic information, inclusion and exclusion criteria, anatomy evaluated, history and physical findings, US evaluation methodology, reference standard, US operator training, fracture prevalence, and operating characteristics. For each of the included studies, we extracted the raw data regarding true and false positives and negatives and calculated sensitivities, specificities, and likelihood ratios (LRs). Since our operating characteristics were assessed compared to standard reference radiologic testing, we defined “true positive” as fracture identified by history, physical, or US and confirmed by reference radiographic test; “true negative” as negative history, physical, or US for fracture and reference radiograph negative for fracture; “false positive” as history, physical, or US interpretation suggestive of fracture in the setting of a negative reference radiograph; and “false negative” as history, physical, or US negative for fracture with reference radiograph positive for fracture. We calculated the operating characteristics of historical features, examination findings, and bedside US in the diagnosis of fracture compared with radiology.
The data from the reviewed trials were combined to assess for heterogeneity by chi-square tests and the I statistic using Meta-DiSC (Hospital Universitario Ramón y Cajal, Madrid, Spain). In the studies in which the US findings were assessed by two independent reviewers (one who operated the bedside US and interpreted the results and another who interpreted the US study from a tape), we report operating characteristics of both. In our results they are annotated as “primary reviewer” (EP who operated the US at bedside) and “secondary reviewer” (EP who interpreted the US footage). Our operating characteristics data are reported for all individual trials and, where relevant, for both primary and secondary US reviewers. Prevalence, sensitivities, specificities, and LRs are reported with 95% confidence intervals (CIs).
When contemplating the test–treatment thresholds, it is important to consider the clinical utility of point-of-care US, and specifically bedside US, in the diagnosis of fracture. We sought to obtain the operating characteristics of ED-performed bedside US compared with traditional radiographic methods for fracture identification. We do not speculate that bedside US will supplant the use of traditional radiologic methods that have been widely used. We do not envision bedside US for fractures to supplement a history and physical examination. We instead see bedside US to have utility in prioritizing which patients receive radiologic studies, not which patients receive treatment. Bedside US may have a specific niche in resources-poor areas and special needs populations. For example, bedside US could be used to guide fracture reduction in pediatric fractures when it is not wise to continually expose the patient to ionizing radiation. Based on this, test–treatment thresholds do not apply in this review. Rather, our goal is to provide the operating characteristics as the basis for future work.
Description of Included Studies: History and Physical
The PUBMED and EMBASE search was combined and identified 2,007 citations as detailed in Figure 1A. Of these, 1,924 were eliminated based on relevancy, resulting in 83 studies. Seventy-six were eliminated based on reference standard, patient type, and study design, leaving seven studies. Two additional studies were identified that fit the inclusion and exclusion criteria after reviewing the references of the prior seven studies, resulting in nine studies (Appelboam et al., Baker and Borland, Cevik et al., Darracq et al., Docherty et al., Hawksworth and Freeland, Lennon et al., Pershad et al., and Webster et al.) that were included in the systematic review of history and physical examination findings that were suggestive of upper extremity fracture.
A full description of the reviewed studies, including patient demographics, inclusion and exclusion criteria, anatomical location studied, historical features, and physical examination findings identified, is presented in Data Supplement S2. The sample size ranged from 48 to 1,740 patients. Of the nine included trials, six[25, 26, 28-31] studied elbow injuries, and three[27, 32, 33] studied wrist injuries. Of the elbow studies, five[25, 28-31] included both adult and pediatric patients, one of which analyzed the adult and pediatric patients on separate arms. Two[32, 33] studies of wrist injuries included only pediatric patients and one study included only adult populations. The mean age for studies ranged from 7 to 43.5 years of age. All studies excluded patients with neurovascular compromise, distracting injuries, and prior injury or defect to the anatomy of interest. Some studies[27, 33] examined both history and physical examination findings, while other studies[25, 26, 28-30, 32] only examined physical examination findings.
The methodologic quality of the included trials is summarized in Data Supplement S4. The authors (NJ, NM) QUADAS-2 assessment showed no discordance and had a kappa of 1.0. The overall quality of the included trials was variable. Some studies used convenience[26-28, 32, 33] sampling and other studies used consecutive[25, 29-31] sampling. The inclusion and exclusion criteria were clearly defined by all studies, and no inappropriate exclusions were noted. The index test of history or physical examination taking was adequately described and performed to answer the review question in all included studies. Only Cevik et al. and Pershad et al. occasionally examined inter-rater reliability when feasible. Some studies[26-28, 32, 33] also provided varied types of training in how to obtain history and physical examination data to reduce variability that would occur with multiple examiners performing a test. Not all of the studies[25, 32, 33] used the same reference standard on all of their patients. Appelboam et al. used radiography on adults only if they had positive elbow extension tests and only on pediatric patients at the discretion of the treating physician. Those patients who did not receive radiographs had a follow-up telephone assessment in 7 to 10 days. Pershad et al. used orthopedic evaluation in clinic within 5 to 7 days as the reference standard for those with a high suspicion for Salter Harris I fractures. Webster et al. reported that of the patients who did not get radiographs, none had additional ED visits at that institution within 4 weeks. In all studies, the radiologist interpreting the reference standard was blinded to the index test.
Description of Included Studies: Bedside Ultrasound
Our PUBMED search identified 2,318 citations, and EMBASE yielded 3,552 citations, demonstrated in Figure 1B for bedside US examination of extremity fractures. Reviewing the bibliography of pertinent articles identified no additional studies. After exclusion based on title and abstract alone, 68 potentially relevant studies remained from EMBASE and 94 from PUBMED. During a review of these potentially relevant citations, articles were excluded based on several criteria as detailed in Figure 1B. After these exclusions, seven studies were identified by EMBASE, and seven studies by PUBMED. Of these, six were common to both searches (Chen et al., Chaar-Alvarez et al., Cross et al., Chien et al., Marshburn et al., and Tayal et al.), one additional reference was identified by EMBASE and not by PUBMED search (Sinha et al.), and one study was identified by PUBMED and not by EMBASE search (Patel et al.), for a total of eight studies included in our review.
A full description of the reviewed studies, including patient population, inclusion and exclusion criteria, anatomical location studied, US methodology, and physician US experience, is depicted in Data Supplement S3. The sample size ranged from 33 to 101 patients. Of those eight trials, six studied pediatric[34-37, 40, 41] patients and two studied adult[38, 39] patients. The mean age in the pediatric studies varied from 9.1 to 12.7 years. The mean ages of the two adult studies were 34 and 79 years. All the studies excluded patients with neurovascular compromise, open fracture, obvious deformity, and limb- or life-threatening injuries. To decrease heterogeneity and at the same time view the data in a clinically meaningful way, we organized the data into anatomical subgroups. Subgroups combined studies that analyzed forearm fractures[34, 35] alone, forearm and leg fractures,[40, 41] and clavicle fractures.[36, 37]
Chien et al., Sinha et al., Patel et al., Chen et al., and Marshburn et al. had operator/interpreters with minimal US experience and were given US training specific to fracture detection, which ranged from 15 minutes to 1 month. The operator/interpreters of the studies by Chaar-Alverez, Cross, and Tayal were not specifically trained in fracture recognition, but had extensive prior experience with emergency US, which ranged from postresidency US experience with 135 CME hours in emergency US to 150 US scans.
The methodologic quality of the included trials is summarized in Data Supplement S5. The authors’ (NJ, NM) QUADAS-2 assessment showed no discordance and had a kappa of 1.0. The overall quality of the included trials was variable. All studies used convenience instead of consecutive sampling. The inclusion and exclusion criteria were clearly defined by all studies, and no inappropriate exclusions were noted. However, none of the trials reported data on dropout rates. The index test was adequately described and performed to answer the review question in all included studies. Chaar-Alvarez et al. and Cross et al. were the only studies that reported inter-rater reliability of ultrasound, with kappas of 0.57 and 0.74, respectively, for fracture diagnosis. An appropriate reference test was selected for use in all studies. All patients except for one in the study by Tayal et al. received the same reference test, and one patient from that study did not have a radiologic study, but instead underwent surgical intervention. None of the studies reported on inter-rater reliability of the reference standard interpretation. Only the study by Patel et al. failed to blind the interpretation of the index and reference tests with respect to each other.
The prevalence of fracture in all the included studies examining history, physical, and US findings is represented in Tables 1 and 2 and ranged from 22% to 70%. This variability in fracture prevalence was also mirrored in larger retrospective reviews of pediatric trauma. Alzen et al. in a review of 2,006 x-rays of 1,386 pediatric trauma patients found a fracture prevalence of 17.2%, whereas Moritz et al. found a higher fracture prevalence of 41% in 653 pediatric trauma patients. The variability in fracture prevalence found among our reviewed studies and the studies of Alzen et al. and Moritz et al. is most likely explained by variations in anatomical locations. In the study by Alzen et al., the prevalence of fracture varied significantly with anatomical location, showing that although overall prevalence was 17.2%, the prevalence of forearm and humerus fracture in particular was as high as 54.9 and 47.7%, respectively. The study by Moritz also reports the overall fracture prevalence (41%), but it unfortunately does not give a breakdown of fracture prevalence by anatomical subgroup.
Table 1. Operating Characteristics of History and Physical Reviewed Tests
Sample Size N
Prevalence,% (95% CI)
Sensitivity,% (95% CI)
Specificity,% (95% CI)
LR+ (95% CI)
LR– (95% CI)
LR+ = likelihood ratio of a positive test; LR– = likelihood ratio of a negative test.
Pain on active motion
Pain on passive motion
Pain on grip
Pain on supination
Table 2. Operating Characteristics of Ultrasound Reviewed Studies
Sample Size N
Primary reviewer = emergency physician who performed and interpreted US; secondary reviewer = emergency physician who interpreted US images captured on tape.
LR+ = likelihood ratio of a positive test; LR– = likelihood ratio of a negative test; US = ultrasound
Chaar-Alvarez 2011 (primary reviewer)
Chaar-Alvarez 2011 (secondary reviewer)
0.13 (0.08 – 0.21)
Cross 2010 (primary reviewer)
Cross 2010 (secondary reviewer)
Upper/lower ext. pediatric
Upper/lower ext. pediatric
History and Physical Examination
For this systematic review, we chose to focus the literature search on history and physical examination findings on the upper extremity. The reason we chose to focus on upper extremity is because the literature has derived and validated decision rules for the lower extremity ankle and knee with the Ottawa ankle and knee rules. These rules are combinations of history and physical examination findings for the lower extremity that help guide imaging decision-making. No rules have been validated to guide imaging decisions for the upper extremities as of yet.
To qualify for imaging for the Ottawa ankle rules, the patient must be unable to walk more than four steps or have tenderness over the malleoli, navicular bone, or base of the fifth metatarsal. In 2003, Bachmann et al. conducted a meta-analysis of 27 studies with 15,581 patients and found a pooled sensitivity of 97.6% for Ottawa ankle rules in both the adult and the pediatric population. The specificity was too heterogeneous and ranged from 47.9% to 26.3%. The study found pooled negative LR only for the pediatric population to be 0.1. The study was not able to report for the adult population because of insufficient published data. Dowling et al. published a meta-analysis of the Ottawa ankle rule in pediatric patients including 12 studies with a fracture prevalence of 21.4%. The authors reported a pooled sensitivity of 98.5%. Specificity was heterogeneous for data pooling and ranged from 7.9% to 50%. The pooled negative LR was 0.11. Positive LR was too heterogeneous for pooling.
Ultimately, Bachmann concluded that although there is good sensitivity and negative LRs with the Ottawa ankle rule, differences in clinical examination skills, clinical experience, and interpretation of the test criteria can affect the sensitivity and the negative LR. Dowling concluded that the Ottawa ankle rule can be used in the pediatric population (>5 years) to rule out fractures.
Bachmann et al. also validated the Ottawa knee rule in 2004. The Ottawa knee rule consists of answering five questions asking whether the patient is above 55 years of age, has isolated tenderness of the patella, has tenderness over the head of the fibula, is unable to flex the knee to 90°, and is unable to bear weight for four steps. If any question is answered yes, imaging of the knee is recommended. To validate the decision rule, 11 studies were included in the analysis, of which five study results were pooled. Pooled sensitivity was 98.5%, pooled specificity was 48.6%, and pooled negative LR was 0.05. Bachmann concluded that the Ottawa knee rule is useful for excluding fracture; however, they did recommend further large-scale testing for validation because they prefer sensitivities above 98.5%.
The studies of upper extremity fractures that were identified based on the inclusion criteria focused on elbow or wrist injuries. Figures 2-6 depict forest plots of the sensitivities, specificities, positive LR, and negative LR of history and physical examination in all of the included trials. We created subgroups examining elbow physical examination findings alone and wrist physical examination findings alone. We also created subgroups within these based on which type of test was performed that resulted in the following subgroups for elbow testing: elbow extension,[25, 26, 28-31] elbow flexion,[26, 28, 31] elbow pronation,[28, 31] and elbow range of motion.[26, 28, 31] The wrist subgroup was further divided into wrist clinical impression.[27, 32, 33] Cevik et al. further described wrist physical examination findings that were not studied elsewhere as described in Data Supplement S2. The data had too much heterogeneity for pooling purposes.
Data Supplement S4 shows the diagnostic accuracy results for history and physical examination. Data Supplement S2 provides a description of the tests performed in each study. Elbow extension test operating characteristics ranged in sensitivity from 48% to 100%, specificity from 48% to 100%, positive LR from 1.9 to infinity, and negative LR from 0.06 to 0. Elbow flexion test operating characteristics ranged in sensitivity from 56% to 89%, specificity from 45% to 100%, positive LR from 1.6 to infinity, and negative LR from 0.3 to 0.6. Elbow range-of-motion test operating characteristics ranged in sensitivity from 93% to 100%, specificity from 34% to 100%,[28, 31] positive LR from 1.4 to 30, and negative LR from 0 to 0.8. Wrist clinical impression operating characteristics ranged in sensitivity from 79% to 99%, specificity from 24% to 85%, positive LR from 1.3 to 6.5, and negative LR from 0.03 to 0.3.
The main outcome is the operating characteristics of US in fracture diagnosis. Figure 7 depicts forest plots of sensitivities, specificities, positive LR, and negative LR of US in all the included trials. Subgroup analyses contained too few studies to report pooled statistics. Figures 8, 9, and 10 show forest plots of subgroup analysis for forearm, clavicle, and upper and lower extremity, respectively. Table 2 represents operating characteristics of US for all reviewed studies individually. For Chaar-Alvarez et al. and Cross et al., the primary bedside ultrasound operator and the secondary reviewer who reviewed the ultrasound studies from a tape have both been included.
The operating characteristics for US detection of extremity fractures ranged in sensitivity from 83.3% to 100%, in specificity from 73% to 100%, in positive LR from 3.2 to 56, and negative LR from 0.0 to 0.2. Subgroup analysis based on anatomical location revealed the following operating characteristics. For the ultrasound detection of pediatric clavicle fractures, sensitivity ranged from 90% to 95%, specificity from 86% to 97%, positive LR from 6.6 to 27, and negative LR 0.0 to 0.1. When upper and lower extremity pediatric fractures were studied together, sensitivity ranged from 90% to 96%, specificity from 83% to 100%, positive LR from 5.8 to 56, and negative LR from 0.0 to 0.2. For the US detection of forearm fractures, sensitivity ranged from 83% to 100%, specificity from 73% to 91%, positive LR from 3.2 to 9.1, and negative LR from 0.0 to 0.2. The operating characteristics of ultrasound in single studies of adult hand fractures by Tayal et al. and adult femur/humerus by Marshburn et al. are also found in Table 2.
We sought to understand the best method of examining a patient who arrives in the ED with the complaint of extremity trauma to assess for fracture. This analysis is particularly useful, given the introduction and wide acknowledgement of point of care bedside US as an enhancement to the history and physical examination in the ED. After reviewing the history and physical examination findings for upper extremity fractures, we found a wide range in positive LR for the elbow extension, flexion, and range of motion tests, which would make it difficult to conclude how posttest probability would change. Similarly, the negative LR for these tests ranged from 0 to 0.8. In isolation, the diagnostic data on history and physical examination are heterogeneous and inconclusive. On the other hand, bedside US is consistently accurate, with the potential to aid in evaluation. Bedside US in the hands of an EP exhibits operating characteristics that could possibly change the posttest probability of extremity fracture. We noted that a negative US for fracture even in varying anatomical locations had an ability to rule out a fracture (negative LR = 0.0 to 0.2). However, the LR of a positive examination widely ranged (positive LR = 3.2 to 56) across different anatomical structures, resulting in only a moderate change in posttest probabilities in some studies.
The operating characteristics in this review need to be compared with those of similarly designed studies looking at traditional US operators such as radiologists or surgeons. Williamson et al. compared US findings by radiologists with radiographs for fracture diagnosis. This study was performed on 26 pediatric patients between 2 and 14 years of age with clinical findings suspicious for forearm fractures. They found 16 fractures by US that were all confirmed by radiographs for sensitivity and specificity of 100% each. Hubner et al. also compared US findings by pediatric surgeons with radiographs. They included 163 pediatric patients with suspicion of fracture in the upper and lower extremities with sensitivity of 98.3% and specificity of 69.3%. These are similar to the operating characteristics in our review. This shows that EPs are able to perform bedside US to diagnose extremity fracture without compromising accuracy compared with radiologists and pediatric surgeons.
Implications for Future Research
This systematic review has highlighted the accuracy of EP bedside US to diagnose fractures. The role of bedside US to complement existing CDRs such as the Ottawa ankle and knee rules remains undefined. We see an opportunity for future prospective studies looking at the possibility of decreasing x-ray use if US is incorporated into these CDRs.
Another area that can be addressed by future research is the paucity of evidence that assesses the diagnostic accuracy or reliability of elements of history and physical examination findings in the diagnosis of upper extremity fracture compared to ankle and knee injuries. Future research should focus upon appropriately designed trials using the STARD criteria. This diagnostic research should examine history and physical examination findings in conjunction with bedside US findings to better understand the accuracy of these tests in isolation and in combination. This systematic review helps future researchers to understand the complementary roles of history, physical examination, and bedside US. Ultimately, upper extremity fracture CDRs could reduce variability and improve diagnostic efficiency. Our results indicate that US should be one additional diagnostic test evaluated in the derivation of these future upper extremity fracture CDRs.
Future research projects should also examine which types of fractures may be better detected by ultrasound than by standard radiograph techniques. Recent studies have suggested the diagnostic superiority of US in radiographically occult fractures.[52, 53] Plain radiographs may be inadequate with buckle and Salter Harris I fractures in the pediatric population, as demonstrated by Patel et al. and Chen et al. In the study by Patel et al. two patients were found to have buckle fractures detected by US not seen on the initial x-ray, which were confirmed by callous formation at a later date. The study by Chen et al. found three patients suggestive of distal radial fracture by effusion on US with normal x-rays diagnosed as Salter Harris I fracture on repeat clinical examination. These cases indicate that bedside ultrasound may be superior to radiographs when diagnosing buckle or Salter Harris I fractures. In addition, US may be superior to traditional radiographic studies for scaphoid fractures.
Ultrasound may not be as accurate in the diagnosis of fractures near joints, related to the complicated nature of joint anatomy. Patel et al. specifically excluded patients with injuries in proximity to a joint. Sinha et al. found one case where a missed fracture by US was located close to the elbow joint and also speculated that the contours of the bones in a joint make it difficult to make the diagnosis. Marshburn et al. had two cases of missed fractures around the intertrochanteric region of the femur. They also had five cases of false-positive US, all of which occurred in the joint region. This implies the anatomical context of the injury affects the accuracy of the operating characteristics of US and, therefore, needs further investigation.
The question then arises if US should supplant radiographs as the criterion standard for fracture diagnosis. Glasziou et al. discuss a process to decide when a reference standard should change when evaluating diagnostic tests. An important point they make is to consider the clinical consequences when deciding whether to change the reference standard. Perhaps US should be explored as a reference standard in pediatric patients, where the clinical implication can be significant when missing a Salter Harris I fracture, especially because it is suggested that radiographs do not perform as well as US. Alternatively, perhaps US should be considered as a surrogate criterion standard for evaluation of pregnant patients as another example of a special population, because of the possible clinical consequences of radiation exposure.
Future research will have to also examine closely the suggestions Reitsma et al. recommend when creating diagnostic accuracy tests examining US and radiographs given that radiographs are a flawed reference standard. A more robust criterion standard than x-rays would incorporate multiple diagnostic strategies (x-ray, CT, repeat imaging, short-term clinical follow-up), as many of the studies we reviewed described. This would ensure that false-negative x-rays are ultimately correctly recognized as fractures. Future research will have to directly address the issue that radiographs are not the perfect reference standard. Unless this is done, it will be difficult to fully understand the potential of bedside US. Using an imperfect criterion standard will bias the estimates of sensitivity upwards and specificity downward.
Future research should also examine adult patient populations and use of history, physical examination, and bedside US in fracture diagnosis. The lack of data was evident in this systematic review, which makes it difficult to reach conclusions on the adult patient population. Future studies enrolling adults should also acknowledge geriatric patients as a potential separate group. According to Xu et al., geriatric patients are using ED resources at an increasing rate, which places a higher burden on EPs to administer proper care and treatment. The geriatric patient group may have to be considered as a separate group from adults based on comorbidities such as osteoporosis that could lead to higher suspicion of fracture based on history and physical examination findings than other groups. They may not be able to give a robust history if there are other comorbidities such as dementia. Additionally, geriatric patients may have different physical examination and US findings based on preexisting degenerative bone disease and arthritis.
The next question raised by this review is how much training is sufficient for the EP to be proficient in bedside US for the diagnosis of fracture. According to American College of Emergency Physicians, US for the diagnosis of fracture is currently considered a noncore emergency US application. With the increased use of US in the ED, more physicians graduating from EM residency training will have greater US exposure. In the future, bedside US fracture diagnosis may become a core application, so proficiency and training are necessary for the accurate interpretation of bedside US. There was significant variability in training among the studies included in this review. The education ranged from as little as 15 minutes, to significant prior experience[35, 36] with US. The studies included in the review did not give sufficient detail to understand clearly what methods of supervision were used during the training. In the future, US research should include direct supervision in real time and retrospective tape review of sonographic images until physician proficiency can be assured.
It is important to consider how this information carries implications into clinical practice. Due to its ready availability, traditional radiography remains the standard of care to diagnose fractures. However, the evidence shows that diagnosis of fractures by US is possible and in certain situations can be a suitable substitute for radiography. Shorter and Macias demonstrated how US assisted in patient management decisions in Haiti after the 2010 earthquake when the medical infrastructure had been destroyed. Evans and Harris discussed the utility of portable US in remote clinics such as mountainside areas, when evaluating skiing and snowboarding accidents. Nelson et al. reviewed the feasibility of clinical assessment with US in field medical operations during deployment. In these nontraditional medical settings, US can serve as a readily available, sufficiently accurate diagnostic tool for fracture diagnosis. Similarly, in patients in whom exposure to ionizing radiation carries greater than average risk, such as pregnant women or pediatric patients requiring imaging for multiple areas as in suspected child abuse,[64, 65] US can be used to prescreen and identify patients requiring further or serial radiographic imaging. These populations are of particular interest for further research.
Finally, an algorithm can be devised for management of suspected long bone fractures incorporating history and physical examination, radiologic studies, and bedside US. Hubner et al. proposed an algorithm for pediatric patients. According to the algorithm, all open fractures, unstable fractures, obvious compound fractures, or joint-adjacent injuries should undergo radiographic imaging, followed by appropriate therapeutic intervention. Other injuries suspicious for fracture can undergo US examination. Fractures identified by US could then undergo radiographic imaging followed by appropriate therapeutic intervention, whereas patients with negative US findings would not require further imaging. This would reduce radiation exposure and costs and could potentially expedite management. Such an algorithm can be applied in the conventional ED, as well as modified for resource poor environments. However, this algorithm first needs external validation. Future algorithms should also involve adult patients, incorporating pertinent history such as mechanism of injury and physical examination findings such as tenderness and swelling, in the decision tree.
Our study has several major limitations regarding methodologic features of the individual diagnostic studies included in the systematic review. These include lack of blinding of outcome assessors, lack of inter-rater reliability, and convenience sampling. In the context of diagnostic research, blinding refers to interpreting the diagnostic test without knowledge of the criterion standard. In this systematic review it refers to the radiologist having a purposeful lack of knowledge regarding the interpretation of the history, physical, and US result obtained by the EP and vice versa. Failure to blind these diagnostic interpreters can bias sensitivity and specificity upward.
All of the studies included in the review for history and physical examination findings of upper extremity fracture were blinded to the criterion standard. Among the US studies, only Patel et al. failed to provide a sufficiently clear methodologic description to be able to infer that the results of the US examination were interpreted blindly to the reference standard. The history and physical examination studies did not uniformly use the same reference standard. Appelboam et al. used either the orthopedic surgeon's final diagnosis or the 1-week follow-up by telephone as reference standards in addition to radiography. Webster et al. performed a chart review for those patients who did not receive radiography to assess the final diagnosis. This represents a realistic situation; however, for the purposes of evaluating diagnostic test accuracy, double criterion standard bias represents a common and significant source of error in diagnostic research that can yield falsely increased estimates of sensitivity and specificity for diseases that resolve spontaneously.
An important methodologic concern found during the review is the lack of inter- or intrarater reliability assessments for obtaining history and physical examination findings, US, or radiologic fracture detection in the majority of reviewed studies. Only the Cevik et al.  physical examination study attempted to assess for inter-rater reliability only when it was feasible. Chaar-Alvarez et al. and Cross et al. were the only US studies that examined inter-rater reliability, but only of the US interpretation. Each of these two studies compared bedside US interpretation by the initial operator/interpreter with a second blinded interpreter who reviewed the scan at a later time. In both studies, the EP who reread the US scan had significantly more US experience than the initial operator/interpreter, resulting in better operating characteristics. This difference in US training between the first and second interpreter may explain why only a moderate agreement between the reviewers was noted in the study by Chaar-Alvarez et al. The unstudied effect of US experience on the operating characteristics of US in fracture detection limits the generalizability of our results.
The wide range of US experience among the operators/interpreters, coupled with the absence of inter- and intrarater reliability in the majority of our reviewed studies, exposes these studies to unmeasured risks of observer variability. This is especially important in US studies where the skill of the operator in obtaining high-quality images may not be synonymous with the ability to accurately interpret the scan. High observer variability in either the index or the reference tests has been shown to affect measures of diagnostic accuracy.[67-69] The unstudied effect of US experience on the operating characteristics of US in fracture detection limits the generalizability and thus the external validity of many of our reviewed studies. Hypothetically, operators who are highly experienced can falsely elevate the sensitivity and specificity when compared to those with less experience.
The absence of discussion of handling of uninterpretable scans in any of the reviewed trials is another concern for bias. Medicine is an art, and often history and physical examination findings can be nuanced and not always easily discernible. The removal of uninterpretable scans from analysis disguises the difficulty in obtaining the images and the misclassification of these images affects the veracity of the operating characteristics of the diagnostic test. Additionally, intermediate test results may also have diagnostic value, especially when considered in the context of history and physical examination findings. Removal of the uninterpretable yields falsely elevated estimates of sensitivity.
Another potential source of bias is that the majority of studies used x-ray as the only reference standard test for comparison with US. Only the study by Marshburn et al. used CT where plain radiographs were equivocal. The study did not state how many patients ultimately required CT for diagnosis. No study used MRI as comparison. The lack of studies using CT and MRI as reference standard may reflect the general practice to use plain radiographs as the first-line diagnostic tool. Often CT or MRI is used as a second-line diagnostic tool for potentially complicated fracture, occult fractures, and stress fractures or to guide possible surgical interventions. Studies show that often MRI has a higher sensitivity for occult fracture diagnosis than x-ray or CT alone.[72, 73] In the reviewed trials that only used x-rays, it is possible that falsely negative radiographs were used as the reference standard. This imperfect criterion standard bias affects estimates of US operating characteristics by either decreasing the number of false negative US or increasing the number of false positives, which will falsely increase sensitivity or decrease specificity, respectively.
Because none of the reviewed studies used a rigorous sampling methodology, the potential for context bias exists, as evidenced by the wide range of observed fracture prevalence (22% to 70.6%) among our reviewed studies. Context bias occurs in studies with high prevalences of positive results where the examiners expect and usually find positive studies, which can lead to falsely increased sensitivity and decrease the specificity.
Finally, the literature search was limited to studies published in English and published manuscripts. No abstract data were included in this review. This may have led to elimination of some high-quality studies that would also address the question at hand.
After reviewing the existing evidence, we conclude that ultrasound performed by emergency physicians is sufficiently accurate to rule in or rule out extremity fracture. We also conclude that the evidence regarding history and physical examination findings are inconsistent and inconclusive. Further research is required to understand the role of history and physical examination along with point-of-care ultrasound to diagnose extremity fractures. The question now is how these findings should be used to change EM practice. The evidence does not support replacing conventional radiography with ultrasound modality for fracture diagnosis, but ultrasound may serve as a suitable alternative under specific conditions. Those include situations where ultrasound would be either the preferred method of diagnosis or the most available method. In remote and resource-poor settings, where conventional radiography is not available, ultrasound can serve as a primary tool for fracture diagnosis. Similarly, in special populations in which ionizing radiation should be avoided, ultrasound can be used as a screening modality to identify those requiring further radiographic imaging. Ultrasound can also be helpful in situations where the history and physical examination findings are inconclusive. Finally, in patients with suspected radiographically occult fractures, ultrasound can potentially be used as an alternative to computed tomography or magnetic resonance imaging.
The authors thank Christopher Stewart for his assistance as medical librarian.