Supervising Editor: Alan Jones, MD.
The Learning Curve of Resident Physicians Using Emergency Ultrasonography for Cholelithiasis and Cholecystitis
Article first published online: 2 NOV 2010
© 2010 by the Society for Academic Emergency Medicine
Academic Emergency Medicine
Volume 17, Issue 11, pages 1247–1252, November 2010
How to Cite
Jang, T. B., Ruggeri, W., Dyne, P. and Kaji, A. H. (2010), The Learning Curve of Resident Physicians Using Emergency Ultrasonography for Cholelithiasis and Cholecystitis. Academic Emergency Medicine, 17: 1247–1252. doi: 10.1111/j.1553-2712.2010.00909.x
- Issue published online: 2 NOV 2010
- Article first published online: 2 NOV 2010
- Received November 26, 2009; revision received April 15, 2010; accepted April 19, 2010.
- learning curve;
Background: Emergency department bedside ultrasonography (EUS) can expedite treatment for patients. However, it is unknown how much experience is required for competency in the sonographic diagnosis of cholelithiasis and cholecystitis.
Objectives: The objective was to assess the learning curve of physicians training in right upper quadrant (RUQ) EUS.
Methods: This was a prospective study at an urban, academic emergency department from August 1999 to July 2006. Patients with suspected biliary tract disease underwent RUQ EUS followed by abdominal ultra sonography (AUS) by the Department of Radiology. Results of EUS were compared to AUS using a predesigned, standardized data sheet.
Results: A total of 1,837 patients underwent EUS by 127 physicians. The overall sensitivity and specificity of EUS for cholelithiasis were 84% (95% confidence interval [CI] = 81% to 86%) and 86% (95% CI = 83% to 88%), respectively. The overall sensitivity of EUS for ductal dilation, gallbladder wall thickening, pericholecystic fluid, and sludge were each < 60%. When analyzing the EUS test characteristics, for every increase in 10 examinations up to 50 examinations, there was no significant improvement in the sensitivity or specificity for any of these sonographic findings. Moreover, on probit regression analysis, accounting for clustering or correlation among the examinations performed by each of the operators, there was no improvement for detecting any of the sonographic findings except for pericholecystic fluid for every 10 additional examinations performed.
Conclusions: When adjusting for operator dependence, performing up to 50 EUS examinations appears to have little effect on the accuracy of RUQ EUS. Rather than simply requiring an arbitrary number of examinations, another method of competency assessment may be necessary.
ACADEMIC EMERGENCY MEDICINE 2010; 17: 1247–1252 © 2010 by the Society for Academic Emergency Medicine
Cholelithiasis affects 10% of people in Western society,1 and is a common cause of acute abdominal pain in emergency department (ED) patients. Although usually benign, up to 31% of patients with cholelithiasis go on to develop acute cholecystitis,2 which can be rapidly fatal and requires emergent surgery in about 20% of patients.1 Therefore, ultrasonography (US) has become the first-line diagnostic test of choice for patients with suspected biliary disease.1 Unfortunately, removing patients from the clinical area to the radiology suite can be time-consuming and may not always be feasible or safe, especially in the case of unstable or actively vomiting patients.
Emergency department bedside ultrasonography (EUS) of the right upper quadrant (RUQ) was previously described as an alternative imaging modality for patients with suspected biliary disease3–6 that does not require any technician time and may be done quickly at the bedside without removing patients from the clinical area. However, while EUS can diagnose cholelithiasis, it may lack the appropriate sensitivity to rule out acute cholecystitis,3,4 which is important because acalculous cholecystitis accounts for up to 14% of all cholecystitis cases.1 Furthermore, there are various conflicting guidelines for training7–9 because it is not known how much training is required for competency, and none of the guidelines are based on prospective data, but rather on consensus opinion.7 Therefore, the purpose of this study was to prospectively assess the learning curve of physicians training in EUS for the sonographic signs of cholelithiasis and cholecystitis.
This was an institutional review board–approved prospective study of a convenience sample of patients seen between August 1, 1999, and July 31, 2006, when a resident physician was available to obtain consent and perform EUS prior to abdominal ultrasonography (AUS) by the Department of Radiology for the diagnosis of biliary disease. The study physicians performed EUS to detect sonographic signs of biliary disease, and results were recorded on a predesigned data sheet.
Study Setting and Population
This study was conducted at an urban, academic ED with 49,000 annual adult visits, a PGY-2 through PGY-4 emergency medicine residency program, and a PGY-1 through PGY-5 combined internal medicine/emergency medicine residency program.
All patients presenting to the ED with abdominal pain or nausea/vomiting were eligible for participation if their treating physicians were ordering an AUS for suspected biliary disease. Patients were excluded if they could not speak English, had known results of AUS done within 60 days, or were unable to give informed consent.
The participating physicians were resident physicians who had completed an introductory course on EUS during the first week of their PGY-2 year and had performed at least two prior EUS examinations on normal volunteers before enrolling patients. The introductory course involved 8 hours of didactics covering ultrasound physics, knobology, core examinations (aortic, biliary, cardiac, obstetric, renal, trauma, and vascular access), and case vignettes, with another 8 hours of hands-on practice on normal volunteers. All EUS examinations were reviewed within 1 week by one of two attending physicians who are board-certified by the American Board of Emergency Medicine and met the EUS training guidelines of both the American College of Emergency Physicians (ACEP) and the Society for Academic Emergency Medicine (SAEM). These reviews were used for educational purposes, but not to alter the completed data sheets, as the performance of EUS requires both technical and interpretive skills (i.e., these attending overreads were not used to alter the performance characteristics of the resident EUS examinations).
Participating physicians consented the patients and performed EUS to detect specific sonographic signs of biliary disease before AUS was obtained. The results were then recorded on a predesigned data sheet. Research assistants, trained in data abstraction and blinded to the results of the EUS examinations, then reviewed the results of AUS read by board-certified radiologists from the Division of Ultrasound and Mammography within the Department of Radiology for subsequent comparison. EUS examinations done with the assistance of a senior resident were tracked by the examination experience of the senior resident.
All EUS examinations performed under the supervision of an attending physician were excluded to avoid biasing the data with the skills of the attending physician. Likewise, EUS examinations done for training purposes after AUS was obtained were also excluded. These examinations were tracked only for the purpose of assessing the overall experience of the resident physician.
Emergency US examinations were performed using an Aloka SSD-1400 with a 3.5-MHz curved linear array probe (ALOKA CO, LTD., Wallingford, CT) to evaluate 1) the presence of cholelithiasis, 2) common bile duct (CBD) dilation > 5 mm (plus 1 mm per decade of life over 50 years of age), 3) gallbladder wall thickening (GBWT) > 4 mm, 4) the presence of pericholecystic free fluid (PCFF), and 5) the presence of sludge. The criterion standard for each of these measures was the final reading of AUS by board-certified radiologists blinded to the EUS results, which was performed using a Philips IU22 with a curved, linear array probe (Philips Healthcare, Andover, MA). A sonographic Murphy’s sign was not assessed due to concerns by the Department of Radiology that administration of narcotics would alter the performance characteristics of the sign.
Data were collected in an Excel database (Microsoft Corp., Redmond, WA) and translated into a native SAS format using DBMS/Copy (Dataflux Corp., Cary, NC). Analyses were conducted using SAS version 9.1 (SAS Institute, Cary, NC). Sensitivity and specificity were calculated using 95% confidence intervals (CIs) to assess both statistical significance and clinical effect.10 PASS 2008 version 08.0.8 (Kaysville, UT) was used to perform a post hoc power analysis.
It was predetermined to track the EUS examinations per resident by experience level in increments of 10 (i.e., examinations 1–10, 11–20, 21–30, etc.). Thus, the data were hierarchical such that the second block of 10 EUS examinations for every resident were in group two (“11–20”), the third block for every resident were in group 3 (“21–30”), etc. Consequently, the number of residents in each successive group was anticipated to be smaller than in the preceding group. Maentel-Haenszel chi-square testing was done to assess for changes in performance between each group. The proc genmod procedure was used to perform a probit regression analysis to account for the fact that clustered examinations by one operator are typically more similar than those performed by another operator; thus, the data analyses took intraoperator cluster correlation into account rather than assuming independence among all observations.
Adequacy of probit regression model fit was assessed by using the generalized score statistic criterion (p > 0.05) and quasi-likelihood information criterion (QIC), which is a modification of the Akaike information criterion (AIC). Additionally, the Hosmer-Lemeshow goodness-of-fit test was performed for every binary response criterion for true-positive and true-negative values for CBD dilation, gallstones, GBWT, PCFF, and sludge, with the independent variable being 10 additional examinations being performed.
A post hoc power analysis demonstrated that regression modeling with a sample size of 1,837 observations (of which 52% are in the group X = 0—no gallstones) achieves 90% power at a 0.01 significance level to detect a change in sensitivity of 80% to 90% (change corresponds to an odds ratio of 2.0) for each 10 incremental examinations. An additional power analysis to sequentially compare each group of 10 examinations was performed, although this is not an optimal power analysis because it does not account for clustering and intraoperator correlations between examinations. To detect a 10% difference in sensitivity from 80% to 90% between examinations 1–10 (n = 904) and examinations 11–20 (n = 458), our sample size achieved 99% power with a significance level of 0.05. When comparing examinations 11–20 (n = 904) to examinations 21–30 (n = 273), our sample size achieved 96% power; when comparing examinations 21–30 (n = 273) to examinations 31–40 (n = 120), our sample size achieved 70% power; and when comparing examinations 31–40 (n = 120) to examinations 41–75 (n = 82), there was 48% power.
A total of 2,080 patients underwent EUS examinations for sonographic signs of biliary disease by 127 resident-sonographers followed by AUS by the Department of Radiology. A total of 243 examinations were excluded due to being done under the supervision of an attending physician who met the training guidelines of ACEP and SAEM, all of which occurred by resident physicians performing one of their first 20 examinations (i.e., groups 1 and 2). Therefore, 1,837 EUS examinations were included for analysis. A total of 864 patients (47%) had cholelithiasis, 156 patients (9%) had CBD dilation, 318 patients (17%) had GBWT, 104 patients (6%) had PCFF, and 189 patients (10%) had sludge.
The sensitivities and specificities of EUS for specific sonographic signs of biliary disease are shown in Table 1 by examination level. The Mantel-Haenszel chi-square test for trend demonstrated no significant difference by examination level strata for either true-positive (χ2 = 0.14, p = 0.71) or true-negative (χ2 = 3.4, p = 0.07) rates for the detection of cholelithiasis. Moreover, when using a random effects model accounting for intraoperator correlation, there was no statistically significant increase in identifying either the presence (p = 0.41) or the absence (p = 0.40) of cholelithiasis for each additional examination performed. Likewise, the Mantel-Haenszel chi-square test for trend for each strata of 10 examinations was performed for each of the sonographic signs of cholecystitis. The only signs that demonstrated increasing trends were for the true-negative rate of CBD dilation (Mantel-Haenszel χ2 = 8.82, p = 0.003), the true-positive rate of PCFF (Mantel-Haenszel χ2 = 4.9, p = 0.03), and the true-negative rate of PCFF (Mantel-Haenszel χ2 = 15.33, p < 0.0001). Using a random effects model to account for intraoperator correlation, there was no significant effect of training up to 40 examinations for the detection of CBD dilation, GBWT, or sludge; however, there remained a significant effect for PCFF (Table 2).
|Finding||Sensitivity, % (95% CI)||Specificity, % (95% CI)||LR + (95% CI)||LR − (95% CI)|
|Cholelithiasis (n = 864)|
|All EUS examinations (1,837)||84 (81–86)||86 (83–88)||6.0 (5.0–7.1)||0.19 (0.17–0.22)|
|Examinations 1–10 (904)||83 (80–86)||83 (79–87)||5.0 (4.0–6.2)||0.2 (0.16–0.24)|
|Examinations 11–20 (458)||79 (74–84)||92 (88–95)||10.0 (6.8–14.7)||0.2 (0.18–0.29)|
|Examinations 21–30 (273)||85 (78–90)||86 (79–92)||6.3 (4.0–10.0)||0.2 (0.1–0.3)|
|Examinations 31–40 (120)||95 (85–99)||92 (81–97)||11.2 (4.8–26.0)||0.05 (0.02–0.16)|
|Examinations 40–75 (82)||91 (78–97)||92 (77–98)||11.2 (3.8–33.4)||0.1 (0.04–0.2)|
|CBD dilation (n = 156)|
|All EUS examinations (1,837)||40 (33–49)||94 (93–95)||6.9 (5.3–9.1)||0.63 (0.6–0.7)|
|Examinations 1–10 (904)||42 (31–55)||96 (95–98)||11.9 (7.6–18.6)||0.6 (0.5–0.7)|
|Examinations 11–20 (458)||39 (25–55)||94 (91–96)||6.2 (3.7–10.4)||0.6 (0.5–0.8)|
|Examinations 21–30 (273)||32 (17–52)||89 (85–93)||3.0 (1.6–5.8)||0.8 (0.6–1.0)|
|Examinations 31–40 (120)||40 (14–73)||86 (78–92)||2.9 (1.2–7.2)||0.7 (0.4–1.1)|
|Examinations 40–75 (82)||67 (24–99)||99 (92–99)||51.0 (6.7–385.0)||0.3 (0.1–1.0)|
|GBWT (n = 318)|
|All EUS examinations (1,837)||54 (48–60)||95 (93–96)||9.9 (7.8–12.5)||0.49 (0.4–0.5)|
|Examinations 1–10 (904)||53 (44–61)||95 (94–97)||11.6 (8.1–16.6)||0.5 (0.4–0.6)|
|Examinations 11–20 (458)||57 (45–67)||95 (92–97)||11.8 (7.2–19.2)||0.5 (0.4–0.6)|
|Examinations 21–30 (273)||47 (33–62)||89 (84–93)||4.4 (2.7–7.1)||0.6 (0.5–0.8)|
|Examinations 31–40 (120)||50 (26–74)||95 (89–98)||10.4 (3.9–27.9)||0.5 (0.3–0.9)|
|Examinations 40–75 (82)||86 (56–97)||97 (89–99)||29.1 (7.3–116.1)||0.1 (0.04–0.5)|
|PCFF (n = 104)|
|All EUS examinations (1,837)||41 (32–51)||97 (96–98)||13.8 (9.7–19.6)||0.6 (0.5–0.7)|
|Examinations 1–10 (904)||43 (26–62)||97 (96–98)||17.2 (9.6–30.8)||0.6 (0.4–0.8)|
|Examinations 11–20 (458)||59 (39–77)||97 (94–98)||18.2 (10.0–33.3)||0.4 (0.3–0.7)|
|Examinations 21–30 (273)||23 (10–42)||95 (92–98)||5.0 (2.1–11.9)||0.8 (0.7–1.0)|
|Examinations 31–40 (120)||25 (4–64)||96 (91–99)||7.0 (1.5–32.6)||0.8 (0.5–1.2)|
|Examinations 40–75 (82)||63 (26–90)||99 (92–99)||46.3 (6.1–348.4)||0.4 (0.2–0.9)|
|Sludge (n = 189)|
|All EUS examinations (1,837)||52 (45–59)||92 (90–93)||6.3 (5.1–7.8)||0.5 (0.4–0.6)|
|Examinations 1–10 (904)||50 (39–61)||91 (89–93)||5.7 (4.2–7.8)||0.5 (0.4–0.7)|
|Examinations 11–20 (458)||57 (42–70)||91 (88–94)||6.4 (4.3–9.5)||0.5 (0.3–0.6)|
|Examinations 21–30 (273)||48 (32–64)||91 (86–94)||5.0 (3.0–8.4)||0.6 (0.4–0.8)|
|Examinations 31–40 (120)||33 (9–69)||95 (88–98)||6.2 (1.8–20.6)||0.7 (0.4–1.1)|
|Examinations 40–75 (82)||78 (40–96)||100 (94–100)||NA||0.2 (0.07–0.75)|
|GEE parameter estimate (95% CI)||When accounting for operator, does the number of examinations improve the ability to detect its absence or presence?|
|CBD dilatation||0.03 (0.05–0.12)||No significant effect (p = 0.44)|
|GBWT||0.03 (0.04–0.1)||No significant effect (p = 0.36)|
|PCFF||0.19 (0.11–0.27)||Significant effect (p < 0.0001)|
|Sludge||0.06 (0.03–0.14)||No significant effect (p = 0.09)|
The QIC for generalized estimating equations and the generalized score statistic for the probit regression model demonstrated adequate fit statistics, with a p > 0.05 for each model, except for that assessing the presence or absence of pericholecystic fluid. Likewise, the Hosmer-Lemeshow goodness-of-fit test statistic for every binary response criterion had a p > 0.05, except for true-negative CBD dilation, true-negative PCFF, and true-negative sludge.
This prospective study involved 100% of the residents at our program, allowing for a large sample of residents training in EUS. Sensitivity of EUS for cholelithiasis has been reported as 86%4 to 96%6 with a specificity ranging from 66%3 to 88%.6 Our data are consistent with these prior studies. Likewise, our data are consistent with the only other prospective study of learning curves, which used review by emergency physician-sonographers as a criterion standard and found that technical and interpretive error rates overlapped between those who had performed < 25 EUS examinations for cholelithiasis and those who had performed ≥ 25 EUS examinations for cholelithiasis.7 In contrast, except for PCFF, the physician-sonographers in our sample could not reliably determine important findings such as CBD dilation, GBWT, or sludge, even after 40 EUS examinations, which is consistent with prior work suggesting that emergency physicians could not accurately diagnose cholecystitis without more extensive training.3,8
Hertzberg et al.11 previously demonstrated that radiology residents with poor US skills improved with 200 training examinations, but continued to have poor overall performance. However, those findings may not be applicable here, since EUS is focused in nature, rather than comprehensive, and ACEP recommends credentialing physicians by indication rather than en bloc for all US indications.9,12 Consequently, different guidelines have been proposed for training nonradiologists to perform US.13–16 The American Institute of Ultrasound in Medicine recommends at least 300 examinations for single-indication applications,17 but ACEP recommends a minimum of 25 examinations,9 consistent with similar suggestions related to obstetric sonography from the family medicine18,19 and obstetric literature.20
In our experience, trainees with limited experience (e.g., < 10 EUS examinations) work hard to acquire the basic skills required to perform EUS and then develop improving comfort with the examination. As their comfort improves, they become more confident in their skills, even though their exposure to “normal variants” (e.g., biliary polyps and Phrygian caps) and specific abnormalities (e.g. wall-echo-shadowing complex and sludge balls) may still be limited. Consequently, they develop a sense of “competence” and interpretive skill that may be overinflated or unjustified, resulting in a decrease in accuracy that occurs when performing the 10th to 40th EUS examinations. With ongoing feedback and teaching, these operators develop a “second learning curve” whereby they have a better appreciation for the subtleties of RUQ EUS and start to improve such that the learning curve levels out to an accuracy of approximately 90% for detecting cholelithiasis around 50 examinations, and for detecting sonographic signs of cholecystitis around 75 EUS examinations, the variability owed to the operator-dependent nature of sonography.11 Therefore, our institutions require more than 25 EUS examinations, a minimum of 10 “positive” examinations with cholelithiasis, four of which must also have cholecystitis, and a minimum accuracy rate of 90% for privileges to perform EUS for biliary disease. However, we determined our competency and credentialing requirements based on internal expert opinion and consensus rather than prospective data, because no clear minimum standard for competency exists in the literature.7,8 This should be explored further to determine the optimal training standards for ensuring the competency of emergency physicians performing EUS for biliary disease.
If one considers the studies done with the assistance of an attending physician to be “failures” or “false” studies, then the performance of operators performing their first through 20th examinations would be worse than reported in Table 1, which supports the requirement for more than 20 training examinations.9,12,15–17 While our data may be limited because only 37 physicians performed more than 25 EUS examinations for biliary disease, at the very least, our data suggest that some physicians may not be competent to perform EUS for cholelithiasis or cholecystitis after 25 examinations, and the Residency Review Committee Emergency Medicine (RRC-EM) requirement for documentation of competency21 may be more appropriate than simply requiring a minimum number of examinations.
On the other hand, it may be that 25 successfully completed examinations with feedback given for inadequate or inaccurate examinations along the way could be enough to develop competency in using EUS for the diagnosis of biliary disease. In our study, the examinations were tracked in order regardless of whether or not they were “successful” or adequately performed, meaning that operators were given credit for a “training examination” even if it was poorly done. The ACEP training guidelines9 do not specify how to handle training examinations that end up being “equivocal,”“inadequate,” or “misinterpreted,” and many centers allow for a certain number or percentage of these examinations to count toward the training total since the learning process inherently involves “mistakes.” If a training model specifically required 25 successfully completed examinations, excluding equivocal, inadequate, or misinterpreted examinations, then the total number of training examinations performed would likely be closer to 40–50, in which case we would expect much better performance, consistent with our group 5 (completed 40 to 75 examinations) operators.
First, although there were 127 total operators, only 37 resident physicians performed more than 25 EUS examinations for biliary disease, potentially representing an “US-interest” bias. These physicians were likely interested in performing the examination and also knew that their results were being studied, raising the potential for a Hawthorne effect. Our findings may not apply to other physicians since US is operator-dependent and poor skills may persist despite training up to 200 examinations.11 We suspect that this is one reason why the RRC-EM recently changed the EUS training requirement from a minimum number of examinations to a documentation of competency.21
Second, there was also a selection bias because the enrolling physicians chose which patients to enroll and may not have enrolled patients who they thought would be difficult to image. However, it has previously been shown that operator confidence correlates with the accuracy of EUS examinations.22 This could be improved with a study enrolling consecutive patients seen in an ED with suspected biliary disease. On the other hand, since our findings represent a “best-case scenario” regarding enrollment, we believe that this further suggests that competency standards based solely on a minimum number of examinations may be inadequate.
Third, hands-on experience and number of examinations may not be the determining factors in developing competency. It may be that the number of examinations with positive findings is more important than the total number of examinations performed or that a dedicated US rotation is more important than a prolonged experience with the same number of examinations. Likewise, this was a single-center study with a particular training protocol. Residency programs with different resources (CD learning modules, simulators, etc.), personnel (RDMS-sonographers, fellowship-trained US directors, etc.), and training requirements (didactic hours, required number of US examinations for graduation, etc.) might find a different learning curve, especially since less than one-third of residency programs meet ACEP and SAEM training guidelines.23 This was not assessed in this study, but should be addressed in the future.
Finally, although the ACEP Emergency Ultrasound Imaging Criteria Compendium lists cholelithiasis, rather than cholecystitis, as the primary indication for EUS of the RUQ,12 many would consider cholecystitis the more important diagnosis since it is associated with greater morbidity and mortality.1,2 We chose to evaluate the performance of EUS for specific sonographic signs of cholecystitis rather than the final diagnosis of cholecystitis because there is no criterion standard constellation of US findings for the diagnosis of cholecystitis,7 and it is recognized that the diagnosis of cholecystitis short of surgical pathology requires a composite of clinical history, physical examination, laboratory abnormalities, US findings, and often evaluation by cholangiogram or nuclear medicine,1,2,7,8 most of which reflect clinical judgment more than US competency. Furthermore, some clinicians or departments may choose to limit EUS to the primary indication of cholelithiasis rather than ruling out cholecystitis, in which case competency requirements would differ.
When adjusting for operator dependence, performing up to 50 emergency ultrasound examinations appears to have little effect on accuracy of right upper quadrant emergency ultrasound. Rather than simply requiring an arbitrary number of examinations, another method of competency assessment may be necessary.
- 9ACEP Board of Directors. ACEP Emergency Ultrasound Guidelines-2008. Available at: http://www.acep.org/WorkArea/DownloadAsset.aspx?id=32878. Accessed Aug 31, 2009.
- 12ACEP Board of Directors. ACEP Policy Statement: Emergency Ultrasound Imaging Criteria Compendium. Available at: http://www.acep.org/WorkArea/DownloadAsset.aspx?id=32886. Accessed Oct 30, 2009.
- 14AIUM and SAEM square off over ultrasound. Emerg Med News. 1993; 15:1..
- 17American Institute of Ultrasound in Medicine. Training Guidelines for Physicians who Evaluate and Interpret Diagnostic Ultrasound Examinations. Available at: http://www.aium.org/publications/statements.aspx. Accessed Oct 30, 2009.
- 21Accreditation Council for Graduate Medical Education. Emergency Medicine Guidelines. Available at: http://www.acgme.org/acWebsite/RRC_110/110_guidelines.asp#res. Accessed Aug 31, 2009.