The authors have no relevant financial information or potential conflicts of interest to disclose.
Outcome Measures for Emergency Medicine Residency Graduates: Do Measures of Academic and Clinical Performance During Residency Training Correlate With American Board of Emergency Medicine Test Performance?
Article first published online: 14 OCT 2011
© 2011 by the Society for Academic Emergency Medicine
Academic Emergency Medicine
Special Issue: CORD/CDEM Educational Advances Supplement
Volume 18, Issue Supplement s2, pages S59–S64, October 2011
How to Cite
Frederick, R. C., Hafner, J. W., Schaefer, T. J. and Aldag, J. C. (2011), Outcome Measures for Emergency Medicine Residency Graduates: Do Measures of Academic and Clinical Performance During Residency Training Correlate With American Board of Emergency Medicine Test Performance?. Academic Emergency Medicine, 18: S59–S64. doi: 10.1111/j.1553-2712.2011.01116.x
Supervising Editor: John Burton, MD.
- Issue published online: 14 OCT 2011
- Article first published online: 14 OCT 2011
- Received July 25, 2010; revision received January 14, 2011; accepted January 30, 2011.
ACADEMIC EMERGENCY MEDICINE 2011; 18:S59–S64 © 2011 by the Society for Academic Emergency Medicine
Objectives: Emergency medicine (EM) residency programs are increasingly asked to have measurable outcomes of residents’ performance. Successful completion of the written and oral American Board of Emergency Medicine (ABEM) examinations is one key outcome. In the clinical practice of EM, emergency physicians (EPs) are often measured by their clinical productivity (patients per hour). This study explored the correlation between these measures of academic and clinical performance and hypothesized that clinical productivity would have a positive association with ABEM performance.
Methods: A prospective written survey was sent to all EPs completing training at an established Midwest 3-year EM residency program between 1994 and 2005 (53,000 annual visits in 1994 to 65,000 annual visits in 2005). Physicians self-reported their national ABEM written and oral board scores in a blinded fashion. Simulated oral board scores and senior written in-training examination scores were also recorded. Postgraduate Year 3 (PGY3) clinical productivity was calculated as annual patient encounters divided by hours worked. Correlations among these variables were assessed by Pearson’s correlation coefficient, with p < 0.05 being considered statistically significant. Multiple regression analysis was performed for ABEM oral and written examination scores.
Results: Fifty-six of 85 residents responded to the initial survey. There was no significant correlation between clinical productivity and ABEM scores, either written (r = −0.021, p = 0.881) or oral (r = −0.02, p = 0.879). There was also no significant correlation between productivity and simulated oral board scores (r = 0.065, p = 0.639) of PGY3 in-training scores (r = 0.078, p = 0.57). As previously reported, there were positive and significant correlations between PGY3 in-service scores and ABEM written examination scores (r = 0.60, p < 0.0001), as well as ABEM oral and written examination scores (r = 0.51, p < 0.0001). Multiple regression analysis revealed only the PGY3 in-training examination was a significant predictor of the ABEM oral and written scores (p < 0.001).
Conclusions: PGY3 resident clinical productivity, when measured as patients per hour, correlated poorly with academic performance when measured by written and oral ABEM scores. The PGY3 in-training examination was predictive of the ABEM written and oral examination scores.
The process of education has the goal of achieving certain outcomes for the learners involved. In postgraduate medical education those goals include reaching a certain level of cognitive knowledge and clinical skills. The Accreditation Council for Graduate Medical Education (ACGME) is increasingly emphasizing the use of useful, reliable, and valid assessment tools to determine whether residents and their training programs are achieving the expected outcomes.1,2 One of the primary measurements of cognitive knowledge in residency education is successfully passing specialty board certification examinations.
In emergency medicine (EM), board certification is conferred only after successful completion of EM residency training and by passing both written and oral board examinations. Prior studies have found a positive correlation between internal medicine in-training examinations and American Board of Internal Medicine (ABIM) scores,3–5 as well as EM in-training examination scores and the American Board of Emergency Medicine (ABEM) written board certification scores.6 This suggests that the written in-training examination score may be a useful measure for predicting success in passing the written portion of the ABEM examination. Most EM residency programs also offer simulated oral board examination testing for resident physicians, designed to help prepare the candidate for this unique testing experience. However, there is a paucity of information in the medical literature as to the correlation of simulated oral boards assessments and success with the oral portion of the certification examination.
In addition to academic performance, EM resident physicians require an assessment of their clinical performance. One means to assess EM resident physician clinical performance is to measure the individual physician’s clinical productivity. The number of patients evaluated per hour worked is a commonly used metric of this productivity and is often used by training programs, and employers, as a useful measure of clinical performance expectations.
Cognitive knowledge and clinical productivity are two very different resident physician outcomes, but both are important to residency graduates as they embark into a practice setting. Are these two outcomes possibly related? Do those resident physicians who excel academically also do well with clinical productivity? Conversely, do those residents who have a higher clinical exposure have superior board examination scores? We evaluated the relationship of these measures of our EM residency program’s graduates and hypothesized that patient productivity and residency academic performance would have a positive association with ABEM examination scores and would be useful assessment tools to perfect outcome.
This was a retrospective cohort design study of Postgraduate Year 3 (PGY3) EM resident physicians. The study was approved as a minimal-risk expedited study by our institutional review board prior to initiation. Informed consent was obtained in the survey mailing sent to the ED graduates.
Study Setting and Population
We are an academic EM program with a PGY1–3 format and eight residents annually (during the study period), at a single-site emergency department (ED). The ED volume increased from 53,000 in 1994 to 65,000 in 2005. Our study population represented all graduating PGY3 EM residents from our Midwest training program between 1994 and 2005.
Individual PGY3 data collected included ABEM written in-training examination scores, simulated oral board examination scores, ABEM certification written and oral examination scores, the number of ED patient encounters, and the documented hours worked in the ED during the PGY3 year. Individual EM resident physician data for ABEM written and oral certification examination scores are not currently available to EM training programs. We attempted to obtain these scores directly from ABEM in a blinded fashion, but our requests were denied. To collect these scores, a written survey was conducted of all graduates from 1994 to 2005 asking them to self-report their ABEM written and oral scores in a blinded fashion. All former residents were sent a detailed letter and e-mail communication explaining the study. Graduates were assigned a random study number and were asked to send a postcard back with the study number and their scores. To ensure confidentiality of the scores and the participants, linkage of the study number and the participant’s identity was not available to the investigators and was maintained by a neutral third party. For those physicians who had forgotten or misplaced their scores, an addressed and postage-paid form letter to ABEM was enclosed requesting the oral and written scores be sent to the individual physician. Follow-up letters and/or telephone communication was attempted twice for those who did not initially respond to the written survey.
All resident physicians in our program are required to participate in the annual ABEM in-training examination. The in-training examination is a single-session, written, standardized comprehensive examination, consisting of 225 reviewed and field-tested multiple-choice questions, requiring about 4.5 hours to complete. The examination is scored by ABEM at a centralized location, and results are mailed to individual resident physicians and EM residency program directors. For this study, individual resident physicians’ PGY3 in-training examination scores were used for analysis.
In our curriculum, EM resident physicians also participate in simulated oral board examinations conducted by EM faculty members serving as examiners. Simulated oral board cases (single case encounters and triple case encounters) are drawn from a compendium of over 100 faculty-developed oral board cases. PGY3 residents are required to participate in simulated oral board examinations quarterly during protected conference time. EM faculty score resident physician performance using the same ABEM standardized categories as they appear on national examinations (data acquisition, problem solving, patient management, resource use, health care provided, interpersonal relations, comprehension of pathophysiology, and overall clinical competence). Categories are graded on a 1 to 8 numeric scale (1 low and 8 high). In addition, four to six predefined critical actions are tracked during the case and scored using a “completed” or “not completed” status. If a resident does not complete a critical action, the associated performance category is then downgraded. Performance category scores are averaged to give an overall case score, with a score of 5.75 representing a passing score. The scoring categories, numeric scoring scales, critical actions, and minimal passing scores attempt to duplicate the evaluation methods used for ABEM oral board certification testing. The overall PGY3 simulated oral board case score represents an averaging of all simulated oral board examinations performed during the individual resident physician’s PGY3 year.
The majority of our residents also participate in an annual PGY3 and PGY4 EM resident statewide simulated oral board examination. The statewide examination is conducted at a single centralized location using a single testing session, and faculty from six EM residency programs participate in a structured simulated oral board experience involving five single-encounter cases and two triple-encounter cases. Cases are drawn from an independently maintained and vetted test bank of simulated oral board cases and scored as previously described to duplicate the ABEM oral examination scoring. Participating EM resident physicians are not allowed to discuss the cases with each other until after the examination has been completed. EM faculty serving as examiners receive a formal orientation and are instructed to act as national examiners would act, with no feedback provided to the resident physician during or after the examination. Individual EM resident physician and EM residency program specific examination scores are provided by mail to EM program directors.
Resident clinical productivity for this study was defined as patient encounters per hour. Clinical productivity was calculated as the total number of patients evaluated during all ED rotations during an individual resident physician’s PGY3 year, divided by the total scheduled hours for all ED rotations during that resident physician’s PGY3 year. EM resident patient encounters were initially recorded by billing and coding staff reported at a monthly faculty meeting. Beginning in March 2002, EM resident patient encounters were recorded and reported using a patient tracking computer software and database (EMSTAT; A4 Health Systems, Cary, NC). The reported clinical productivity did not utilize patient acuity or factor bedside procedure performance, such as would be found in a relative value unit (RVU) system. No significant difference in the mean numbers was noted collecting the data by hand versus by computer.
Our PGY3 residents evaluate a similar mix of patients compared to the more junior residents (PGY1–PGY2), as all resident physicians are tasked to assess the next patient waiting to be seen, with the obvious exceptions of critical patients. No specific patient complaint or patient acuity level is preferentially assigned to PGY3 residents compared to junior resident physicians. PGY3 resident physicians typically evaluate and treat more patients during shifts and rotations compared to more junior resident physicians.
All collected data were entered into a coded spreadsheet (Excel 2007, Microsoft Corp., Redmond, WA) and were exported into and analyzed using SPSS v.17.0.3 (SPSS Inc., Chicago, IL). Descriptive statistics were used to analyze and report population demographics with means and standard deviations (SDs) reported. Correlations among variables were assessed by Spearman’s rho and Pearson’s correlation coefficient; inspection of Pearson and Spearman correlations found them very similar, and therefore, only Pearson correlations are reported. For multiple regression analysis with the outcome variables ABEM oral examination scores and ABEM written examination scores, the predictor variables were chosen using a significance level of 0.10 or less for Pearson correlations with the respective outcome variable. The histograms of regression standardized residuals and normal p–p plot of regression standardized residuals were inspected for normality. Statistical significance was considered p < 0.05.
During the study period, 85 EM residents completed training, and 56 (65.9%) responded to the survey and reported their written and oral ABEM scores. The majority of participants were male allopathic physicians who passed the ABEM examinations on their first attempt (Table 1). PGY3 residents averaged 1,426.41 patient encounters during an average of 1,054.64 hours worked in the ED, for an average clinical productivity of 1.35 patients per hour (Table 1). No differences were noted between the reported mean number of PGY3 patients encounters from billing and coding staff reporting and patient tracking computer software reporting (1,401.84 mean encounters [SD±221.8] vs. 1,457.04 mean encounters [SD±211.9]; p = 0.358). A moderate positive correlation was noted between ABEM written examination scores and ABEM oral examination scores; however, the two variables share only 26% of variance in common (r2 = 0.26; p < 0.0001).
|Identifier||%||ED Hours Worked||Total Patients Evaluated||Patients Evaluated/Hour Worked||Local Oral Board Examinations||Oral Board Score||Statewide Oral Board Score||PGY3 In-training Score||ABEM Written Examination Score||ABEM Oral Examination Score||% Pass ABEM on First Attempt|
|Male||79||1,059.78 (±96.11)||1,439.81 (±179.11)||1.36 (±0.16)||9.30 (±4.23)||6.12 (±0.42)||6.10 (±0.51)||81.0 (±4.5)||83.27 (±4.5)||5.95 (±0.26)||98|
|Female||21||1,051.0 (±120.0)||1,392.08 (±296.97)||1.31 (±0.17)||9.83 (±4.11)||6.07 (±0.43)||6.09 (±0.36)||81.25 (±5.93)||83.42 (±5.21)||6.01 (±0.29)||100|
|MD||79||1,048.95 (±108.9)||1,396.76 (±210.23)||1.34 (±0.17)||8.87 (±3.8)||6.11 (±0.45)||6.05 (±0.47)||81.26 (±4.88)||83.76 (±4.88)||5.97 (±0.26)||98|
|DO||21||1,087.64 (±63.28)||1,536.45 (±185.47)||1.41 (±0.13)||11.36 (±4.95)||6.10 (±0.27)||6.24 (±0.49)||80.36 (±4.76)||81.73 (±3.55)||5.93 (±0.28)||100|
|Overall||1,054.64 (±100.54)||1,426.48 (±221.19)||1.35 (±0.16)||9.28 (±4.13)||6.12 (±0.42)||6.08 (±0.49)||80.75 (±5.01)||83.3 (±4.66)||5.99 (±0.33)||98.2|
Local EM residency simulated oral board scores demonstrated little positive correlation with ABEM oral board scores and statewide oral board examination scores, but showed moderate positive correlation with PGY3 in-training examination scores and ABEM written examination scores (Table 2). However, even with these correlations, only 14 and 16% of the respected variances could be attributed to local EM residency simulated oral boards. Poor correlations were also noted with ABEM oral examination scores and statewide simulated oral examination scores, ABEM written examination scores, and PGY3 in-training examination scores (Table 3).
|Variable||R value||95% CI||R2 Value||p-value|
|ABEM oral board examination scores||0.26||0.02–0.49||0.068||0.055|
|Statewide oral board examination scores||0.252||0.03–0.50||0.064||0.082|
|PGY3 in-training examination scores||0.373||0.11–0.59||0.139||0.006|
|ABEM written examination scores||0.400||0.15–0.61||0.16||0.003|
|Variable||R value||95% CI||R2 value||p-value|
|ABEM oral board examination scores||0.139||0.14 to 0.40||0.02||0.324|
|PGY3 in-training examination scores||0.062||−0.22 to 0.34||0.004||0.664|
|ABEM written examination scores||0.190||−0.09 to 0.45||0.036||0.182|
PGY3 in-training examination scores ranged from 68% to 89%, with a mean score of 80.75% (SD±5.01%). A moderate positive correlation was noted between the PGY3 in-training examination score and the ABEM written examination scores (r2 = 0.36; p < 0.001), as well as ABEM oral examinations (r2 = 0.105; p = 0.02).
PGY3 resident clinical productivity was not significantly correlated to ABEM written examination scores, ABEM oral examination scores, local simulated oral board examination scores, statewide simulated oral board examination scores, or the PGY3 in-training examination (Table 4).
|Variable||R value||95% CI||R2 value||p-value|
|ABEM oral board examination scores||−0.021||−0.29 to 0.25||0.0004||0.881|
|ABEM written examination scores||−0.021||−0.29 to 0.25||0.0004||0.879|
|Local simulated oral board examination scores||0.065||−0.21 to 0.33||0.004||0.639|
|Statewide oral board examination scores||0.108||−0.17 to 0.38||0.012||0.450|
|PGY3 in–training examination scores||0.078||−0.19 to 0.34||0.006||0.570|
Both the PGY3 in-training examination scores and the local simulated oral board examination scores met the level of significance to be included in the multiple regression analysis as predictors. There were 53 residents with complete data on all variables. The multiple regressions for the two predictors were significant for the ABEM oral (r2 = 0.426; p ≤ 0.001) and ABEM written (r2 = 0.527; p < 0.001) examinations. In the final model, only the PGY3 in-training examination was a significant predictor of the ABEM oral and written scores (p < 0.001). Both the histogram of regression standardized residuals and the normal p–p plot of regression standardized residuals were inspected and found to be within acceptable limits of normality.
One of the goals of the ACGME’s Outcome Project was to ensure that training programs are using appropriate assessment tools and reaching desired outcomes. In residency education, perhaps the primary desired outcomes are for the resident physician to pass the certification examination and successfully practice his or her medical specialty. Outcome measures are also important to the training program to gauge the effectiveness of teaching during residency training. Gillen et al.7 found that a structured board review program for EM residents increased EM in-training scores. No measures of clinical performance were measured. Ledrick et al.8 examined multitasking in an ED using RVUs and found that they correlated more with training level than with medical knowledge as measured by in-training scores. Cognitive knowledge as measured by test scores has always been a measurable outcome throughout medical education. Clinical productivity, measured as patients per hour, represents a “real-life” parameter used to measure clinical performance and often physician reimbursement.
Our study looks at assessment tools that might be used to help predict outcomes. Like this study, others have demonstrated that there is a positive correlation between the ABEM in-training examination score and passing the ABEM certification examinations. Our study is perhaps more interesting because of the lack of correlation that we felt would exist. This too is good information, as we tend to act on assumptions as we plan our curriculums and goals. For example, those resident physicians who had more clinical encounters did not score higher on written in-training examinations or ABEM certification examinations. Presumably a certain patient volume is needed for competency in EM, but we do not know what that volume is, nor to what extent this affects test taking. There may be an assumption that a resident physician who does well on a written or oral examination would do equally well in the clinical setting. Our study did not show such a correlation. It is possible that some high-achieving academic performers may lack some of the skills needed to function in the chaotic, busy environment of the ED, which requires multiple, rapid patient encounters. Likewise, the resident physician who demonstrates exceptional clinical and bedside skills in the ED may not necessarily perform in an outstanding manner on oral and written examinations. Cognitive knowledge and clinical experience are two separate and distinct skills necessary for the successful completion of graduate medical education. While it appears that cognitive knowledge can be assessed by the in-training examination, assessing clinical knowledge will require separate, robust, reliable, and validated tools.
Another pertinent negative identified by our study is the lack of correlation of simulated oral board scores and the ABEM oral board scores. We suspect the standards and pretest development for the ABEM oral board scenarios and examination process improve the reproducibility among ABEM examiners at a higher level than our local or regional faculty. Nonetheless we need to attempt to develop better defined guidelines for scoring to maximize feedback to our residents.
Another question arises as to the validity of our assessment tools in evaluating the desired outcomes (medical knowledge and clinical productivity). ABEM written and oral examinations could be considered “high-stakes” measures used to confer specialty status on the successful candidate.9 As such, these tools would be presumed to have been well validated in assessing medical knowledge. That same sort of rigorous validation process is not as evident in local and statewide examinations.
Benchmarks in clinical productivity, as measured by patients per hour, are not required for graduation from residency and certainly not for specialty certification. As such it could be considered more of a “low-stakes” measure.9 It is a straightforward measurement that should be very reproducible, but only as a measurement of physician efficiency, not clinical competence. However, future employers or partners may deem this a high-stakes measurement in clinical practice (physician reimbursement, RVUs, etc.).
Another area for future study is how monthly resident evaluations done by faculty correlate with other more objective measurements, such as test scores and productivity measures. Positive correlations have been noted in other programs when comparing in-training scores and evaluations of medical knowledge.10
This study has several limitations that need to be considered when interpreting our findings. Our study was conducted at a single academic EM residency site and has a relatively small sample of participants. Although inclusion of the non-responders would give the study more external validity, we know of no reason why nonresponders would be different than responders. Our study relied upon participants to self-report their ABEM written and oral certification examination scores, as these were not otherwise available. While a mechanism to obtain lost scores was provided, participants may have reported erroneous or false scores. Finally, although correlations were noted among several variables, these do not necessarily imply causation; meaning, even though a variable seemed to be associated with ABEM board certification, a causative relationship of that variable is beyond the scope of this work.
In our study, in-training examination scores for third year EM residents were predictive of ABEM written and oral examination scores. No significant correlation was found between simulated oral board examinations and ABEM oral examinations. Third-year resident clinical productivity, when measured as patients per hour, did not significantly correlate with academic performance, as measured by written and oral ABEM scores. Higher patient volumes did not predict higher test scores.
The authors acknowledge the assistance received from the following individuals for their assistance in data collection for this project: Lauren Thompson, Adam Anderson, MD, Troy Cutler, MD, Sarah Hamlin, MD, and Thomas Trent, DO.
- 1Accreditation Council for Graduate Medical Education. ACGME Outcome Project Glossary. http://www.acgme.org/outcome/project/glossary2.asp#2. Accessed Mar 31, 2011.
- 2Outcome assessment in emergency medicine--a beginning: results of the Council of Emergency Medicine Residency Directors (CORD) Emergency Medicine Consensus Workgroup on outcome assessment. Acad Emerg Med. 2008; 15:267–77., , , et al.
- 6Correlation of emergency medicine residency training simulated oral Board examination scores with National ABEM Oral Examination scores [abstract]. Acad Emerg Med. 2008; 15(Suppl 1):s54., , , , .
- 10Resident evaluation-subjective versus objective measures [abstract]. Acad Emerg Med. 2009; 16(Suppl 1):s50., , , , .