Assessing Medical Knowledge of Emergency Medicine Residents


  • The list of breakout session participants can be found as the appendix of a related article on page 1486.

  • This paper reports on a workshop session of the 2012 Academic Emergency Medicine consensus conference, “Education Research in Emergency Medicine: Opportunities, Challenges, and Strategies for Success,” May 9, 2012, Chicago, IL.

  • The authors have no relevant financial information or potential conflicts of interest to disclose.

Address for correspondence and reprints: Nikhil Goyal, MD; e-mail:


The Accreditation Council for Graduate Medical Education (ACGME) requires that emergency medicine (EM) residency graduates are competent in the medical knowledge (MK) core competency. EM educators use a number of tools to measure a resident's progress toward this goal; it is not always clear whether these tools provide a valid assessment. A workshop was convened during the 2012 Academic Emergency Medicine consensus conference “Education Research in Emergency Medicine: Opportunities, Challenges, and Strategies for Success” where assessment for each core competency was discussed in detail. This article provides a description of the validity evidence behind current MK assessment tools used in EM and other specialties. Tools in widespread use are discussed, as well as emerging methods that may form valid assessments in the future. Finally, an agenda for future research is proposed to help address gaps in the current understanding of MK assessment.

In 2002, a Council of Emergency Medicine Residency Directors (CORD) consensus conference attempted to define the medical knowledge (MK) competency for emergency medicine (EM) and suggest methods for integration into residency curricula and resident assessment.[1] CORD revisited the progress of individual programs on implementing the core competencies in 2005 with an update on MK instruction and assessment.[2] The discussion continued at CORD in 2006, but recommendations on MK assessment were limited compared to other core competencies, as it was felt that the current tools were sufficient.[3]

In addition to curricula that address MK content objectives for each postgraduate year, the Accreditation Council for Graduate Medical Education (ACGME) requires descriptions of how these objectives will be assessed and remediated when necessary. As a result, EM residency programs are in need of tools that can feasibly produce scores that are valid and reliable measures of performance across each year of training.

As a response to a growing need for improved education research within the field of EM, Academic Emergency Medicine convened a consensus conference entitled “Education Research in Emergency Medicine: Opportunities, Challenges, and Strategies for Success” in May 2012. This paper reports the findings of the “medical knowledge” subsection of the “learner assessment” breakout session.


Components of MK include:1) specialized, immediate recall of information for care of critical patients, 2) understanding of the use of medical resources for the immediate care of the patient, and 3) the ability to apply information to undifferentiated patient presentations.[1] The ACGME asks that residents demonstrate knowledge of established and evolving biomedical, clinical, epidemiologic, and social–behavioral sciences. Concurrently, residents are expected to apply this knowledge to the patient care setting. Examples of this application include generating differential diagnoses for undifferentiated patient complaints, identifying life-threatening conditions, sequencing critical management actions, and proper patient disposition.[4] Assessment of the MK domain for EM trainees thus requires consideration of both the core factual content of this specialty and environmental factors that affect the application of this knowledge in the unique practice setting of the emergency department (ED). MK roughly correlates to the knowledge component of the medical expert role in the Canadian CanMEDS Physician Competency Framework.[5]


The iterative consensus-building process for this breakout session began with a review of the literature. This review was supplemented and refined by expert thematic analysis as well as discussion with education experts at the conference.

Publications of interest were identified using PubMed using various combinations of the terms “graduate medical education,” “internship and residency,” “housestaff,” “compenten*,” “medical knowledge,” and for EM-specific content, “emergency medicine.” An Education Resources Information Center (ERIC) search was performed using the descriptor “graduate medical education” and combining the keywords “evaluation,” “assessment,” “knowledge,” and “emergency.” We also browsed the MedEd Portal of the Association of American Medical Colleges (AAMC) using the terms “emergency medicine” and “medical knowledge.” Subsequently, searches were also performed to locate test-specific data, such as for USMLE Step 3 validity.

Given this broad search strategy, we reviewed publication titles and abstracts to exclude those that were not applicable to our topic of interest. Reference lists of pertinent articles were also examined to identify any publications that may have been missed. When inadequate data were found regarding a novel or widely used assessment tool, we contacted the authors regarding any other related published or unpublished data. We also contacted the American Board of Emergency Medicine (ABEM) regarding any data on the standardized national certifying examinations.

Due to the paucity of studies providing psychometric data, we chose to include publications expressing expert or consensus opinion or narrative accounts. Studies were excluded if their scope was to specifically evaluate a single curriculum or specific subset of the Model of Clinical Practice of EM (for example, assessment of resident's knowledge on patients with headache).[6]

Current Tools for MK Assessment

Medical knowledge has been noted to have significant overlap with the five other competency areas, making isolated assessment using current assessment tools difficult.[7] MK has been described as a bridge between information acquisition (interpersonal and communication skills) and patient care and includes the understanding of when to access other data banks (practice-based learning and improvement); successful application of MK necessitates an understanding of the overall health care system (systems-based practice).[1]

Multiple-choice Questions (MCQ) Testing

Multiple-choice questions are extensively used for MK assessment at various levels of training in medicine and in other professions. Advantages of this format include broad content sampling, ease of administration, and robust psychometric properties.[7] MCQ and short-answer questions have been described as the most reliable, valid, and feasible methods of assessing knowledge content for the CanMEDS medical expert role.[8]

American Board of Medical Specialties (ABMS) Examinations

Certification by the ABMS and participation in continuing certification has been shown to translate into improved patient outcomes in non-EM specialties.[9-16] The criterion standard for MK assessment in EM has traditionally been certification by ABEM, which consists of a qualifying examination delivered in MCQ format and an oral examination. However, we were unable to locate any data on the effect ABEM certification has on patient-centered outcomes (e.g., Kirkpatrick level 4 evaluations).[17]

ABEM In-training Examination (ITE)

The ABEM offers the ITE annually to all ACGME-accredited and Royal College of Physicians and Surgeons of Canada accredited EM residency programs. The ITE is an annual standardized examination that targets the expected knowledge base of a third-year EM resident. While programs are not required to participate in this examination, 5,616 residents participated in the ITE in 2012.[18] There is a strong relationship between the ITE and the ABEM qualifying examination scores.[18] Correlation between performance on the ITE and ABEM oral examination has also been described.[19]

United States Medical Licensing Examination (USMLE)

The first two steps of the USMLE are traditionally taken during medical school and therefore have limited utility for MK assessment during residency. However, there is a correlation between lower scores on USMLE steps 1 and 2 with lower scores on the ABEM ITE.[20, 21] Other specialties have similar data; for example, American Board of Internal Medicine certifying examination scores are strongly correlated with USMLE step 2 scores.[22]

The “cutoff scores” used to make pass/fail decisions on USMLE step 3 are generally considered appropriate (i.e., a candidate who fails step 3 is thought to have inadequate MK), although there are few data to inform the correlation with a criterion standard.[23] The three levels of the Comprehensive Osteopathic Medical Licensure Examination of the United States (COMLEX-USA) examination have been correlated with scores on ITEs created by specialty societies.[24]

CORD Question Bank

The first CORD-sponsored EM test bank was created in 1994 and has evolved significantly since then. Many residency programs are using these tests as learning tools or for remediation, rather than for summative MK assessment.[25] There are currently no published data comparing performance on these tests to any standard.

Interactive Case-based Examination Formats

Although traditional MCQ formats can reliably collect the factual knowledge aspect of the MK competency, these tools neglect much of the uncertainty and weighted decisions inherent in clinical reasoning.[7] MCQs may thus not be a valid assessment of individuals' abilities to apply their knowledge to patient care. Several approaches, outlined below, have been used to address this domain. It has been suggested that performance assessment utilizing these tools may not correlate with traditional fact-based assessments that test MK recall, making use of ABEM certification as a criterion standard problematic.[26]

Script Concordance Testing

Script concordance testing (SCT) was developed within the conceptual framework of illness script theory, which assumes that tasks are driven by experience-rich networks of knowledge around particular clinical problems.[26-28] This testing format begins with an intentionally ambiguous case, prompts subjects to consider several possible diagnoses, and then asks the learners to assess the degree to which the addition of new information increases or decreases the likelihood of the initial diagnoses. Scores are based on the similarity of a learner's management strategy to those of experts, thereby approximating the degree of concordance between examinees' and experts' scripts.[29] Past work with this tool suggests that it can produce reliable and valid measures of clinical decision-making.[30]

The SCT reliably distinguishes between medical students, EM residents, and practicing physicians and has been shown to correlate with the ABEM qualifying examination score and the ABEM ITE.[31] Among practicing physicians referred for remediation concerning potential clinical competence difficulties, lower scores on SCT testing format were correlated with lower performance ratings by an expert review panel.[32]

Key Features Problems

Key features problems present a clinical case and then allow participants to select more than one correct answer from a list of possible diagnoses. These questions are built on a series of tasks that ask trainees to identify the problem at hand, develop diagnostic strategies, and make management decisions.[33, 34] These open-ended questions hinge on a number of key decision-making steps. Past work with licensing examinations in several countries lend support to the validity of this method.[9]

Virtual Patients

Virtual patients are clinical scenarios that play out on a computer screen. Learners gather information by typing or selecting questions to collect focused historical, physical examination, or laboratory data. These instruments have been used in diverse ways to assess clinical reasoning and decision-making.[35] Many have used checklists to measure “key elements” of the learners' performance, as well as step-by-step grading of their decision paths through the history, physical examination, diagnostic testing, and final diagnosis.

Perhaps the best such approach to date is the computer-based case simulations (CCS) used in Step 3 of the USMLE, which use an open-ended unstructured format similar to SCT testing, to measure reasoning by scoring how closely subjects' actions mimic the path of experts.[36] Physicians more advanced in training obtain higher scores on CCS compared to less advanced physicians, and moderate correlations have been described between CCS and other measures of MK like MCQ.[37]

Direct Observation of Clinical Skills

Direct, real-time observation by physician faculty is one of the most commonly used methods of MK assessment in EM and other specialties. Faculty members evaluate MK on a periodic basis, either at the end of every major assignment (e.g., monthly) or at the end of every shift. This evaluation is often unstructured, may have significant inter-rater variability or recall and rater bias, or may suffer from the halo effect.[38] For example, internal medicine faculty do not accurately predict ITE scores of residents and senior residents could not predict ITE performance for PGY-1 residents.[39] When compared to internal medicine, it is possible that the availability of EM faculty offers more frequent opportunities for direct clinical observations of trainees, although the unpredictable and decision-dense environment of the ED may limit extended periods of observation.

There has been significant interest in development of more structured observation scenarios that may mitigate some of these concerns. A number of such tools exist with varying levels of supporting evidence.[38] The Mini-Clinical Evaluation Exercise (Mini-CEX) is a validated test that has been extensively used by internal medicine residency programs and includes an MK assessment.[38, 40] A CORD-developed Standardized Direct Observation Tool (SDOT) has been shown to have good inter-rater reliability for all competencies.[41] However, MK assessment was limited by lack of data points showing poor resident performance in MK. The MK assessment on SDOT has not yet been compared to a criterion standard, such as the ABEM qualifying examination or the ITE.

In these structured observation scenarios, development of strong behavioral anchors has further improved rater accuracy, although faculty education (such as rater training) has not been as successful.[42, 43] Past work suggests that, while direct observation tools can accurately measure global clinical competence, their reliability for discrete clinical skills may be low.[44] As such, alternative methods for assessing MK, in conjunction with the real-time measurement tools described here, may be necessary.

Objective Structured Clinical Examination (OSCE)

Interns of 10 different specialties (including EM) improved their scores on an OSCE for all core competencies after completing their PGY-1 year; the competencies that showed the greatest improvement were MK and systems-based practice.[45] The OSCE may therefore be a useful tool for MK assessment although further research is needed to validate it.


Self-assessment has generally been shown to be poorly correlated with actual performance. Poor performers tend to overrate their own performance, while high performers tend to underrate it, and thus this is likely a poor marker for the MK core competency.[46, 47] It has been argued that self-assessment is a complex activity that has not been adequately conceptualized and operationalized in the current literature; it provides critical mechanisms for ongoing monitoring, rather than a mechanism for identifying and redressing gaps.[48]

Oral Examinations

The ABEM oral examination has been used as the final hurdle before achieving board certification in EM. This examination tests the application of knowledge across a multitude of pathologies and simulates true clinical practice by asking examinees to manage multiple patients simultaneously. Past research suggests that these high-stakes summative assessments are reliable and valid measures of performance.[49, 50] Many residency programs conduct local or regional practice oral examinations to prepare their graduates for the ABEM oral examination. One study found that performance on the local practice oral examination moderately correlated with the ITE score and ABEM qualifying examination score, but not with the ABEM oral examination score.[19]


Portfolios have been described as a useful tool to assess all of the general competencies; however, they are best used to assess competencies such as practice-based learning and improvement, professionalism, or systems-based practice that can be difficult to assess in other ways.[51] A correlation between portfolio performance and ITE performance has been described for psychiatry residents.[52]

Research Agenda

Through our review of this literature, iterative evaluation of these data within the writing group, and discussion with education experts at the consensus conference, the following knowledge gaps were identified. We recommend research in the following domains:

  1. Compare performance on ABEM certification examinations and Maintenance of Certification participation with clinical outcomes.
    • It was felt that development of these data would help establish the credibility of ABEM certification as the criterion standard for MK assessment during residency.
  2. Compare SCT performance with current criterion standard tests such as ABEM certification (qualifying and oral examination).
    • This would require development of a national database of scripts relevant to EM. At least 20 EM experts should be recruited to develop a scoring metric for each script, to be used with the SCT.
    • The consensus group also felt that the SCT should be studied compared to other standards, such as patient-centered outcomes.
  3. Multicenter data collection to compare performance on the CORD question bank to the ITE or to the ABEM qualifying examination.
  4. Further investigate the role of mini-CEX in EM and MK assessment value of the SDOT.


The consensus proceedings reported above had a number of limitations. We did not include conference presentations, posters, or other material that was not accessible using the search strategy referenced earlier. Like all consensus proceedings, the data presented here were located through recommendations from content experts, reviewing reference lists, and performing focused searches for validity evidence on specific tools. This publication should not be viewed as a systematic review of all published data on MK assessment.


This consensus conference workshop group evaluated the validity evidence behind current medical knowledge assessment tools used in EM and other specialties and considered tools already in widespread use, as well as emerging methods that may form valid medical knowledge assessments in the future. An agenda for future research has been proposed to help address gaps in our current understanding. The research agenda will hopefully improve our ability to assess the medical knowledge of EM trainees.