Measuring the Competence of Residents as Teachers


  • Sondra Zabar MD,

    Corresponding author
      Address correspondence and requests for reprints to Dr. Zabar: 550 First Avenue, Old Bellevue, D401B, New York, NY 10016 (e-mail:
    Search for more papers by this author
  • Kathleen Hanley MD,

  • David L. Stevens MD,

  • Adina Kalet MD,

  • Mark D. Schwartz MD,

  • Ellen Pearlman MD,

  • Judy Brenner MD,

  • Elizabeth K. Kachur PhD,

  • Mack Lipkin MD

  • Received from the Division of Primary Care Internal Medicine (SZ, KH, DLS, AK, MDS, EP, ML) and Department of Medicine (JB), New York University School of Medicine; and Medical Education Development (EKK), New York, NY.

  • Presented in part at the annual meeting of the Society of General Internal Medicine, May 2, 2003.

Address correspondence and requests for reprints to Dr. Zabar: 550 First Avenue, Old Bellevue, D401B, New York, NY 10016 (e-mail:


Medical residents, frontline clinical educators, must be competent teachers. Typically, resident teaching competence is not assessed through any other means than gleaning learner's comments. We developed, evaluated, and integrated into our annual objective structured clinical examination a resident teaching skills assessment using “standardized” students. Faculty observers rated residents using a customized 19-item rating instrument developed to assess teaching competencies that were identified and defined as part of our project. This was feasible, acceptable, and valuable to all 65 residents, 8 students, and 16 faculty who participated. Teaching scenarios have potential as reliable, valid, and practical measures of resident teaching skills.

House staff are the frontline teachers for medical students as well as other house staff and spend up to 20% of their time teaching.1 Yet while teaching is an important component of the resident's job, residency programs struggle with the design and implementation of meaningful and practical evaluations of house staff teaching skills. The Accreditation Council of Graduate Medical Education (ACGME) Outcomes Project has challenged all training programs to assess teaching skills.2 Most programs currently do this based on nonstandardized, global impressions from attendings and medical students. Objective, reproducible, and performance-based evaluation of teaching skills is needed to provide summative reports to program directors and formative feedback to house staff.

Recently, program directors have introduced “resident-as-teacher” curricula and medical educators have explored the value of performance-driven assessment of teaching skills.3 First reported in the literature in the early 1990s, objective structured teaching exams (OSTEs) are modeled after objective structured clinical exams (OSCEs) but target teaching skills. Standardized students, like standardized patients in an OSCE, are trained to portray challenging learners.4–6 Every participant rotates through multiple stations where case portrayal and ratings are standardized. OSTEs have been used to assess teaching skills,7 staff development,8–11 screening faculty for teaching assignments,12 and program evaluation.13–15 The degree of standardization of the standardized students and faculty raters in these reports varies from stringent to experiential depending on the purpose of the program.

This paper reports on the development, implementation, and evaluation of teaching stations that were part of an annual 10-station OSCE for medical residents in 2001 and 2002. The goal was to create a high-quality, practical, formative assessment of residents’ teaching skills. Our interest in formalizing our assessment followed a successful 5-year experience with our Resident as Teacher curriculum.16


In 2001, we began an annual 10-station OSCE for our internal medicine residents that included two teaching stations. As in the clinical stations, residents had 10 minutes to perform teaching tasks with the standardized student in each scenario. Faculty observers, the standardized students, and the residents themselves all independently completed a distinct rating form immediately following each encounter. After the rating forms were completed, residents received 5 minutes of feedback from faculty and students. Teaching and clinical cases were interspersed. Following the 3-hour experience, residents and faculty debriefed all 10 scenarios. Residents received relevant readings to promote further self-study.

Teaching Station Development Process

Identifying Teaching Competencies.  We defined our 4 core-teaching competencies through a literature review. They were: establishing rapport with the learner, assessing the learner's needs, demonstrating instructional skills, and fund of knowledge.17–19 Consultation with primary care and categorical medical residency directors and other experts helped us identify common teaching scenarios and skills expectations.

Developing Teaching Scenarios.  We developed scenarios that mirrored residents’ daily responsibilities, were realistic and reproducible, and could be observed in a limited time. The scenarios were: 1) giving feedback to a disorganized subintern on a team (2001), 2) teaching a medical student how to take blood pressure (2001), 3) helping an intern assess a cardiac murmur at the bedside (2002), and 4) teaching an overconfident intern about a difficult diabetic patient (2002). They were written by faculty and reviewed and practiced with chief residents for realism and timing. Each teaching competency was assessed in each teaching scenario.

Recruitment and Training of Standardized Students and Faculty Observers.  We recruited 8 senior medical students and trained each for 2 hours with role playing to standardize their case portrayal and rating. To enhance faculty rater reliability, faculty observers spent three 1-hour sessions together rating representative and difficult videotapes of pilot scenarios. Most faculty participated both years. Individual faculty raters saw 8 to 24 residents perform the same teaching scenario.

Protocol for Assessing Teaching Competencies.  We assessed the residents’ teaching performance from 3 perspectives: faculty, student, and resident. Lacking literature-derived consensus on the best instrument, we developed a 19-item rating form based on Skeff.20 The faculty used it to assess the 4 teaching competencies and 3 global assessments of teaching performance (overall teaching performance, communication skills, and fund of knowledge demonstrated). Standardized students rated global satisfaction with the encounter on a 4-point scale. Residents evaluated their own teaching performance on a 1-item question using the same 9-point scale as faculty (see Table 1 for details).

Table 1.  Three Evaluation Instruments Used and Mean Performance in All Four Teaching Stations (N = 65)
Evaluation InstrumentsMean (SD) Range
  • *

     The faculty evaluation was done on a scale of 1 to 4, except for the 3 global assessments, which were on a scale of 1 to 9, with 1 to 3 = “needs improvement,” 4 to 6 = “done well,” and 7 to 9 = “done excellently.” Mean for each of the 4 competencies is the average of the specific items listed for that competency.

  • PEARLS, partnership, empathy, apology, respect, legitimization, and support.

A. Faculty Evaluation of Teaching Competence (19 items)* 
Specific Teaching Competencies (4-point scales)
Rapport building3.3 (0.4)
  1. Communicates nonjudgmental, respectful, and supportive attitude (e.g., acknowledges challenge)2.0 to 4.0
  2. Exhibits appropriate nonverbal behavior 
  3. Recognizes and names emotions (e.g., insecurity, stress) 
  4. Responds to emotion with PEARLS or nonverbally 
Needs assessment3.0 (0.5)
  5. Assesses knowledge gaps (e.g., asks student what she knows/does not know)1.8 to 4.0
  6. Assesses skills (e.g., asks student to demonstrate) 
Instructional skills3.0 (0.5)
  7. Asks learner to commit to diagnosis/plan1.8 to 4.0
  8. Probes for supporting evidence/thought process 
  9. Gives information/teaches skills in small chunks 
 10. Provides feedback on specific knowledge 
 11. Checks learner's understanding of what was taught 
 12. Invites questions (to ask now or later) 
Knowledge base demonstrated2.8 (0.5)
 13–16. Four case-specific items1.6 to 3.8
Global Performance (9-point scales)
 17. Overall teaching performance6.5 (1.2) 3.5 to 8.5
 18. Communication skills6.8 (1.2) 3.0 to 9.0
 19. Fund of medical knowledge demonstrated6.5 (1.5) 3.0 to 9.0
B. Resident Self-assessment for Overall Case Performance (1 item)5.7 (1.5)
 1 = needs improvement, 9 = done excellently1.0 to 8.5
C. Student Evaluation of Resident's Teaching Competence (1 item)3.4 (0.4) 2.0 to 4.0
1 = “not satisfied—would do my best to avoid working with this resident again” 
4 = “very satisfied—one of the best teachers, would recommend to my colleagues” 

Immediately following all the scenarios, residents, students, and faculty completed a debriefing questionnaire that addressed case difficulty and educational value of the experience.

Analytic Plan.  Construct validity was assessed by confirmatory factor analysis on the faculty-rating instrument.21 We assessed scale reliability by calculating Cronbach's α coefficients for the 4 specific teaching scores. We tested Pearson correlations between the 4 specific teaching scores and the 3 global scores by faculty. Convergent validity was assessed by Pearson correlations of the global faculty with the student and resident scores.

We determined mean scores for the specific teaching competency scales by faculty and the global ratings by faculty, residents, and students. We compared mean global ratings by faculty and residents with a Student's t test. Global ratings were also dichotomized to reflect “done excellently” (mean > 6.5) or not. We compared global scores by resident training level (postgraduate year [PGY] 1, 2, and 3) with t tests and ANOVA, respectively. Standardized effect size was calculated for PGY-3 versus PGY-1 scores using the Cohen's d statistic.22 Dichotomized global scores were compared by postgraduate year and between raters with χ2 tests.

Evaluation of the Teaching Scenarios

Subjects.  Sixty-five residents participated in the annual OSCE for the first time in either 2001 or 2002. There were 34 females and 31 males (28 PGY-1, 16 PGY-2, and 21 PGY-3). Sixteen faculty observed and 8 students were standardized learners.

Scale Reliability and Validity.  Confirmatory factor analysis (principal components, varimax rotation) of the faculty-rating instrument supported clustering our 4 variables as it explained 79% of the variance. Cronbach's α coefficients for the 4 specific teaching competency scores ranged from 0.78 to 0.92. All 4 mean competency scores significantly correlated with the 3 global faculty ratings, most with r > .6, P < .001. Faculty global teaching performance scores correlated significantly with residents’ self-evaluation (r = .384; P = .002) and student score (r = .392; P = .001), supporting convergent validity.

Teaching Competency Scores. Table 1 shows the mean ratings by faculty, residents, and students for the 65 residents. Faculty global overall teaching scores ranged from a mean of 6.2 on the subintern feedback scenario to 6.8 on the precepting scenario. Faculty global teaching scores exceeded residents’, 6.5 versus 5.7 (difference, 0.77; 95% confidence interval [CI], 0.39 to 1.15). Faculty rated 66.2% of residents as excellent (mean > 6.5), whereas only 41.5% of residents thought they taught excellently (P = .1). In comparison, students rated 67.7% of residents ≥ 3.5 on the 4-point satisfaction scale, further supporting an association between faculty and student ratings.

Figure 1 shows that faculty rated third-year residents significantly higher than interns on the global measures of teaching performance, communication skills, and knowledge base. The ~1-point differences on the 9-point scales correspond to a medium effect size (Cohen's d scores of 0.3 to 0.4.) The proportion of residents rated in the done excellently category (mean > 6.5) for global teaching performance rose from 50% (PGY-1) to 75% (PGY-2) and 81% (PGY-3); P = .05.

Figure 1.

Faculty-rated overall teaching assessment by trainee level. Faculty ratings increased across all 3 overall measures. P values refer to differences between PGY-1s and PGY-3s. 9 = done excellently; 1 = needs improvement.

The Teaching Stations Are Acceptable and Educational.  On the post-OSCE survey, 90% of residents reported that the teaching scenarios were educational, 80% felt the degree of difficulty of the teaching scenarios was just right, and 65% said the faculty provided valuable feedback. These responses paralleled their clinical cases satisfaction. All faculty reported it was time well spent because it was an opportunity to give feedback on skills rarely observed. All of the students described participation in the teaching scenarios as highly educational.


We developed and implemented a practical annual performance-based assessment of resident teaching skills that included individualized feedback. There were significant correlations and associations among ratings of teaching competencies from 3 perspectives—faculty, students, and the residents themselves. Ratings discriminated among training levels, showing improvement with experience. This assessment method has promise as a reliable, valid, and practical measure of resident teaching skills. Morrison et al.14 recently demonstrated that a time-intensive, “high stakes” teaching assessment similar to ours can reliably measure resident teaching skills and is responsive to curricular interventions. A briefer teaching assessment embedded in an overall primary care competency OSCE may provide a cost-efficient and effective approach accessible to most residencies.

Our study has some limitations. While resident scores increased with training level, we do not yet know whether this can be attributed purely to improved teaching skills or nonspecific evolution of knowledge and professional maturity. Further study is needed to clarify the distinct development of teaching behaviors. Faculty were not blinded to the residents’ training level. While a weakness in study design, this proved to be an asset in providing individualized feedback following each case. We addressed this by consensus building, calibration, and standardization of faculty rating using the checklist on videotaped examples of resident teaching.

Is investment in teaching competence assessment worthwhile in faculty and resident time and money? The teaching stations allow for direct observation of learners, provide faculty valuable information about what residents can actually teach, and thereby inform teaching expectations and curricular refinements. Assessing teaching underscores the importance our program places on teaching. The scenarios and feedback vividly exemplify expectations of best teaching practices for faculty and residents. The ratings satisfy program directors’ need to assess objectively one key aspect of the ACGME competency Interpersonal Communication Skills and Professionalism. When residency programs are considering the financial investment, one must include the costs of the development phase, of the actual implementation, and of any analysis. In the development phase we invested in a medical education consultant. Other than this additional cost the annual expense includes standardized learners/patients ($20/hour), faculty time, space (donated), and food (lunch and snacks). As a consequence, year 1 of our 10-station OSCE cost us $454/resident and year 2 went down to $120/resident. This does not include the institutional contribution of faculty time and space. A 2-teaching station OSCE would cost significantly less. For us, our investment became more efficient each year as we expanded the number of trainees participating. For instance, since our first year, we have instituted an annual OSTE for all PGY-2 medicine residents, and developed a multistation, performance-based teaching assessment for our general internal medicine fellows and for affiliated medical residencies.23

Because most day-to-day resident teaching occurs out of sight, the opportunity for faculty to observe directly and provide feedback is eye opening. Other studies confirm that residents find teaching stations valuable.24 Our residents felt that the key ingredients were that cases reflected real life challenges, faculty observers were respected teachers, and standardized students were realistic and gave valuable feedback. Faculty commented that observing multiple residents perform the same teaching task was valuable in calibrating normative resident skill expectations. Many said they gained insights on their own teaching as well. Our students reported that this experience helped them shape their own views and understanding of clinical teaching. Others have found that standardized learners benefit by improving their own communications and teaching skills.25

Further research is needed to establish inter- and intrarater reliability of these measures, seek further validation against other criteria such as teaching ratings by actual learners during varied rotations and setting, and to expand the cases and competencies tested. Reliable and valid OSTE assessments can provide outcome measures with which to test the effectiveness of teaching curricula and teachers. Ultimately linking teaching quality to longer-term learner and patient outcomes is essential.


We found that integrating teaching stations into an annual OSCE was practical, assessed ACGME competencies, and improved the objectivity of assessment of resident teaching skills in our program. The generalizabilty and reproducibility of our method and approach needs to be confirmed in other programs because teaching is an essential element of residency role and education, patient care, and physician contribution.


Funded by HRSA, Bureau of Health Professionals.