The Impact of the Stanford Faculty Development Program on Ambulatory Teaching Behavior
The authors have no conflict of interest to declare for this work.This work was presented at the plenary session of the annual Society of General Internal Medicine Meeting, May 2004, Chicago, IL
Address correspondence and requests for reprints to Dr Berbano: Department of Medicine, MCHL-MG, Walter Reed Army Medical Center, 6900 Georgia Ave, NW, Washington, DC 20307 (e-mail: email@example.com).
CONTEXT: Faculty development has received considerable investment of resources from medical institutions, though the impact of these efforts has been infrequently studied.
OBJECTIVE: To measure the impact of the Stanford Faculty Development Program in Clinical Teaching on ambulatory teaching behavior.
SETTING AND PARTICIPANTS: Eight internal medicine faculty participating in local faculty development.
INTERVENTION: Participants received 7 2-hour sessions of faculty development. Each session included didactic, role-play, and videotaped performance evaluation.
MAIN OUTCOME MEASURE: Before and after the intervention, faculty were video-taped during a case presentation from a standardized learner, who had been trained to portray 3 levels of learners: a third-year medical student, an intern, and a senior medical resident. Teacher and learner utterances (i.e, phrases) were blindly and randomly coded, using the Teacher Learner Interaction Analysis System, into categories that capture both the nature and intent of the utterances. We measured change in teaching behavior as detected through analysis of the coded utterances.
RESULTS: Among the 48 videotaped encounters, there were a total of 7,119 utterances, with 3,203 (45%) by the teacher. Examining only the teacher, the total number of questions asked declined (714 vs 426, P=.02) with an increase in the proportion of higher-level, analytic questions (44% vs 55%, P<.0001). The quality of feedback also improved, with less “minimal” feedback (87% vs 76%, P<.0005) and more specific feedback (13% vs 22%) provided.
CONCLUSIONS: Teaching behaviors improved after participation in this faculty development program, specifically in the quality of questions asked and feedback provided.
Faculty development has received considerable investment of time, effort, and money from medical institutions in the last 2 decades.1–3 One recent survey of Internal Medicine programs found that 74% had ongoing or occasional faculty development,4 reflecting a belief that faculty development programs increase the effectiveness of teaching. Unfortunately, most studies of the effect of faculty development have relied on indirect measures, such as surveys of satisfaction of learners or self-assessment by teachers,3,5–11 rather than direct observation. Consequently, institutions that are putting resources into faculty development may have no objective way to decide which faculty development efforts are effective and worth the investment or how to measure the return on their investment in terms of improved effectiveness of teaching.
One influential faculty development initiative is the Stanford Faculty Development Program on Clinical Teaching (SFDP-CT). This program has been disseminated through workshops sponsored by the American College of Physicians to clinical instructors in rural areas through the Area Health Education Center faculty development programs and, especially, through a “teach-the-teacher” program. Thus far, 99 clinicians from 77 institutions have completed the 1-month training program, returning to their home institutions to lead local faculty development efforts.
While several studies suggest the SFDP-CT is effective, all but one is based on surveys of teachers and learners rather than direct observation. One study analyzed videotapes of inpatient ward rounds, in which blinded observers rated overall teacher performance on the 7 domains of education taught in the SFDP-CT sessions 12 and found that teacher performance significantly improved in 2 of the 7 domains: learning climate and control of session. This study was limited for 2 reasons. First, it only examined inpatient teaching; the outpatient clinic is an increasingly important arena for the clinical teaching of medical students and residents.13–17 Second, these behaviors were only coded with regard to the 7 domains of the SFDP-CT. The full range of possible changes in teaching behavior was not assessed.
The purpose of our study was to objectively assess the impact of the SFDP-CT program on ambulatory teaching behavior, using an innovative method of coding directly observed student-learner interactions, the Teacher Learner Interaction Analysis System (TELIAS).18 We hypothesized that participation in the SFDP-CT would result in higher quality teacher-learner interactions with an increase in the proportion of analytical and open-ended questions resulting in higher-order thinking among learners, and an increase in specific feedback given.
Using a pre-post study design, we studied the effects of the SFDP-CT on ambulatory teaching behaviors. Eight faculty, participating in the SFDP-CT at our institution in a single year, were video-taped teaching a standardized learner, who had been trained to portray 3 levels of learners: a third-year medical student, an intern, and a senior medical resident. Standardized learners have been previously used in faculty development, though not to assess the effects of training.19,20 These 8 faculty members included 3 who participated as part of their training (an internal medicine chief resident, a general medicine fellow, and a rheumatology fellow) and 5 newly hired general medicine staff physicians who were asked by their service chief to participate as part of the service's commitment to education.
Faculty participants interacted with the standardized learner before and after completing the SFDP-CT. The SFDP-CT is given over 7 2-hour sessions, covering the domains of learning climate, control of session, communication of goals, promoting understanding and retention, evaluation, feedback, and self-directed learning. It combines didactic, videotaped role-play, and feedback and was led by a facilitator (L.P.) with over 15 years experience leading SFDP-CT seminars.
The standardized learner (R.B.), a Fellow in Pulmonary Medicine, but not known by any of the workshop participants, was trained to follow a script in which he portrayed 3 roles, a third-year medical student, a medicine intern and a senior medicine resident, using identical scripts for both the pre and postmeasurements. The script for the third-year medical student demonstrated lack of organization as well as inaccurate use of medical jargon and a poorly defined problem list. The intern script demonstrated good organization, with a focused, pertinent review of systems and good reasoning skills, but with an inaccurate assessment. The senior resident script was portrayed as highly competent, nearing the end of training. The “memorized” scripts consisted only of the initial presentation of the history and physical examination. Within the limits of the predetermined proficiency level, the standardized learner was trained to interact with the attending after the initial presentation based on the questions and prompting of the attending. The standardized learner's training occurred over approximately 40 hours, including role-play and videotape review and feedback.
The post-SFDP-CT interactions with the standardized learner occurred within 1 month of training completion. The post-SFDP-CT roles portrayed the same levels of learner, using identical scripts for the presentation of the history and physical examination as used during the pre-SFDP-CT interactions.
The consistency with which the standardized learner portrayed the different roles was assessed using TELIAS. Faculty participants were aware that the learner was a standardized simulation, though neither the standardized learner nor the faculty was aware of our hypothesis.
Independent medical transcribers, blind to the nature or purpose of the encounters, transcribed audiotapes of these encounters. Identifying information from the transcripts was stripped and 2 independent coders (E.B. and J.J.) coded transcripts using TELIAS. These tapes were coded randomly and coders were blind to the pre-post timing of the encounter as well as to the identity of the faculty member.
TELIAS has been more fully described elsewhere.18,21 In brief, TELIAS codes each teacher and learner utterance, defined as a complete thought, into 2 levels of coding, “concrete” and “abstract.” The concrete codes comprise a comprehensive framework of mutually exclusive categories (Table 2), and each utterance can receive only 1 concrete code. For example, an utterance by the teacher, such as: “What do you think is going on?” would be classified as an open-ended analytic question (Table 2). These coded utterances are noted to occur during presentation of the history and physical examination, or case discussion. In addition, some utterances receive a secondary “abstract” code, capturing the utterances' intent based upon their context. Most utterances are not given “abstract” codes, though some utterances can receive more than 1 “abstract” code. For example, the statement, “Rather than obtaining a hemoglobin A1C annually, check it every 3 months,” would be coded as “teaching a general rule” as well as “ implicit negative feedback.” Thus, this statement would be labeled (noncontextually, independent of other phrases) as negative feedback (concrete code), but the intention of it would be to teach a general rule (abstract code). The 193-node coding tree is based on standard qualitative software (QSR NUDIST 4.0, Qualitative Solutions and Research Corp., Australia). This protocol was approved by our institutional review boards and informed consent was obtained from the faculty members.
Table 2. Classification of 7,119 Concrete Teacher and Learner Utterances
| Question codes|
| Clarifying||160 (9.3)||68 (4.5)|
| Recall||55 (3.2)||36 (2.4)|
| Analytic||186 (10.8)||156 (10.5)|
| Rhetorical||25 (1.4)||28 (1.8)|
| Statement codes||120 (6.9)||115 (7.7)|
| Patient fact||45 (2.6)||34 (2.2)|
| Medical fact||131 (7.8)||122 (8.2)|
| Non-integrative||416 (24.2)||345 (23.2)|
| Thinking out loud||15 (0.8)||37 (2.5)|
| Directive||71 (4.2)||44 (2.9)|
| Repeats learner||51 (2.9)||41 (2.8)|
| Transitional word||29 (1.7)||23 (1.5)|
| Back check||343 (19.9)||392 (26.4)|
|Total teacher utterances||1,717||1,486‡|
|Learner||n (%)||n (%)|
| Question codes|
| Clarifying||2 (0.9)||0|
| Recall||4 (0.2)||0|
| Analytic||8 (0.4)||6 (0.2)|
| Rhetorical||7 (0.3)||7 (0.3)|
| Summative||273 (13.4)||271 (13.3)|
| Patient fact||1,144 (56.1)||1,014 (49.7)|
| Medical fact||103 (5)||62 (3.0)|
| Nonintegrative||351 (17.2)||339 (16.6)|
| Thinking out loud||21 (1)||35 (1.7)|
| Directive||2 (.09)||1 (0.05)|
| Repeats teacher||7 (0.3)||22 (1.1)|
| Transitional word||7 (0.3)||9 (0.4)|
| Back check||112 (5.5)||100 (5.3)|
|Total learner utterances||2,041||1,875|
This was a pre-post design with 6 encounters for each teacher and multiple utterances within each encounter. The unit of analysis was each coded utterance, with coding done by 1 of the 2 coders (not both). Twenty-five percent of the transcripts were double-coded, and interrater reliability as well as consistency of standardized learner presentation was assessed with Spearman's ρ. When double coded, all utterances were analyzed preferentially using coder 1 (E.B.). The mean number of utterances before and after the intervention, and the proportion of utterances in the various categories were compared with ANOVA, adjusting for clustering at the level of the individual participant with the Huber-White sandwich method. In addition, we looked for differences in the impact of the intervention between the different participants in the faculty development program. STATA 8.0 statistical software was used (STATA Corp., College Station, TX) for analysis.
Eight of the 9 faculty members participating in the SFDP-CT program in the fall of 2002 consented to participate. The number of years since residency and years as faculty encompassed a wide range (1 to 10 years, mean 5.25 years, Table 1). Three had participated in prior faculty development, though none within 3 years. There was no difference in effect or baseline utterance patterns between those who had and who had not previously participated in faculty development. There were a total of 48 encounters (24 before and 24 after completion of the SFDP). On average, there was a difference of 17 weeks between baseline and post-intervention encounters. Interrater reliability of TELIAS coding was 0.89 and the standardized learner consistency pre-post was high (Spearman's ρ: 0.80).
Table 1. Teaching and Faculty Development Experience of Participants
|Years since residency||7||4||1||2||6||10||8||5||5.4|
|Years as faculty||7||3||1||2||6||10||8||5||5.3|
|Number of previous faculty development programs*||0||2||0||0||0||0||1||2||0.63|
“Concrete” Behavior Codes
Among all 48 pre- and post-workshop encounters, there were a total of 7,119 utterances, 3,203 (45%) by the teacher, and 3,916 (55%) by the learner. The majority (82%, n=2,637) of teacher utterances were made during the discussion of the case, 15% (n=477) during the history examination, and only 3% (n=89) during the physical examination. In contrast, learner utterances were more evenly distributed. Among the 3,916 learner utterances, 38% were made presenting the history examination, 15% presenting the physical examination, and 31% in the case discussion. Overall, 22% (n=714) of teacher utterances were questions, the majority (82%, n=2,637) occurring during the case discussion; 15% (n=111) occurred during the history and only 4% (n=25) during the presentation of the physical examination. The majority of questions during the history and physical examination were recall or clarifying questions (80%).
Effect of Intervention. There was a nonsignificant decline in the total number of utterances, with an average decline of 397 utterances made by teachers and learners after workshop participation (P=.08, Table 2). For teachers, the average number of utterances per encounter nonsignificantly declined from 71.5 to 61.9, with no change in the timing of utterances, with most teacher utterances made during the discussion (pre: 83.6% vs post: 81.8%). There was improvement in the quality of questions asked by teachers, with a greater percentage of higher-level, analytic questions (pre: 44% vs post: 55%, P<.0001) such as, “What would you like to do for this patient?” Concomitantly, clarifying or recall questions, such as, “What was the patient's age?” or “What organisms are usually responsible for urinary tract infections?” significantly declined. This difference was especially marked during the case discussion, where clarifying or recall questions declined from 80% to 59% and analytic questions increased from 10% to 34%. In addition, the total number of teacher questions decreased from 426 to 288 (P=.02) after SFDP-CT training (Table 2).
There was no change in the percentage of teacher questions that were open-ended, such as, “What do you think is going on?” after the intervention (35%, n=147 vs 39%, n=112, P=.26), and no change as to when in the encounter (history, physical, discussion) questions were asked (P=.22). Finally, there was no change in the type of statements made by teachers.
Abstract Behavior Codes. Six hundred and seventy-four teacher utterances (9%) were some sort of feedback (Table 3). The majority (90%, n=608) were minimal, positive (83%, n=556) feedback statements, such as “good job” or “nice presentation.” While there was no change in the total number of feedback utterances (pre: n=363, mean/tape 15.1 vs post: n=311, mean/tape 13.0, P=.23) or the likelihood of receiving negative feedback (pre/post both 10%), the type of feedback changed. The percentage of feedback that was minimal declined from 87% (n=317) to 76% (n=239) after participation in faculty development (P<.0005) (Table 3). This decline in low-quality, minimal feedback was accompanied by a corresponding increase in higher quality, “more than minimal” feedback, increasing from 13% of feedback statements to 23%. Most of this “more than minimal” feedback was specific: “You did a nice job eliciting the history of congestive heart failure and determining its impact on your patient's ADLs” or “I like the way you prioritized this patients problems from most serious to least serious.” There were 46 specific feedback statements made before participation in the faculty development workshop (30 positive, 16 negative), increasing to 67 afterwards (54 positive, 13 negative, P<.001).
Table 3. Change in Type of Feedback Given by Teacher
|Minimal||317 (87%)||239 (76%)|
| Specific||46 (13%)||67 (23%)*|
| Positive||325 (90%)||283 (91%)|
| Negative||38 (10%)||28 (9%)†|
|Total feedback statements||363||311|
Another change in “abstract” coded behavior was in “teaching general rules.” An example of teaching is general rules “Diabetics should have their feet checked at each follow-up office visit.” While there was a nonsignificant decline in the number of general rules taught (from 4.7 to 3.8, P=.22), teachers used more utterances to reinforce those general rules (pre: 97 utterances vs post: 121 utterances, P=.03). for example, “In addition to checking for ulcers and the condition of the patients toenails, you should check for neuropathy, preferably with a monofilament.”
Our results demonstrate that participation in the Stanford Faculty Development Program for clinical teaching resulted in a number of important changes in teaching behavior. First, there was a shift in the type of questions asked of learners from mostly clarifying and recall questions to higher-level analytic/synthetic questions. In addition, there was a decrease in “pimping” with fewer, narrow, fact-type questions asked. The encounters were shorter, and the attendings reduced the number of general rules taught while spending more time reinforcing those rules. Finally, there was a shift in the type of feedback learners received. After SFDP training, our faculty was nearly twice as likely to provide specific feedback on learner performance and much less likely to provide only minimal feedback. These changes in behavior are all in the direction of improvement in the quality of teaching and were consistent across participants; no one person accounted for the majority of these effects. In addition, we found trends toward improvement in a number of other domains as well: for example, teachers spent less time talking and more time listening after the intervention.
TELIAS has now been found to be reliable and sensitive to small changes in teaching behavior in a number of studies.18,21,23 TELIAS allows the objective quantification of the characteristics of teacher-learner interactions and could be a tool for a number of future investigations in medical education. Thus far, it has been used to assess the nature of encounters between third-year students and attendings, the impact of the 1-minute preceptor on teaching behaviors with third-year medical students, and in this study. Other uses could include assessing differences in teaching between different specialties or different levels of learners, evaluating teaching behaviors that optimize learner outcomes, examining if these teaching behavior changes are sustainable, or objectively characterizing teaching behavior for evaluation and promotion purposes, though this would require further study into characterizing “desirable” teaching behavior.
However, several important limitations to our study exist. First, we report the effect of the SFDP-CT on a small group of participants at a single institution. This will necessarily limit the generalizability of our findings. However, our findings are consistent with those seen during direct observation of videotapes of inpatient teaching after participation in the SFDP-CT. Second, we used a standardized learner rather than actual teaching encounters. This may have lent an element of artificiality to the encounters and could increase any Hawthorne effect. This decision was made by design. In a previous study, we investigated the effect of the “One-Minute Preceptor” faculty development program on encounters between faculty and third-year medical students and found considerable variance in students' proficiency during the encounters.19 Although we found improvement in teaching, the wide range of student skills made isolating the effect of the faculty development difficult. The use of a standardized learner allows more clear delineation of the effect from our intervention. Moreover, our method allowed us to model the effect of faculty development on a range of learner skills, rather than just 1 stratum. To minimize the Hawthorne effect, we kept participants blind to our hypothesis; by inference, they knew that we were studying the effect of the SFDP-CT on teaching. The SFDP-CT focuses on a broad range of teaching behaviors. Of the many possible changes in teaching, it is remarkable that the effects were limited to a specific few and were so consistent between teachers. Third, we used identical scripts before and after participation in the SFDP-CT. Although there may be concern that the faculty subjects may have learned the script, there was a gap of over 4 months between the pre-SFDP-CT interaction and post-SFDP-CT interaction. During those 4 months, our faculty remained busy with seeing patients and precepting residents and medical students. It seems unlikely that these brief interactions would be remembered in great detail after so many weeks. Moreover, only the initial presentation of the history and physical examination was scripted. Subsequently, the standardized learner improvised his responses, within the context of the role he was playing, based on the questions and statements made by the attending. A fourth and related limitation is that our subjects, while somewhat compelled to participate in this workshop, were selected for their positions, either as fellows, chief residents, or staff positions, partially for their interest in teaching. Consequently, they may be more motivated to improve their teaching than others. On the other hand, it is also possible that their baseline teaching was higher than average, so other teachers with less enthusiasm or skills might have an even greater change in their teaching. Without a control group it is hard to delineate the precise impact of the SFDP. An ideal study would randomize potential subjects to participate in the workshop, and both groups would be randomized to scripts in different orders: faculty member A would use script 1 pre and script 2 post and the reverse for faculty member B to remove the possibility of bias. This would require a much larger sample size as well as funding. Fifth, there was a short duration between completion of the SFDP-CT and the standardized encounters. Since previous work has suggested that improvements in teaching from faculty development programs may not be sustained,22 we do not know if these changes in teaching behaviors would persist over a longer period of follow-up. The fact that the effect of our intervention was equally strong among those with previous exposure to faculty development would suggest the potential for a waning effect on behavior. Sixth, there were several trends toward improvement in teaching behavior that were not statistically significant due to inadequate sample size. Finally, while TELIAS is very sensitive, it remains uncertain how “clinically relevant” the effect is. Further research is needed to assess how teacher and learner satisfaction correlates with TELIAS' objective measures of change. It is likely that indirect measures of encounter quality, such as learner satisfaction surveys, and objective measures, such as TELIAS, are complementary.
In summary, meaningful improvement in teaching behaviors after participation in this faculty development program was seen after participation in the SFDP-CT. It is important to note that the novel assessment tool that we used to quantify these behavioral changes, TELIAS, can be applied to other faculty development programs and other institutions. Future research should be done in this area. In addition, other areas of research include assessment of the sustainability of these behaviors and evaluation if the learners themselves perceive or benefit from these changes in teaching behavior.
The opinions or assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the Department of the Army or the Department of Defense.