SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

Objective

Traditional means of testing rheumatology fellows do not adequately assess some skills that are required to practice medicine well, such as humanistic qualities, communication skills, or professionalism. Institution of the New York City Rheumatology Objective Structured Clinical Examination (ROSCE) and our sequential 5 years of experience have provided us with a unique opportunity to assess its usefulness and objectivity as a rheumatology assessment tool.

Methods

Prior to taking the examination, all of the fellows were rated by their program directors. Fellows from the participating institutions then underwent a multistation patient-interactive examination observed and rated by patient actors and faculty raters. Assessments were recorded by all of the participants using separate but overlapping sets of instruments testing the Accreditation Council of Graduate Medical Education (ACGME) core competencies of patient care, interpersonal and communication skills, professionalism, and overall medical knowledge.

Results

Although the program directors tended to rate their fellows more highly than the ROSCE raters, typically there was agreement between the program directors and the ROSCE faculty in distinguishing between the highest- and lowest- performing fellows. The ROSCE faculty and patient actor assessments of individual trainees were notable for a high degree of concordance, both quantitatively and qualitatively.

Conclusion

The ROSCE provides a unique opportunity to obtain a patient-centered assessment of fellows' ACGME-mandated competencies that traditional knowledge-based examinations, such as the rheumatology in-service examination, cannot measure. The ability of the ROSCE to provide a well-rounded and objective assessment suggests that it should be considered an important component of the rheumatology training director's toolbox.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

Yearly in-service examinations are widely used to test residents' knowledge, but do not assess the humanistic qualities, communication skills, or professionalism that are also required to practice medicine well. In contrast, an objective structured clinical examination (OSCE) is designed to test these and other skills in a structured framework. OSCEs employ direct observation during simulated clinical encounters to evaluate history-taking and physical examination skills, the ability to interpret data such as radiographs and microscopic images, and the ability to advise and communicate with patients. The OSCE is thus designed to sample the full set of skills that trainees need for excellence in clinical practice. Poor performance in OSCEs has been shown to correlate with poor performance in other subsequent clinical examinations (1).

Harden and Gleeson first described the OSCE as a tool for medical student assessment in 1979 (2). Subsequently, it has been adapted for use in multiple surgical and medical subspecialties, including rheumatology. In the UK in 2002, a 13-station OSCE was administered to 12 rheumatology trainees as part of an initiative to assess specialists' skills (3). Brasington and colleagues at the Washington University School of Medicine also adapted the OSCE to rheumatology trainees (coining the term ROSCE [Rheumatology Objective Structured Clinical Evaluation]) and presented their data at the American College of Rheumatology Annual Scientific Meeting that year (4). One strength of their study was its attempt to specifically assess Accreditation Council of Graduate Medical Education (ACGME)–mandated core competencies as they apply to basic skill sets in rheumatology. The ACGME has established that core competencies in medical practice must include skills in patient care, medical knowledge, practice-based learning, interpersonal and communication skills, professionalism, and systems-based practice.

In 2004, a group of New York City rheumatology training programs (The Hospital for Special Surgery, The State University of New York Downstate Medical School, and New York University/New York University Hospital for Joint Diseases) initiated the first interinstitutional New York City ROSCE (NYC-ROSCE) at the Hospital for Special Surgery. These programs have subsequently performed an interinstitutional ROSCE annually, with the additional participation of 2 other training programs (Albert Einstein College of Medicine and Columbia Presbyterian Hospital/The College of Physicians and Surgeons). Our cumulative 5 years of experience with the ROSCE have provided us with a unique opportunity to assess its utility and effectiveness as a rheumatology assessment tool. Specifically, we hypothesized that our experience in the ROSCE would permit us to 1) evaluate the usefulness and accuracy of the ROSCE with regard to trainee performance, 2) validate the ability of the ROSCE to document meaningful improvement for individual trainees and programs between years, 3) determine the objectivity of patient and physician ratings during the ROSCE by comparing ROSCE assessments with assessments of trainees by their program directors, and 4) assess the utility of the ROSCE for evaluating trainee competence in areas other than medical knowledge and patient care, specifically communication and professionalism. In the fifth year, professionalism was looked at in an in-depth format from the point of view of all examination participants. Our data suggest that the ROSCE provides an accurate and objective assessment of fellow performance in a number of clinical areas, including professionalism, and that OSCE methodology can be meaningfully applied to assessment at the subspecialty level.

MATERIALS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

Overview.

In the NYC-ROSCE, the majority of the stations are patient interactive. Each station constitutes a unique clinical scenario addressing an important rheumatic disease. Twelve minutes are allotted for the completion of each station. Every attempt is made to create scenarios that mimic typical clinical encounters with realistic patient interactions to improve examination validity (5). Prior to each station, trainees are provided with the opportunity to review a brief description of the patient and the issue to be discussed (see Supplementary Appendix A, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). In different years, either rheumatology patients or trained professional patient actors have been used to role play and rate the trainee. The patient actors are also given background information relevant to their role (see Supplementary Appendix B, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home) to help them formulate responses to the trainees' questions.

For each station, a board-certified rheumatologist serves as a faculty rater. The faculty rater has prior access to both patient and trainee information sheets, and aids the patient in enacting their role when necessary. Three minutes are allotted after the clinical encounter for the faculty member and patient to complete a set of standardized assessment forms that include questions with rating scales as well as opportunities to enter written comments. Patients assess fellows' humanistic and communication skills and professionalism. The faculty similarly assess these areas, and in addition, assess clinical competence and medical knowledge. After the completion of each ROSCE, the data are collected and collated by a single member of the ROSCE faculty (JRB), selected by consensus. These individual assessments, along with composite trainee data, are later made available to training directors for the purpose of program assessment and fellow feedback. By deliberate design, no overt comparisons are made between programs, and the program directors are privy only to data regarding their own fellows.

ROSCE design and evolution.

The 2004 pilot ROSCE consisted of 7 stations duplicated to form 2 circuits (14 stations total), allowing all of the participants to complete the examination in a timely manner. The participants consisted of 14 first- and second-year rheumatology trainees and 14 faculty raters from the participating institutions. To the fullest extent possible, trainees and faculty raters from any single institution were separated to foster objectivity on the part of the raters. Eight real-life patients with rheumatic disease were recruited from the Hospital for Special Surgery patient population. All of the patients had rheumatologic diagnoses but, owing to logistic issues, it was not always possible to match the patients by their own diagnosis to a particular station. We also discovered that it was not always useful to have “real patients” play their own diagnoses, since they often employed peripheral knowledge of their conditions to diverge from the script and asked questions about their own illness, rather than playing the role of a model patient. The station topics for that first ROSCE and the task for the trainees at each station are listed in Table 1. Stations 1, 2, 4, and 5 involved patient actors. Since station 3 was a simulated phone call, a member of the faculty both played the role of the patient caller and served as the rater. For station 7, trainees were asked to evaluate radiographs and provide their interpretations on a written questionnaire.

Table 1. Rheumatology Objective Structured Clinical Examination station design*
StationThemeContent, year 1 (2004)Content, year 5 (2008)
  • *

    DMARDs = disease-modifying antirheumatic drugs; DXA = dual x-ray absorptiometry; RA = rheumatoid arthritis; OA = osteoarthritis; ANA = antinuclear antibody.

1Rheumatoid arthritisDiscuss treatment options for a recently diagnosed patient.Discuss treatment options for a recently diagnosed patient whose preference is for alternative therapies alone.
2Systemic lupusExplain this new diagnosis with the patient and discuss beginning treatment with prednisone.Explain this new diagnosis and treatment with DMARDs in a patient whose religious beliefs preclude contraception and who intends to become pregnant soon.
3Knee painA simulated phone call in which the fellow is asked to assess the situation and give advice to the patient “after-hours” about an acute, painful knee effusion.A simulated phone call in which the fellow is asked to assess the situation and give advice to the patient “after-hours” about an acute, painful knee effusion. The patient was instructed to resist instructions to present to an emergency room.
4Lupus pregnancyAddress medication safety and likelihood of disease control during a lupus pregnancy.Not done
5OsteoporosisInterpret DXA results for the patient and discuss possible therapies for bone loss.Not done
6Synovial fluid analysisUse microscopy to evaluate trainee thought process and reasoning during evaluation of a fresh synovial fluid sample to diagnose crystal disease.Not done
7Bone radiographyCommon findings in hand films for RA, OA, and hemochromatosis were presented on a view box and the fellows were asked to interpret them.Systematic interpretation of common radiographic findings in rheumatic disease.
8FibromyalgiaNot doneExplain the condition to a difficult patient with a positive ANA who is convinced she has lupus and frustrated by the contradictions.
9SclerodermaNot doneDiscuss the poor prognosis with a patient who has advanced disease.

In 2005, the number of participating institutions increased to 4, including 15 fellows and 17 faculty. Twelve patient volunteers were a mix of real-life patients from the New York University Hospital for Joint Diseases rheumatology clinic population and trained nonpatient volunteers. Two lupus stations from the 2004 examination were consolidated, as were the stations concerning knee pain and synovial fluid analysis. A station addressing fibromyalgia was added. The stations on rheumatoid arthritis and osteoporosis were rewritten. The bone radiography station was modified to better evaluate thought process and pattern recognition; faculty raters prompted the trainees to systematically review their approach to reading the films, along with how the findings informed their diagnosis of the disease. To allow for simultaneous testing of all of the fellows, the number of circuits was increased from 2 to 3.

In 2006, the ROSCE was expanded to include 5 institutions, with 21 trainees and an equal number of participating faculty. Patient actors were recruited from the rheumatology clinic population, as well as the administrative staff of the New York Harbor Health Care System (Veterans Administration) Brooklyn campus. Three new stations (spinal stenosis, osteoarthritis, and scleroderma) were rotated in, for a total of 8 stations (6 patient-centered) in the ROSCE. Because of the increase in topics and participants, the number of circuits required was again 3.

Beginning in 2007, the patient raters were replaced with professional actors who were specifically trained to assess communication skills and professionalism. Prior to the ROSCE, these conservatory-trained actors were informed about their diagnoses and were given access to an educational Web site with extensive information on their individual conditions (www.hss.edu). The actors were also allowed regular, ongoing communication with a single ROSCE organizer to clarify the examination design and their diagnoses. Immediately prior to the ROSCE, they received additional instruction regarding the competency of professionalism in medicine, expectations for trainee performance, and the assessment questionnaire and rating scale used.

In 2008, the ROSCE included 5 patient-centered stations, as well as a radiology station, arranged in 4 circuits, with 20 trainees and 24 faculty members from the 5 institutions. By faculty consensus, the 2008 ROSCE emphasized assessment of the competency of professionalism. Patient scenarios were redesigned not only to test physician knowledge and judgment, but also to challenge the trainees with more complex ethical dilemmas and patient interactions (Table 1). Prior to the ROSCE, professionalism assessments of trainees by their program directors were obtained and compared with professionalism ratings by faculty assessors and patient actors obtained during the examination. For this most recent ROSCE, fellows also rated their own performances at each station and their overall professionalism following the examination.

Assessment instruments.

Assessments were recorded using separate but overlapping sets of instruments for the faculty and patient raters, as detailed below. Ratings were entered using Likert scales (range 1–9). For the years 2004–2006, the ranges were specified as: 1–3 = unsatisfactory, 4–6 = satisfactory, and 7–9 = superior. In 2007 and 2008, the ranges were reworded as: 1–3 = below average, 4–6 = typical, and 7–9 = superior. During those years, we explicitly asserted that the expectation was a median value of mid-range, a change that resulted in a more discriminating spread of scores. In addition to the Likert scales, the faculty were encouraged to provide written, qualitative comments on the trainees' performances, strengths, and/or weaknesses.

Faculty assessment instruments.

Station-specific medical content.

These forms permitted assessment of the examinees' medical knowledge and quality of patient care. Checklists were used to confirm whether trainees obtained the relevant medical history and discussed suitable treatment options, as well as potential complications, with the patient (a typical example is provided in Supplementary Appendix C, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home).

Core competencies.

Faculty members were asked to provide global numerical impressions of the trainee's skills, because checklists alone do not improve validity over a global rating scale (6). This questionnaire assessed the trainees' skill levels in 4 of the ACGME core competencies: patient care, interpersonal and communication skills, professionalism, and overall medical knowledge (see Supplementary Appendix D, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home).

Subjective feedback.

Written comments in free text were solicited during each encounter. Areas of the assessments were designated that specifically asked for a list of the fellow's strengths and weaknesses. Additional space was provided for further comments.

Professionalism.

Initially, professionalism was assessed as a single item on the core competency evaluation instrument. In the 2007 and 2008 ROSCEs, we employed a more extensive instrument (see Supplementary Appendix E, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home) adapted from similar tools, particularly those used in nursing assessments of professionalism. Precedent for this type of assessment can be found in prior studies linking noncognitive skills to specific behaviors associated with professionalism in medicine (7). Included were questions aimed at whether the fellow being rated demonstrated respect, responsiveness, clarity in communication, and competency in their patient interactions. For the purposes of comparative analysis, an overall professionalism score was generated for each questionnaire by taking a fellow's average score across all categories.

Patients' assessment instruments.

The core competencies instrument was identical to that used by the faculty. Specific medical knowledge was not assessed by the patients, although in one year (2008), the patient raters were asked to comment on whether the trainees appeared to be knowledgeable. Subjective feedback was identical to the comments solicited from the faculty, and the professionalism instrument was identical to that used by the faculty.

Post-ROSCE questionnaire.

In order to assess participants' opinions of the ROSCE, a post-ROSCE questionnaire was completed by the trainees, faculty, and patient participants. Questions addressed the strengths and weaknesses of the examination and what could be done in the future to improve the examination.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

Concordance between ROSCE faculty raters across stations and between ROSCE faculty raters and program directors' assessments.

For individual fellows, we observed rough agreement between faculty raters for overall scores across the individual stations tested, although this agreement did not reach statistical significance. Similar trends were seen for ratings by patient raters (data not shown). Although these results appear to suggest a level of rater consistency, they may also reflect the differences in performance that were observed for individual fellows at specific ROSCE stations. In contrast, fellows performing at a specific level in a particular competency tended to perform at a similar level in all of the other competencies, suggesting that performance in individual areas reflects performance overall (Figure 1).

thumbnail image

Figure 1. Individual trainees tended to be assigned similar ratings in multiple areas of competency. Twenty individual fellows participating in the 2008 Rheumatology Objective Structured Clinical Evaluation (ROSCE) were evaluated by the ROSCE faculty for overall competence in patient care, interpersonal and communication skills, professionalism, and medical knowledge. For each individual trainee, the ratings shown for each competency represent the average rating assigned by 5 faculty across all 5 patient-centered stations in the examination.

Download figure to PowerPoint

We also compared ROSCE faculty assessments of individual fellows with the evaluations of their program directors obtained prior to the ROSCE administration. Overall, program directors tended to rate their fellows more highly than the ROSCE raters, typically by a margin of 1–2 points (Figure 2A). Program directors' descriptive assessments of their own fellows also tended to be more generous than those of the ROSCE faculty (data not shown). However, among fellows from any given fellowship program, there was typically agreement between the program directors and the ROSCE faculty in distinguishing between the highest- and lowest-performing fellows. Indeed, overall scores provided by the program directors and the ROSCE faculty demonstrated a reasonably good correlation (Figure 2B). Additionally, the particular strengths and weaknesses of individual fellows, as identified qualitatively by the ROSCE faculty, were similar to those identified by the program directors in the narrative portion of their pre-ROSCE assessments (data not shown).

thumbnail image

Figure 2. Comparison of trainee ratings between the program directors, Rheumatology Objective Structured Clinical Evaluation (ROSCE) faculty, patient actors, and real patients. A and B, Comparison of ratings between the program directors and ROSCE faculty. A, Mean rating of the trainees' overall ROSCE performance in 2006–2007 by program directors versus ROSCE faculty (n = 32 trainees rated). B, Regression analysis comparing program directors' ratings of individual trainees versus average ROSCE faculty ratings for the same trainees as in A. C and D, Comparison of ratings between the ROSCE faculty and patient actors. C, Mean rating of trainees' overall ROSCE performance for the years 2007–2008 by the ROSCE faculty versus patient actors (n = 31 trainees rated). D, Regression analysis comparing ROSCE faculty ratings of individual trainees versus patient actors' ratings of the same trainees as in C. For the regression analyses, each data point represents the overall rating of a single program director compared with the mean overall rating of 6 different ROSCE faculty for each trainee. E, Comparison of the distribution of overall scores by the professional patient actors from 2007 (n = 72) versus the scores issued by actual rheumatology patients from 2006 (n = 101 responses) recruited to play the role of patients in years 1–2 of the ROSCE.

Download figure to PowerPoint

Concordance between ROSCE faculty and patient raters.

Overall, ROSCE faculty raters tended to score trainees more generously than professional patient actors (Figure 2C). Nonetheless, ROSCE faculty and patient actor assessments of individual trainees were notable for a high degree of concordance, both quantitatively (Figure 2D) and qualitatively. In particular, we observed concordance between the ROSCE faculty and patient actor assessments of individual trainees' strengths and weaknesses. Areas of qualitative concordance included the identification of pitfalls such as the use of language that was overly technical or rushed. This observation contrasts with prior reports of poor agreement between ROSCE faculty examiners and simulated patients (8).

Comparison between ratings by actual patients versus professional patient actors.

When we compared the ratings provided by professional patient actors with those of the actual patients who had previously participated as patient actors in our first 2 ROSCEs, we observed that the actual patients appeared to be excessively liberal in their patient evaluations (Figure 2E). These results were unlikely to be due to major intrinsic differences in trainee performance, since these comparisons were performed only on the cohort of trainees who were rated both by real patients participating in the ROSCE (in year 1 of the trainees' ROSCE experience) and by professional patient actors (in year 2 of the trainees' ROSCE experience). Moreover, the correlation between ROSCE ratings by the actual patients and patient actors was poor (data not shown). Although the ratings submitted by the professional patient actors achieved a Gaussian distribution, the ratings submitted by the actual patients did not.

Ability to detect improvement in performance.

Between 2004 and 2008, data obtained from fellows who took the examination in successive years demonstrated the ability of the ROSCE to detect performance improvements between the first and second year of training (Figure 3). All of the repeat participants demonstrated improvement in at least 2 of the 6 stations tested from one year to the next. Eleven of the 14 repeating trainees for whom data could be analyzed showed overall improvement, which was statistically significant (Figure 3). The worst performers in a given year tended to demonstrate the most significant improvement in the following year.

thumbnail image

Figure 3. The Rheumatology Objective Structured Clinical Evaluation (ROSCE) is able to detect the change in performance between successive years. The overall ratings for the 14 individual trainees who participated in the ROSCE as both first-year (year 1) and second-year (year 2) fellows between 2004 and 2008 are shown. A, Average overall examination performance for the 14 fellows shown for year 1 (open bar) and year 2 (solid bar). Ratings are calculated by taking the average of each rater's score in all categories, and then averaging all of the raters for an aggregate score for each trainee. B, Average overall examination performance for year 1 and year 2 for each repeating fellow, indicating that the majority showed significant improvement.

Download figure to PowerPoint

Assessment of professionalism.

Using a single global assessment of professionalism, program directors in our early ROSCEs tended to rate their own trainees' professionalism more highly than the ROSCE faculty raters, but this difference did not achieve statistical significance (data not shown). Moreover, we observed poor correlation between the program directors and ROSCE faculty raters (R = 0.1). These data suggested that a single global assessment of professionalism might not be capturing professionalism as rigorously as might be desirable. To improve the performance of the ROSCE with regard to professionalism, we instituted 2 interventions. First, we rewrote our station scenarios in order to provide trainees with more challenging ethical and interpersonal interactions (Table 1). Second, to more precisely define professionalism, we instituted an expanded professionalism questionnaire to allow us to capture data more specifically and rigorously for both feedback and analytic purposes.

Under these circumstances, ROSCE program directors tended to rate fellows more highly than did faculty raters, a difference that achieved statistical significance (Figure 4A). Moreover, we now observed a trend toward correlation between program directors and ROSCE faculty global evaluations of fellows' professionalism (Figure 4B). In contrast, ROSCE faculty raters rated trainees' professionalism more highly than the patient actors (Figure 4A), and the overall correlation was strong between these 2 groups (Figure 4C). Trainees tended to be relatively critical of their own professionalism (mean ± SD self-rating 5.91 ± 0.18) (Figure 4A), with 7 of 20 giving themselves the lowest ratings of all of the evaluators. However, the 5 trainees whose professionalism self-ratings were higher than those of their other raters were among the lowest rated overall. We observed little or no correlation between ROSCE faculty ratings of trainee professionalism and the trainees' ratings of themselves (Figure 4D), as well as between program directors' evaluations and trainees' self-evaluations (data not shown).

thumbnail image

Figure 4. Ratings of professionalism by the program directors, Rheumatology Objective Structured Clinical Evaluation (ROSCE) faculty, patient actors, and trainees. A, Overall professionalism ratings of the entire group of trainees for ROSCE year 2008 (n = 20) by the program directors, ROSCE faculty, patient raters, and trainees themselves. B, Regression analysis comparing program directors' ratings of individual trainees with those of the ROSCE faculty (scores by the ROSCE faculty represent the mean score for each fellow across all 6 stations included in the ROSCE). C, Regression analysis comparing ROSCE faculty scores with patient-raters' scores for the individual ROSCE participants. D, Comparison of the ROSCE faculty evaluations of trainee professionalism with the self-evaluation of the trainees themselves.

Download figure to PowerPoint

ROSCE usefulness.

In each year of the ROSCE, trainees, faculty, and patients were invited to give their impressions of the usefulness of the ROSCE for the purposes of training, assessment, and feedback. In year 1, the majority of trainees (8 of 12) reported that the ROSCE was extremely (3 of 12) or very (5 of 12) useful. Fellow ratings of test usefulness were similar in subsequent years, including the most recent year (2008) for which assessments were obtained. The distribution of ratings compiled between the years 2004 and 2008 is shown in Figure 5. Among the limitations to usefulness identified by trainees, some found it difficult to discuss serious issues (e.g., prognosis, sexual history) in the absence of an established relationship with their “patients.” Others thought that although the patient actors were believable, they did not fully simulate the real-life practice of rheumatology. Some trainees who rated the exercise somewhat useful or less noted their dissatisfaction with the lack of immediate feedback on their performance. We note that other OSCEs do indeed give immediate feedback after each station; in our experience, the tight time structure of the OSCE makes this a difficult matter and alters fellow performance across stations, making a consistent assessment more difficult to achieve. Nonetheless, we are responding to these fellow comments by piloting a restructuring for the 2009 NYC-ROSCE, in which some direct fellow feedback will be provided in real time.

thumbnail image

Figure 5. Ratings of Rheumatology Objective Structured Clinical Evaluation (ROSCE) usefulness by the ROSCE faculty, trainees, and patient actors. Participants were asked to rate the utility of the ROSCE as extremely, very, somewhat, not very, or not at all useful. For each group of raters, data are shown as the percentage of respondents responding under each category for the years 2004–2008 (n = 59 ROSCE faculty, 78 fellows, and 47 patient raters).

Download figure to PowerPoint

In contrast to the moderately positive impressions of the trainees, faculty participants were overwhelmingly positive about the usefulness of the exercise, with the majority annually rating it as extremely or very useful. The majority of the faculty supported the ROSCE as an extremely appropriate tool for evaluating the fellows, noting in particular that it “demonstrates clinical reasoning skills and translational thought process” and “provides insight into interpersonal communication,” which are skills untested by other means. Patient evaluators were the most enthusiastic group in support of the exercise. Indeed, when asked to “rate the usefulness of this exercise in helping the doctor to learn,” 80% or more each year stated that it was extremely useful.

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

We studied the performance parameters of a ROSCE, based on 5 years of annual experience in an interinstitutional setting. Our data indicate that the ROSCE provides good overall concordance between the fellow ratings by program directors, faculty raters, and patient raters, although the program directors tend to rate fellows most highly and the patient raters least highly. In contrast, fellows appear to be poor self-raters, particularly in the area of professionalism. The ROSCE was able to measure improvement from the first to the second year of fellowship and was considered useful by both fellows and faculty, particularly in testing skills that are untested by other methods.

The ROSCE provides a unique opportunity to obtain a patient-centered assessment of fellows' clinical skills, professionalism, and humanistic qualities. It has the potential to directly assess ACGME-mandated competencies that traditional knowledge-based examinations, such as the rheumatology in-service examination, cannot. Although our quantitative data suggest that the ROSCE is a useful tool for measuring improvements in performance between successive years of training, these data do not rigorously distinguish between genuine skills acquisition versus improvements in ROSCE test taking. However, a close correlation between our quantitative data for individual fellows and the corresponding qualitative comments from their evaluators (data not shown) lead us to infer that the observed improvement was due to genuine skills acquisition. Moreover, because fellows understand that they are expected to use the ROSCE opportunity to “put their best foot forward,” at the very least, the ROSCE can be assumed to capture fellows' attempts to perform a clinical interaction to the best of their ability.

Importantly, the ROSCE can identify specific areas in need of further improvement for both individual fellows and training programs. In particular, we believe that the ROSCE can serve as an important tool in evaluating professionalism and humanistic qualities. Combined with a 360° evaluation, the ROSCE can provide a significant portion of the feedback required in these competencies by the ACGME. Other benefits of the ROSCE include its flexibility: ROSCE topics and formats can be altered over time to accommodate new knowledge and/or competency concerns.

One potential weakness of the ROSCE is its reliance on subjective and/or semiquantitative methods of assessment. However, our experience confirms that the ROSCE can provide numerically reproducible data that will generally provide an objective correlate to program directors' own impressions and, in some cases, highlight areas in which program directors may have incorrectly assessed fellow performance. Moreover, the use of faculty from other programs to evaluate each fellow provides a level of objectivity not achievable within the programs themselves. On the other hand, it is worth noting that subjectivity may have its value, particularly in the form of free-text comments. For example, one fellow was consistently noted to be “too scary” when talking to patients, information regarding her performance that could not have been obtained in another type of examination. Another fellow whose overall performance was rated highly was nonetheless noted by several patient raters to be sitting too close during their interactions, an important and easily correctible problem.

Our most recent use of the ROSCE to rate professionalism was unique in its attempt to expand assessment of the ACGME-mandated proficiency of this competency. Prior OSCEs have tried to gauge professional behaviors by asking participants to provide written followup to an ethics station (9), or by using a professional decision inventory (10) in which trainees had to explain their choices using values. However, in our examination, professionalism was assessed at multiple levels through an extensive questionnaire that all evaluators taking part in the examination were asked to complete. In addition to providing a more precise view of the individual trainee's strengths and weaknesses, the global integration of the individual professionalism axes, scored by both faculty and patient actors, provided a more statistically reliable overall professionalism assessment than that of a single professionalism Likert scale. The significantly lower scores registered by the patient actors versus the ROSCE faculty suggest that the faculty may not fully empathize with the patient perspective. We speculate that physician and patient perceptions of what constitutes professionalism may be similar, but that patients may be more critical due to their inherent vulnerability in the relationship. Additionally, the finding that fellows tended to over- or underrate their own professionalism suggests that trainees may not be accurate assessors of their own professionalism. ROSCE feedback may further provide an opportunity for discussion between the fellowship director and the trainee on otherwise difficult-to-approach issues of professionalism.

Finally, our data suggest that professionally-trained actors are better suited than actual patients to serve as patients in the ROSCE setting. Ratings from professional patient actors tended to show a normal distribution, whereas ratings by the real patients clustered at the top. Qualitative feedback from our faculty observers suggested that the professional patient actors were more poised, better prepared, and more responsive to trainees during the ROSCE encounters. These observations extend what is otherwise very limited literature on the relative value between the use of actual patients versus patient actors in OSCEs. For example, Wass et al observed a relative equivalence in the use of actual patients versus patient actors, but compared these groups across different types of exercises (11).

To our knowledge, the NYC-ROSCE is the largest and most enduring examination of its kind to be undertaken with rheumatology trainees. Our stepwise expansion of this project through duplication of simultaneous circuits demonstrates the capacity of the ROSCE to service an increasing number of programs, limited only by the physical constraints of space and geography. The ability of the ROSCE to provide a well-rounded, objective, and patient-centered (as well as faculty-centered) assessment suggests that the ROSCE should be considered an important component of the rheumatology training director's toolbox.

AUTHOR CONTRIBUTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Berman had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Berman, Lazaro, Fields, Bass, Putterman, Paget, Pillinger.

Acquisition of data. Berman, Lazaro, Fields, Bass, Weinstein, Putterman, Dwyer, Krasnokutsky, Paget, Pillinger.

Analysis and interpretation of data. Berman, Lazaro, Fields, Bass, Putterman, Krasnokutsky, Paget, Pillinger.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

We would like to thank the fellows, faculty, and actors who participate in the OSCE yearly and make this possible. We thank Ms Cookie Reyes, Ms Maricel Galindez, Ms Amy Reyes, Ms Rochelle Yates, and Ms Sunaina Kumar for their logistic assistance. We especially thank Dr. Rick Brasington for laying the important foundation that made this further expansion of the ROSCE possible, and for his generous advice and inspiration.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
ART_24738_sm_appendix.doc60KSUPPLEMENTARY APPENDIX

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.