Real-Time Inter-Rater Reliability of the Council of Emergency Medicine Residency Directors Standardized Direct Observation Assessment Tool

Authors

  • Joseph LaMantia MD,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • Bryan Kane MD,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • Lalena Yarris MD,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • Anthony Tadros BA,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • Mary Frances Ward RN, ANP,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • Martin Lesser PhD,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • Philip Shayne MD,

    1. From the Department of Emergency Medicine, North Shore University Hospital, (JL, AT, MWF) Manhasset, NY; Department of Emergency Medicine, Lehigh Valley Hospital-Muhlenberg, (BK) Bethlehem, PA; Department of Emergency Medicine, Oregon Health and Science University, (LY) Portland, OR; Department of Biostatistics, Feinstein Institute for Medical Research, (ML) Manhasset, NY; Department of Emergency Medicine, Emory University School of Medicine, (PS) Atlanta, GA.
    Search for more papers by this author
  • The SDOT Study Group II

    Search for more papers by this author

  • Funding Sources: None.

  • CoI: The author has no financial conflicts to report.

  • Presented: 2008 National SAEM Annual Meeting, Washington, DC, May 29, 2008; 2008 Southeastern Regional SAEM Meeting, Louisville, KY, March 14–15, 2008; 2008 Western Regional SAEM Meeting, Costa Mesa, CA, March 28–29, 2008; 2008 New York Regional SAEM Conference, New York, NY, April 30, 2008; 2008 New England Regional SAEM Meeting, Shrewsbury, MA, April 30, 2008.

  • We would like to thank the attending physicians at each site who participated in this study.

Address for correspondence and reprints: Joseph LaMantia, MD; e-mail: jlamanti@nshs.edu.

Abstract

Objectives:  Developed by the Council of Emergency Medicine Residency Directors (CORD), the standardized direct observation assessment tool (SDOT) is an evaluation instrument used to assess residents’ clinical skills in the emergency department (ED). In a previous study examining the inter-rater agreement of the tool, faculty scored simulated resident–patient encounters. The objective of the present study was to evaluate the inter-rater agreement of the SDOT in real-time evaluations of residents in the ED.

Methods:  This was a multi-center, prospective, observational study in which faculty raters were paired to simultaneously observe and independently evaluate a resident’s clinical performance using the SDOT. Data collected from eight emergency medicine (EM) residency programs produced 99 unique resident–patient encounters and reported on 26 individual behaviors related to specific core competencies, global evaluation scores for each core competency, and an overall clinical competency score. Inter-rater agreement was assessed using percentage agreement analyses with three constructs: exact agreement, liberal agreement, and binary (pass/fail) agreement.

Results:  Inter-rater agreement between faculty raters varied according to category of measure used. Exact agreement ranged from poor to good, depending on the measure: the overall competency score (good), the competency score for each of the six core competencies (poor to good), and the individual item scores (fair to very good). Liberal agreement and binary agreement were excellent for the overall competency score and the competency score for each of the six core competencies and very good to excellent for the individual item scores.

Conclusions:  The SDOT demonstrated excellent inter-rater agreement when analyzed with liberal agreement and when dichotomized as a pass/fail measure and fair to good agreement for most measures with exact agreement. The SDOT can be useful and reliable when evaluating residents’ clinical skills in the ED, particularly as it relates to marginal performance.

Introduction

The Accreditation Council for Graduate Medical Education (ACGME) requires residency programs to provide documentation, through a variety of outcome assessment measures, that residents have successfully achieved proficiency in the six core competencies. Developed in 2001, the six core competencies include medical knowledge, patient care, professionalism, interpersonal and communication skills, practice-based learning and improvement, and systems-based practice. 1 The ACGME has suggested that direct observation can be useful for outcomes assessment of these competencies and should be included as part of a program’s core assessment system. 2 In addition, at the 2002 Academic Emergency Medicine consensus conference on the core competencies in emergency medicine (EM), an expert panel of EM educators suggested that direct observation of residents providing care in the emergency department (ED) was a particularly valuable opportunity for assessment that took advantage of constant faculty presence in the ED. 3 In response to this, the Council of Emergency Medicine Residency Directors (CORD) developed the standardized direct observation assessment tool (SDOT). The SDOT provides a comprehensive method through direct observation for assessing residents providing clinical care in the ED.

The SDOT lists 26 expected behaviors in a physician–patient encounter and asks the observer to judge the resident on each item, such as needs improvement, at expected, or above expected performance. Each item is linked to one or more of five core competencies; the SDOT developers believed that a sixth core competency, practice-based learning and improvement, was less well determined than the others during the clinical encounter in the ED. The SDOT includes behavioral anchors for each item to guide raters in completing the form. In addition to the 26 individual items, the version of the SDOT used in our study also contains a global assessment score for each of the six core competencies (a later version of the SDOT did not include an assessment of practice-based learning and improvement for the reason noted above), as well as a global score for overall competency.

At present, faculty observing scripted video resident-patient encounters have assessed the inter-rater agreement of the SDOT, 4 although an assessment of the inter-rater reliability of the tool has not been conducted in the clinical setting. The purpose of this study was to evaluate the inter-rater agreement of the SDOT in real-time evaluations of residents in the ED.

Methods

Study Design

This was a multi-center, prospective, observational study in which two EM faculty raters from each of eight EM residency program sites were paired to simultaneously observe and independently evaluate the clinical performance of EM residents using the SDOT. This study was approved by the Institutional Review Board at each participating site.

Study Setting and Population

Eight EM residency programs participated in this study. Programs were located in tertiary care academic medical centers throughout the country. Distribution of type and length of the programs were allopathic EM1 to EM3 (n = 6), allopathic EM1 to EM4 (n = 1), and osteopathic EM1 to EM4 (n = 1).

Study Protocol

Members of the CORD Standardized Evaluation Methods Committee developed and coordinated this study. A CORD SDOT Standardized Evaluation Methods Committee member at each participating site served as the site principal investigator (PI) and was responsible for all aspects of implementation and regulatory compliance. Full-time EM faculty physicians served as raters on a voluntary basis after providing verbal consent to participate in the study. Residents in training and fellows were not eligible to serve as raters, nor were faculty members who served on the SDOT Standardized Evaluation Methods Committee.

In preparation for the observation sessions, faculty raters took part in a 15- to 30-minute orientation to the SDOT conducted by the site PI encompassing a description of the tool and the behavioral anchors. An additional 15 to 30 minutes was spent viewing and scoring videotaped clinical encounters to demonstrate proper use of the tool. A questionnaire was sent to participating sites to ensure that adequate briefing had occurred and to gather information regarding rater demographics and training.

After completing training with the site PI, each member of the faculty pair simultaneously observed individual residents providing clinical care to patients in the ED on a convenience basis and used the SDOT to evaluate those residents. During the assigned observational session, faculty did not have resident supervisory or direct patient care responsibility. Only EM residents were observed in the study; rotating non-EM residents were excluded. Throughout the study, a faculty pair did not observe any resident more than once. Evaluators were also asked to distribute the observations as evenly as possible across the years of training. With all scales, evaluators were instructed to score residents based on the performance expected at their given level of training.

For the purpose of this study, a resident–patient encounter was defined as a period of observation that began when the EM resident first encountered the patient. The observational period ended when both raters independently determined that sufficient information had been gathered to adequately complete the SDOT. This sometimes resulted in one rater completing the observation before the other rater in the pair. The two faculty were not permitted to compare their scores or observations at any time during the study. Completed SDOT forms were given to the site PI and were stored in a secure research file with limited access. This process was repeated at each site until site enrollment goals were met. The number of encounters scored was roughly equivalent between the sites, ranging from 10 to 17.

Upon collection of forms, the PI ensured that the confidentiality of patients, residents, and faculty had been preserved. Hard copies were then mailed to the coordinating center. De-identified data were entered into a Microsoft Access (Microsoft Corp., Redmond, WA) database, and hard copies were stored in a secure file with limited access to investigators.

Measures

The primary outcome variables were the two faculty ratings for each of the 26 individual item scores, the global assessment scores for each of the six core competencies, and the overall clinical competency score. A 3-point Likert scale was used to rate the 26 individual items (1 = needs improvement, 2 = meets expectations, 3 = above expectations, with a fourth option of 4 = not assessed). A 3-point Likert scale was also used to rate the overall clinical competency (1 = needs improvement, 2 = meets expectation, 3 = above expectation).

The ratings for the global assessments of six core competencies were made on a 5-point Likert scale, with the descriptors needs improvement, meets expectations, and above expectations listed above the numbers 1, 3, and 5, respectively.

The primary statistic to measure agreement was the proportion of raters who were in agreement. Three different criteria for agreement were applied to the data: exact agreement, liberal agreement, and binary (pass/fail) agreement. Two raters were said to be in exact agreement if both assessments of the resident were identical; otherwise, it was considered no agreement. Two raters were said to be in liberal agreement if Rater A differed from Rater B by at most one Likert point; otherwise, it was considered no agreement.

Regarding binary (pass/fail) agreement, each of the three assessment scores was dichotomized into two categories: needs improvement and meets or above expectation. For the 26 individual item scores and the overall competency score, the 3-point Likert scale was defined as category 1 (#1) = needs improvement and category 2 (#2 and #3) = meets or above expectation. For the six global scores for each of the core competencies, the 5-point Likert scale was defined as category 1 (#1 and #2) = needs improvement and category 2 (#3, #4, and #5) = meets or above expectation. Two raters were said to be in binary agreement if both assessments of the resident were of identical categories; otherwise, it was considered no agreement.

In all three analyses, if Rater A or Rater B did not provide an assessment of a given resident, then they did not contribute to the calculation of the agreement statistic. For the individual items scores, responses of N/A were included and treated as an additional response category.

Data Analysis

The enrollment goals were based on our initial power analysis, which employed kappa as the main statistical test and required an even distribution of encounters at each site. Upon analyzing the final dataset, the kappa was deemed to be unstable, and percentage agreement was chosen for the final analysis. Furthermore, despite requesting a specific number of subjects for examination at each site, there were variations in the number of subjects enrolled at each site.

The statistical objective was to compute item- and competency-specific measures of agreement pooled across institutions. For each item and each competency and within each institution, the proportion of raters in agreement was computed as the number of rater pairs in agreement divided by the total number of rater pairs providing assessments. The institution-specific proportions were then pooled by computing a weighted mean of proportions appropriate for stratified variables (with institution the stratification variable). 5 Likewise, a weighted variance with standard deviation (SD) was computed. A 95% confidence interval (CI) around the pooled proportion was calculated for each item and competency score.

As previously mentioned, the more commonly used kappa statistic, which measures corrected-for-chance agreement between raters, was not used in this analysis because of unbalanced marginal totals, which can cause kappa to take on counterintuitive or misleading values. In cases such as this, uncorrected agreement has been recommended, as reported in Feinstein and Cicchetti. 6,7

Results

Ninety-nine unique resident–patient encounters were observed and evaluated. Table 1 and Figure 1 show the percent agreement for the overall competency score. The global score for each of the six core competencies is shown in Table 1 and Figure 2, and the 26 individual item scores as calculated with liberal agreement, binary(pass/fail) agreement, and exact agreement are shown in Table 2. The 95% CIs for each calculation are displayed as error bars on the graphs. The number of interactions (n) used in the calculation of the percentage agreement is listed below each score or item in the graphs. For the individual item scores, the results are displayed on four graphs (Figure 3 a–d), grouped according to the categories of clinical behaviors as listed in the SDOT (data gathering, data synthesis and differential diagnosis, patient management, and patient disposition).

Table 1. 
Agreement Between Physician Evaluators Regarding Resident Performance Overall and in Each of the Six Core Competencies
Global Competency ItemInteractionsLiberal AgreementBinary AgreementExact Agreement
nOdds Ratio (95% Confidence Interval)
Overall agreement82100.0 (100.0–100.0)94.7 (90.9–98.5)68.9 (59.6–78.6)
 Patient care9792.5 (87.5–97.5)94.0 (89.5–98.5)46.9 (37.3–56.4)
 Medical knowledge9795.0 (90.7–99.2)93.7 (89.1–98.3)64.5 (55.6–73.4)
 Practice-based learning and improvement7193.8 (88.4–99.1)94.8 (90.0–99.7)62.3 (52.8–71.9)
 Interpersonal and communication skills9693.4 (88.7–98.0)94.8 (90.6–99.0)44.3 (34.8–53.8)
 Professionalism9695.6 (91.8–99.4)98.8 (96.8–100.0)44.0 (34.7–53.3)
 Systems-based practice9294.6 (90.3–98.9)94.6 (90.9–98.4)58.1 (48.8–67.4)
Figure 1.

 Overall clinical competency agreement. Agreement between physician evaluators regarding residents’ overall clinical competency. The 95% confidence intervals are displayed as error bars.

Figure 2.

 Agreement between physician evaluators regarding resident performance in each of the six core competencies. Individual items were classified into six different categories to develop a global assessment score. The 95% confidence intervals are displayed as error bars.

Table 2. 
Agreement Between Physician Evaluators Regarding Resident Performance on Individual Standardized Direct Observation Assessment Tool Items
QuestionInteractionsLiberal AgreementBinary AgreementExact Agreement
nOdds Ratio (95% Confidence Interval)
 1. Privacy or confidentiality9699.2 (97.2–101.2)92.0 (87.0–96.9)66.3 (57.4–75.1)
 2. Professional or communicates well9699.0 (97.0–100.0)95.4 (91.2–99.6)69.4 (60.6–78.3)
 3. Language translation9580.2 (72.7–87.7)74.1 (66.0–82.2)73.0 (64.7–81.2)
 4. Accurate or essential information gathering9195.5 (91.5–99.5)84.8 (78.2–91.4)61.5 (51.7–71.3)
 5. Appropriate physical examination9498.1 (95.3–101)89.2 (83.5–95.0)69.0 (59.8–78.1)
 6. Explains pathology7687.3 (81.1–93.4)85.1 (78.1–92.1)71.5 (62.2–80.8)
 7. Case presentation 95100.0 (100.0–100.0)96.7 (93.3–100.0)67.3 (58.6–76.0)
 8. Differential diagnosis, treatment,  disposition plan9697.8 (95.1–100.0)87.6 (81.5–93.6)68.4 (59.7–77.1)
 9. Benefits, risks, indicators for procedure9063.2 (54.8–71.7)59.2 (50.2–68.2)53.1 (43.7–62.5)
10. Sequences critical actions9287.8 (82.5–93.2)83.3 (77.4–89.2)69.2 (61.1–77.2)
11. Competently performs procedure9276.9 (68.5–85.2)70.1 (61.0–79.2)68.0 (58.7–77.3)
12. Communicates with colleagues and  ancillary staff9593.2 (89.5–96.8)92.1 (87.9–96.2)69.6 (61.4–77.8)
13. Resolves conflicts9665.0 (56.3–73.8)61.3 (52.1–70.4)51.3 (41.5–61.1)
14. Communicates care plan8881.1 (74.2–88.1)80.1 (72.7–87.6)71.5 (62.5–80.4)
15. Charting8182.5 (75.8–89.2)80.0 (72.8–87.3)66.4 (56.7–76.1)
16. Prioritizes patients9576.2 (68.7–83.8)74.6 (66.8–82.4)58.7 (49.8–67.6)
17. Plans appropriate for system resources9578.1 (72.6–83.5)72.4 (65.4–79.5)57.6 (49.0–66.1)
18. Plans appropriate for patient services9679.6 (73.2–86.1)75.9 (68.5–83.3)65.3 (56.7–73.9)
19. Remains focused9595.5 (91.6–99.3)90.4 (85.1–95.6)67.3 (58.9–75.8)
20. Uses appropriate info for making decisions9598.0 (95.2–100.0)92.5 (87.2–97.9)79.0 (71.1–86.8)
21. Reevaluates patients9374.7 (66.6–82.9)69.0 (60.4–77.6)54.5 (44.6–64.4)
22. Documents reassessed8769.4 (60.5–78.2)66.7 (57.7–75.6)56.5 (47.0–66.1)
23. Uses social work resources9476.9 (68.5–85.2)72.5 (63.9–81.0)68.7 (59.9–77.5)
24. Communicates discharge plan with patients8882.5 (75.0–89.9)82.9 (75.6–90.2)78.4 (70.2–86.6)
25. Carries out appropriate discharge plan9285.9 (80.0–91.8)84.5 (78.0–91.0)66.1 (57.6–74.6)
26. Appropriate patient follow-up9174.6 (66.2–82.9)75.0 (66.8–83.2)68.1 (59.0–77.1)
Figure 3.

 Agreement between physician evaluators regarding resident performance on individual standardized direct observation assessment tool (SDOT) items. (a) Individual items: data gathering. (b) Individual items: data synthesis and differential diagnosis. (c) Individual items: patient management. (d) Individual items: patient disposition. Individual items are divided into separate graphs based upon the order of the SDOT. The 95% confidence intervals are displayed as error bars.

Discussion

In shifting its focus from process considerations to outcome assessment of resident core competencies, the ACGME has placed an increasing demand on EM residency programs to develop tools that are valid, reliable, and feasible for use. In a prior study examining the inter-rater reliability of the SDOT, the tool was found to have good reliability when used to assess simulated resident–patient encounters. 4 Although the assessments were provided by a large number of faculty participants (n = 82), the study was limited by the small number of encounters assessed (n = 2), and the artificial nature of the encounters (scripted scenarios with physician volunteers as actors). In this study, the inter-reliability of the SDOT in the clinical setting, involving a large number of clinical encounters (n = 99) and real patients and physicians in real time, was investigated.

We analyzed the three components of the SDOT (overall competency, global performance for each core competency, and performance on specific behaviors) with three different constructs: exact agreement, liberal agreement, and binary (pass/fail) agreement. For all three components, percentage agreement for the exact agreement construct was only fair to good and, in certain instances (e.g., global core competency scores for patient care, interpersonal and communication skills, and professionalism), appeared to be no better than chance. Percentage agreement with the exact agreement construct in general was better for the overall competency score (68.9%) and for the individual items scores (51.3% to 79%).

Regarding the percentage agreement for the exact agreement construct, it is not surprising that exact agreement was seen to be variable and, at times, poor. Studies point to the significant variability in faculty observation and evaluation of EM resident clinical behavior. 8 What is surprising is that percentage agreement was poor for the core competencies of patient care, interpersonal and communication skills, and professionalism. One would expect direct observation of residents in the clinical setting to provide particularly robust information about these competencies that might not otherwise be apparent using other, less-direct means of evaluation. It is not clear, then, why percentage agreement was lower in regard to these competencies.

Percentage agreement, in contrast, was very good to excellent for the liberal agreement construct. Although one might expect this for the scoring of the individual behavior items and the overall competency score, because these are rated on only 3-point scales, agreement was also excellent for the global scores for each of the six core competencies, which were rated on 5-point scales and ranged from 92.5% to 95.6%.

The most useful findings in our study relate to the very good to excellent percentage agreement across all categories of performance when the data are examined using the binary (pass/fail) construct. With this construct, we divided the scoring categories into marginal to unacceptable performance (needs improvement) versus acceptable performance (meets or above expectations). Percentage agreement in regard to global scores related to the core competencies, and scores for overall competency ranged from 93.7% to 98.8%. Although there was variability in the range of percentage agreement for individual behaviors observed, similar to that observed with the exact agreement and liberal agreement constructs, percentage agreement overall was very good to excellent, ranging from 59.2% to 96.7%.

Across all three constructs, there were a number of behaviors observed for which the percentage agreement tended to be lower than for other behaviors. These included #9 (explains benefits and risks of procedure), #13 (resolves conflicts), #16 (prioritizes patients), #17 (makes plans appropriate for system resources), #21 (reevaluates patients), and #22 (reassesses documents). It is not clear why these items show lower agreement, but it may relate to the complexity of the nature or description of the behavior, the difficulty of observing this behavior in the clinical setting, rater training to properly observe and assess these behaviors, or other factors.

Of note is the number of interactions scored for percentage agreement calculation for the practice-based learning and improvement competency. The number scored was 71, well below the numbers for all the other five competencies scored, which ranged from 92 to 97 interactions. This would indicate that a significant number of evaluators were unable to complete the scoring of the residents’ practice-based learning and improvement competency based on their observation of the ED clinical care, supporting the contention of the SDOT developers that this competency is not well determined by direct observation in the ED. Also, regarding the number of interactions recorded for the listed behaviors, several, including #6 (explains pathology) and #15 (charting), were notably less often completed. This may relate to a relative lack of opportunity for observing a behavior more rarely demonstrated (e.g., #6, explains pathology) or to a lack of attention to less-often assessed behaviors (e.g., #15, charting). These conclusions perhaps point to the need to observe a larger number of interactions to ensure observation of a broad range of clinical behaviors or a need on the part of faculty to focus on less-obvious but also-important components of clinical care, namely, charting.

Limitations

The main limitation of the study is the unknown effect of the faculty evaluators’ prior exposure to the residents’ performance and what bias this may have introduced into the rating of the residents across the different types of measurements. One might expect this bias to be particularly evident in the overall competency score and the global scores for each of the core competencies, given that these are general assessments that faculty might score based partly upon prior knowledge of the residents’ performance. The data show that there was greater variability in percentage agreement when performance with specific behaviors was observed. Perhaps this limitation could be overcome with a study that involves EM residents at the beginning of their training, when faculty have had no prior exposure to the residents in the training program.

Another limitation of the study is the unknown effect of rater training and instrument design on the aspects of reliability studied. Future reliability studies can address these important issues with study designs that use more-formal and -comprehensive rater training programs and direct attention to possible revision of individual items in the SDOT according to our analysis and that of the prior study4 examining the reliability of the SDOT. Also, lessons may be learned in this regard from the American Board of Emergency Medicine oral board certification process, an examination process that assesses clinical skills and has been shown to demonstrate high inter-examiner agreement. 9

Conclusions

High levels of agreement were noted for the overall competency score and the global scores for each of the core competencies when analyzed using a binary (pass/fail) construct. In addition, with this construct, very good levels of agreement were noted overall when individual resident behaviors were observed and evaluated. This favorable level of agreement across all types of evaluation during direct observation of residents in the ED points to the value of the SDOT as a potential tool for remediation and direct feedback, particularly with regard to marginal clinical performance. In addition, because of its high level of agreement when evaluating the core competencies of residents providing clinical care in the ED, it appears to be a useful tool that offers a dependable measure for residency educators interested in advancing their assessment programs according to the mandates of the ACGME.

Appendix

The SDOT Study Group II

Joseph LaMantia, MD, North Shore University Hospital, Department of Emergency Medicine, Manhasset, NY; Bryan Kane, MD, Lehigh Valley Hospital-Muhlenberg, Emergency Medicine, Bethlehem, PA; Lalena Yarris, MD, Oregon Health & Science University, Department of Emergency Medicine, Portland, OR; Anthony Tadros, North Shore University Hospital, Department of Emergency Medicine, Manhasset, NY; Mary Francis Ward, RN, ANP, North Shore University Hospital, Department of Emergency Medicine, Manhasset, NY; Martin Lesser, PhD, Feinstein Institute for Medical Research, Department of Biostatistics, Manhasset, NY; Philip Shayne, MD, Emory University School of Medicine, Department of Emergency Medicine, Atlanta, GA; Patrick Brunett, MD, Oregon Health and Science University, Emergency Medicine, Portland, OR; Chris Kyriakedes, DO, Akron General Medical Center, Emergency Medicine, Akron, OH; Stephen Rinnert, MD, SUNY Downstate Medical Center, Emergency Medicine, Brooklyn, NY; Joseph Schimdt, MD, Baystate Health, Emergency Medicine, Springfield, MA; David Wald, DO, Temple University, Emergency Medicine, Philadelphia, PA; Meredith Akerman, MS, Feinstein Institute for Medical Research, Department of Biostatistics, Manhasset, NY; Elayne Livote, MS, Feinstein Institute for Medical Research, Department of Biostatistics, Manhasset, NY; David Soohoo, North Shore University Hospital, Department of Emergency Medicine, Manhasset, NY; Jonathan Gong, North Shore University Hospital, Department of Emergency Medicine, Manhasset, NY.

Ancillary