Synchronous Collection of Multisource Feedback Evaluations Does Not Increase Inter-rater Reliability


  • Presented at the 2011 Society for Academic Emergency Medicine annual meeting, Boston, MA, June 2011.

  • The authors have no relevant financial information or potential conflicts of interest to disclose.

  • Supervising Editor: Terry Kowalenko, MD.

Address for correspondence and reprints: Gregory Garra, DO; e-mail:


ACADEMIC EMERGENCY MEDICINE 2011; 18:S65–S70 © 2011 by the Society for Academic Emergency Medicine


Objectives:  Most multisource feedback (MSF) evaluations are performed asynchronously, with raters reflecting on the subject’s behavior. Numerous studies have demonstrated poor inter-rater reliability of MSF. This may be due to cognitive biases that are inherent in such a process. We sought to determine if within- and between-rater group reliability is increased when evaluations are gathered synchronously and relate to a specific patient interaction.

Methods:  This was a survey at a university emergency department (ED) of 30 emergency medicine (EM) residents. ED nurses and faculty anonymously participated in asynchronous MSF assessment of resident performance from February to April 2010 using a Web-based survey, the Emergency Medicine Humanism Scale (EM-HS). In May 2010, a second round of MSF collection was conducted in the ED. At the conclusion of patient encounters, the EM-HS was synchronously obtained from ED nurses and faculty. Evaluators were instructed to assess the resident based on the patient encounter, placing aside any preconceptions of resident performance, attitude, or behavior. Evaluators rated resident performance using a 1–9 scale (“needs improvement” to “outstanding”). The mean rating for each of the questions and the total score provided by each evaluator class was calculated for each EM resident. Differences between the asynchronous and synchronous ratings were compared with t-tests. Pearson correlations were used to measure agreement in scores within and between nurse and faculty rater groups. Correlations > 0.70 were deemed acceptable and are reported with 95% confidence intervals (CIs).

Results:  Twenty-one of 30 residents had assessments collected by both asynchronous and synchronous methods. A total of 699 Web-based (asynchronous) assessments were completed by nurses and 149 by faculty. Synchronous nurse and faculty assessments were obtained in 105 resident–patient encounters. There was no difference in faculty ratings between the MSF collection methods. Nurses assigned slightly (but significantly) higher ratings during synchronous collection. Correlation of the total MSF score between asynchronous and synchronous feedback collection methods within the faculty rater group was poor (0.18, 95% CI = −0.22 to 0.60). Correlation of the total MSF score between asynchronous and synchronous feedback collection methods within nurse rater groups was moderate (0.63, 95% CI = 0.27 to 0.83). Correlations between faculty–nurse rater groups for the total MSF collected asynchronously and synchronously were moderate (0.39, 95% CI = −0.05 to 0.7; and 0.44, 95% CI 0.01 to 0.73, respectively).

Conclusions:  Synchronous collection of MSF did not provide clinically different EM-HS scores within rater groups and did not result in improved correlations. Our small, single-center study supports asynchronous collection of MSF.