Inaccuracy of the Global Assessment Score in the Emergency Medicine Standard Letter of Recommendation
Supervising Editor: Nicole DeIorio, MD.
Address for correspondence and reprints: Leslie C. Oyama, MD; e-mail: email@example.com.
Objectives: The standard letter of recommendation (SLOR) is used by most emergency medicine (EM) faculty to submit evaluations for medical students applying for EM residency programs. In the global assessment score (GAS) section, there is a crucial summative question that asks letter writers to estimate the applicant’s rank order list (ROL) position in their own program. The primary aim of the study was to determine if these estimated global assessment tiers agreed with the actual ROL, using the criteria recommended by the Council of Emergency Medicine Residency Directors (CORD).
Methods: Data from SLORs written by EM faculty from five California institutions were retrospectively collected from the 2008–2009 residency application year. Descriptive and comparative statistical analyses were performed using the documented GAS tiers and actual ROL positions.
Results: A total of 105 SLORs were reviewed from the five participating institutions. Three SLORs were excluded and 102 were analyzed. Only 27 (26%) SLORs documented a GAS tier that accurately predicted the applicant’s actual ROL position. The GAS tier overestimated the applicant’s position on the ROL in 67 (66%) SLORs, whereas it underestimated the position in eight (8%) SLORs. Accuracy was poor regardless of the number of letter writers on the SLOR (p = 0.890), the writer’s administrative title (p = 0.326), whether the student was a home or visiting student (p = 0.801), or if the student had prior EM rotation experience (p = 0.793).
Conclusions: Standard letter of recommendation writers are inaccurate in estimating the ROL position of the applicant using the GAS tier criteria. The GAS tiers were accurate only 26% of the time. Because of the valuable role that the SLOR plays in determining an applicant’s competitiveness in the National Resident Matching Program (NRMP) in EM, future discussion should focus on improving the consistency and accuracy of the GAS section. Furthermore, there needs to be a national dialogue to reassess the utility of the criterion-based GAS within the SLOR.
ACADEMIC EMERGENCY MEDICINE 2010; 17:S38–S41 © 2010 by the Society for Academic Emergency Medicine
Senior medical students annually apply for admission into a residency program training program through the National Resident Matching Program (NRMP). There are many important evaluative components that factor into the residency match selection process, which include the applicant’s United States Medical Licensing Examination (USMLE) scores, the medical student performance evaluation from the dean of the medical school, clinical rotation grades, extracurricular activities, and medical school reputation.1–3
Letters of recommendation also play a critical role in a medical student’s application. For emergency medicine (EM) faculty writing letters for EM-bound applicants, the Council of Emergency Medicine Residency Directors (CORD) encourages the use of a standard letter of recommendation (SLOR), developed by CORD in 1996.4 This replaces a traditional narrative letter of recommendation (NLOR) format. The purpose of creating the SLOR was to standardize the list of observed personal and clinical characteristics assessed among the applicants, increase interrater reliability among the letter writers, and minimize the reading time spent on the part of the residency faculty evaluating the applicants.5,6
The intention of the SLOR is to serve as a tool for residency programs to gain an accurate assessment of the candidate. It is composed of four sections: background, qualifications for EM, global assessment, and written comments. SLOR readers focus especially on the global assessment section, because it provides a summary evaluation of the applicant and his/her true competitiveness.7
In the global assessment section, the letter writer is asked to rank the applicant as outstanding (top 10%), excellent (top one-third), very good (middle one-third), and good (lower one-third). Additionally in this section, the letter writer answers the question “How highly would you estimate the candidate will reside on your match list?” The applicant is then categorized into one of four predefined global assessment score (GAS) tiers: very competitive, competitive, possible match, and unlikely match. These categories are defined based on multiples of that residency program’s total number of postgraduate year (PGY)-1 positions available in the NRMP, according to SLOR instructions on the CORD website (Table 1).4
Definition of GAS by ROL
|Very competitive||Within 2x on ROL||≤20 on ROL|
|Competitive||Between 2x and 4x on ROL||21–40 on ROL|
|Possible match||Between 4x and 6x on ROL||41–60 on ROL|
|Unlikely match||Greater than 6x on ROL||>60 on ROL|
In this study, we assessed the agreement between the documented GAS tiers on the SLOR and the applicants’ actual rank order list (ROL) positions in the letter writer’s residency program during a single application cycle at five California residency programs. We hypothesized that the GAS tiers poorly estimated the ROL positions and that the GAS in SLORs written by the program director (PD), associate or assistant program director (APD), and clerkship director (CD) were more accurate than those written by other faculty. We secondarily hypothesized that there would be no difference in GAS and ROL tier agreement by visiting student status, the number of EM rotations, or the number of authors who wrote the SLOR. To our knowledge, this type of systematic evaluation of the SLOR GAS tiers and ROL positions has not been previously published.
This was a multicenter, retrospective study assessing the accuracy of the SLOR letter writers’ GAS in estimating the applicants’ actual ROL position from five California EM residency programs during the 2008–2009 residency application cycle. Institutional review board approval was granted by each of the five participating institutions: the University of California–Los Angeles (UCLA), the University of California–San Diego (UCSD), the University of California–San Francisco (UCSF), the University of California–San Francisco at Fresno (UCSF-Fresno), and the University of Southern California (USC).
Study Setting and Population
The five institutions in this study were selected as they represent a wide range of program sizes and formats. These programs matched 15 (UCLA-Harbor), eight (UCSD), 12 (UCSF-General), 10 (UCSF-Fresno), and 17 (USC) PGY-1 residents in 2009. The population studied included the faculty SLOR letter writers at these five institutions in the 2008–2009 residency match process.
Each of the five residency programs in the study identified SLORs written by the program’s faculty during the 2008–2009 residency application cycle. Each site coordinator completed a data collection instrument, designed on a Microsoft Excel (Microsoft Inc., Redmond, WA) spreadsheet. Data collected included the GAS tier from each SLOR (very competitive, competitive, possible match, unlikely match); the actual corresponding ROL position (≤2x, between 2x and 4x, between 4x and 6x, >6x, or did not rank; see Table 1); the administrative title of the SLOR writer(s); whether the applicant was an internal or external candidate; and whether this was the applicant’s first, second, or third EM rotation. The names and identifying traits of students were removed from the spreadsheets prior to submission to the primary investigator (LO), who merged and collated the data sheets.
We did not include a sample size calculation for this descriptive study. Frequencies and percentages of GAS tiers, ROL positions, student characteristics, and letter writer characteristics were analyzed. SLORs were also analyzed by number of writers (individual and multiple) and administrative role of the writers (PD, APD, CD, and multiple/other). Accuracy between GAS tiers and actual ROL positions were determined. A chi-square test was performed to measure the association between writer characteristics and accuracy. p-values < 0.05 were considered statistically significant. Analysis of the data was conducted using SPSS version 17.0 (SPSS Inc., Chicago, IL).
A total of 105 SLORs were reviewed from the five participating institutions. This represents the total number of SLORs written by all participating sites. One letter writer graded two applicants as both “very competitive” and “competitive” in the GAS section. These two SLORs were excluded because of the dual rankings. Another applicant withdrew prior to the match and thus was excluded. Multivariate analysis was performed on the remaining 102 SLORs. General characteristics of the students and letter writers from the SLORs are presented in Table 2.
Descriptive Analysis of Applicants and Letter Writers from the SLORs (n = 102)
| Yes||55 (53.9)|
| No||47 (46.1)|
|EM rotation experience|
| First EM rotation||46 (45.1)|
| Second EM rotation||55 (53.9)|
| Third EM rotation||1 (1.0)|
| PD only||7 (6.9)|
| APD only||10 (9.8)|
| CD only||59 (57.8)|
| PD and APD||4 (3.9)|
| PD and CD||9 (8.8)|
| PD, APD, and CD||5 (4.9)|
| Other||8 (7.8)|
The number of SLORs written by each institution ranged from nine to 44. Because the programs writing more SLORs may have skewed the results, post-hoc subanalyses were performed, with each institution evaluated separately. Individually, the data showed a similar poor accuracy between the GAS and the ROL position.
The frequencies of SLORs as categorized by GAS and the actual ROL position are presented in Table 3. Of the 102 SLORs analyzed, only 27 (26%) correctly matched the applicant’s actual ROL position. In contrast, eight SLORs (8%) documented a GAS that underestimated the applicant’s actual ROL position, and 67 (66%) documented a GAS that overestimated the applicant’s actual ROL position. The differences between GAS and ROL tier accuracy by student and SLOR writer characteristics are presented in Table 4.
Frequency of GAS Tiers and the Actual ROL Positions (n = 102)
|≤2x||18 (47.4)*||7 (14.0)||0 (0.0)||0 (0.0)||25|
|Between 2x and 4x||8 (21.1)||8 (16.0)*||1 (7.1)||0 (0.0)||17|
|Between 4x and 6x||9 (23.7)||10 (20.0)||1 (7.1)*||0 (0.0)||20|
|>6x||2 (5.3)||21 (42.0)||4 (28.6)||0 (0.0)*||27|
|Did not rank||1 (2.6)||4 (8.0)||8 (57.1)||0 (0.0)||13|
Correlation of GAS and ROL Tier Agreement Based on Student and SLOR Writer Characteristics
| Yes||14 (51.9)||41 (54.7)||0.801|
| No||13 (48.1)||34 (45.3)|
|EM rotation experience|
| First EM rotation||13 (48.1)||33 (44.0)||0.793|
| Second EM rotation||14 (51.9)||41 (54.7)|
| Third EM rotation||0 (0.0)||1 (1.3)|
|Number of SLOR writers|
| Individual||5 (18.5)||13 (17.3)||0.890|
| Multiple||22 (81.5)||62 (82.7)|
|Role of SLOR writers|
| PD or APD only||3 (11.1)||14 (18.7)||0.326|
| CD only||16 (59.3)||43 (57.3)|
| Combination/other||8 (29.6)||18 (24.0)|
The SLOR was developed by CORD and offers better interrater reliability than traditional NLORs.6 The global assessment section summarizes the letter writer’s impression of the applicant’s skills and qualifications. By categorizing the applicant in one of the four GAS tiers, the letter writer attempts to predict the applicant’s anticipated ROL position in his or her program. With accurate GAS documentation, residency programs reading the SLORs can more easily determine the applicant’s competitiveness. Our study, however, shows poor GAS accuracy and thus calls into question the practical utility of the GAS question on the SLOR. Three reasons may account for the poor accuracy of the GAS–ROL pairing.
- 1Inexperienced SLOR letter writers may be unaware of the CORD instructions about the predefined criteria for the GAS tiers. Further education, either on the CORD website or a clarification statement on the actual SLOR template itself, may improve accuracy.
- 2There are many components that determine an applicant’s position on the ROL. SLOR writers may not have access to these components, such as extracurricular accomplishments, grades, and test scores, which all influence the ROL. Thus, it may be worthwhile to encourage letter writers to view the student’s entire file prior to composing the SLOR. Additionally, letter writers should consider writing SLORs just prior to the SLOR deadline, so that each applicant can be compared to the larger applicant pool.
Despite this approach of collecting more supporting information about the applicant, the GAS may still be inaccurate because the letter writer does not have access to the dean’s medical student performance evaluation, which would have comments about disciplinary actions, failed exam results, and other “red flag” occurrences. Furthermore, an applicant’s ROL position is also partially determined by other SLORs and interview performance, which are both unavailable to the SLOR writer.
- 3SLOR writers may be subject to grade inflation. In 66% of the SLORs, the letter writer documented a higher GAS tier than the actual ROL position. A letter writer may have felt pressured to advocate on behalf of the applicant and overstated the student’s overall competitiveness in the GAS section, especially if it seemed that other writers were overestimating their applicants’ GAS tiers.
In the global assessment section of the SLOR, there are two questions. One asks the letter writer to “rank” the applicant as outstanding, excellent, very good, and good. The other asks the letter writer to score the applicant in one of four GAS tiers, based on ROL positions. Both give a summative assessment of the applicant. Both are based on the letter writer’s personal assessment of the applicant based on the information available. Given the inaccuracy of the GAS–ROL pairing, this raises the question of whether the GAS tier question should be removed from the global assessment section because of its redundancy and predictive inaccuracy.
There are inherent flaws to retrospective data abstraction studies, which include bias errors. Because the study required only objective measurements (SLOR data and ROL positions), recall and observer biases were unlikely. In contrast, if this study were prospectively conducted on letter writer accuracy of the SLOR, the Hawthorne effect may have biased the results.
There was only one data abstracter per site. The accuracy of data collection was not double-checked and may have led to miscoding or overlooked SLORs. If these errors occurred, however, they were likely to have occurred at random at each site and would not be expected to bias the results in one direction or the other.
This study was conducted only within California residency programs over a single residency application cycle and may not be externally valid and applicable to other U.S. residency programs for future years. The five participating institutions in this study, however, were purposely selected because they represented a wide range of program sizes and types.
As with many educational research studies, sample size is a limitation. We attempted to increase our sample size by conducting the study at multiple residency programs. Despite this, we were still unable to identify potential associations between writer characteristics and accuracy. Future investigations in non-California programs with larger samples should be conducted.
The development of the standard letter of recommendation was intended to decrease time in reading; increase interrater reliability; and create standard, objective data to evaluate applicants. The global assessment score, with a directive to create standard, objective data, showed poor accuracy in predicting the applicant’s actual rank order list position. In our study, the global assessment score was accurate only 26% of the time. There needs to be a national dialogue to discuss the evaluative utility of the global assessment score question on the standard letter of recommendation.
The authors thank David Burbulys, MD, James Comes, MD, Binh Ly, MD, Susan Promes, MD, Brandy Snowden, MPH, CCRP, and Stuart Swadron, MD, for their assistance with data collection. Special thanks to Esther K. Choo, MD, for her assistance with methods design.