The impact of United States Medical Licensing Exam (USMLE) step 1 cutoff scores on recruitment of underrepresented minorities in medicine: A retrospective cross‐sectional study

Abstract Background and Aims United States Medical Licensing Exam (USMLE) scores are the single, most objective criteria for admission into residency programs in the country. Underrepresented minorities in medicine (URiM) are found to have lower USMLE scores compared to their White counterparts. The objective of this study is to examine how USMLE step 1 cutoff scores may exclude self‐reported URiM from the residency interview process across various specialties. Methods This was a retrospective cross‐sectional study of 10 541 applicants to different residency programs at Zucker School of Medicine at Hofstra/Northwell Health between May 2014 and May 2015. We identified Blacks and Hispanics as URiM. The primary outcome is the percentage of applicants with USMLE step 1 score above different ranges of cutoff score, from 205 to 235 in five‐point increments, by race/ethnicity and by URiM status. Secondary outcome is percentages of URiM vs non‐URiM above and below mean USMLE step 1 scores by different specialties (internal medicine, obstetrics/gynecology, pediatrics, and psychiatry). Results The study sample included 2707 White, 722 Black, 805 Hispanic, 5006 Asian, and 562 Other Race/Ethnicity applicants. Overall, 50.2% were male, 21.3% URiM, 7.4% had limited English proficiency, 67.6% attended international medical schools, and 2.4% are Alpha Omega Alpha Honor Medical Society (AOA) members. The mean (±SD) USMLE step 1 score was significantly greater among non‐URiM applicants as compared to URiM applicants (223.7 ± 19.4 vs 216.1 ± 18.4, P < .01, two‐sample t‐test). Non‐URiM applicants were younger, and the percentage of male and AOA applicants was greater among non‐URiM applicants as compared to URiM applicants (50.5% vs 47.7%, P = .02, Chi‐Square test; 2.9% vs 1.2%, P < .01, Chi‐Square test, respectively). Conclusion Using a USMLE step 1 cutoff score as an initial filter for applicant recruitment and selection could jeopardize the benefits of a diverse residency program. Practical implications are discussed.


| INTRODUCTION
Research has consistently shown that a large segment of ethnic and racial minorities in the United States (US) population face inequities in both health care quality and access. [1][2][3] A growing body of empirical evidence has shown that increasing the diversity of the physician workforce may assist in eliminating health disparities in the US. 1,2,[4][5][6][7] For example, results from previous research indicates that minority physicians are more likely to care for minority patients and work in underserved communities. 4,7 Patient-provider concordance can produce better communication, trust, satisfaction, adherence to medication, and health outcomes, supporting the case for diversity. 1,[8][9][10] Despite the many benefits of a diverse physician workforce, there are various factors that limit it. [11][12][13] At the Graduate Medical Education (GME) level, one potential limiting factor of a diverse physician workforce is the overemphasis on standardized test scores such as the United States Medical and Licensing Examination (USMLE). 1,11,[14][15][16] Administered by the National Board of Medical Examiners (NBME), the USMLE is a series of three tests (steps 1-3) for the purpose of granting medical licenses to physicians who want to practice in the US. 11,16,17 The USMLE step 1 is intended to assess medical students' understanding and application of basic science concepts to medical practice, with a special emphasis on principles underlying modes of disease, therapy, and health. [15][16][17] All residency program applicants graduating from allopathic schools are required to take USMLE step 1, with many program directors using USMLE and the Comprehensive Osteopathic Medical Licensing Examination (COMLEX-USA; for osteopathic school) scores in making their decisions to grant interviews. 11,15,18,19 The NBME recognizes the use of step 1 scores as "a major factor in residency screening and selection," which may be useful to some key stakeholders but viewed as a negative consequence for others, such as those underrepresented minorities in medicine (URiM). 15,20 The Association of American Medical Colleges (AAMC) defines URiM as "racial and ethnic populations that are underrepresented in the medical profession relative to their numbers in the general population." 21 URiM include individuals from African American (AA), Hispanic/Latino (HL), and Native American racial and ethnic groups. 21 According to the 2016 U.S. Census data, racial and ethnic minorities comprised at least 38.7% of the U.S. population, 17.8% of which were H/L, 13.3% Black/AA, and 2% Native Americans 22,23 Yet, between 1997 and 2017, AA, H/L, and Native Americans made up only 4%, 4%, and <0.04% of medical doctors, respectively. 1,2,24 In 2014, the population of AA and H/L in New York City, one of the most ethnically diverse cities in the US, was 53%, yet, only 12% of practicing physicians were URiM. 2,24 Since the implementation of step 1 in the 1990s, exam results were reported as a pass/fail status, but with time, it also included a three-digit numeric score. 11,14 Typically, URiM students score lower on standardized tests than do White students. 11,25 One study found that based on USMLE cutoff scores, AA were three to six times less likely to be offered an interview compared to non-AA; White students had a mean score of 210, while Black students had a mean score of 187.9. 11 A 2019 study by the NBME showed that compared to White males, female students scored 5.9 points lower, while Asians, H/L, and Black test-takers scored 4.5, 12.1, and 16.6 points lower, respectively, on the USMLE step 1. 26 In this study, we explore how using cutoffs to screen applicants may affect the number of URiM participating in the interview process for residency programs at a single institution. We focus on the potential bias in USMLE step 1 cutoff scores across racial/ethnic groups and medical specialties. Despite the importance of racial/ethnic biases in the USMLE, only one study has previously examined the racial/ethnic biases of USMLE step 1 scores. 11 The study was limited by using socially assigned race by examining photographs of the applicants to determine whether they were African American or non-African American. 11 The standard in the US is for one to self-report their racial/ethnic identity. We built on the previous study by: (a) using the Electronic Residency Application Service (ERAS) 27

| METHODS
In this retrospective cross-sectional study, we extracted data from ERAS of 10 541 residency applicants applying to five residency programs at Zucker School of Medicine at Hofstra/Northwell Health between May 2014 and May 2015. ERAS is the centralized online application service that medical students around the world use to deliver their application, along with supporting documents, to residency programs in the US. 27 This study was conducted as part a larger institution-wide diversity and equity strategy. The programs were selected because they represent the largest programs for our academic institution. Residency programs studied included internal medicine (two sites), pediatrics, obstetrics-gynecology, and psychiatry residencies at North Shore University Hospital, Long Island Jewish Medical Center, and Forest Hills Hospital. Northwell Health is located in the New York metropolitan area and is the third largest health care system nationally. The program has more than 1800 residents and fellows, serving patients at 23 hospitals.
The primary outcome was USMLE step 1 score. For applicants with more than one USMLE step 1 score, we used their highest USMLE step 1 score. Self-reported race/ethnicity was categorized into five groups: non-Hispanic White, non-Hispanic Black, Hispanic, Asian, and Other. This categorization was based off of the ERAS predetermined variables, which are "self-identified" by applicants. Using the AAMC definition, 21 URiM were defined as those in the Non-Hispanic Black, Hispanic, and Other categories. Non-URiM were defined as those in Asian or White category.
Based on previous research, 11 we also identified key sociodemographic characteristics associated with USMLE step 1 scores. They include age, sex/gender, limited English proficiency status, international medical school attendance, and Alpha Omega Alpha (AOA) status. We calculated age as of 1 September 2014 using applicants' date of birth. Sex was categorized as male, female, or missing; while ERAS technically records "gender," for the purposes of this study, since we are reporting male/female, we used the term "sex." English proficiency status was defined as English native/functionally native, or not English native/functionally native; international medical school attendance was categorized as trained in the US (any medical school training in the US) or not trained in the US (no medical school training in the US).
We performed descriptive analyses (means and standard deviations or medians and interquartile ranges) of residency applicants and then subgroup analyses by race/ethnicity category and URiM status.
USMLE step 1 scores are normally distributed, therefore, one-way analysis of variance (ANOVA) was used to compare the mean USMLE step 1 scores for each race/ethnicity, and two-sample t-test was used to compare mean USMLE step 1 score by URiM status. Next, we examined USMLE step 1 distributions by medical specialties. To examine how USMLE step 1 cutoff scores would affect the number of applicants qualifying for a potential interview, we calculated the percentage of applicants with USMLE step 1 score above different ranges of cutoff score: 205, 210, 215, 220, 225, and 230; these were conveniently determined for ease of analysis.
Lastly, we calculated mean USMLE step 1 score for different medical specialties and compared percentage of applicants above the mean USMLE step 1 score by URiM status. The Chi-Square test was used to compare percentages by groups. The ANOVA test was used to compare the means of continuous variables by group. For nonnormally distributed data, the nonparametric Wilcoxon Signed Rank Sum test or Kruskal-Wallis test was used to compare the distributions of continuous variables by group. All statistical tests were performed at the 5% significance level. All analyses were conducted using SAS, version 9.4 (SAS Institute Inc., Cary, North Carolina). The IRB granted a waiver of consent as it considered that it was not practical to obtain informed consent from participants; data was archival. All data was de-identified. All de-identified data were provided by the Office of Academic Affairs within the Zucker School of Medicine at Hofstra/Northwell Health. Black applicants (P < .01, ANOVA). We also observed significant differences in sex, international medical school attendance, and AOA status by race/ethnicity group. We also observed a significant difference in the distribution of age by race/ethnicity group (Table 1).

| RESULTS
We also examined these characteristics by URiM group (Table 2).
We then examined whether there were differences in USMLE step 1 score by specialties, and we observed significant differences in the means of USMLE step 1 score by race/ethnicity group within each specialty (Table 3).
Next, we examined the percentage of applicants with USMLE step 1 scores above the cutoff score in five-point increments, from 205 to 235 ( Lastly, for each specialty, the overall mean USMLE step 1 score was calculated. The percentage of URiM and non-URiM that were above the mean score were calculated. We observed significant differences in the percentage of applicants above the mean between  Table 5).

| DISCUSSION
By 2045, the US will be a majority-minority nation. 28   specifically, in the "recruitment and retention of minorities underrepresented in medicine." 31 The LCME states that "each medical school must have policies and practices to achieve appropriate diversity among its students, faculty, staff, and other members of its academic community." 32,33 Despite lots of recommendations and effort, little progress has been made in creating a truly diverse physician workforce. 1,5,6,14 The use of cutoffs in USMLE step 1 scores for granting residency interviews may be a contributor to why little progress has been made. 1,11,15,16 Recently, the USMLE's parent organization (The Federation of State Medical Boards and NBME) and AAMC met with "key stakeholders" to discuss reporting the results of the USMLE step 1 as pass/fail instead of a numeric score. [15][16][17]20 Reporting the USMLE step 1 scores as pass/fail would meet the primary purpose of the exam, which is licensing, as well as reduce its significant value in resident selection, therefore providing an opportunity to increase diversity of the physician workforce. [14][15][16] Our study aimed to explore associations between USMLE step As hypothesized, our study revealed significant differences in the means of USMLE step 1 scores by race/ethnicity categories and URiM groups. In our cohort, URiM scored lower than their White counterparts on the USMLE step 1 exam, which is consistent with previous studies that showed that White students performed higher on USMLE step 1 than other racial/ethnic minorities 34,39 This trend was consistent within different applicant specialties. The data from our sample are generalizable to national trends, given it represents applicants from the majority of the US medical schools and applicants from all 50 states. When using the 2014 mean USMLE step 1 score (X = 230) for matched applicants, 84.3% of Black applicants and 74.8% of Hispanic applicants would be eliminated from the pool. 40 The use of USMLE step 1 cutoff scores as a recruitment and selection criteria contributes to the "leakiness of the pipeline," that is, the departure of students, particularly minorities, from a medical career path. 12,13 Though multifactorial, the "leaky pipeline" posits that minority students who have an interest in science, technology, engineering, and mathematics (STEM) careers change their minds when applying to college or leave the "pipeline" after graduating with a STEM degree due to negative experiences such as microaggressions and discrimination. 22,[41][42][43][44] Lower scores on standardized tests such as USMLE may also be a contributing factor to the leaky pipeline, with a consequence of that leakage being lack of representation of physicians from racial/ethnic minority backgrounds, who speak languages other than English and are representative of the U.S. patient population they serve. [12][13][14]42,45 This study has several limitations. Although our applicants are diverse and from all parts of the US, it is important to note that we only examined individuals applying to one (albeit large) institution.
The applicant pool might be different for other institutions, and study findings may not be generalizable. In addition, our data included Percentage of applicants with USMLE step 1 score above the cutoff score

| PRACTICAL IMPLICATIONS
To better understand physician diversity, academic researchers and program directors should focus on attaining a more in-depth perspective of the recruitment and selection process experienced by URiM. [14][15][16]45 During the recruitment and selection process, structural barriers, microaggressions, biases, and cutoff scores from standardized test scores such as USMLE step 1 scores might hinder the acceptance and progress of URiM medical students throughout their medical career. 2,12,13 According to the leaky pipeline theory, a consequence of the leakage of the diverse talent pool is the underrepresentation of minorities in the medical field. 12 For program directors, this research helps to take a more closer look at the recruitment and selection process, which would help increase diversity in the physician workforce. 1,7,11,15,16 As reported earlier, there is growing evidence stating that minority patients report high levels of better communication, satisfaction, and adherence to medication when they are treated by physicians who are similar to them. 1,2,5,7 Such gains to reducing health disparities are of growing importance to payers, providers, and health plans. 1,2,29 Medical schools can increase the diversity of the physician workforce to treat a racially/ethnically diverse population by redoubling efforts to recruit URiM. However, in an effort to do so, program directors and medical schools may need to change the way they evaluate medical school applicants. 1 The recent announcement to report USMLE step 1 scores from three-digit numeric scores to pass/fail is a vital step. 17,20 Maintaining the status quo of numeric scores works against the initiatives to increase physician diversity. As shown in our own and previous research, compared to Asian and White students, URiM score lower on step 1. 11,26 Very often, step 1 scores are used as an alternative to one's work ethic, time management skills, and determination, despite there being a lack of empirical evidence to explain systematic differences. 15 Additionally, recent empirical evidence shows that students with higher USMLE step 1 scores often self-select into more competitive specialties 16 ; as a result, diverse candidates are more than likely being screened out. 15,46 Medical schools which have thus far implemented a pass/fail curriculum reporting of results have reported an increase in student well-being without any effect on academic achievement. 14,47 In an attempt to diversify the physician workforce, program directors and residency programs can, instead, engage in a more holistic application review. With a holistic application process, program directors can create a rubric for ranking participants which tailors more specifically to the mission and values of the program and institution instead of being so highly dependent on pure numerical scores.

| CONCLUSION
This study demonstrated that using a USMLE step 1 score as an initial filter for applicant recruitment and selection could jeopardize the benefits of a diverse residency program. As indicated by our results, using a fixed cutoff score will significantly reduce the percentage of URiM who are selected into residency programs. Including URiM in the workforce/learning environment has been shown to improve learning for all, 38 assist with addressing the shortage of physicians in underserved areas, 7 and increase patient satisfaction and cultural competence. Given our results and other empirically derived data about the unintended consequences of the USMLE step 1 score, residency programs should carefully consider how the USMLE step 1 score should be used in the application process. USMLE step 1 was "not designed to, nor does it predict, the success of a physician." 48 When we use the USMLE score or any other one-dimensional score as a cutoff, we are losing out on qualified URiM who will help both reduce health disparities and improve diversity across medical institutions. Most importantly, we limit the opportunity to evaluate applicants on other criteria which, in the end, may represent more important characteristics of an exceptional healer.

CONFLICT OF INTEREST
The authors declare that there is no conflict of interest regarding the publication of this paper.

TRANSPARENCY STATEMENT
Myia Williams affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.