Collecting Patient Race/Ethnicity and Primary Language Data in Ambulatory Care Settings: A Case Study in Methodology


  • Latha P. Palaniappan,

    1. Palo Alto Medical Foundation Research Institute (PAMFRI), 795 El Camino Real, Ames Building, Palo Alto, CA 94301
    Search for more papers by this author
    • Address correspondence to Latha P. Palaniappan, M.D., M.S., Palo Alto Medical Foundation Research Institute (PAMFRI), 795 El Camino Real, Ames Building, Palo Alto, CA 94301; e-mail: Eric Wong, M.S., and Jessica Shin, B.A., are at Palo Alto Medical Foundation Research Institute (PAMFRI), Palo Alto, CA. Maria R. Moreno, M.P.H., is with Sutter Health Institute for Research and Education (SHIRE), San Francisco, CA. Regina Otero-Sabogal, Ph.D., is with University of California, San Francisco, Institute for Health and Aging, Department of Social and Behavioral Sciences School of Nursing, San Francisco, CA.

  • Eric C. Wong,

    1. Palo Alto Medical Foundation Research Institute (PAMFRI), Palo Alto, CA
    Search for more papers by this author
  • Jessica J. Shin,

    1. Palo Alto Medical Foundation Research Institute (PAMFRI), Palo Alto, CA
    Search for more papers by this author
  • Maria R. Moreno,

    1. Sutter Health Institute for Research and Education (SHIRE), San Francisco, CA
    Search for more papers by this author
  • Regina Otero-Sabogal

    1. University of California, San Francisco, Institute for Health and Aging, Department of Social and Behavioral Sciences School of Nursing, San Francisco, CA
    Search for more papers by this author


Objective. To collect patient race/ethnicity and language (r/e/l) in an ambulatory care setting.

Data Sources/Study Setting. The Palo Alto Medical Foundation (PAMF), December 2006–May 2008.

Study Design. Three pilot studies: (1) Comparing mail versus telephone versus clinic visit questionnaire distribution; (2) comparing the front desk method (FDM) versus exam room method (ERM) in the clinic visit; and (3) determining resource allocation necessary for data entry.

Data Collection/Extraction Methods. Studies were planned and executed by PAMF's Quality and Planning division.

Principal Findings. Collecting r/e/l data during clinic visits elicited the highest response rate. The FDM yielded higher response rate than the ERM. One full-time equivalent is initially necessary for data entry.

Conclusions. Conducting sequential studies can help guide r/e/l collection in a short time frame.


Actively assessing and addressing health disparities requires accurate collection of data on race/ethnicity and language (r/e/l) of patients. The Institute of Medicine reports that “data on patient race, ethnicity, and primary language would … help [health care] plans monitor performance, ensure accountability to enrolled members and payers, improve patient choice, allow for evaluation and intervention programs, and help identify discriminatory practices” (Smedley, Stith, and Nelson 2003). While there is some information on r/e/l data collection in hospitals (Blustein 1994; Kressin et al. 2003; Hasnain-Wynia, Pierce, and Pittman 2004) and in community populations (McHorney, Kosinski, and Ware 1994; Sullivan, Karlsson, and Ware 1995; Ngo-Metzger et al. 2004), little is known about r/e/l collection in ambulatory care systems.

The Palo Alto Medical Foundation (PAMF) is an ambulatory care system and a Sutter Health affiliate. One of PAMF's three health care divisions, the Palo Alto Region (PAMF/PAR) has clinics staffed by 380 physicians in 40 specialties. It currently provides coverage in three California counties with 11 clinics and centers, and receives 750,000 patient visits/year with over 240,000 active patients. The EPIC electronic health record (EHR) system has been in use at PAMF/PAR since 2000. According to census data for the PAMF/PAR service areas, 47 percent of respondents reported at least one race/ethnicity other than white.

Before December 2006 at PAMF, race/ethnicity information was collected for patients entering the ambulatory surgery center and for cancer patients in a de-identified manner on separate systems (other than the clinical record) and reported in aggregate to external agencies (such as OSHPD). These values were not linked to individual patient records, and thus are not usable for analyses on disparities based on r/e/l. The known r/e/l based health disparities in the PAMF catchment area (Iribarren et al. 2005; Cresswell et al. 2008) along with well-established national health disparities (Agency for Healthcare Research and Quality) made a compelling case for the collection of r/e/l data. As a result, the Quality Improvement Steering Committee (QISC) of PAMF/PAR approved r/e/l data collection for all clinic patients in December 2006.

A PAMF/PAR taskforce, comprised of 14 members from the operations, information technology, and research divisions, designed the patient r/e/l questionnaire. The final form was comprised of five questions about race, Spanish origin, ancestry, spoken language, and interpreter services (Figure 1). In question one, a patient may select up to two races from predefined categories. The second question asks about Spanish origin. The two-question format asking race first was designed to satisfy California State regulations for Ambulatory Surgery Units (OSHPD 2005, 2008). The third question is a free text response where respondents can identify up to two ancestries, because the taskforce felt it was important for patients to have an opportunity to self-identify their ancestry or family lineage. The free text responses are matched to 1,035 possible ancestries derived from Census Ancestry List and the Surveillance, Epidemiology, and End Results (SEER) Program (U.S. Census Bureau 2000c; Johnson and Adamo 2007). The first three questions are based on the U.S. Census 2000 question format (U.S. Census Bureau 2000a). The last two questions ask a patient to self-identify his/her primary spoken language and the need for an interpreter. The language question (#4) is a free text response that is matched to a back-end table of 64 most common languages in the PAMF catchment area.

Figure 1.

 Patient Demographics Questionnaire

The following described studies were not designed a priori as a research project. Rather, this is a collection of small studies that were initiated to determine best r/e/l collection methodology for quality improvement purposes. These studies were considered exempt from review by the Palo Alto Medical Foundation Institutional Review Board. In retrospect, these studies formed and supported the path of implementation at our organization, which we hope will be helpful to other ambulatory health care delivery systems with similar goals for r/e/l collection. The pilot studies were as follows:

  • 1A comparison of three methods of asking patients to identify their own r/e/l: mail, telephone, and in-person clinic visit questionnaires;
  • 2A comparison of two ways of distributing and collecting the in-person clinic visit r/e/l questionnaires (substudy of #1)—a front desk method (FDM), with questionnaires distributed by patient services representatives (PSR) and an exam room method (ERM), with questionnaires distributed by medical assistants (MA).
  • 3A data-entry time study, to estimate additional resource allocation needed for timely recording of r/e/l information into the electronic health record (EHR).


Pilot Study 1: Comparing Three Methods of Data Collection: Mail versus Telephone versus Clinic Visit

Objective. To determine the most effective method to collect r/e/l data from patients.

Methods. Three methods of surveying patients about r/e/l were compared: mail versus telephone versus clinic visit. The main outcome measures for the pilot studies were response rate and cost per response.

Mailed Questionnaire: Participants meeting the following criteria were identified from the PAMF/PAR EHR: men and women aged 35 years and older whose race/ethnicity was inferred using surname analysis. Asian lists and Spanish lists were used to identify persons belonging to one of the following likely descents: Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, Hispanic/Latino, and Non-Hispanic White (Word and Perkins 1996; Lauderdale and Kestenbaum 2000; Palaniappan et al. 2008). Using a random number generator, a sample of 3,200 patients (blocked by surname-inferred racial/ethnic group, 400 in each group) was selected for a mailed questionnaire. In August 2007, individuals randomized to receive a mailed questionnaire were mailed a multiple choice questionnaire (predecessor to the final questionnaire) inquiring about race/ethnicity. Results were tabulated at the end of 4 weeks.

Telephone Questionnaire: A separate random sample of 3,200 patients (also blocked by surname-inferred racial/ethnic group, 400 per group, as above) was selected for a telephone survey. In August 2007, individuals randomized to telephone surveys were asked the same questions, simplified by a branching order, by a live interviewer. The telephone survey was conducted by a third-party vendor ( Three attempts were made to a telephone number over 2 weeks before it was considered unreachable.

In-Person Clinic Visit Questionnaire: In January 2008, questionnaires (Figure 1) were distributed by the front desk staff or by exam room staff to a convenience sample of all clinic patients visiting two departments (Internal Medicine, Dermatology) over 1 business week (n=1,995). These departments were selected because of their high patient traffic. Patients were not blocked by racial/ethnic group or randomized to avoid the appearance of discrimination, as some patients and not others in the same waiting area would be given the questionnaire. We examined two methods of distributing and collecting the forms—a FDM and an ERM as an additional substudy in this method of data collection (described below, Pilot Study 2).

Results. The mailed questionnaire had a 6 percent response rate (range across racial/ethnic groups: 3.3–9.8 percent). The telephone questionnaire had a 30 percent response rate (range across racial/ethnic groups 28–35 percent). There were no apparent differences in response rate by racial/ethnic group for the mailed and telephone questionnaires. The clinic visit questionnaire had an 87 percent response rate (Figure 2). Of the approximately 3,000 responses across all three methods, 58 percent answered both race and Spanish origin, 38 percent answered race only, 1 percent answered Spanish origin only, and 3 percent answered neither race nor Spanish origin. Among Hispanics/Latinos, 90 percent answered the race question across all three methods of distribution. Clinic visits not only had the highest response rate but were significantly more cost-effective per response ($0.21) as compared with phone ($5.46) and mail ($9.50) (Palaniappan et al. 2008). These preliminary results informed the decision by the PAMF QISC to distribute questionnaires during clinic visits instead of via mail or telephone surveys.

Figure 2.

 Pilot Studies 1 and 2

Pilot Study 2 (Substudy of Pilot Study 1.3). Assessing Two Workflow Methods

Objective. To examine operational workflow to maximize completion rates.

Methods. Two methods of distributing and collecting the forms—FDM and an ERM—were compared by completion rate. Questionnaires were distributed and collected in two departments. Dermatology, which used the FDM, distributed and collected 885 questionnaires. Internal Medicine, which used the ERM, distributed and collected 1,110 questionnaires.

FDM: PSRs distributed forms to patients at check-in. They instructed patients to return the forms face down in a box at the front desk after completion. We chose this method because patients are accustomed to filling out other forms at check-in.

ERM: MAs distributed forms to the patients in the exam room, generally after obtaining vital signs. The MA collected the form after completion and handed it to the PSR at the front desk for data entry. We hypothesized that this method—with increased privacy in the exam room and a greater level of trust with the MA (compared with the PSR)—would yield higher completion rates than the FDM.

Results. The FDM had a higher response rate compared with the ERM (88 versus 80 percent, p<.000001).While there were fewer patients who declined in the ERM (1 versus 7 percent), there were fewer missing forms in the FDM (3 versus 18 percent, p<.000001) so that there was an overall higher completion rate for the FDM (Figure 2).

Pilot Study 3: Estimation of Resource Allocation

Objective. To determine the additional resources necessary for entry of r/e/l data in a timely manner into the EHR.

Methods. Resource allocation necessary for data entry was estimated using new and return patient visit volume data from 2007. Using an average data entry time by 10 experienced patient services representatives in four departments (Internal Medicine, Dermatology, Family Practice, and Podiatry) for 100 forms, we projected the number of hours per week required for data entry.

Results. The average data entry time was 24 seconds (range 11–51, standard deviation 7 seconds). We projected the number of hours per week required for data entry (Figure 3). One full-time equivalent (FTE) would be required in the first quarter of implementation, for approximately 75,000 patients visiting who were receiving the form for the first time. In the second and third quarters this drops to 0.5 FTE, and in the fourth quarter 0.25 FTE. Resources were allocated by the PAMF QISC to cover the temporary increased work flow in the first quarter, and costs for IT reprogramming the EHR to capture this data (approximately 70 programming hours, or $5000). Initially, front desk staff were collecting the surveys, and temporary staff entered survey data into the EHR. At the 3-month mark, existing front desk staff were transitioned to data entry. Routine audits will be conducted to ensure accuracy and completeness.

Figure 3.

 Resource Allocation Estimate: Estimated Hours/Week Required for Data Entry

Statistical Methods (Pilot Studies 1–3)

Selection using random number generation and analyses were performed using SAS 9.1.3 (Cary, NC) and S-Plus 8.0 (Seattle, WA).

Proportions and response rates were compared using a binomial test of proportions.


Collecting information on patient r/e/l in health care organizations is critical for monitoring quality of services and testing initiatives to reduce disparities. These three pilot studies are a successful attempt at providing the necessary data and experience to institute and standardize efforts to collect r/e/l and an opportunity to derive lessons for other ambulatory care units. Preliminary data provide support for using in-person surveys during the clinic visit and for designating patient service representatives to distribute the survey to patients at check-in (FDM) rather than in the exam room (ERM). The FDM was most cost effective, had a high response rate, and minimal negative patient interactions. Nevertheless, due to the limitations of our study design, we are cautious about ruling out the effectiveness of the mail and telephone survey and the ERM without further exploration.

Before this study, to our knowledge, head-to-head data on optimal methods for r/e/l data collection and work flow in an ambulatory care setting was unavailable. We used quantifiable measures (e.g. cost, response rate) in comparing methods to standardize r/e/l collection at PAMF/PAR. The generalizability of our studies may be limited to other large ambulatory health care organizations that have electronic health record capabilities, as costs may be prohibitive for smaller organizations. Hasnain-Wynia, Pierce, and Pittman (2004) describe the current state of r/e/l collection in hospitals as having “a great deal of both intra-organizational and inter-organizational inconsistency… operationally there are not consistent policies and practices to make [r/e/l information collection] happen.” There is even less information available on ambulatory care settings, and we can only assume that the challenges are similar and magnified.

Some health care organizations fear patient sensitivity in initiating r/e/l collection, but real-life experiences confirm that patients are not aggravated by this type of data collection (Gomez et al. 2003; Shelley 2003; Hassett 2005; Baker et al. 2006). We found this true at PAMF/PAR as well. Overall, patients were accommodating in providing this information, as evidenced by our high response rates in the clinical setting. We asked over 8,000 patients for r/e/l information over the course of these pilot studies. Only one patient contacted Public Affairs to clarify the need for the Spanish origin question. This issue was quickly resolved with explanation that the separate question is designed to satisfy state government reporting requirements (OSHPD 2008). Our experience reinforces the notion that when r/e/l data are collected with the same sensitivity with which we deliver patient care, patients are more than willing to provide this information.

Our pilot studies have limitations. Pilot Study 1 (mail versus telephone) did not include information on individual patient demographics. In hindsight, this information may have been useful to identify demographic characteristics (such as age, gender, etc.) that predicted better response. However, overall response rates were quite low (6–30 percent), and we concluded that mail and telephone methods would be ineffective as a r/e/l data collection method. In Pilot Study 2, (clinic questionnaires) there is confounding between department and delivery method, so caution should be used in interpreting these results. We encourage other organizations to design similar studies to address best work flow models in their own settings. Finally, all of the questionnaires in these described pilot studies were delivered in English, which is less than ideal given the stated diversity of our patient population. Subsequent to these studies, the questionnaires were translated into Spanish, Chinese (traditional and simplified), Tagalog, Vietnamese, Hindi, and Russian, which are the most frequently spoken non-English languages in our catchment areas (U.S. Census Bureau 2000b). These limitations need to be taken into account when trying to interpret our results.

While limitations exist with our choice of sites in terms of comparability between samples and generalizability, we have taken important steps in fulfilling the organizational commitment to collecting r/e/l data in a standardized, methodical way for all Sutter Health affiliates. Our data and experience outline the process in our ambulatory care organization to collect r/e/l information using a series of pilot projects. These pilot studies formed and supported the path of implementation at our organization, and they may aid other ambulatory health care delivery systems with similar goals for r/e/l collection.


Joint Acknowledgment/Disclosure Statement: The authors wish to thank the Palo Alto Medical Foundation (PAMF) Quality Improvement Steering Committee (Drs. Susan Smith, Laurel Trujillo, and Barry Eisenberg) and the Quality and Planning (Mr. Tomas Moran, Ms. Melanie Okawachi, Ms. Monica Rodrigues, Ms. Thuy Trinh) and Operations Departments (Ms. Jenny Buchanan, Ms. Theresa Manley, Ms. Kathy Bratcher, Mr. Michael Fagan, Mr. Ashraf Morrar, Mr. David Curtis, Ms. Catherine Knipe, Ms. Emma Chavarria, Ms. Sarah Kirby) for conducting these studies and implementing race/ethnicity collection at PAMF.

Disclaimers: None

Disclosures: None.