Methodology of the Discrimination in the United States survey

Abstract Objective To describe survey methods used to examine reported experiences of discrimination against African Americans, Latinos, Asian Americans, Native Americans, women, and LGBTQ (lesbian, gay, bisexual, transgender, and queer) adults. Data Source and Study Design Data came from a nationally representative, probability‐based telephone survey of 3453 US adults, conducted January‐April 2017. Methods We examined the survey instrument, sampling design, and weighting of the survey, and present selected survey findings. Principal Findings Examining reported discrimination experienced by multiple groups in a telephone survey requires attention to details of sampling and weighting. In health care settings, 32 percent of African Americans reported discrimination, as did 23 percent of Native Americans, 20 percent of Latinos, 18 percent of women, 16 percent of LGBTQ adults, and 13 percent of Asian Americans. Also, 51 percent of LGBTQ adults, 42 percent of African Americans, and 38 percent of Native Americans reported identity‐based violence against themselves or family members; 57 percent of African Americans and 41 percent of women reported discrimination in pay or promotions; 50 percent of African Americans, 29 percent of Native Americans, and 27 percent of Latinos reported being discriminated against in interactions with police. Conclusions Even the small selection of results presented in this article as examples of survey measures show a pattern of substantial reported discrimination against all six groups studied.

Because Harvard researchers were not directly involved in data collection and de-identified datasets were used for analysis, the study was determined to be "not human subjects research" by the

Harvard TH Chan School of Public Health Office of Human Research
Administration.

| Survey instrument
Survey questions were developed after conducting a review of available questions on discrimination. This questionnaire was designed to ask the same series of questions on institutional and interpersonal discrimination across several separate groups, including African Americans, Latinos, Asian Americans, Native Americans, women, and LGBTQ adults, which was a methodological challenge requiring original question wording in order for question stems and response categories to work for all groups. The questionnaire was reviewed by external experts for bias, balance, and comprehension, and it was pretested in the field before it was conducted using the full sample.
The complete survey instrument is shown in Appendix S1.
Discrimination was conceptualized as differential or unfair treatment of individuals based on self-identified race/ethnicity, gender, or LGBTQ identity, whether that treatment is enacted by individuals (based on beliefs, words, and behavior) or social institutions (based on laws, policies, institutions, and related behavior of individuals who work in or control these laws, policies, or institution). [1][2][3] The other articles in this issue analyze questions about personal experiences, covering six institutional and six interpersonal areas of discrimination. Institutional areas included were employment, education, health care, housing, political participation, and interactions with police and courts. Interpersonal areas included were racial/ethnic, gender, or anti-LGBTQ slurs; microaggressions; other people's fear; sexual harassment; being threatened or nonsexually harassed; and experiencing violence. Also analyzed were two areas in which concerns about discrimination might prevent or deter adults from taking potentially needed action: seeking health and police services. We examined discrimination in domains previously demonstrated to be associated with health (eg, health care interactions), 4,5 as well as domains generally outside health services research (eg, police interactions), to capture a wide range of possible discriminatory experiences across respondents' lives.
Questions about experiences were only asked among a random half-sample of respondents to maximize the number of questions while limiting respondent burden. Questions were only asked of relevant subgroups (eg, college questions only asked among adults who had ever applied to or attended college). Questions on harassment, violence, and avoiding institutions for fear of discrimination were asked about yourself or family members because of the sensitive nature of the topics. 6 Prior literature has demonstrated the validity of asking questions this way to measure experiences on sensitive topics, as vicarious experiences of stress (eg, through discrimination or harassment experienced by family members) can adversely affect the health of individuals, even without respondents directly experiencing it themselves. 7 Screening questions regarding racial and ethnic identities were asked at the beginning of the survey. This method of screening also allowed interviewers to use the appropriate language in survey questions to describe or refer to the respondent's own identity. For example, this allowed questions to be read as "Did you experience [form of discrimination] because you are Latino?" rather than "because of your race or ethnicity?" This makes it possible to ask otherwise-identical questions of respondents of each group while still specifying their own group identity. In turn, this enables researchers to see results for each group being asked the same questions during exactly the same time period.

| Sample design
Phone numbers used for this study were randomly generated from cell phone and landline telephone sample frames, with an overlapping frame design. This means that a respondent could theoreti- Estimates for the population counts, by group, in each rate center or exchange, were generated from Marketing System Group's (MSG) Genesys database. Table 1 specifies the criteria by which each stratum was defined for the cell phone and landline strata.
Adults of all racial/ethnic groups were interviewed in each of the high-density areas.
If in the process of screening for any racial/ethnic group member, respondents reported being LGBTQ, they were included in the LGBTQ oversample. In addition, the LGBTQ oversample included adults with telephone numbers where the respondent on the omnibus polls had reported that they were gay, lesbian, or bisexual (or volunteered they were transgender), which is a standard demographic question on that series of polls and slightly different from the questions asked on the Discrimination in the United States survey. Immediately prior to the survey's field period, a question asking whether respondents identified as transgender, genderqueer, or gender nonconforming was added to the omnibus polls and used for screening purposes. All respondents were screened about LGBTQ status, regardless of whether they were prescreened from the omnibus polls. No data from the omnibus polls were included in the Discrimination in the United States survey. Screening from the omnibus polls was used only to increase the likelihood of reaching an LGBTQ respondent.
The questionnaire was translated into Spanish and Chinese, so respondents could choose to be interviewed in either of these languages, or switch between the languages according to their comfort level. Those who preferred being interviewed in Spanish (n = 255) or Chinese (n = 33) were interviewed by bilingual interviewers.

| Pretesting
Live pretest of the survey instrument was conducted prior to the field period with respondents from both listed-landline and pre-

| Survey administration
The field period for this study was January 26 through April 9, 2017. All interviews were completed using a Computer-Assisted Telephone Interview (CATI) system, which ensured that questions followed logical skip patterns and that complete dispositions of all call attempts were recorded.

| Screening
The screening process for the survey involved the following procedure.
Cell phone respondents were interviewed once they confirmed they were 18 or older. In households reached via landline, the procedure varied by the number of adults in the household. In single-adult households, the respondent answering the phone was interviewed, once she or he established they were 18 or older. In two-adult households, the CATI program randomly selected whether the adult on the phone would be interviewed or the other adult in the household. In households with three adults or more (as well as where the person answering the phone refused to disclose the number of adults in the household), the interviewer asked to speak with the adult male or female (randomly selected) who had had the most recent birthday. If the other person selected was unavailable, another adult of the same gender was selected.

| Multirace and Latino/Hispanic respondent self-identification
In some cases, respondents identified as being multiracial. When that happened, interviewers asked respondents with which race they identified most, and any following questions about racial/ethnic discrimination or experiences were based on this self-identification.
The US Census asks a question about Latino or Hispanic heritage separately from the question about race. Latino/Hispanic is not considered a race, and a Latino can be of any race. Researchers often use Latino/Hispanic as one group and then define all or most other races as excluding those who call themselves Latino or Hispanic.
For instance, reports by the National Health Interview Survey often use this approach. 8 Our survey generally follows this course.
Respondents who said they were Latino/Hispanic were asked questions about discrimination experienced because they were Latino.
One exception was made for the survey. Respondents who identified as Latino or Hispanic and as AI/AN were asked with which group they identified more, and ensuing questions were determined by their response.

| Efforts to maximize survey response
In order to maximize survey response, up to seven follow-up attempts were made to contact nonresponsive numbers (eg, no answer, busy, answering machine); each nonresponsive number was contacted multiple times, varying the times of day and the days of the week that callbacks were placed using a programmed differential call rule; respondents were offered the option of scheduling a callback at their convenience; specially trained interviewers contacted households where the initial call resulted in respondents hanging up the phone; respondents reached by cell phone were offered $5 if they requested compensation for their time. A total of 113 respondents received incentives. For the purposes of weighting only, respondents who identified as multiracial were not grouped based on which group respondents said they most identified with, but rather were considered separately as multirace, consistent with Census approaches. Similarly, respondents who were both AI/AN and Latino/Hispanic were considered AI/AN, and thus, Latinos/Hispanics were matched to non-AI/AN Hispanic Census distributions.

| Weighting procedures
Each race-defined group was weighted using the following steps.
1. Probability of selection (total). A phone number's probability of selection depends on the number of phone numbers selected out of the total sample frame. For each respondent whose household has a cell phone number, based on self-report, this is calculated as total cell phone numbers dialed divided by total numbers in the cell phone frame. For respondents answering at least one landline number, this is calculated as total landline numbers divided by total numbers in the landline frame.
The probability of respondent selection within households is also taken into account. In households reached by landline, a single respondent is selected. Thus, the probability of selection within a household is inversely related to the number of adults in the household.
Total probability of selection is calculated as the phone number's probability of selection (by frame), and for landlines, this is divided by the number of adults in the household. To avoid extremely large or small weights, the maximum number of adults was capped at 3.
The sample weights derived at this stage are calculated as the inverse of the combined probability of selection.  6. Weight truncation ("trimming"). The raking methodology used in sample calibration for each race group had the weights converging to match the population benchmarks. As is often the case with weighting, the resulting weights inflated the variance in the data, as measured by the ratio of the highest weight to the lowest weight and by the design effect due to weighting. 12 This is commonly addressed by weight trimming, 13 a method in which extreme weights, on the low and high ends, 14 are identified and the weight distribution is truncated to reduce error in the estimates stemming from the increase in variance.
The trimming method used for this study was based on the weight distribution whereby a "prespecified probability of occurrence" marked an extreme point in the distribution that was then selected as the cutoff points for truncation. 15 As Henry and Valliant note, this is a pragmatic approach typical of studies, such as this, constrained by deadlines, that would typically work well on the survey as whole, but could have more meaningful effects on some domain estimates. 16 For most race groups, the distribution cutoff points were the 5 percent upper and lower bounds of the weights, with the exception being Hispanics for which the 2.5 percent bound was used. The decision was made considering the impact of trimming on representativeness as established by the population benchmarks.

| RE SULTS
In this section, we consider seven main measures of survey performance and outcome.  Table 3 shows the sample sizes and margins of sampling error, accounting for design effect, for the survey overall (±3.2 percentage points at the 95% confidence level) and for each of the groups analyzed.

| Effect of weighting/design effect
Weighting procedures increase the variance in the data, with larger weights causing greater variance. Complex survey designs and postdata collection statistical adjustments increase variance estimates and, as a result, the error terms applied in statistical testing. The design effect for each group is shown in Table 3.

| Effect of weight trimming
Whereas trimming the weights typically serves to reduce variance and to increase the effective N, which reduces sampling error calculated for the survey's point estimates, this may also introduce bias in the data as the weighted data no longer converge at the weighting benchmarks. For each of the racial-ethnic groups studied, design effect was reduced along with sampling error (see Table 3). Thus, on average, the estimates for any of these groups have a smaller confidence interval, though this is not guaranteed to be the case for each specific point estimate. As for bias, comparing the demographic makeup and responses to key survey questions as observed with trimmed and untrimmed weights, the overwhelming majority of trimmed demographics were within 1 percent of the untrimmed demographics. Accordingly, nearly all substantive results to key questions, using the trimmed weights, were within 1 percent of the untrimmed ones, and none exceeded a 3 percent difference (Appendix S2). Thus, on a substantive level, trimming did not meaningfully affect the results on these measures. Table 4 shows the completion and response rates for the overall sample, the racial/ethnic groups, and LGBTQ adults. Among respondents who answered initial demographic screening questions, the overall completion rate was 74 percent. The overall response rate for this survey was 10 percent, calculated based on the American Association for Public Opinion Research's RR3 formula. 17 This calculation takes into account that for the prescreened part of the sample, the total response rate is the product of the response rate for recontacts multiplied by the response rate of the original omnibus poll.

| Telephone coverage
Interviewing was conducted by both cell phone (68 percent) and landline (32 percent) in order to ensure coverage of adults who use only one type of telephone.

| Survey results for selected key outcome measures
The survey was designed to look at the self-reported experiences across a wide range of domains and among several groups simultaneously, using parallel question wordings. Other articles in this issue detail the widespread prevalence of reported discrimination. Table 5

| D ISCUSS I ON
In order to assess the Discrimination in the United States survey's methodology, we refer here to a summary chapter by Graham Kalton on how to survey hard-to-sample populations, such as those that are the focus of our survey. 18 Kalton places an emphasis on probability sampling method, which he states is "necessary to provide the security of valid statistical inference." Whereas many surveys of hard-to-sample groups are conducted by nonprobability methods, ours used telephone RDD probability sampling. There are several other types of probability approaches, which we did not use because of cost (in-person) or time constraints (mail). Other methods, such as a sequential multimode format, often yield higher response rates than our telephone RDD approach.
Conditions for survey-designed inference include known selection probability, high coverage of the target population, high response rates, weighting, and operational feasibility. Kalton states that it is often not possible to satisfy all of these criteria and some compromises are often needed. Our survey generally meets four of these criteria. The main drawback involves the low response rate (see limitations below).
Some of the techniques for sampling hard-to-reach populations include large-scale screening, use of a large host survey for screening, disproportionate stratification, and multiple frames. Each of these approaches was utilized in our survey.
Kalton notes that the efficiency of screening is increased by concentrating the sample in areas where the population is more prevalent.
In our survey, oversamples included high-density African American, Latino/Hispanic, Asian, and AI/AN areas (Table 1). In addition, we were able to take advantage of a large series of weekly RDD omnibus polls to pool telephone numbers for adults with specific characteristics.
The data were weighted to compensate for the effects of oversampling. In addition, the data, overall and for each specific racial/ ethnic group, were weighted using benchmarks from the US Census.
The selected survey results in Table 5  Fourth, weight trimming/truncating, which was used in this survey, often introduces bias and may worsen margins of sampling error.
In general, trimming/truncating is advisable only when the need for large weights cannot be addressed via sampling design and when it improves margins of sampling error, as was the case for this survey.
Fifth, we did not examine respondents' experiences being multiracial. If the respondent gave more than one race/ethnicity, they were asked which one they identified with most. This decision was made for practical reasons, chiefly that we could only examine so many groups and that being multiracial can involve a variety of racial/ ethnic combinations that yield different experiences. Additional research is needed to explore experiences unique to multiracial adults.

| CON CLUS IONS
An article by David Williams in this issue, as well as other prior research, has shown that major patterns of racism, sexism, and other discrimination can significantly harm the health and well-being of impacted populations and that self-reported discrimination is associated with worse health outcomes. [1][2][3][19][20][21][22][23][24]29 The Discrimination in the United States survey and the articles based on it extend prior work in this area by focusing on people's reports of their own and their family members' direct life experiences, rather than general perceptions of discrimination on the country; and by bringing together simultaneously these reported experiences across six groups, most of them underrepresented in much of public opinion research due to their low incidence in the population.
Surveying these hard-to-sample populations involves challenges, but with attention to the many particular aspects of sampling (especially oversampling and screening), coverage, and weighting, and taking into account limitations, it is feasible to conduct such surveys using probability-based RDD telephone sampling. Abbreviation: LGBTQ, lesbian, gay, bisexual, transgender, and queer. a All questions asked among a randomized half-sample of respondents within each category. Don't know/refused responses included in the total. b Questions about "you" are personal experiences only; questions about "you or family member" ask if items have happened to you or a family member because you or they are [respondent's own racial/ethnic/gender/LGBTQ identity]. c Pay/promotion question only asked among respondents who have ever been employed.