Can We Measure Immigrants' Legal Status? Lessons from Two U.S. Surveys



This research note examines response and allocation rates for legal status questions asked in publicly available U.S. surveys to address worries that the legal status of immigrants cannot be reliably measured. Contrary to such notions, we find that immigrants' non-response rates to questions about legal status are typically not higher than non-response rates to other immigration-related questions, such as country of birth and year of immigration. Further exploration of two particular surveys – the Los Angeles Family and Neighborhood Survey (LAFANS) and the Survey of Income and Program Participation (SIPP) – reveals that these data sources produce profiles of the unauthorized immigrant population that compare favorably to independently estimated profiles. We also find in the case of the SIPP that the introduction of legal status questions does not appear to have an appreciable “chilling effect” on the subsequent survey participation of unauthorized immigrant respondents. Based on the results, we conclude that future data collection efforts should include questions about legal status to (1) improve models of immigrant incorporation; and (2) better position assimilation research to inform policy discussions.


The size of the unauthorized population in the United States – estimated at about 11.1 million in 2011 (Passel and Cohn, 2012) – coupled with the emergence of an increasingly exclusionary stance vis-à-vis immigrants at federal, state, and local levels – has led researchers to raise important policy-relevant questions about the salience of legal status for the incorporation and well-being of immigrant families (Bean and Stevens, 2003; Massey and Bartley, 2005; Glick, 2010; Donato and Armenta, 2011). A small body of ethnographic research suggests that unauthorized status severely diminishes the well-being and life chances of the children of immigrant parents (Abrego, 2011; Gonzales, 2011; Hagan, Rodriguez, and Castro, 2011; Yoshikawa, 2011; Gleeson and Gonzales, 2012; Gonzales and Chavez, 2012; Gonzales, Suarez-Orozco, and Dedios-Sanguineti, 2013). This conclusion is consistent with the findings from an even smaller number of survey-based studies. For example, using data from a survey of second-generation adults in Los Angeles, Bean et al. (2011) find significantly lower levels of educational attainment among the adult children of unauthorized Mexican immigrant mothers who never obtained legal resident status, as compared to their counterparts whose mothers either arrived as legal permanent residents or were able to at some point adjust to legal status. Using data from two panels of the Survey of Income and Program Participation (SIPP), Hall, Greenman, and Farkas (2010) report that unauthorized Mexican immigrants experience significant adjusted wage gaps, smaller returns to their human capital, and slower wage growth, relative to legally resident Mexican immigrants. On the basis of analyses of four panels of SIPP data showing that unauthorized Mexican and Central American youth are less likely to complete high school and enroll in college, Greenman and Hall (2013) conclude that “legal status is a critical axis of stratification for Latinos.” And in their analysis of data from the Los Angeles Family and Neighborhood Survey (LAFANS), Prentice, Pebley, and Sastry (2005) find that unauthorized immigrants have a substantially lower likelihood of having health insurance coverage than legal immigrants and U.S. citizens.

Taken together, the small body of literature on unauthorized status and immigrant incorporation suggests that legal status is an important variable shaping patterns of contemporary assimilation. However, because legal status is not measured in surveys most commonly used in social science research on assimilation, it tends to be underemphasized in the large body of incorporation research that has emerged during the last twenty-five years. This lack of emphasis on legal status, which stems largely from a lack of widely used data (Clark and King, 2008), carries at least two crucial implications. First, it puts contemporary social science theory and research aimed at explaining patterns of assimilation at risk of suffering from omitted variable bias (Massey and Bartley, 2005), insofar as the myriad structural and/or cultural mechanisms purported to slow the incorporation of some groups, especially Mexicans, are spuriously associated with integration outcomes due to the omission of legal status in assimilation models. Second, the lack of survey data on unauthorized immigrants, including their incorporation experiences, severely limits the extent to which social science research can adequately inform public policy debates (National Research Council, 2013).

As discussed in more detail below, the lack of population survey data that includes measures of immigrant legal status appears to derive in part from assumptions by some researchers and policy-makers that information on legal status cannot be reliably collected, or should not be (Carter-Pokras and Zambrana, 2006; U.S. Government Accountability Office, 2006). Concerns about the collection of data on legal status tend to revolve around doubts about the accuracy of such data (i.e., whether immigrants would answer such questions or answer them honestly) and the extent to which the introduction of questions about immigrants' legal status may produce a “chilling effect” whereby respondents refuse to answer subsequent questions or participate in subsequent data collection efforts.

In this research note, we examine these concerns empirically by investigating non-response/allocation rates of legal status measures in LAFANS and SIPP. We focus on: (1) the frequency with which immigrants answer questions about their legal/citizenship status; (2) in the case of SIPP, whether the introduction of questions about the legal status of immigrants is associated with relatively high rates of non-response to subsequent survey items or attrition from the panel altogether; and (3) whether the surveys produce estimates of the characteristics of the unauthorized population that are comparable to independent estimates based on the residual method, the technique most often used to compute aggregate estimates of unauthorized migration (Van Hook and Bean, 1998). The two surveys are valuable sources of data for researchers interested in adding legal status variables to models of immigrant incorporation and the well-being of immigrants and their children, but few researchers have employed the data for these purposes. Of course, these data sources are useful only to the extent that immigrants' legal status is accurately measured and that they provide unbiased profiles of the unauthorized population. To the best of our knowledge, these questions have not been addressed in the research literature.


Perspectives on Measuring Immigrants' Legal Status

A common assumption about the measurement of immigrants' legal status holds that such questions are sensitive in nature, akin to questions about illicit drug use, for example. This assumption underlies a 2006 report by the U.S. Government Accountability Office (GAO), which takes as a starting point that standard questionnaires with direct questions about legal status are unlikely to yield valid data. As a possible remedy, it considers the implementation of non-standard “grouped answers,” or a “three-card” method, in a national survey that would enable the estimation of the size and characteristics of the unauthorized immigrant population without respondents having to report their specific immigration status (U.S. Government Accountability Office, 2006). Viewed as a sensitive survey item, researchers and immigrant advocates have also expressed concern that the introduction of such questions in surveys or administrative data collection could have a “chilling effect” leading unauthorized immigrants to refuse participation in surveys and render this population more difficult to locate and/or serve (Carter-Pokras and Zambrana, 2006).

The assumption that questions about immigrants' legal status are sensitive in nature, however, is not necessarily supported by the empirical evidence that exists to test the premise. For example, the aforementioned GAO report (U.S. Government Accountability Office, 2006) acknowledged that two government surveys that include relatively direct questions about legal status1 noted that data collection efforts had not experienced problems in the field, even as the research had adopted the assumption that such questions could not be asked directly. Similarly, researchers from the RAND Corporation reported from a pilot survey of immigrants in Los Angeles in the early 1990s: “Surprisingly, all respondents answered what we thought was the single most sensitive question – their current immigration status” (DaVanzo et al., 1994). It is likely that the degree to which immigrants perceive questions about their immigration status as sensitive in nature depends on who is asking the questions. Among recommendations made by researchers and immigrant advocates consulted for the GAO report, a universal opinion expressed was that a national survey of immigrants including questions about legal status should be administered by a university or private research organization rather than a government agency such as the U.S. Census Bureau (U.S. Government Accountability Office, 2006). This leads us to hypothesize that of the two data sets examined here, the LAFANS, which was designed and administered by the RAND Corporation, will yield better response rates to legal status questions than the SIPP, which is administered by the U.S. Census Bureau.

Data and Measurement

The Los Angeles Family and Neighborhood Survey (LAFANS)

LAFANS is a longitudinal survey of families in Los Angeles County focusing in large part on the extent to which child development outcomes are related to neighborhood, family, and peer dynamics. Our focus is on the first wave of data, collected between April 2000 and January 2002. The survey was designed by RAND, and fieldwork was conducted in English and Spanish by RAND's Survey Research Group and the Research Triangle Institute via computer-assisted personal interviews (CAPI). During the informed consent process, in addition to assurances that information from the survey would be kept confidential and used for research purposes only, respondents were also given a Certificate of Confidentiality issued by the U.S. Department of Health and Human Services (USDHHS), guaranteeing that respondents' identities could not and would not be revealed, even under court order or subpoena (Pebley and Sastry, 2003, Chapter 7).

Questions about legal status were asked of all foreign-born adults in the LAFANS. The wording and order of the questions is shown in Figure 1. First, all immigrants were asked whether they are naturalized citizens. Non-citizens were then asked whether they had a “green card” or legal permanent resident (LPR) visa. Subsequently, those without green cards were asked whether they had been granted refugee, asylee, or temporary protected status (TPS). Finally, those responding “no” to each of these questions were asked whether they possessed a valid visa granting temporary residence in the United States. Those not indicating one of these four legal statuses – citizens, non-citizen LPRs, refugees/asylees, or legal temporary residents – are assumed to be unauthorized immigrants.

Figure 1.

Wording of Survey Questions used to Determine Immigrants' Citizenship and Legal Status in L.A.FANS and SIPP

Results presented later pertaining to response rates are based on unweighted analyses, while the comparative profiles of the unauthorized population are based on weighted estimates, using the adult person weight provided with the data. Sample weights provided by LAFANS adjust for the oversampling of households with children as well as for household non-response (see Peterson et al., 2004:41).

The Survey of Income and Program Participation (SIPP)

SIPP is a representative panel survey of households in the United States and is designed to collect information about factors associated with income and program participation. It includes a core set of questions asked at every wave of data collection, as well as topical modules that vary from wave to wave. We used the 2004 panel. Questions about immigrants' legal status were included in the wave 2 topical module, collected via in-person interviews between June and September 2004. Foreign-born respondents in SIPP were first asked whether they were citizens (and how they became citizens). Then, all immigrants (i.e., those who are not citizens by birth or adoption) were asked about their status upon arrival. As shown in Figure 1, immigrants could select six arrival statuses: three types of legal permanent residency, refugee/asylee status, non-immigrant status (e.g., student or tourist visas), and “other.” The publicly available data, however, which we use here, do not allow users to distinguish between different types of non-LPRs, presumably to protect respondents' confidentiality. Thus, in the public-use data, non-LPRs consist of those who arrived as unauthorized immigrants as well as those arriving legally as refugees/asylees or legal temporary migrants. Finally, non-citizen non-LPR arrivals were asked whether their status had changed to legal permanent resident. It is assumed that the overwhelming majority of non-LPR arrivals who have not adjusted to LPR status will be unauthorized immigrants, although as noted, some non-adjustees are likely to be legal non-immigrants and refugee/asylees.

In comparison with LAFANS, the public-use SIPP data provide users with a less fine-grained profile of the foreign-born population. The series of LAFANS questions enable users to distinguish citizens from LPRs, refugees, legal non-immigrants, and the unauthorized. One advantage of the SIPP questions over the LAFANS, however, is that the former measure the legal and citizenship trajectories of immigrants, while the latter only measures immigrants' current status.

The SIPP, but not the LAFANS, imputes values for missing responses on legal status questions. Specifically, the Census Bureau allocates missing responses in SIPP using statistical techniques (e.g., hot-deck imputation) that predict missing values based on the valid responses of persons who are similar with respect to a number of characteristics (e.g., age, sex, race, geography, etc.). One concern with the allocation procedure used by the Census Bureau stems from the variables used to predict responses to the legal status questions: It is not clear that they include country of birth or year of arrival (information on predictors has not been published by the Census Bureau). If values are not allocated using these two key variables, which are strongly related to the legal status of immigrants, the allocated values are not likely to be very predictive of actual legal status. In addition, the statistical imputation procedure assumes that the mechanism driving non-response to a given survey item is not related to the item being imputed (U.S. Census Bureau, 2001; Allison, 2002), which would be violated if data on the legal status questions are largely missing because unauthorized immigrants do not wish to reveal to government employees that they are in the country without authorization. Due to questions about the accuracy of the Census Bureau's allocation procedure, we employ three different methods for assigning legal status to foreign-born respondents in the SIPP. The first simply accepts the Census Bureau's allocated values and assumes that all non-citizen, non-permanent resident arrivals who have not adjusted to LPR status are unauthorized immigrants.

We also use a second method, which we refer to as the “multiple imputation” method. For this approach, we recode all of the legal status variables that were allocated by the Census Bureau as missing and then used multiple imputation (Allison, 2002) to reallocate the values. Missing naturalization, arrival status, and adjustee status information is imputed using several variables that model-building exercises indicated are predictive of these variables.2 As the name implies, missing data are imputed multiple times, and imputed values for a given observation will vary across imputations. We created ten imputed data sets, and thus, any results presented later based on the method will have been averaged across the ten imputed data sets.

The third method, which we term a “logic-based reallocation” method, involves two steps. We first set all of the allocated responses to missing. Then, all respondents with missing data, along with those who were deemed to fall in the unauthorized category, were placed in a pool of “potentially unauthorized” persons. Thus, the pool of potential unauthorized respondents consists of all persons who cannot be identified as either naturalized or legal permanent residents based on valid responses to the three questions listed in Figure 1. The second step follows Passel, Van Hook, and Bean (2006) and Hall, Greenman, and Farkas (2010) and involves removing respondents from the pool of potential unauthorized persons based on characteristics that suggest they have a low probability of being unauthorized. For example, we assume that all foreign-born students enrolled in post-secondary educational institutions are in the country on student visas. We also remove persons in a number of specialty occupations that are not likely to be held by unauthorized immigrants (e.g., diplomats and government employees) as listed in Passel, Van Hook, and Bean (2006). We also remove persons who are veterans or active-duty service men or women, as well as persons having received public assistance. Finally, we use data from the Department of Homeland Security's (DHS) Office of Immigration Statistics (OIS) to remove probable refugees from the pool of potential unauthorized immigrants. We assign each immigrant a probability of being a refugee arrival based on country of birth and year of arrival. This probability is simply the proportion of immigrants admitted from a given country in a given year admitted as refugees or asylees, as reported by DHS. For each immigrant, we take a random draw from a uniform distribution and compare it with the individual's refugee probability. Individuals whose probability exceeds the random draw are coded as refugee arrivals and removed from the pool of potential unauthorized immigrants. All persons remaining in the pool are assumed to be unauthorized immigrants.

As with the LAFANS results, SIPP-based analyses of non-response are unweighted. Comparative profiles, however, are based on weighted estimates using the wave 2 person weight provided by the Census Bureau, which adjusts for household non-response and panel attrition.


Do Immigrants Answer Questions about Their Legal Status?

Our first research question is whether evidence of non-response emerges when people are asked about their migration status. As hypothesized, this does not appear to be the case in the LAFANS data. Table 1 reports the prevalence of missing data for the series of questions asked of immigrants in the LAFANS. Of the 1,949 immigrant respondents who were asked the naturalization question, 69, or 3.7 percent, had missing data. The percentage missing increases to 5.7, 10.5, and 12.4 for the green card, refugee, and temporary visa questions, respectively. The bulk of this increase is due to the fact that the same 69 respondents with missing data on the citizenship question also had missing data on all of the subsequent questions. Of all of the foreign-born respondents in LAFANS, only 4.3 percent (N = 84) have an ambiguous immigration status due to non-response to the series of questions that determine legal status. We assume that these ambiguous cases are unauthorized, but the profile of the unauthorized population (presented below) does not change when we delete these cases from the sample.

Table 1. Missing Data on Citizenship and Immigration Status Questions among Adult Immigrants in Los Angeles, 2001
 YesNoMissing% Missing
  1. Source: LAFANS, Wave 1 (2001), Adult File.

  2. Note: Cases in shaded cells are assumed to be unauthorized.

  3. a

    Asked of all non-citizens.

  4. b

    Asked of non-citizen, non-LPRs.

  5. c

    Asked of non-citizen, non-LPRs and non-refugees.

  6. d

    Asked of persons with a temporary visa.

Naturalized Citizen?5221,358693.7
LPR/Green Card?a588762775.7
Temporary Visa?c1175638412.4
Temporary Visa Still Valid?d942300.0

Also as hypothesized, non-response to the legal status items in SIPP is much higher than in the LAFANS (Table 2). Of the 9,178 foreign-born persons asked the question about arrival status, 27.2 percent of the responses were allocated by the Census Bureau. And of the 2,445 respondents asked the question about adjusting to LPR status, 17.8 percent had allocated responses. Both of these percentages exceed the 15 percent threshold beyond which Allison (2002) suggests multiple imputation techniques be used for the handling of missing data. However, it is important to note that legal status in the SIPP is indeterminable only for those non-naturalized (or missing), non-LPR arrivals (or missing) who also have missing data on the status adjustment question. Of the 2,494 immigrants with missing arrival status data, 1,552 (62%) were reported as naturalized citizens, so it can be inferred that these respondents either arrived as LPRs or arrived illegally and subsequently adjusted to permanent status before naturalizing. And among the 2,973 non-citizen immigrants with unknown or valid non-LPR arrival status, 1,168 did not provide a valid response to the question about adjustment of status. Thus, in total, 1,168 of 9,178 immigrant adults in the SIPP (12.7%) had an ambiguous legal status when considering the three questions jointly.

Table 2. Allocation Rates for Citizenship and Immigration Status Questions among Foreign-Born Respondents Aged 15 and Older, SIPP, wave 2, 2004
 Naturalized Citizen?aLPR Arrival?aLPR Adjustee?b
  1. Source: Survey of Income and Program Participation (SIPP), Wave 2 2004 data were collected between February and August, 2004.

  2. a

    Asked of all foreign-born respondents.

  3. b

    Asked of non-naturalized, non-LPR arrivals.

Unallocated Responses
Not in Universe006,733
Allocated Responses
Total N in Universe9,1789,1782,445
% Allocated0.827.217.8

Additional evidence suggests that the relatively high item-allocation rates for the legal status questions in SIPP may derive from issues related to the survey design or field operations rather than from the content of the questions per se. First, the percentage of missing values on the question about year of entry is also very high, 25.3. And among those who answered the question on year of entry, only 8.5 percent failed to answer the questions on legal status at entry. Thus, non-response to the legal status questions is concentrated among a relatively small percentage of immigrants who also did not answer other questions about their migration experience. This point is further illustrated in Tables 3 and 4. In Table 3 we compare the range of non-response/allocation to legal status items in LAFANS and SIPP to the range in five other publicly available surveys that have also asked legal status questions (more information about these data sources can be found in the Appendix). Relative to the other six surveys listed in Table 3, SIPP easily has the highest rate of non-response to the legal status indicators, consistent with the notion that something particular to the SIPP data collection is responsible for the high allocation rate, rather than the sensitive nature of the legal status questions.

Table 3. Range of Non-Response to Survey Questions on Immigrants' Citizenship and Legal Status in Selected, Publicly Available U.S. Surveys
SurveyData Collection MethodPrivate or Public SponsorYear(s)# Citizenship/Legal Status QuestionsRange of Non-Response (%)
Survey of Income and Program Participation (SIPP)In-PersonPublic1996–200844.6–25.3
The Los Angeles Family and Neighborhood Survey (LAFANS)In-PersonPrivate2001 and 200861.4–6.8
National Agricultural Workers Survey (NAWS)In-PersonPublic1988–200910.5–5.5
Immigration and Intergenerational Mobility in Metropolitan Los Angeles (IIMMLA)TelephonePrivate2004–200570.3–13.1
The Immigrant Second Generation in Metropolitan New York (ISGMNY)TelephonePrivate1999–200030.1–1.6
National Asian American Survey (NAAS)TelephonePrivate200820.2–5.2
Multi-City Study of Urban Inequality (MCSUI)In-PersonPrivate1992–199430.1–0.5
Table 4. Missing or Allocated Data (Percentages) for Immigration-Related Questions in Selected U.S. Surveys
 Country of BirthYear of ImmigrationCitizenshipLegal Statusa
  1. a

    Percentage refers to the share of the sample whose legal status cannot be determined.

  2. b

    Citizenship included as a category on a single legal status question.

  3. c

    Data quality flags for the CPS citizenship question not provided by IPUMS-CPS.

  4. d

    Legal status not measured.

Los Angeles Family and Neighborhood Survey (LAFANS), 20010.233.953.544.31
Survey of Income and Program Participation (SIPP), 20040.0825.350.8413.62
National Agricultural Workers Survey (NAWS), 2004–20050.250.13 b 0.41
National Asian American Survey (NAAS), 20080.316.61 b 6.23
Immigrant Second Generation in Metropolitan New York (ISGMNY), 1999–20000.000.620.091.59
Immigrant Integration and Mobility in Metropolitan Los Angeles (IIMMLA), 2004–20050.610.730.306.90
Multi-City Study of Urban Inequality (MCSUI), 1992–19941.472.164.846.99
American Community Survey (ACS), 20052.696.153.35 d
Current Population Survey (CPS), March 20040.4410.47 c d
National Health Interview Survey (NHIS), 20040.202.592.59 d

In Table 4 we present allocation rates for other immigration-related questions in the seven surveys with legal status indicators as well as three large government-sponsored surveys that are often used to study the immigrant population but do not measure legal status. Relative to other immigration-related variables, especially year of immigration, non-response/allocation for legal status is not appreciably higher in most surveys that measure it. To put these allocation rates into perspective, the average allocation rate of person-level variables in the 2005 American Community Survey was 3.7 percent. Thus, immigration-related variables tend to have higher than average rates of missing data, although the results here suggest that legal status variables are not more prone to non-response than other immigration-related variables such as year of immigration. Finally, it is also worth noting that other variables commonly used in social science research, such as income, are also allocated at relatively high rates. For example, the Census Bureau reports that 18 percent of the persons asked income questions in the 2005 ACS had their responses allocated.

Do Questions about Legal Status Produce a Chilling Effect?

Our second research question is whether asking about immigrants' legal status will have a “chilling effect” that leads subsequently to high unit non-response, or to refusal to participate in the survey altogether. We examine this question by comparing two outcomes: (1) non-response to the survey item immediately following the SIPP questions on legal status and (2) attrition from the SIPP panel between wave 2, when the immigration status questions were asked, and wave 3. If legal status questions exerted a chilling effect, we would expect to find higher rates of subsequent non-response and panel attrition among the unauthorized compared with legal non-citizens and naturalized citizens. The results are shown in Table 5 using the three status assignment methods described earlier. Because unauthorized status is positively correlated with other characteristics that have been found to be associated with non-response (Groves, Cialdini, and Couper, 1992), Table 5 presents, in addition to zero-order comparisons, adjusted probabilities based on logit models controlling for age, sex, marital status, education, home ownership, English-language proficiency, and duration of U.S. residence. The logit models are weighted using the SIPP person weights, and thus, the predicted probabilities can be interpreted as the percentage of the wave 2 population not represented in the third wave.

Table 5. Predicted Probabilities (Percentages) from Logit Models Predicting Non-Response to the Survey Question Immediately Following Legal Status Questions (A) and Panel Attrition Between Waves 2 and 3 (B), Foreign-Born Respondents ages 15 and Older, SIPP, 2004
 UnauthorizedLegal Non-CitizensNaturalizedPseudo-R2
  1. a

    Difference from unauthorized immigrants significant at p<.001.

  2. b

    Difference from unauthorized immigrants significant at p<.01.

  3. c

    Difference from unauthorized immigrants significant at p<.10.

  4. d

    Adjusted models include controls for age, sex, educational attainment, employment status, home-ownership, and English language proficiency.

A. Subsequent Non-Response
Hot Deck
Multiple Imputation
Logical Imputation
B. Attrition, Wave 2–3
Hot Deck
Multiple Imputation
Logical Imputation

We begin by focusing on comparative probabilities of non-response to the question immediately following the SIPP legal status questions, which in this case is the first in a series of questions about relationships to other household members (Panel A). Overall, non-response to this question is relatively low, although in the unadjusted case significantly higher among the unauthorized. For example, when legal status is assigned using either the hot deck or logical imputation method, the unauthorized are about twice as likely not to respond to the subsequent survey item as their legal non-citizen counterparts and 33 percent more likely when legal status is made using the multiple imputation method. Although the unauthorized are significantly more likely not to respond to the item following the legal status questions, their overall rate of non-response – 7.2 percent, 7.2 percent, and 5.7 percent for the hot deck, logical, and multiple imputation methods, respectively – is not so high as to suggest that their non-response derives from the chilling effects of the legal status questions. If this were the case, one would expect substantially higher rates of non-response among the unauthorized, and for legal status alone to be a stronger determinant of non-response than indicated by the very small pseudo-R-squared statistics reported for the unadjusted models in panel A of Table 5. Rather, we see that non-response is determined to a far greater extent by other characteristics also associated with legal status, as evidenced by the fact that predicted probabilities of non-response among the unauthorized are reduced in the unadjusted models, that the gap in non-response between unauthorized and legal immigrants decreases when controls are introduced, and that the pseudo-R-squared statistic is roughly three times larger in the adjusted versus the unadjusted model.

In Panel B of Table 5, we test a second type of chilling effect, attrition from the SIPP panel between waves 2 and 3. As would be expected, the overall rate of attrition is higher than the rate of non-response to subsequent survey items. Unauthorized migrants, regardless of the method used to assign legal status, are significantly less likely than legal migrants to be observed in the subsequent wave of the survey, but the difference is not so large to provide evidence that unauthorized immigrants are disproportionately dissuaded from subsequent participation in the survey due to being asked about their legal status. Moreover, the gap in rates of attrition across legal statuses diminishes somewhat with the introduction of the control variables, but the pseudo-R-squared statistics strongly suggest that attrition is driven largely by unobserved variables. While we cannot conclude with certainty the reason behind the relatively higher rates of panel attrition among the unauthorized, it is not surprising to find somewhat higher rates of attrition among the unauthorized given that more recently arrived immigrants live in more complex household arrangements and experience more turnover (Van Hook and Glick, 2007), but the magnitude of the difference between the rates of attrition for the unauthorized and other groups does not appear to reflect a chilling effect.

Comparative Profiles of the Unauthorized Foreign-Born Population

One reasonable concern over the use of measures of immigrants' legal status in surveys is that unauthorized immigrants may see no reason to report their status honestly, especially to representatives collecting data on behalf of the U.S. government. For this concern to be warranted, we would expect profiles of the unauthorized population that are based on self-reported data to deviate from those based on independent estimates derived largely from administrative and non-self-reported data. Thus, we turn now to comparisons of the unauthorized population estimated in LAFANS and SIPP to independent estimates derived from residual methods.

Residual methods estimate the size and demographic characteristics of the unauthorized population by subtracting a demographic estimate of the legally resident foreign-born population from an estimate based on a population survey (such as the CPS or ACS) or census count of the total foreign-born population. In short, the estimated number of unauthorized immigrants is the difference between the population estimate of legal immigrants and the survey-based estimate of all foreign-born. The two most commonly cited estimates of the characteristics of the unauthorized population are those of the Pew Hispanic Center (Pew) and the Department of Homeland Security (DHS), each of which uses variations of a residual-based method. The DHS estimates are based on the ACS and use administrative records on legal admissions to remove from the ACS the legalized population by age, sex, state of residence, and country of birth. The Pew estimates are based on the Current Population Survey, but use a very similar methodology as DHS to produce residual-based estimates of the unauthorized population. The Pew Hispanic Center also produces estimates of detailed characteristics of the unauthorized population based on imputed measures of legal status in the Current Population Survey. This methodology uses an assignment algorithm developed by Passel (Passel and Clark, 1998; Passel, Van Hook, and Bean, 2006), which identifies all the foreign-born individuals in the Current Population Survey who have a very low probability of being unauthorized (e.g., persons arriving before 1980, persons from major refugee sending countries, persons reporting as naturalized, veterans, public assistance recipients, persons in specialty occupations, etc.). Among the remaining potentially unauthorized, the method assigns unauthorized status probabilistically based on already established percentages unauthorized within occupation, state, and sex cells.

It is important to note that we are not here evaluating the accuracy of one set of estimates versus another. Rather, our goal here is to seek evidence that might give researchers pause in utilizing LAFANS and SIPP to study the unauthorized population. We propose that such evidence for concern would be present if the survey-based estimates varied dramatically from the Pew and DHS estimates, especially in instances when the latter two estimates are similar to each other.

Table 6 compares the LAFANS-based estimates (thus weighted using the adult person weights provided in the data) of the unauthorized foreign-born population in Los Angeles to estimates derived using the Passel algorithm (as published in an Urban Institute report by Fortuny, Capps, and Passel (2007)). Comparisons are made with respect to the percentage of the foreign-born population that is estimated to be unauthorized and the following characteristics of the unauthorized population: duration of U.S. residence (ten or fewer years), country of birth, and sex and age composition. It should be noted that all of the LAFANS-based estimates are for the adult unauthorized population, whereas the residual-based estimate is for the entire population with the exception of the sex and age distribution estimates, which are limited to the adult unauthorized population.3 About 12% of unauthorized persons are children younger than 18 (Hoefer, Rytina, and Baker, 2008). In addition, the LAFANS estimates are, of course, for the year 2001, while the residual estimates are based on March CPS data from 2004.

Table 6. Comparative Profiles of the Adult Unauthorized Foreign-Born Population in Los Angeles County, by Estimation Source
 LAFANS, 2001Resdiual Method
  1. a

    LAFANS estimates are weighted percentages of adults, aged 18+.

  2. b

    Fortuny, Capps, and Passel (2007), Table 7, based on 2000 Census data.

  3. c

    Fortuny, Capps, and Passel (2007), Table 8, based on 2004 March CPS data, unauthorized population, all ages.

  4. d

    Fortuny, Capps, and Passel (2007), Table 14, based on 2004 March CPS data, adults aged 18–64.

  5. e

    Fortuny, Capps, and Passel (2007), Table 9, based on 2004 March CPS data, adults ages 18+.

% Unauthorized (of foreign-born)26.326.2b
% in U.S. < 10 Years, Unauthorized52.151.0c
Birthplace, Unauthorized (%)
Other Latin America30.128.0c
% Male, Unauthorized52.953.9d
Age Distribution, Unauthorized (%)

The two estimates are similar with respect to the share of the foreign-born population that is unauthorized and the percentage of recent arrivals among the unauthorized. Both the LAFANS and the Fortuny et al. estimates indicate that unauthorized immigrants comprise about 26 percent of the foreign-born population and that among the unauthorized, 51–52 percent arrived in the United States within the previous 10 years. The estimates diverge, however, with respect to the country/region of birth distribution of the unauthorized. In particular, LAFANS estimates that 65 percent of the Los Angeles County unauthorized population is Mexican-born compared to 57 percent estimated by Fortuny, Capps, and Passel (2007). And, the LAFANS estimate of the unauthorized population that is Asian-born (4%) is substantially lower than the residual-based estimate (12%). We can only speculate about the reason(s) for this discrepancy, but one possibility is that it stems from the fact that the LAFANS was administered only in English and Spanish, which could have led to undercoverage among the Asian-born unauthorized population.

Turning finally to comparisons with respect to sex and age distributions of the unauthorized population over age 17, we find that LAFANS and the residual estimate are comparable in terms of the percentage of unauthorized adults estimated to be male (53% in LAFANS and 54% in Fortuny et al.), but the LAFANS estimates a relatively younger adult unauthorized population than the residual-based estimate. Again, the source of the age discrepancy is unclear. One possibility is that the focus by LAFANS on development outcomes among young children may have led to a sample biased toward young adult parents that is not fully accounted for in the person weights.

Table 7 presents comparisons between estimated (weighted using the SIPP person weights for wave 2) characteristics of the unauthorized population from the 2004 SIPP and estimates published by Pew and DHS, respectively. Table 7 reports three SIPP-based estimates, one for each of the three legal status assignment methods described earlier. The SIPP estimates are for the adult (age 18+) population; unless otherwise noted, the Pew and DHS estimates are for the entire unauthorized population. In addition, as noted in Table 7, while most of the Pew and DHS estimates are based on 2005 CPS/ACS data, estimates for some characteristics were published more recently and thus are based on subsequent years of data.

Table 7. Comparative Profiles of the Adult Unauthorized Foreign-Born Population in the United States, by Estimation Method/Source
 SIPP, 2004Residual Estimates
Hot-Deck AllocationMultiple ImputationLogical AllocationPewDHS
  1. Note: SIPP estimates are for the unauthorized population aged 15 and older; pew and DHS estimates are for the total unauthorized population (all ages); SIPP estimates are weighted.

  2. a

    Passel (2005), based on 2004 March Current Population Survey data.

  3. b

    Authors' calculations based on figure 7, Passel and Cohn (2009), based on 2008 March Current Population Survey data.

  4. c

    Hoefer, Rytina, and Campbell (2006), residual estimates using the 2005 American Community Survey.

  5. d

    Hoefer, Rytina, and Baker (2008), residual estimates using the 2007 American Community Survey.

Legal Status of Foreign-Born (%)
Legal Non-Citizen36.541.235.339.0an/a
Unauthorized, Years in the U.S. (%)
Unauthorized, Country of Birth (%)
Other Latin America20.319.520.124.0an/a
Europe & Canada6.
Africa & Other4.
Unauthorized, State of Residence (%)
New York6.07.410.27.0a5.0c
Unauthorized, Male (%)53.653.854.756.0a56.6d
Unauthorized, Age Distribution (%)

With respect to the share of the foreign-born population that is unauthorized, all three SIPP estimates are lower compared to the Pew estimate, but the extent to which this is true varies across the three methods. When legal status is assigned using the Census Bureau's hot deck allocation method, just 18 percent of the foreign-born population is estimated to be unauthorized in the SIPP, compared to 29 percent estimated by Pew. Conversely, when legal status is assigned in the SIPP by the hot deck method, nearly 46 percent of the immigrant population is estimated to consist of naturalized citizens, nearly 18 percentage points higher than the Pew estimate of 32 percent. The multiple imputation and logical allocation methods (and especially the latter) yield estimated legal status distributions that more closely align with the Pew estimate, relative to the hot deck allocation method. For example, based on the logical allocation of unknown legal statuses in the SIPP, about 26 percent of the foreign-born population is estimated to be unauthorized compared to 29 percent by Pew; 35 percent of the SIPP foreign-born population is estimated to be legal non-citizens, versus 39 percent by Pew; and 39 percent are estimated to be naturalized citizens compared to 32 percent in the Pew estimate.

Thus, regardless of the method used in the SIPP, the foreign-born population is estimated to have a disproportionately larger share of naturalized citizens than is true for the Pew estimate. We suspect that this may have to do with the tendency of some immigrants, especially those from Mexico, to misreport as naturalized citizens in Census surveys (Van Hook and Bachmeier, 2013). In the residual-based Pew estimates, adjustments are made in an attempt to account for this tendency (Passel, Van Hook, and Bean, 2006), whereas the only such adjustment we make in the SIPP data is to recode persons reporting as naturalized citizens that have not been in the United States at least 5 years (unless they have a U.S. citizen spouse), as non-citizens. This methodological difference in the handling of naturalization reports, thus, may in part explain differences in the estimated legal status distribution of the foreign-born population.

Variation also exists with respect to the duration of residence of the estimated unauthorized population across the three assignment methods used in the SIPP.4 Using the hot deck and multiple imputation methods, 73 percent and 75 percent, respectively, of the unauthorized foreign-born population are estimated to have lived in the United States ten years or less. Using the logical allocation method, the estimated percentage of 64.7 closely aligns with the Pew estimate, 65 percent, both of which are higher than the DHS estimate of 59 percent.

Turning to country/region of birth of the unauthorized population, Mexicans predominate in all of the estimates presented in Table 7, although some variation does exist across the SIPP estimates. Mexicans are relatively less numerous when legal status is assigned using the hot deck allocation method (51.6% Mexican), compared to the multiple imputation method (56.4%) and the logical allocation method (54.8%), both of which are more in line with the identical Pew and DHS estimates of 57 percent. Conversely, the hot deck estimate of the share of the unauthorized population consisting of persons born in Asia (17.7%) is significantly higher than the other four estimates. And with respect to the state of residence of the unauthorized foreign-born population, there is general agreement across the set of five estimates reported in Table 7, which are listed for the five states with the largest concentrations of unauthorized residents.

There are modest differences between the three SIPP estimates and the Pew and DHS estimates with respect to the gender composition of the unauthorized population, with the estimated SIPP unauthorized population being slightly less male. Greater variation exists across the five estimates with respect to the age distribution of the unauthorized population. Sixty-one, 63, and 65 percent of the adult unauthorized population in the hot deck, multiple imputation, and logical allocation SIPP estimates, respectively, are between the ages of 18 and 34. These estimates are substantially higher than the Pew estimate of 51.6 percent, but relatively comparable to the DHS estimate, 60 percent.

In summary, SIPP-based estimates of the characteristics of the unauthorized population compare favorably to estimates derived from other data sources and using other methods. In instances where SIPP-based estimated characteristics of the unauthorized population diverge from residual-based estimates, there also tends to be little agreement between the two residual-based estimates (e.g., duration of U.S. residence and the age distribution of the unauthorized population). Most importantly, we find little in Table 7 to suggest that misreporting of legal status is so widespread in the SIPP to lead to substantially biased estimates of the unauthorized immigrant population. This conclusion varies somewhat depending on the method used to handle missing data for the legal status measures, as the multiple imputation and logical allocation methods tend to produce profiles of the unauthorized population that are more in line with those published by the Pew Hispanic Center and Department of Homeland Security.


Over the past two decades, the foreign-born population in the United States has grown increasingly heterogeneous with respect to legal status (Bean et al., 2013), all within a policy context in which legal status is a particularly salient characteristic that structures the opportunities available to immigrants and their children (Bean et al., 2011; Donato and Armenta, 2011; Kasinitz, 2012). Due to the sheer size of the unauthorized population and the number of U.S.-born children whose development and incorporation outcomes are likely to be affected by the legal status of their parents, there is a growing need for empirical research examining this association. Unfortunately, the nationally representative data sources that are most often used to study the incorporation of immigrant populations, such as the American Community Survey (ACS) and the Current Population Survey (CPS), do not include questions about immigrants' legal status (Clark and King,2008). In this research note, we have examined data from two large-scale surveys (one that is national in scope) that do include measures of legal status and thus serve as valuable sources of data for researchers to begin to examine the consequences of unauthorized status for key incorporation outcomes.

Not only have the LAFANS and SIPP been underutilized in research on immigrant group incorporation, the quality of the data on the legal status measures, which make these surveys distinctive, has not been assessed, leaving unaddressed the issue of whether immigrants would even answer questions about their legal status. Here we address this issue and find that non-response to questions about legal status is largely not a concern in the LAFANS. However, in the SIPP, there is reason for concern due to that survey's relatively high allocation rates. The difference in the rate of missing data between the two surveys appears to be consistent with opinions expressed by a panel of immigration researchers and immigrant advocates consulted for the 2006 GAO report, which held that one way to ensure the highest possible response rates to survey questions related to immigrants' legal status would be to have surveys administered by a private research firm or a university, rather than by a government agency or department (U.S. Government Accountability Office, 2006), as is the case with the SIPP. It is also possible that the increased privacy protections implemented in the LAFANS helped to keep rates of item non-response relatively low. However, more research is necessary to assess whether the higher non-response in the SIPP is due to these factors, or whether it results from other problems with the SIPP's immigration module. The SIPP stands out among several other governmental surveys as having the highest non-response on immigration items, including questions unrelated to the sensitive topic of legal status. Additionally, despite the relatively high allocation rates in the SIPP, we find little or no evidence suggesting that the introduction of questions about immigrants' legal status during wave 2 of the survey leads to disproportionately high rates of subsequent item non-response or attrition in wave 3 among those with the greatest likelihood of being unauthorized.

Moreover, profiles of the unauthorized population estimated from LAFANS and especially the SIPP compare favorably to independent estimates based on the residual method, suggesting that the two survey samples yield reasonably similar snapshots of the unauthorized populations in Los Angeles and the United States, respectively. Taken together, the findings presented here thus suggest that the assumption that questions about immigrant legal status are too sensitive to be asked seems unwarranted. The collection of such data is feasible and should be considered in future survey efforts. The results reported here also confirm that LAFANS and SIPP are valuable, although underutilized, sources of data that can be employed to study variation in multiple dimensions of immigrant incorporation across legal and citizenship statuses. A wide range of outcomes can be analyzed using these surveys, including educational attainment and transitions, health and well-being of both adults and children, employment, marriage, and fertility, to name just a few.

It is worthwhile to note that the data analyzed here were collected between 2001 and 2004, prior to the widespread emergence of immigrant enforcement measures at federal, state, and local levels. It is therefore possible that the conclusions drawn here about the collection and use of survey data on legal status may not apply in the more recent enforcement context. The extent to which such contexts yield a chilling effect on immigrant respondents in all surveys, not just those that may include questions on legal status, merits future research attention. With that being said, supplementary analyses (available upon request) of the 2008 SIPP panel do not suggest that increases in immigration enforcement have had appreciable impacts on response rates to the legal status questions.

Also, we stress that this research note is concerned with assessing the feasibility of collecting unbiased data on immigrants' legal status. As with any effort to collect data of a sensitive nature, researchers designing surveys to collect legal status information need to take extraordinary precaution to protect respondents' confidentiality. These issues have already been given careful consideration by the LAFANS research team at RAND (Prentice, Pebley, and Sastry, 2006), whose work has established a precedent for ensuring the collection of legal status information while simultaneously minimizing risks of disclosing the identities of immigrant respondents.

Finally, it is important to note that the conclusions drawn here about the measurement of immigrants' legal status may well vary across U.S. states. Just as state- and local-level policy contexts can shape immigrants' incorporation outcomes, they may also influence response rates to questions about nativity and/or legal status, thus perhaps affecting state-level estimates of the unauthorized foreign-born population, such as those recently published by Warren and Warren (2013). Future research should examine the extent to which direct measures of immigrants' legal status may be more effective in states with less restrictive policies vis-à-vis immigrants. Internationally, it follows that the measurement of legal status may be more successful in nations with policies that are relatively more likely to promote and facilitate the incorporation of immigrant populations. Recent comparative research suggests that the children of immigrants in major immigrant destinations with relatively more inclusive policy regimes experience more favorable integration outcomes (Bean et al., 2012). The implication from this research is that efforts to measure the legal status of immigrants may be met with more success in nations with relatively more inclusionary immigrant incorporation policy regimes.


  1. 1

    The two surveys discussed in the GAO report are the SIPP and the National Agricultural Workers Survey (NAWS).

  2. 2

    The imputation model includes the following variables: citizenship, arrival status, adjustee status, years of U.S. residence, earliest known U.S. migration, country of birth, English language proficiency, age, age-squared, sex, family size, family structure, number of minor children in the family, number of social security recipients in the family, poverty level, home ownership, and health insurance coverage status. The imputations are performed using the Imputation with Chained Equations (ice) user-written package for STATA 10.2 (Royston and White, 2011) Further information about the imputation model is available upon request.

  3. 3

    Given that the survey-based estimates are for the adult unauthorized population while most of the residual-based characteristics are for the entire unauthorized population, this age difference in the populations being compared may account for some observed differences in the profile; though, it is not possible for us to estimate how much. It seems likely that the influence of the dissimilarity would be strongest when the characteristic in question varies across age groups.

  4. 4

    We remind readers that it is important to bear in mind that 25 percent of the foreign-born sample in the 2004 SIPP panel had missing year of arrival information and that duration of residence has thus been imputed by the authors (the Census Bureau did not impute this variable) using the multiple imputation model described in the data and methodological section.


Surveys That Include Legal Status Measures

In this appendix we briefly describe the data sources that are not the focus of this paper, but are listed in Tables 3 and 4.

Immigration and Intergenerational Mobility in Metropolitan Los Angeles (IIMMLA)

IIMMLA is a phone survey of the adult children of immigrants, aged 20–40, living in the Los Angeles metropolitan area. The survey was carried out in 2004 and 2005 and was funded by the Russell Sage Foundation and coordinated by research at the University of California, Irvine, and UCLA. More information about the survey is available at

Immigrant Second Generation in Metropolitan New York (ISGMNY)

ISGMNY is a phone survey of the adult children of immigrants, aged 18–32, living in the greater New York metropolitan area. The survey was carried out in November and December 1999 and was funded by the Russell Sage Foundation, the Ford Foundation, the Andrew W. Mellon Foundation, and the Rockefeller Foundation. More information can be obtained at

Multi-City Study of Urban Inequality (MCSUI)

MCSUI is an in-person survey of adults and employers. Data were collected in four metropolitan areas: Atlanta (April 1992–September 1992), Boston (May 1993–November 1994), Detroit (April-September 1992), and Los Angeles (September 1993–August 1994). The study was funded by the Russell Sage Foundation and the Ford Foundation. More information is available at

National Agricultural Workers Survey (NAWS)

NAWS is nationally representative face-to-face survey of the U.S. agricultural workforce, conducted annually since 1988 by the U.S. Department of Labor. More information is available at

National Asian American Survey (NAAS)

NAAS is a phone survey of Asian-American residents in the United States. The data were collected between August and October 2008 with funding from the James Irvine Foundation, Rutgers University, the Carnegie Corporation, and the Russell Sage Foundation. More information is available at and at

Surveys That Do Not Include Legal Status Measures

American Community Survey (ACS)

The ACS is an annual survey that samples approximately one percent of the entire U.S. population and has been conducted since 2001 by the U.S. Census Bureau. More information about the ACS is available at

Current Population Survey (CPS)

The CPS is a monthly survey of about 60,000 American households jointly sponsored by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics. Data are collected through a combination of in-person and telephone interviews. More information is available at

National Health Interview Survey (NHIS)

The NHIS is an annual health survey of the U.S. civilian non-institutionalized population. The survey is sponsored by the Centers for Disease Control and Prevention and is administered via in-person interviews by the U.S. Census Bureau. More information about the NHIS is available at