SEARCH

SEARCH BY CITATION

Keywords:

  • emigration and immigration;
  • Surveillance, Epidemiology, and End Results (SEER) program;
  • multiple imputation;
  • Hispanic Americans;
  • health status disparities

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

BACKGROUND

Although birthplace data are routinely collected in the participating Surveillance, Epidemiology, and End Results (SEER) registries, such data are missing in a nonrandom manner for a large percentage of cases. This hinders analysis of nativity-related cancer disparities. In the current study, the authors evaluated multiple imputation of nativity status among Hispanic patients diagnosed with cervical, prostate, and colorectal cancer and demonstrated the effect of multiple imputation on apparent nativity disparities in survival.

METHODS

Multiple imputation by logistic regression was used to generate nativity values (US-born vs foreign-born) using a priori-defined variables. The accuracy of the method was evaluated among a subset of cases. Kaplan-Meier curves were used to illustrate the effect of imputation by comparing survival among US-born and foreign-born Hispanics, with and without imputation of nativity.

RESULTS

Birthplace was missing for 31%, 49%, and 39%, respectively, of cases of cervical, prostate, and colorectal cancer. The sensitivity of the imputation strategy for detecting foreign-born status was ≥ 90% and the specificity was ≥ 86%. The agreement between the true and imputed values was ≥ 0.80 and the misclassification error was ≤ 10%. Kaplan-Meier survival curves indicated different associations between nativity and survival when nativity was imputed versus when cases with missing birthplace were omitted from the analysis.

CONCLUSIONS

Multiple imputation using variables available in the SEER data file can be used to accurately detect foreign-born status. This simple strategy may help researchers to disaggregate analyses by nativity and uncover important nativity disparities in regard to cancer diagnosis, treatment, and survival. Cancer 2014;120:1203–1211. © 2014 American Cancer Society.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

The study of the effects of immigration on cancer patterns has become increasingly important for health disparities research. Immigrants currently comprise 13% (nearly 40 million persons) of the United States population.[1] Although immigrants generally have lower overall cancer mortality than their US-born counterparts,[2] they have higher mortality from infection-associated cancers (eg, gastric, liver, and cervical cancer) and screenable cancers (eg, cervical and colorectal cancer). For screenable cancers, disparities in access and use of early detection services may lead to disparities in incidence, stage at diagnosis, and cancer-specific survival.[3-6]

The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute provides population-based data on cancer incidence and survival in the US[7] and has been used extensively to evaluate cancer health disparities.[8] However, analyses by immigrant status are lacking due to inadequate denominator data as well as incomplete reporting of immigrant status. Although data regarding birthplace are routinely collected in participating SEER registries, such data are missing for a large percentage of cases. Furthermore, the distribution of missing data are nonrandom and is related to variables including nativity, vital status, age, sex, ethnicity, and certain hospital characteristics.[9-13] Of particular concern for nativity disparities research is the observation that missing birthplace data are more common among US-born versus foreign-born cases[12] and among cancer survivors versus those who died, given that birthplace is often ascertained from the death certificate when it is not available from other sources.[9, 10] Due to the nonrandom distribution of missing birthplace data, strategies such as listwise deletion and allocation proportional to the distribution of birthplace in the population may cause significant bias in estimates. Nonetheless, these strategies have been commonly used in nativity disparities research.[14-26]

Multiple imputation is a strategy whereby missing values are replaced with ≥ 2 values representing a distribution of probabilities.[27, 28] It has been used extensively to deal with missing data in complex health data sets.[29] However, to our knowledge, there have not been any studies to evaluate the accuracy of multiple imputation of nativity in the SEER data file. Although Choe et al used multiple imputation of nativity, ethnicity, and stage at diagnosis,[9] the purpose of their analysis was to compare cancer-specific survival among US-born and foreign-born Asian and Pacific Islander patients with colorectal cancer. In the current study, we specifically evaluated the accuracy of multiple imputation for detecting foreign-born status and demonstrated how multiple imputation of nativity may overcome significant biases that occur in cancer survival analyses when cases with missing birthplace information are simply omitted from the database and ignored during analysis. We conducted our analysis among Hispanic patients diagnosed with invasive cervical, prostate, and colorectal cancer, the primary screenable cancers for females, males, and both sexes, respectively. We focused on Hispanics because they are the largest and fastest-growing minority group in the US with a large percentage of foreign-born individuals (> 40%).[30]

MATERIALS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

Data Source

Data were obtained from the SEER program (June 2012 release). SEER registries cover 18 geographic areas (Alaska, Arizona, Connecticut, Hawaii, Iowa, Louisiana, Kentucky, New Jersey, New Mexico, Utah, San Jose-Monterey, Los Angeles, San Francisco-Oakland, Greater California, Rural Georgia, Atlanta, Detroit, and Seattle-Puget Sound), which together represent approximately 28% of the US population and 41% of the US Hispanic population.[7]

Cases were Hispanic women and men living in a SEER catchment area who were diagnosed with microscopically confirmed, primary invasive cervical (International Classification of Diseases for Oncology, 3rd Edition [ICD-O-3] codes C53.0-53.9), prostate (ICD-O-3 code C619), or colorectal (ICD-O-3 codes C180, C182-C189, C199, C209, and C260)[31] cancer between January 1, 1988 and December 31, 2009. The year 1988 was chosen as the lower time limit because it was the first year in which Hispanic ethnicity was collected for all SEER cases.[32] We included cases with known Hispanic ethnicity as well as those with evidence of Hispanic ethnicity based on surname.[33] The numbers of patients of Hispanic origin with a confirmed diagnosis of cervical, prostate, or colorectal cancer were 10,399, 52,346, and 32,880, respectively. Of these, we excluded the ≤ 2% of cases with unknown age at the time of diagnosis (2 cases, 447 cases, and 4 cases, respectively, in the cervical, prostate, and colorectal cancer databases), cancer-directed surgery (cervical: 18 cases; prostate: 145 cases; and colorectal: 48 cases), and radiation therapy (cervical: 164 cases; prostate: 785 cases; and colorectal: 348 cases).

Imputation of Nativity

After the exclusion of cases based on the criteria described above, nativity was the only variable with missing values (monotone missing pattern). As discussed earlier, missing nativity follows a nonrandom pattern. We used the SAS Multiple Imputation procedure to generate nativity values by the logistic regression imputation method (PROC MI with LOGISTIC in the MONOTONE statement [SAS version 9.2; SAS Institute, Cary, NC]).[34, 35] Specifically, among those not missing nativity data, a logistic regression model was fitted for nativity (the dependent variable) by the maximum likelihood method using a group of independent variables selected a priori for each cancer separately. Candidate independent variables were those known to be associated with missing nativity status,[11-13] those significantly associated with nativity in our data set, and others of clinical relevance. They included age at diagnosis, stage at diagnosis, receipt of cancer-directed surgery, receipt of radiation therapy, SEER site, Hispanic origin, reporting source, sex (for colorectal cancer only), and anatomical subsite (for colorectal cancer only). Independent variables were omitted from the model if they did not increase the model's accuracy according to the global F-test. The area under the receiver operator curves (ROC) and R-squared values were used to describe how well the models fit the data.

Age at diagnosis was treated as a continuous variable in the model. Tumor stage at diagnosis was defined using the SEER historic staging scheme, which classifies cervical and colorectal tumors as local, regional, or distant, and prostate tumors as local/regional or distant.[36] Cases for which the stage at diagnosis was missing (4.52%, 15.45%, and 4.43%, respectively, of cases of cervical, prostate, and colorectal cancer) were kept in the data set and categorized as unknown. The receipt of cancer-directed surgery and radiation therapy variables were categorized dichotomously (yes/no). Hispanic origin was based on the SEER recoded variable and categorized as specified Spanish/Hispanic origin, not otherwise specified Spanish/Hispanic origin, and surname match only. Specified Spanish/Hispanic origin included those of Mexican, Puerto Rican, Cuban, Dominican Republic, and South or Central American (excluding Brazilian) origin and those of other specified Spanish/Hispanic origins (including European). Reporting source was categorized as hospital inpatient, physician's office, or other. Anatomical subsite was used for colorectal cancer only and was categorized as proximal, distal, rectum, or other.

To impute the missing values for nativity, we randomly divided cases with known nativity into a test group (80% of cases) and a validation group (20%). In the test group, a new regression model was simulated over 20 iterations for each cancer using the posterior predictive distribution[27] of parameters based on the fitted regression coefficients. For each iteration, missing nativity was imputed as either 1 (foreign-born) or 0 (US-born). The imputed values were then averaged across all iterations. Missing nativity values in the final data set for each cancer were recoded as 1 (foreign-born) if the mean imputed value across the 20 data sets was > 0.5 or 0 (US-born) if the mean imputed value was ≤ 0.5. The imputation strategy was then used in the validation group to calculate the sensitivity and specificity for detecting foreign-born cases, the percentage of misclassified cases, and kappa statistics to measure the agreement between true and imputed values. Kappa values > 0.8 indicate excellent agreement, whereas values from 0.61 to 0.8 and 0.41 to 0.60 indicate substantial and moderate agreement, respectively.[37] Finally, we used the full data set with known nativity (test and validation groups) to impute nativity for those with missing birthplace.

To elucidate the effect of multiple imputation on nativity differences in cause-specific survival, we constructed Kaplan-Meier curves comparing survival among US-born and foreign-born Hispanics, with and without imputation of nativity. For the analyses without imputation, cases with missing birthplace data were omitted from the data set (listwise deletion). Survival was defined as the number of months from the date of diagnosis to the date of death or last follow-up (December 31, 2009). Deaths were defined as cervical, prostate, or colorectal cancer-specific mortality; individuals who died of other causes and those alive at the date of last follow-up were censored. We used the log-rank test to assess the statistical significance of the observed differences between the cancer-specific survival curves by nativity.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

Between 1988 and 2009, there were 10,215 cervical, 51,400 prostate, and 32,480 colorectal cancer cases among Hispanics reported in SEER that met our inclusion criteria. Of these, birthplace data were missing for 3191 cervical cancer cases (31.24%), 24,998 prostate cancer cases (48.63%), and 12,575 colorectal cancer cases (38.72%). There were significant differences between those with and without birthplace data (Table 1). Notably, cases with an unknown birthplace were significantly more like to be diagnosed at a localized or regional stage, to be reported by a physician's office, and to have nonspecified Hispanic origin or to be classified as Hispanic based on surname match only. For example, 68.44% of cervical cancer cases with missing birthplace data were of nonspecified Hispanic origin, compared with 17.77% of cases with known birthplace. There were also significant differences in receipt of cancer-directed surgery and radiation therapy between those with and without birthplace information.

Table 1. Demographic, Tumor, and Treatment Characteristics of Cases of Cervical, Prostate, and Colorectal Cancer With and Without Available Birthplace Information
 Cervical CancerProstate CancerColorectal Cancer
 Place of Birth Available (n=7024; 68.76%)Place of Birth Missing (n=3191; 31.24%)PPlace of Birth Available (n=26,402; 51.37%)Place of Birth Missing (n=24,998; 48.63%)PPlace of Birth Available (n=19,905; 61.28%)Place of Birth Missing (n=12,575; 38.72%)P
  1. Abbreviation: SMSA, Standard Metropolitan Statistical Area.

  2. a

    For patients with prostate cancer, stage at diagnosis is classified as localized/regional versus distant.

  3. b

    Sex and anatomical subsite were only used to impute nativity for patients with colorectal cancer.

Age at diagnosis, y <.0001 <.0001 <.0001
≤296.388.90 0.020.02 1.421.18 
30-3924.6729.87 0.040.10 4.554.42 
40-4929.0628.52 2.532.98 10.6611.38 
50-5918.3415.39 17.0618.44 20.3422.35 
≥6021.5517.33 80.3578.47 63.0460.68 
Stage at diagnosis <.0001 <.0001  
Localized46.0760.70 75.18a84.31a 33.4344.63<.0001
Regional39.5129.61  37.9535.17 
Distant9.855.26 6.233.55 24.0216.05 
Missing4.574.42 18.5912.13 4.604.15 
Reporting source <.0001  <.0001  <.0001
Hospital inpatient98.9395.89 95.0679.26 98.6096.35 
Physician's office0.462.38 3.0816.52 0.842.47 
Other0.611.72 1.864.22 0.561.18 
Hispanic origin <.0001  <.0001  <.0001
Specified77.2512.10 63.5423.99 58.199.85 
Unspecified17.7768.44 27.9666.46 31.7569.47 
Surname match only4.9819.46 8.509.55 10.0620.68 
Received surgery57.4668.13<.000145.6639.63<.000184.5887.60<.0001
Received radiation57.4340.08<.000132.8929.35<.000117.4511.91<.0001
Registry site  <.0001  <.0001  <.0001
San Francisco-Oakland SMSA5.277.08 6.427.62 6.8310.00 
Connecticut3.453.45 3.702.63 4.712.95 
Metropolitan Detroit0.581.54 0.961.28 1.151.11 
Hawaii0.270.16 0.710.17 0.790.13 
Iowa0.240.97 0.300.44 0.350.56 
New Mexico4.9710.37 11.7711.80 12.1611.93 
Seattle0.901.50 1.051.15 1.220.97 
Utah0.933.67 1.231.70 1.321.92 
Metropolitan Atlanta0.932.29 0.780.91 1.041.10 
Alaska  0.000.01 
San Jose-Monterey4.985.23 3.867.05 3.737.36 
Los Angeles51.3723.00 39.2826.41 36.1622.32 
Rural Georgia0.030.00  0.010.02 
Greater California (excluding San Francisco, Los Angeles, and San Jose)20.1929.52 21.8728.83 22.0431.85 
Kentucky0.090.47 0.110.18 0.170.33 
New Jersey5.009.24 7.459.00 7.656.59 
Greater Georgia (excluding Atlanta and rural Georgia)0.831.50 0.520.82 0.680.87 
Male sexb      54.2854.60.5696
Anatomical subsiteb        <.0001
Proximal      35.6936.05 
Distal      26.7128.50 
Rectum      34.2132.83 
Other      3.382.62 

Cervical Cancer

Covariates used to impute nativity status for cervical cancer were age at diagnosis, stage at diagnosis, receipt of cancer-directed surgery, receipt of radiation therapy, reporting source, Hispanic origin, and SEER site. The area under the ROC was 0.942, indicating excellent agreement between the model and the data, whereas the R-squared value was 0.7534 (see online supporting information). After imputation, 2816 cases with unknown birthplace (88.25%) were classified as US-born and 375 (11.75%) were classified as foreign-born (Table 2). In the validation group, the correlation between the true and imputed nativity values was 0.82, and 6.83% of cases were misclassified (Table 3). Misclassification of nativity was 12.43% and 4.83%, respectively, among US-born and foreign-born cases. The sensitivity for detecting foreign-born status was 95.17%, and the specificity was 86.81%.

Table 2. Comparison of Nativity Distribution Before and After Imputation for Cervical, Prostate, and Colorectal Cancer
 Before ImputationAfter Imputation
      % Missing
 Foreign-BornUS-BornPlace ofForeign-BornUS-BornAllocated to
 No. (%)No. (%)Birth MissingNo. (%)No. (%)US-Born
Cervical cancerN = 7024N = 3191N = 10,21588.25
5229 (74.44)1795 (25.56) 5604 (54.86)4611 (45.14) 
Prostate cancerN = 26,402N = 24,998N = 51,40086.28
14,884 (56.37)11,518 (43.63) 18,313 (35.63)33,087 (64.37) 
Colorectal cancerN = 19,905N=12,575N = 32,48089.61
10,032 (50.40)9873 (49.60) 11,338 (34.91)21,142 (65.09) 
Table 3. Cross-Validation of Imputation Method for Cervical, Prostate, and Colorectal Cancer
 Imputed Value 
Real ValueForeign-BornUS-BornTotal
Cervical Cancer
Foreign-born986 (70.13%)50 (3.56%)1026 (73.02%)
US-born46 (3.27%)324 (23.04%)370 (26.32%)
Total1032 (73.40%)374 (26.60%)1406 (100.0%)
 
% Misclassified6.83%
Kappa0.8245
Sensitivity95.17%
Specificity86.81%
 
Prostate Cancer
Foreign-born2839 (53.79%)183 (3.47%)3022 (57.22%)
US-born234 (4.47%)2025 (38.35%)2259 (42.78%)
Total3073 (58.19%)2208 (41.81%)5281 (100.0%)
 
% Misclassified7.90%
Kappa0.8382
Sensitivity93.94%
Specificity89.64%
 
Colorectal Cancer
Foreign-born1818 (45.67%)186 (4.67%)2004 (50.34%)
US-born199 (5.00%)1778 (44.66%)1977 (49.66%)
Total2017 (50.67%)1964 (49.33%)4382 (100.0%)
 
% Misclassified9.67%
Kappa0.8066
Sensitivity90.72%
Specificity89.93%

Prostate Cancer

Covariates used to impute nativity status for prostate cancer were age at diagnosis, stage at diagnosis, receipt of cancer-directed surgery, receipt of radiation therapy, reporting source, Hispanic origin, and SEER site. There was excellent agreement between the model and the data (area under the ROC of 0.947) and the R-squared value was 0.7697 (see online supporting information). Cases with unknown birthplace were predominantly classified as US-born (86.28%; Table 2). In the validation group, the correlation between the true and imputed nativity values was 0.84, and 7.90% of cases were misclassified (Table 3). Misclassification of nativity was 10.36% and 6.06%, respectively, among US-born and foreign-born cases. The sensitivity for detecting foreign-born status was 93.94%, and the specificity was 89.64%.

Colorectal Cancer

Covariates used to impute nativity status for colorectal cancer were age at diagnosis, stage at diagnosis, receipt of cancer-directed surgery, receipt of radiation therapy, reporting source, Hispanic origin, SEER site, sex, and anatomical subsite. There was excellent agreement between the model and the data (area under the ROC, 0.939) and the R-squared value was 0.7383 (see online supporting information). After imputation, 89.61% of cases with unknown birthplace were classified as US-born (Table 2). In the validation group, the correlation between the true and imputed nativity values was 0.81, and 9.67% of cases were misclassified (Table 3). Misclassification of nativity was 10.07% and 9.28%, respectively, among US-born and foreign-born cases. The sensitivity for detecting foreign-born status was 90.72%, and the specificity was 89.93%.

Effect of Imputation on Nativity Differences in Cancer-Specific Survival

For cervical cancer (Fig. 1A), the pre-imputation Kaplan-Meier curves (using listwise deletion of cases with missing nativity) indicated that cervical cancer-specific survival was significantly poorer among US-born versus foreign-born cases (log-rank P value < .0001). After imputation, however, the mean length of survival among US-born cases increased while remaining largely unchanged for foreign-born cases. The new Kaplan-Meier curves indicated an opposite association between nativity and survival, with improved (but not statistically significant) cancer-specific survival among US-born cases (log-rank P value of .0771). For prostate cancer (Fig. 1B), the preimputation Kaplan-Meier curves also indicated significantly poorer cancer-specific survival among US-born versus foreign-born cases (log-rank P value < .0001), whereas after imputation, there was a significant survival advantage among US-born cases (log-rank P value < .0001). Finally, for colorectal cancer (Fig. 1C), the apparent survival advantage among foreign-born cases (log-rank P value < .0001) became null after imputation of nativity (log-rank P value of .4182).

image

Figure 1. Comparison of log-rank test and Kaplan-Meier (KM) curves of cancer-specific mortality using (Left panels) listwise deletion versus (Right panels) multiple imputation of missing nativity for patients with (A) cervical, (B) prostate, and (C) colorectal cancer.

Download figure to PowerPoint

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

Multiple imputation by logistic regression performed well for imputing nativity status for cases of cervical, prostate, and colorectal cancer, with a sensitivity of ≥ 90% and a specificity of ≥ 86% for detecting foreign-born status, with slightly higher sensitivity noted among cervical and prostate cancer cases (≥ 93%). Using the subset of cases with known nativity status, the agreement between the true and imputed values was excellent (kappa, ≥ 0.8) and the misclassification error was ≤ 10% for all 3 cancers.

Using California Cancer Registry data, Gomez et al developed an algorithm to impute nativity based on age at receipt of a social security number[5] that is highly sensitive and specific for detecting foreign-born status (sensitivity of 84% and specificity of 80% among Asian breast cancer cases5; sensitivity of 81% and specificity of 80% among Hispanic gastric cancer cases[4]). Although we did not evaluate the same populations, the current study data suggest that multiple imputation by logistic regression may more accurately impute nativity status than the imputation algorithm based on date of receipt of a social security number. Perhaps more importantly, multiple imputation uses variables available in the SEER data file, making it less labor-intensive and more accessible to researchers who do not have administrative rights over the data. In addition, multiple imputation allows for analyses across larger geographic areas spanning multiple cancer registries. Even for analyses limited to individual registries for which the researcher may obtain social security number data, multiple imputation may be more feasible for imputing nativity in ethnic groups and geographic areas in which a large percentage of the foreign-born population is undocumented and thus lacking a social security number. In 2005, 30% of the foreign-born population in the US (primarily Mexicans and other Latin Americans) was undocumented[38] and in new settlement states, such as those in the Southeast, the rapid growth in the foreign-born population is primarily driven by undocumented immigration.[39]

The current study data indicate that the majority of Hispanic cancer cases lacking birthplace information are among US-born individuals, and are more commonly diagnosed at an early stage. This nonrandom distribution of missing data makes common analytic strategies, such as listwise deletion and allocation proportional to the distribution of nativity in the population, extremely prone to bias. Survival analyses are particularly biased given that missing birthplace is significantly more prevalent among cancer survivors.[9, 10] For example, among cervical cancer cases, birthplace data were missing for 36.77% of living cases versus 17.04% of the deceased. For this reason, our Kaplan-Meier survival curves indicate drastically different associations between nativity and survival when nativity is imputed by logistic regression versus when cases with missing nativity data are dropped from the data set. For colorectal cancer, listwise deletion of cases with missing birthplace data resulted in Kaplan-Meier curves suggesting a survival advantage for foreign-born versus US-born Hispanics. However, these survival differences became null after imputing nativity status for those with missing data. For prostate and cervical cancer, the apparent survival advantage of foreign-born men and women was reversed or made null when cases with missing birthplace data were included in the analysis and assigned nativity through multiple imputation.

The current study is subject to a few potential limitations. First, although the sensitivity and specificity of classification are higher than prior methods, these data suggest that imputation by logistic regression misclassifies between 7% to 10% of cases with missing data. The misclassification appears to be differential, affecting US-born cases more frequently than foreign-born cases, and may slightly bias the results of the survival analyses. However, as the results of the current study indicate, the biases introduced by multiple imputation are substantially smaller than those introduced when cases with missing birthplace information are omitted from the database. Second, certain variables used to impute nativity status, specifically Hispanic origin, which is determined based on medical record review and surname, are also subject to misclassification that varies by subgroup.[40] The Hispanic origin variable weighs heavily in the multiple imputation procedure and significantly influences its sensitivity, specificity, and percentage misclassification. Third, the Hispanic immigrant population is heterogeneous, with evidence of disparities in cancer incidence and survival among subgroups.[41, 42] However, further disaggregation by country/region of origin is not possible given that multiple imputation by logistic regression can only be used to impute a binary variable and thus allocate missing birthplace cases to either US-born or foreign-born status.

Multiple imputation by logistic regression can be used to impute missing nativity data for the large number of cases that are missing birthplace information using variables readily available in the SEER data file. Although we do not prescribe a set group of candidate variables to be used for imputation, the proposed procedure allows for customizable variable selection depending on factors that may be clinically relevant to any particular cancer (eg, anatomical subsite for colorectal cancer). We propose this multiple imputation strategy as a tool that will allow researchers to disaggregate analyses by nativity and uncover important nativity disparities in regard to cancer diagnosis, treatment, and survival. As the foreign-born population continues to grow, such disaggregation is imperative to cancer disparities research.

FUNDING SUPPORT

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

This research is partly supported by a grant from the National Institutes of Health (P01CA082710; Principal Investigator: M. Follen).

CONFLICT OF INTEREST DISCLOSURES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

Dr. Montealegre and Ms. Zhou are supported by an Innovation for Cancer Prevention Research Postdoctoral (Dr. Montealegre)/Predoctoral (Ms. Zhou) Fellowship (The University of Texas School of Public Health-Cancer Prevention and Research Institute of Texas grant RP101503).

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information
  • 1
    Grieco ME, Acosta YD, de la Cruz GP, et al. The Foreign-Born Population in the United States: 2010. American Community Survey Reports ACS-19. Washington, DC: US Census Bureau; 2012. census.gov/prod/2012pubs/acs-19.pdf. Accessed December 5, 2012.
  • 2
    Singh GK, Hiatt RA. Trends and disparities in socioeconomic and behavioural characteristics, life expectancy, and cause-specific mortality of native-born and foreign-born populations in the United States, 1979-2003. Int J Epidemiol. 2006;35:903-919.
  • 3
    Seeff LC, McKenna MT. Cervical cancer mortality among foreign-born women living in the United States, 1985 to 1996. Cancer Detect Prev. 2003;27:203-208.
  • 4
    Chang ET, Gomez SL, Fish K, et al. Gastric cancer incidence among Hispanics in California: patterns by time, nativity, and neighborhood characteristics. Cancer Epidemiol Biomarkers Prev. 2012;21:709-719.
  • 5
    Gomez SL, Quach T, Horn-Ross PL, et al. Hidden breast cancer disparities in Asian women: disaggregating incidence rates by ethnicity and migrant status. Am J Public Health. 2010;100(suppl 1):S125-S131.
  • 6
    Nielsen SS, He Y, Ayanian JZ, et al. Quality of cancer care among foreign-born and US-born patients with lung or colorectal cancer. Cancer. 2010;116:5497-5506.
  • 7
    National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) Program. About SEER. seer.cancer/gov/about/. Accessed April 20, 2012.
  • 8
    Clegg LX, Reichman ME, Hankey BF, et al. Quality of race, Hispanic ethnicity, and immigrant status in population-based cancer registry data: implications for health disparity studies. Cancer Causes Control. 2007;18:177-187.
  • 9
    Choe JH, Koepsell TD, Heagerty PJ, Taylor VM. Colorectal cancer among Asians and Pacific Islanders in the U.S.: survival disadvantage for the foreign-born. Cancer Detect Prev. 2005;9:361-368.
  • 10
    Lin SS, O'Malley CD, Lui SW. Factors associated with missing birthplace information in a population-based cancer registry. Ethn Dis. 2001;11:598-605.
  • 11
    Lin SS, Clarke CA, O'Malley CD, Le GM. Studying cancer incidence and outcomes in immigrants: methodological concerns. Am J Public Health. 2002;92:1757-1759.
  • 12
    Gomez SL, Glaser SL, Kelsey JL, Lee MM. Bias in completeness of birthplace data for Asian groups in a population-based cancer registry (United States). Cancer Causes Control. 2004;15:243-253.
  • 13
    Gomez SL, Glaser SL. Quality of cancer registry birthplace data for Hispanics living in the United States. Cancer Causes Control. 2005;16:713-723.
  • 14
    Hedeen AN, White E. Breast cancer size and stage in Hispanic American women, by birthplace: 1992-1995. Am J Public Health. 2001;91:122-125.
  • 15
    Hedeen AN, White E, Taylor V. Ethnicity and birthplace in relation to tumor size and stage in Asian American women with breast cancer. Am J Public Health. 1999;89:1248-1252.
  • 16
    Pineda MD, White E, Kristal AR, Taylor V. Asian breast cancer survival in the US: a comparison between Asian immigrants, US-born Asian Americans and Caucasians. Int J Epidemiol. 2001;30:976-982.
  • 17
    Cook LS, Goldoft M, Schwartz SM, Weiss NS. Incidence of adenocarcinoma of the prostate in Asian immigrants to the United States and their descendants. J Urol. 1999;161:152-155.
  • 18
    Flood DM, Weiss NS, Cook LS, Emerson JC, Schwartz SM, Potter JD. Colorectal cancer incidence in Asian migrants to the United States and their descendants. Cancer Causes Control. 2000;11:403-411.
  • 19
    Kouri EM, He Y, Winer EP, Keating NL. Influence of birthplace on breast cancer diagnosis and treatment for Hispanic women. Breast Cancer Res Treat. 2010;121:743-751.
  • 20
    Herrinton LJ, Goldoft M, Schwartz SM, Weiss NS. The incidence of non-Hodgkin's lymphoma and its histologic subtypes in Asian migrants to the United States and their descendants. Cancer Causes Control. 1996;7:224-230.
  • 21
    Stanford JL, Herrinton LJ, Schwartz SM, Weiss NS. Breast cancer incidence in Asian migrants to the United States and their descendants. Epidemiology. 1995;6:181-183.
  • 22
    Herrinton LJ, Stanford JL, Schwartz SM, Weiss NS. Ovarian cancer incidence among Asian migrants to the United States and their descendants. J Natl Cancer Inst. 1994;86:1336-1339.
  • 23
    Rosenblatt KA, Weiss NS, Schwartz SM. Liver cancer in Asian migrants to the United States and their descendants. Cancer Causes Control. 1996;7:345-350.
  • 24
    Liao CK, Rosenblatt KA, Schwartz SM, Weiss NS. Endometrial cancer in Asian migrants to the United States and their descendants. Cancer Causes Control. 2003;14:357-360.
  • 25
    Rossing MA, Schwartz SM, Weiss NS. Thyroid cancer incidence in Asian migrants to the United States and their descendants. Cancer Causes Control. 1995;6:439-444.
  • 26
    Kamineni A, Williams MA, Schwartz SM, Cook LS, Weiss NS. The incidence of gastric carcinoma in Asian migrants to the United States and their descendants. Cancer Causes Control. 1999;10:77-83.
  • 27
    Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons, Inc; 1987.
  • 28
    Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview and some application. Stat Med. 1991;10:585-598.
  • 29
    Barnard J Meng XL. Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat Methods Med Res. 1999;8:17-36.
  • 30
    Pew Hispanic Center. Statistical Portrait of Hispanics in the United States, 2011. Washington, DC: Pew Hispanic Center; 2013. http://www.pewhispanic.org/files/2013/02/Statistical-Portrait-of-Hispanics-in-the-United-States-2011_FINAL.pdf Accessed October 30, 2013.
  • 31
    Fritz A, Percy C, Jack A, et al, eds. International Classification of Diseases for Oncology. 3rd ed. Geneva, Switzerland: World Health Organization; 2000.
  • 32
    Surveillance, Epidemiology, and End Results (SEER) Program. Research Data (1973-2008). Bethesda, MD: National Cancer Institute, Division of Cancer Control and Population Sciences, Surveillance Research Program, Surveillance Systems Branch; 2012.
  • 33
    Patel DA, Barnholtz-Sloan JS, Patel MK, Malone JM, Chuba PJ, Schwartz K. A population-based study of racial and ethnic differences in survival among women with invasive cervical cancer: analysis of Surveillance, Epidemiology, and End Results data. Gynecol Oncol. 2005;97:550-558.
  • 34
    SAS Institute Inc. Chapter 9: The MI Procedure. In: SAS OnlineDoc, Version 8. Cary, NC: SAS Institute Inc; 2000:129-200.
  • 35
    Yuan YC. Multiple Imputation for Missing Data: Concepts and New Development. Version 9.0. Cary, NC: SAS Institute; 2000. support.sas.com/rnd/app/stat/papers/multipleimputation.pdf. Accessed October 10, 2012.
  • 36
    Howlader N, Noone AM, Krapcho M, et al, eds. SEER Cancer Statistics Review, 1975-2008 (Vintage 2008 Populations). Bethesda, MD: National Cancer Institute; 2011.
  • 37
    Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159-174.
  • 38
    Passel JS. Size and Characteristics of the Unauthorized Migrant Population in the US. Washington, DC: Pew Hispanic Center; 2006. pewhispanic.org/2006/03/07/size-and-characteristics-of-the-unauthorized-migrant-population-in-the-us/. Accessed November 29, 2012.
  • 39
    Passel JS. Estimates of the Size and Characteristics of the Undocumented Population. Washington, DC: Pew Hispanic Center; 2005. pewhispanic.org/files/reports/44.pdf. Accessed December 10, 2012.
  • 40
    Swallen KC, West DW, Stewart SL, Glaser SL, Horn-Ross PL. Predictors of misclassification of Hispanic ethnicity in a population-based cancer registry. Ann Epidemiol. 1997;7:200-206.
  • 41
    Pinheiro PS, Williams M, Miller EA, Easterday S, Moonie S, Trapido EJ. Cancer survival among Latinos and the Hispanic Paradox. Cancer Causes Control. 2011;22:553-561.
  • 42
    Martinez-Tyson D, Pathak EB, Soler-Vila H, Flores AM. Looking under the Hispanic umbrella: cancer mortality among Cubans, Mexicans, Puerto Ricans and other Hispanics in Florida. J Immigr Minor Health. 2009;11:249-257.

Supporting Information

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. FUNDING SUPPORT
  8. CONFLICT OF INTEREST DISCLOSURES
  9. REFERENCES
  10. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
cncr28533-sup-0001-suppinfo.docx26KSupplementary Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.