Gastric adenocarcinoma burden and late‐stage diagnosis in Latino and non‐Latino populations in the United States and Texas, during 2004–2016: A multilevel analysis

Abstract Background Gastric cancer disproportionately affects Latinos, but little is known about regional effects and risk factors. We compared primary incidence, late‐stage diagnosis, and risk factors for gastric adenocarcinoma (GCA) from 2004 to 2016 in Latinos and non‐Latinos in the United States, Texas (TX), and South Texas (STX). Methods We collected case data from Surveillance, Epidemiology, and End Results (SEER) and the Texas Cancer Registry. We generated average annual age‐adjusted incidence rates, rate ratios (RRs), and 95% confidence intervals (CIs) using SEER*Stat software and analyzed the cases by anatomic site, demographics, and county‐level risk factors using SAS 9.4. We constructed multilevel logistic regression models for late‐stage GCA, adjusting for patient‐ and county‐level characteristics. Results Latinos had higher overall GCA incidence rates in all regions, with the greatest disparities in overlap GCA in STX males (RR 4.39; 95% CI: 2.85, 6.93). There were no differences in cardia GCA rates for non‐Hispanic Whites (NHWs) and Latino women in all regions. Younger patients, patients with overlapping or not otherwise specified (NOS) lesions, and patients diagnosed during 2012–2016 had higher odds of late‐stage GCA. The stratification by location showed no differences in late‐stage disease between NHWs and Latinos. The stratification by anatomic site showed Latinos with cardia GCA were more likely to have late‐stage GCA than NHWs (OR: 1.13, p = 0.008). At the county level, higher odds of late‐stage GCA were associated with medium and high social deprivation levels in TX without STX (OR: 1.25 and 1.20, p = 0.007 and 0.028, respectively), and medium social deprivation index (SDI) in patients with NOS GCA (OR: 1.21, p = 0.01). Conclusions STX Latinos experience greater GCA disparities than those in TX and the United States. Younger age and social deprivation increase the risk for late‐stage GCA, while Latinos and women are at higher risk specifically for late‐stage cardia GCA. There is a need for population‐specific, culturally responsive intervention and prevention measures, and additional research to elucidate contributing risk factors.


| INTRODUCTION
Cancer is the leading cause of death among Latinos, surpassing heart disease, the primary cause of mortality among non-Hispanic Whites (NHWs). 1,2 Latinos comprise 18.5% of the US population 3 and are projected to reach 28.6% by 2060. 4 Latinos comprise 39.7% of all Texans, and Texas Latinos account for about 17% of the entire US Latino population. South Texas (STX), a 38-county region encompassing San Antonio south to the Lower Rio Grande Valley along the Texas-Mexico border, is nearly 70% Latino. 5 Compared to NHWs, Latinos have higher risk of developing multiple cancers, including gastric cancer (GC). 6 GC is the third leading cause of cancer death worldwide 7 and is the sixth-and eighth leading cause for Texas men and women, respectively, in 2012-2016. 1 Mortality rates from 2012 to 2016 were twice as high for Latino than NHW men and 2.4 times higher for Latino than NHW women. Gastric adenocarcinomas (GCAs) account for approximately 90% of all GCs and are the focus of this study. GC incidence has been increasing in persons aged <50 years and Latino women. 7,8 In 2018, the probability of developing invasive GC at any age was twice as high in Latino versus NHW men (1.6 vs. 0.8), and three times higher in Latino versus NHW women (1.2 vs. 0.4). 1 Moreover, Latinos are diagnosed at younger ages and more advanced stages compared to other ethnic groups. 1,9,10 Latinos are disproportionately vulnerable to cancer due to increased poverty, decreased education, low or no insurance, and high prevalence of risk factors like obesity 1 and heavy alcohol consumption. 6 About 22% of STX residents on average had an income below 150% of the federal poverty line in 2018 (range 7.3%-35%), compared with the US national average of 12%. 11 Our previously published review of health conditions in STX versus Texas and the US examined data on health insurance, obesity, alcohol use, and smoking. 12 For the period 2007-2010, an estimated 30% of STX residents were uninsured compared to 23% in the rest of Texas and 15% nationwide. Latinos had the highest uninsured rate (40%). STX also had the highest percentage of obese adults (32.7% vs. 29.1% in the rest of Texas and 27% in the United States). STX Latinos were more obese (37.9%) than NHWs (24.6%). STX diabetes rates were likewise higher than the rest of Texas and national estimates (11.6% vs. 9.3% and 8.9%, respectively), with Latinos again accounting for larger proportions of diabetics. Although alcohol use and smoking were similar in all three regions, binge drinking rates were higher in STX (17.4%) than the rest of Texas (14.5%) and the nation (15.1%).
The social deprivation index (SDI) is a composite measure of county-level deprivation based on seven American Community Survey (ACS) measures for income, education, employment, housing, household characteristics, and transportation. 13 The SDI has been previously used to quantify the socioeconomic variation in health outcomes, including diabetes prevalence, odds of received colon and cervical cancer screening, and surgical treatment of head and neck cancers. 14,15 Previous studies have reported mixed results on ethnic disparities in GCA trends. Two studies using Surveillance, Epidemiology, and End Results (SEER) and National Program of Cancer Registries data from 1992 to 1998 and 1976 to 2007 found that Caucasians had a 1.7 times higher GCA incidence rate and increased gastric body GCs, respectively, compared to White Latinos. 16,17 In contrast, studies in Southern California and Texas (using Texas Cancer Registry data) found increased odds of GC in racial/ethnic minorities and a higher GCA incidence in Latinos, respectively (with a 29% excess risk in El Paso County), compared to the NHW population. 18,19 Short time frame, lack of GCA anatomic site data, and failure to compare Texas data with national trends were limitations of these studies. Finally, Balakrishnan and colleagues' retrospective cohort study of 299 non-cardia GC cases between 2005 and 2015 in the Harris County, Texas public medical system found increasing incidence among Hispanics/Latinos but not NHWs or Blacks. 20 The current study aimed to compare the rates and latestage diagnosis for all primary GCA in Latinos and NHWs from 2004 to 2016 in the United States, Texas, and STX using multilevel modeling. The goal was to elucidate existing differences and the influence of specific risk factors in order to facilitate novel interventions aimed at reducing gastric adenocarcinoma incidence in specific high-risk, diverse populations.

| Ethical statement
The study was exempted from review by the Institutional Review Board at UT Health Science Center at San Antonio,

| Gastric cancer data
We obtained de-identified data via Limited-Use Data Agreements from the US SEER Program Registries 21 and the Texas Cancer Registry (TCR) at the Texas Department of State Health Services. 22 SEER is a population-based cancer registry system that has collected individual-level data provided by participating registries in select states since 1971. This study used the SEER 21-registry grouping excluding Alaska due to lack of county-level data.
While not included in SEER, the TCR is an identically organized population-based registry of all 254 Texas counties following all SEER standards and coding criteria. The registry has earned the North American Association of Central Cancer Registries (NAACCR) Gold Certification for data quality and completeness. 23

| Inclusion and exclusion
We coded primary GCAs according to the International Classification of Diseases for Oncology (ICD-O) GCA morphology codes (81403, 84903, 81453, 81443, 82113, 82102, 84803, 80103, and 85603). 24 We subdivided all cases by anatomic location according to the ICD 10th Edition Clinical Modification into cardia/gastroesophageal junction (C16.0), non-cardia (C16.1-C16.6), overlapping lesion (C16.8), and not otherwise specified (NOS, C16.9). 25 The coding followed the example of prior studies. 20 We limited our analysis to patients aged 20+ diagnosed in 2004-2016, for whom GCA was their first cancer. We further excluded cases without valid county codes, cases reported based on an autopsy/death certificate, those with unknown stage, and those who had missing county-level data. The resulting datasets included 142,068 GCA cases for rate analyses and 75,761 GCA cases for late-stage diagnosis.

| County-level data
We used county-level data from the 2018 County Health Rankings (CHR) for health behaviors and risk factors. 26 CHR combines several data sources, including the American Community Survey (ACS), the Behavioral Risk Factor Surveillance System, and the USDA Food Environment Atlas. We added the 2015 SDI to estimate the county-level deprivation and act as a proxy for SES.

| Rates
We obtained population denominators used for all rate calculations from SEER. 21 We defined ethnicity using the NAACCR Hispanic/Latino Identification Algorithm, version 2.2.1. 27 We selected GCA incident cases from 2004 to 2016 from the 21 SEER registries (cumulative population at risk = 194 million Latino person-years, 655 million NHW person-years); and TCR for all of Texas (76 million Latino, 116 million NHW person-years) and the 38-county STX region (25 million Latino, 12 million NHW personyears). We generated average annual age-specific, ageadjusted GCA incidence rates, rate ratios (RRs), and their corresponding 95% confidence intervals (CIs) for NHWs, Non-Hispanic Blacks, Latinos, and Non-Hispanic Others in the SEER and TCR datasets using SEER*Stat software v8.3.8 (2020, National Cancer Institute, Bethesda, MD). We used standard age groups for rates estimation.

| Logistic regression analysis outcome
We defined the late-stage GCA diagnosis as known metastatic disease (i.e., distant site(s)/node(s) involved) using the combined Summary Stage 2000 variable in the SEER and TCR data versus non-late-stage (including localized and regional but not unknown).

| Patient characteristics
We included age at diagnosis (20-39, 40- Categories for year of diagnosis were chosen to achieve as equal a distribution of patients over time as possible. We also included reporting source as previous research has shown this may impact the unknown stage. 28

| County-level predictors
From the CHR, we included percent current smokers, percent obese, percent reporting excessive alcohol consumption (i.e., binge/heavy drinking), and the food environment index, an indicator of environmental access to healthy food options. All CHR measures were z-scored prior to analysis. We included the SDI (21-79, 80-100 [most deprived], vs. 0-20 [least deprived]) as a measure of socioeconomic deprivation.

| Analytical approach
We report the descriptive univariate and bivariate statistics for late-stage GCA diagnosis, patient characteristics, and county-level predictors. To examine the effect of patient-and county-level predictors on the diagnosis of latestage GCA, we constructed multilevel logistic regression models adjusting for county-level clustering. We used a nested approach, first only including location (i.e., SEER, TX, STX, Model 1; Table 1) and reporting source, then adding patient-level predictors (Model 2), followed by county-level predictors (Model 3). We stratified the final model by location (SEER, TX, STX; Table 2) and anatomic site (cardia, non-cardia, overlapping, and NOS; Table 3).
Because the combined Summary Stage 2000 variable in the SEER and TCR used to define our late-stage outcome contained cases with unknown stage (10.19% of sample excluded from the analysis), we conducted sensitivity analyses including patients with unknown stage at diagnosis as part of the reference group (i.e., non-late-stage disease). Since SDI data are based on ACS data collected during 2011-2015, we also conducted a sensitivity analysis using cases limited to that year range. We conducted all descriptive statistics, multilevel modeling, and sensitivity analyses for late-stage GCA diagnosis using SAS 9.4. We considered all results significant if α ≤ 0.05.

| Frequencies and incidence
For incidence analyses, we had over 117,400 cases of primary invasive GCA diagnosed from 2004 to 2016 in the United States; 20,418 GCA cases registered in TX; and 4192 in STX. Overall GCA incidence rates were significantly higher in Latinos than NHWs in all three regions (Table S1). GCA was more common in males than females in all regions and both race/ethnicity groups. Latinos had higher rates of non-cardia, overlap, and NOS GCA rates in all three regions, with the greatest ethnic disparity seen in STX Latino versus NHW men for overlap GCA (RR 4.39; 95% CI: 2.85, 6.93). NHWs had significantly higher rates of cardia GCA than Latinos in all three regions except for women in all regions, where there was no ethnic difference.

| Descriptive statistics of late-stage gastric cancer diagnosis
For the more granular analyses of late-stage disease, our sample consisted of over 75,700 cases of primary invasive GCA diagnosed from 2004 to 2016 in the United States; 8663 GCA cases registered in TX, not counting STX; and 2340 in STX. (Table 1). Approximately 85% of patients came from SEER and 15% from TCR. Approximately 3% of patients lived in STX at the time of diagnosis. The sample was majority male (64%), age 65 or older (55.6%), and NHW (51.6%). About 40% of patients were diagnosed between 2012 and 2016. Most cases were reported by inpatient/outpatient hospitals or clinics (94.8%; data not shown). Approximately 44% of patients were diagnosed with late-stage GCA.

| Multilevel logistic regression
In the Model 1 adjusted for reporting source, patients from TX had higher, and STX lower odds of having late-stage GCA compared to SEER patients, but these were not significant (OR: 1.04 and 0.90, p = 0.18 and 0.07) ( Table 1). Adjusting for patient characteristics reduced the odds of late-stage GCA for TX and STX patients, with STX being significantly lower (OR: 0.81, p = 0.001, Model 2) compared to SEER. Furthermore, younger patients, those with overlapping and NOS lesions and those diagnosed during 2012-2016 had significantly higher odds of late-stage diagnosis than patients aged 65+, those with cardia GCA, and those diagnosed during 2004-2007, respectively. NH Others had lower odds of late-stage disease than NHW (OR: 0.68, p < 0.001). Adjusting for county-level variables, patients from TX and STX had significantly lower odds of being diagnosed with late-stage GCA compared to SEER patients (OR: 0.91 and 0.81, p = 0.02 and 0.001, Model 3). Patients living in counties with higher proportions of smokers at the time of diagnosis also had lower odds of late-stage diagnosis (OR: 0.94, p = 0.0002); meaning that for every standard deviation increase in county-level smoking prevalence, the odds of being diagnosed with latestage disease decreased by 6%. County-level SDI did not significantly impact patients' odds of late-stage GCA.
We saw some different effects in models stratified by location (Table 2). Latinos were slightly more likely to have late-stage disease compared to NHWs only in TX without STX (OR: 1.09, p = 0.13). Overlap and NOS lesions had higher odds compared to cardia in all locations, except overlap in STX (OR: 1.21, p = 0.29). In TX without STX, living in a county with medium and high SDI were associated with increased odds of late-stage GCA compared to counties with the lowest SDI (i.e., least deprived counties) (OR: 1.25 and 1.20, p = 0.007 and 0.028). In STX, patients from counties with higher smoking had greater odds of  In models stratified by anatomic site, Latinos and NH blacks with cardia GCA were more likely to have late-stage GCA than NHWs (OR: 1.13 and 1.29, p = 0.008 and <0.001) ( Table 3). Patients living in counties with higher smoking prevalence had lower odds of late-stage cardia (OR: 0.94, p = 0.019), overlapping (OR: 0.89, p = 0.03), and NOS (OR: 0.86, p = 0.001) lesions. In STX, patients with NOS GCA and who lived in medium SDI counties were associated with higher odds of being diagnosed with late-stage GCA (OR: 1.21, p = 0.013). Sensitivity analyses including patients with unknown stage as part of the reference group (10.19% of the sample) showed similar results as for the main analysis (Tables S2-S4). However, there were additional findings: males had significantly higher odds of having late-stage disease in the full model (Table S2), in the stratified analysis for SEER (Table S3), and the stratified analyses for patients with cardia and non-cardia GCA (Table S4).

| DISCUSSION
Our study provides an overview of GCA rates and latestage diagnosis in the United States, Texas, and South Texas (STX) from 2004 to 2016 among Latinos and non-Latinos. Texas's exclusion from SEER data has limited previous comparative assessments of GCA incidence T A B L E 3 Logistic regression models for late-stage GCA diagnosis by anatomic site, adults 2004 rates. Although Camargo and colleagues' study reported robust data including Texas, only 7.9% of SEER-9 participants were Latinos. 16 Previous studies were limited by region (El Paso County vs. Texas) 19 or population examined (predominantly male veterans). 29 The present study used both SEER-21 and Texas Cancer Registry data, ensuring adequate representation of the standard United States, TX, and STX populations. Consistent with prior studies, 7,17,20 we found that overall GCA incidence rates in Texas and STX were higher in Latinos than in NHWs, despite lower frequencies in the state and STX region compared to the United States. Cardia GCA was more common in NHWs, contrasting with noncardia, overlap, and not otherwise specified (NOS) in Latinos. This was also consistent with prior studies. 19,30,31 An exception we found was in cardia GCA among women, which had similar rates among NHWs and Latinos. Although GC overall is a predominantly male disease, our findings indicate that additional measures are needed to target women who may be at risk for cardia GCA. Further research should identify such measures, as well as specific populations of women at higher risk.
Non-cardia, overlap, and NOS GCA were disproportionately represented among Latinos in all regions compared to non-Latinos. The high H. pylori infection rate in Latinos 32,33 coupled with H. pylori being the most wellknown risk factor for more distal stomach cancers, 34 may account for this disparity.
In contrast to overall GCA, late-stage diagnosis odds were significantly higher in the United States than both Texas (not including STX) and STX. Texas providers may screen earlier due to the higher overall GCA prevalence in Texas and STX. This possibility can be explored in future studies. Furthermore, younger patients, those with overlapping and NOS lesions and those diagnosed during 2012-2016 had significantly higher odds of late-stage diagnosis than patients aged 65+, those with cardia GCA, and those diagnosed during 2004-2007, respectively. The increase in younger patients with GCA partially reflects previous studies, though these did not focus on late-stage disease. [8][9][10] To the best of our knowledge, we are the first to report this finding. This along with the increased late-stage diagnoses of overlap and NOS lesions is a disturbing indication of the lack of standardized screening practices for GC, unlike those for colorectal, breast, and ovarian cancers. The guidelines for surveillance of precursor conditions like intestinal metaplasia are based on predominantly low-quality evidence, 35 and only symptomatic patients are recommended to be tested for H. pylori in lower prevalence countries like the United States. 34 Our results support the need for development of effective methodologies of early GC screening in the United States. 36 None of the county-level risk factors contributed to higher odds of late-stage disease in the main analysis.
Smoking was associated with decreased odds of late-stage disease in the main and anatomic site analyses. While counterintuitive, this is consistent with previous studies, 1,6 including a recent population-based cohort study of SEER data from 2007 to 2015 which found that Latinos with non-cardia GC were less likely to reside in counties with high smoking prevalence. 9 As stated by Gnaldi and colleagues, in the case of disagreement between results from studies conducted at different levels, additional research is needed before it can be concluded that estimates including ecological variables are inappropriate. 37 It is also possible that smokers are screened earlier and thus their disease is caught earlier. However, confirming this is beyond the scope of this paper. In contrast, in the model stratified by location, STX smokers had a significantly increased risk of late-stage GCA. This is not explained by the current smoking prevalence rates in STX and TX counties, which are significantly lower than the nation (15.7% vs. 17.7%, p < 0.001; data not shown).
We explored socioeconomic status's connection to latestage GCA using the SDI, a more complex measure than poverty alone. For the full model, SDI was not significantly associated with late-stage disease, but in the stratified model by location, only in TX without STX patients living in counties with medium and high SDI were associated with increased odds of late-stage GCA compared to those with the lowest SDI. This was surprising, given the high poverty rate in STX (~22% below 150% of the federal poverty line in 2018 vs. 12% in the United States). In contrast, in the model stratified by anatomic site, living in a county with medium SDI in STX only was associated with increased odds of NOS late-stage GCA specifically. Additional studies, preferably using individual-level data, are required to adequately characterize the interplay of SDI with H. pylori infection, which has been linked to lower socioeconomic class, 38 in TX and STX.
The lack of significant county-level behavioral results for late-stage disease, except for smoking, in STX is surprising, considering the high proportion of Latino residents and high prevalence of relevant risk factors. This may be due to the smaller sample size. Underdiagnosis is also a possibility, which must be further pursued.
The stratification of our data by anatomic site (e.g., cardia, non-cardia) is consistent with findings from The Cancer Genome Atlas molecular classification of GCAs. 39 The inclusion of overlapping and NOS GCA lesions as separate categories is justified, given the significant disparities seen across all regions, and to the best of our knowledge is the first analysis of these anatomic sites.
One strength of this study lies in extracting data from both SEER and TCR, enabling us to obtain precise rates and risk estimations in all geographic regions analyzed with the same methodology, making relevant comparisons between populations possible. Another strength is our multilevel approach using county-level GCA risk factors, including smoking, obesity, excessive alcohol consumption, food environment, and social deprivation. This allowed for a richer comparison between groups. However, one limitation is that both registries consider Latinos as a single group, which prevented us from identifying differences by ethnic subgroups. However, 88% of Texas Latinos are of Mexican origin, so data from Texas and STX may well represent this specific Latino subgroup. 11 Additionally, SEER and NAACCR do not have identical completeness requirements and thus have differences in organization. This may account for histology differences observed. County-level indicators (SDI, CHR variables) may not accurately describe individual-level living conditions and risk factors and warrant careful interpretation as they can lead to the ecological fallacy. While previous research has shown that smaller area estimates are better proxies for individual-level characteristics, 40,41 using county-level indicators are a first step in identifying additional factors not typically included in registry data. Further examination of our findings based on county-level variables is warranted, ideally with individual-level data. Additionally, county-level variables were measured at one time point only, which may not overlap with patients' year of diagnosis. We conducted sensitivity analyses to address the limitation of county-level variables to the years 2011-2015, the range for which the SDI is available. Our findings regarding smoking prevalence for patients in STX was confirmed with the significantly reduced sample. Our sample contained approximately 11% of patients with unknown stage, who we excluded from our main analysis. As part of our sensitivity analysis, we included those with unknown stage as part of the reference group in our latestage outcome variable. TCR had higher rates of unknown stage than SEER (14.1% vs. 9.5%), and in a previous study the type of the reporting source predicted the unknown stage. 28 We included reporting source in all analyses to account for reporting differences. Our sensitivity analyses including unknown stage in the analytic sample produced similar results.
As the US Latino population continues to grow, the higher GCA incidence in Latinos, particularly of latestage disease, which is less responsive to therapy, is an increasingly important public health concern. Future studies should include genetic factors, which was beyond the scope of this paper, and interactions between alcohol consumption and social deprivation as contributors to late-stage GCA.