Community predictors of COVID‐19 cases and deaths in Massachusetts: Evaluating changes over time using geospatially refined data

Abstract Background The COVID‐19 pandemic has highlighted the need for targeted local interventions given substantial heterogeneity within cities and counties. Publicly available case data are typically aggregated to the city or county level to protect patient privacy, but more granular data are necessary to identify and act upon community‐level risk factors that can change over time. Methods Individual COVID‐19 case and mortality data from Massachusetts were geocoded to residential addresses and aggregated into two time periods: “Phase 1” (March–June 2020) and “Phase 2” (September 2020 to February 2021). Institutional cases associated with long‐term care facilities, prisons, or homeless shelters were identified using address data and modeled separately. Census tract sociodemographic and occupational predictors were drawn from the 2015–2019 American Community Survey. We used mixed‐effects negative binomial regression to estimate incidence rate ratios (IRRs), accounting for town‐level spatial autocorrelation. Results Case incidence was elevated in census tracts with higher proportions of Black and Latinx residents, with larger associations in Phase 1 than Phase 2. Case incidence associated with proportion of essential workers was similarly elevated in both Phases. Mortality IRRs had differing patterns from case IRRs, decreasing less substantially between Phases for Black and Latinx populations and increasing between Phases for proportion of essential workers. Mortality models excluding institutional cases yielded stronger associations for age, race/ethnicity, and essential worker status. Conclusions Geocoded home address data can allow for nuanced analyses of community disease patterns, identification of high‐risk subgroups, and exclusion of institutional cases to comprehensively reflect community risk.


| INTRODUCTION
The COVID-19 pandemic has exacerbated existing racial and ethnic health and socioeconomic disparities in the United States. Notably, Black/African American, Latinx, and Indigenous populations have suffered disproportionate morbidity and mortality, [1][2][3][4][5] as well as financial loss from subsequent economic disruption. 6 These disparities are substantially due to systemic racism and its consequences that affect infectious disease transmission and recovery, including unequal access to medical care, 7,8 suboptimal housing characteristics, 9,10 and employment in essential services with minimal physical distancing. 11 While the literature highlights the significant burden on communities of color in the United States as a result of the pandemic, there are few analyses to date that evaluate these disparities at higher resolution than the county level, and even fewer that disentangle cases originating in institutional congregate settings. Using stratified, higher-resolution data, we may be able to identify important community-level conditions that contribute to the clear and persistent disparities induced by the pandemic.
In particular, it has been widely recognized that older Americans, especially individuals living in nursing homes or assisted-living facilities, have faced significantly elevated risk of COVID-19 morbidity and mortality, especially early in the pandemic. [12][13][14] Similarly, high case burdens have been observed in other institutional residential settings, such as prisons and homeless shelters. 15,16 Models using total case and mortality rates without removing or controlling for these institutional settings may obfuscate trends or risk factors in community transmission. Since the racial and ethnic composition of institutional and non-institutional settings may differ, disparities may be better characterized using higher-resolution data. In addition, characteristics and risk factors associated with disease transmission and severity within institutional settings may not coincide with those driving COVID-19 transmission within non-institutional community settings. 17 Efforts to disaggregate institutional and community outcomes would inform a more comprehensive understanding of health disparities in both institutional and non-institutional settings, in turn directing testing efforts and informing mitigation activities.
Analyses using highly resolved geospatial data provide the tools to identify specific, local factors that contribute to disease outcomes and hone targeted efforts to intervene and support communities.
Such data are particularly useful for local public health departments, for whom local data from their community is more valuable than aggregated larger-scale trends. Local case data coupled with community-level sociodemographic data at the census-tract level can provide public health leaders with actionable and highly relevant local information and support pandemic response. [18][19][20][21] State public health departments have served a critical role during the pandemic in collecting and aggregating individual patient information across municipalities within the state and communicating relevant information back to local leaders. Due to patient privacy regulations, 22 data on residential addresses associated with COVID-19 that would support generating census tract resolution case estimates and distinguishing institutional from non-institutional cases are not publicly available. Cross-sectoral partnerships and data sharing agreements between state public health agencies and academic researchers can support analyses that integrate protected public health data with community-level characteristics. This study reflects one such partnership between the Massachusetts Department of Public Health (MDPH) and academic researchers at Boston University School of Public Health to inform state and local interventions to mitigate COVID-19 risk and associated health disparities.
In this study, we geocoded residential home addresses (street, city/town, and zip code) of individual COVID-19 cases confirmed via nucleic acid amplification tests (NAAT) from the first year of the COVID-19 pandemic in Massachusetts (MA), as provided by MDPH. We then analyzed community-level sociodemographic and occupational predictors of outcomes at the census-tract level. The goals of this project were to (1) estimate associations between communitylevel risk factors and COVID-19 cases and deaths by census tract in the state; (2) evaluate the sensitivity of these associations to the exclusion of institutional cases and deaths, given the relevance of institutional settings to larger-scale disease patterns; and (3) assess changes over time in these associations during the initial two periods of high case burdens in the state.

| Identification of institutional outcomes
Using geocoded addresses, we identified institutional cases and deaths among individuals residing in a long-term care facility (LTCF), prison, or homeless shelter in MA. Locations and capacities of licensed nursing homes, rest homes, and assisted-living facilities (collectively aggregated to total number of LTCF beds) were obtained from MassGIS. 24 Likewise, locations of all state, county, and federal correctional facilities in MA were obtained from MassGIS. 25 Prison data excluded temporary processing or treatment facilities without residential inmates. Locations of homeless shelters in MA were provided by the MDPH Office of Population Health upon request.

| Community-level predictors
We obtained total population counts, as well as select social, occupational, housing, and demographic data at census tract resolution from the most recent five-year (2015-2019) American Community Survey (ACS). 26 We evaluated potential predictors hypothesized to be associated with increased risk of disease transmission, disease severity, and/or health disparities, including: population proportions of those who identify as Black or African American, Hispanic or Latino (Latinx), or American Indian or Alaska Native (AIAN); share of population with ages greater than 80 years and share with ages under 20 years; percent of population enrolled as undergraduate students or employed in essential services; the number of LTCF beds per capita; the percent of households with more than 1.5 persons per room; and housing unit density (number of housing units per square mile). We defined "essential services" following the approach of the American Civil Liberties Union (ACLU) Massachusetts. 27 These variables are informed by and consistent with our previous modeling work at the town level, in which we used backwards model selection to select non-correlated covariates (confirming that none of the independent variables had a correlation of jrj > 0.60). 28

| Statistical analysis
We used mixed-effects negative binomial regression models to generate incidence rate ratios (IRRs) and 95% confidence intervals for each predictor in the model. We modeled case and death outcomes separately for each Phase, and we fit case and death models inclusive and exclusive of cases/deaths identified as institutional (yielding eight models in total). A random effect of town (351 towns in MA) was included to address within-town spatial autocorrelation of residuals for nearby tracts. We used counts of cases or deaths at each census tract as the outcome variable, with census tract population used as an offset term to reflect consistent rates. Predictors that affected modeling estimates significantly or that demonstrated changes between the Phases were retained in the models, as were predictors of a priori interest to health disparities or specific COVID-19 risk factors regardless of statistical significance (e.g., housing unit density and proportion of AIAN residents). All statistical analyses were conducted in R (version 4.0.3) 29 using the "glmmTMB" function from the glmmTMB

| Total cases and deaths
Our first models assessed the relationship between the total number of COVID-19 cases and deaths in each census tract in Phase 1 and Phase 2 (four models in total). These models included all cases and deaths, including those in institutional settings.
3.1.1 | Models predicting total case incidence Models of key predictors of COVID-19 cases by census tract in each of the two phases are presented in Figure 1 (left points, in lighter blue

| Models predicting non-institutional case incidence
Variables associated with non-institutional cases are shown in

| Models predicting non-institutional death incidence
Variables associated with non-institutional deaths are shown in

| Comparisons between total and noninstitutional cases and deaths
We observed some similarities and differences in predictors between the models for total cases and deaths and for non-institutional cases and deaths ( Table 2). Most notably, the strongly positive associations between LTCF beds per capita and total cases and deaths were sharply reduced to statistical non-significance in the non-institutional models, per our original hypotheses. All other variables had confidence intervals that overlapped between the total and noninstitutional models, and estimates were similar across all remaining variables in both phases between total and non-institutional cases.
However, a few substantive differences emerged in the IRR point estimates from models with total versus non-institutional deaths, Note: Bold values indicate non-overlapping confidence intervals between total and non-institutional outcomes (rows); carets (^) indicate non-overlapping confidence intervals between cases and deaths within the same type of outcome (columns).
these outcomes at high spatial resolution and distinguish institutional outcomes from those in community models of disease over time.
Removing institutional cases from our models, especially in the context of mortality endpoints where institutional facilities contributed an appreciable percentage of total deaths, allowed for a more nuanced understanding of local risk and disease drivers. Additionally, assessing trends over time across both case and mortality outcomes shed light on differential case fatality by subpopulation over time. Overall, our efforts highlight the value of collaboration between state public health departments and academic researchers to access, analyze, and interpret COVID-19 data to maximize its effective use in public health practice.
We observed key disparities in models of both cases and death The finding that the association between disease incidence and % Black residents was smaller during Phase 2 than Phase 1 may indicate that public health policies and other measures enacted by Fall 2020 among communities with higher proportions of Black residents successfully reduced risk relative to other communities. This could include greater availability of testing compared to availability during the early months of the pandemic. Notably, we did not see as substantial of a reduction for communities with higher proportions of Latinx residents, which points toward the need for a closer look at testing as well as structural barriers to implementing risk-reduction methods (such as ability to self-isolate, work from home, or physically distance) across MA communities. can serve as a guide for understanding differential risk by population subgroups but not to identify specific tracts to target with public health interventions, as would be possible with spatial methods.
Additionally, as mentioned previously, limited availability of testing during Phase 1 resulted in testing and diagnosis of only symptomatic cases early in the pandemic, while testing was widely available in Phase 2, both for symptomatic cases and asymptomatic surveillance.
It is difficult to concretely assess the directionality of these biases, but these trends may indicate that our data from Phase 1 reflects underestimates of true associations. In addition, the variables we included might not be all, or the strongest, predictors of cases or deaths. Another limitation is that patient address information likely contains errors, which cascade into the geocoding process, resulting in misclassification of LTCF residents and tract of residence; however, this is likely non-differential with respect to outcome and, moreover, there is no feasible way to ameliorate this type of error. Our age-related covariates may imprecisely classify risk associated with age. Finally, ACS data were derived pre-pandemic and may not fully reflect conditions during the pandemic, especially with respect to employment and housing.

PEER REVIEW
The peer review history for this article is available at https://publons. com/publon/10.1111/irv.12926.

DATA AVAILABILITY STATEMENT
The community-level covariate data from the American Community Survey and other public sources are available from the corresponding author upon reasonable request. The COVID-19 outcome data were made available to the authors by the Massachusetts Department of Public Health under a data-use agreement and cannot be shared by the authors.