Differential COVID‐19 case positivity in New York City neighborhoods: Socioeconomic factors and mobility

Abstract Background New York City (NYC) has been one of the hotspots of the COVID‐19 pandemic in the United States. By the end of April 2020, close to 165 000 cases and 13 000 deaths were reported in the city with considerable variability across the city's ZIP codes. Objectives In this study, we examine: (a) the extent to which the variability in ZIP code‐level case positivity can be explained by aggregate markers of socioeconomic status (SES) and daily change in mobility; and (b) the extent to which daily change in mobility independently predicts case positivity. Methods COVID‐19 case positivity by ZIP code was modeled using multivariable linear regression with generalized estimating equations to account for within‐ZIP clustering. Daily case positivity was obtained from NYC Department of Health and Mental Hygiene and measures of SES were based on data from the American Community Survey. Changes in human mobility were estimated using anonymized aggregated mobile phone location systems. Results Our analysis indicates that the socioeconomic markers considered together explained 56% of the variability in case positivity through April 1 and their explanatory power decreased to 18% by April 30. Changes in mobility during this time period are not likely to be acting as a mediator of the relationship between ZIP‐level SES and case positivity. During the middle of April, increases in mobility were independently associated with decreased case positivity. Conclusions Together, these findings present evidence that heterogeneity in COVID‐19 case positivity during NYC’s spring outbreak was largely driven by residents’ SES.

across the city's neighborhoods. 3 There is also considerable variability among different ZIP codes within the boroughs. 4 Potentially opposing mechanisms may help understand differentials in the case positivity proportion between wealthier and less wealthy ZIP codes: access to the COVID-19 diagnostic tests themselves and the underlying true (but imperfectly measured) COVID-19 prevalence by ZIP code. First, individuals living in wealthier ZIP codes may have found it easier to circumvent the restrictive initial testing guidelines on eligibility for a COVID-19 diagnostic test, resulting in a lower proportion receiving the test actually being COVID-19 positive.
Conversely, individuals living in less wealthy ZIP codes may have been less able to receive tests unless clinically sick due to a lower proportion having a primary care physician and therefore reliant on emergency care for clinical consultation, suggesting that individuals living in poorer neighborhoods who eventually receive tests are more likely to be COVID-19 positive. Second, evidence strongly suggests that the actual prevalence of COVID-19 is substantially higher among Black individuals and those of lower socioeconomic status (SES). 5,6 There are many potential explanations for these disparities. Individuals living in wealthier neighborhoods may have greater ability to reduce personal exposure by abiding by social distancing guidelines per the New York State on PAUSE directive, 7 may be more able to transition to work-athome, may be more able to limit visits to stores by having essentials delivered, being less reliant on public transportation, and having the ability to shelter outside of NYC.
This analysis utilizes routinely reported public data from the NYC Department of Health and Mental Hygiene (NYC DOHMH), coupled with anonymized cell phone data assessing frequency of visits to businesses and US Census data to examine 1 the extent to which heterogeneity in the COVID-19 daily case positivity proportion by ZIP code can be explained with aggregate markers of SES along with markers of daily change in mobility; and 2 the extent to which daily change in mobility independently predicts COVID-19 daily case positivity proportions.

| ME THODS
In this ecologic study, our study population consists of aggregate data collected among residents in 177 ZIP codes of NYC covering all five boroughs. Our primary outcome of interest is the proportion of COVID-19 tests found positive in each ZIP code. These data were extracted from a versioned public repository updated daily by the NYC DOHMH 8 with the first release dating back to April 1, 2020. The positivity on a given day is calculated as the fraction of new tests during the last 3 days that were found to be positive.
The use of a moving window rather than raw counts on a single day smooths out fluctuations due to reporting constraints, especially around weekends when fewer tests are sought, administered and/or reported than on weekdays.
ZIP code-level characteristics used as explanatory variables in this study were extracted from the US Census and the American Community Survey 2016 (ACS; codes in parentheses) and include: • Proportion of the 18-to 64-year-old population that is uninsured (B27010), • Median household income (in 2016 dollars, B19013) • Proportion of population that self-identified their race as white (B02001) • Proportion of population living in households with more than three inhabitants (B11016) • Proportion of population using public transportation to commute to work that includes bus travel (B08301) • Proportion of population that is elderly (65+ years of age) (B01001) Anonymized location data from cell phone visits to businesses within a ZIP code were obtained via SafeGraph. 9,10 Across NYC, SafeGraph provides the location (to the resolution of US Census block group) of one of approximately 75 000 points of interest (POIs) and the number of daily visits to each POI as tracked by mobile phones. POIs are defined by SafeGraph as "a specific physical location which someone might find interesting" and includes businesses, workplaces, educational institutions, and transit centers. These data were chosen in part to provide a comprehensive assessment of visitation patterns in a given ZIP code and for comparability with mobility estimates released by CDC. 11 For this analysis, we aggregated the total number of visits to all POIs in a ZIP code on a given day and divided by the number of POIs in that ZIP code to estimate total visits per business per day (v d z , visits per in ZIP code z during day d). As a measure of baseline or reference mobility (v z ⋀ ), we calculated median daily visits per during pre-pandemic period, defined as a 6-month period from September 2019 to February 2020. Our exposure of interest, referred to henceforth as mobility, is the proportional change from pre-pandemic mobility for each ZIP code, operationalized as the proportion of baseline mobility experienced in a given day of post-pandemic response, To account for the estimated incubation period of COVID-19, 12 we lagged the mobility variable by 7 days. Our second goal was to assess whether changes in mobility were independently related to neighborhood percent positivity.

| Statistical analyses
Specifically, we hypothesized that one mechanism through which neighborhood SES may influence neighborhood percent positivity is through reduced ability to decrease mobility during the pandemic.
To assess this, we first included a "total effect" model with proportion uninsured as the explanatory variable of interest and adjusting for the other SES variables as sources of confounding. Next, we added change in mobility to this model and compared (a) the change in the magnitude of the regression parameter estimate for uninsured proportion between the two models; and (b) assessed the magnitude and precision of the regression parameter estimate for change in mobility, adjusting for the SES variables. All models used linear regression with generalized estimating equations to account for within-ZIP correlation.
We report the results of the analyses at four time points during the study period, April 1, 10, 20, and 30, using data available through the given day. For April 1 time point, as daily data are not publicly available before this date, the outcome was calculated from cumulative cases and tests for the month of March.  9.7% (6.5-15) of those who relied on public transportation to commute to work used buses for part of their commute, and 11.8% (9.8%-14.4%) of the population was found to be elderly. Figure 1 shows distribution of these neighborhood SES characteristics by ZIP code.

| D ISCUSS I ON
Our findings indicate that the heterogeneous distribution in neighborhood level case positivity during the first 2 months (of the presumably first wave) of the COVID-19 pandemic in NYC largely followed underlying SES markers. We also found that spatial differences in change in mobility were independently related to case positivity during the middle of April-with a smaller reduction in mobility independently associated with reduced positivity rates-but not at the beginning and the end of the month. A likely explanation for this is increased COVID testing during April. At the beginning of April, routine testing for COVID was not widely available in NYC and likely differed by ZIP code in its availability. At the same time, change in mobility was the most dramatic during the early parts of April.
However, as testing became more widely performed, case positivity decreased overall, leading to an overall inverse association between change in mobility and reduction on COVID positivity. Alternatively, this could be due to differences in how quickly ZIP codes adapted to Similarly, the importance of household density in these models underscores what is known from studies on transmissibility of other respiratory infections, including influenza, that most infections occur among members of a household (or in workplaces) due to prolonged exposure to infected individuals. Dense housing also limits the ability of those known to be infected to self-isolate; providing infected individuals the option to isolate at a location different from home could prove beneficial in these cases.

F I G U R E 1 Maps of six explanatory variables used in this study as measured of SES characteristics of ZIP codes
Our analysis showed that mobility reduced quite rapidly across all ZIP codes, beginning before mandated restrictions from NYS on PAUSE. This strongly suggests that many city residents dramatically curtailed their activities. The rate and magnitude of reduction varied by neighborhood, but higher reductions in mobility were actually in-  Our study has a number of limitations. First, this is an ecologic analysis and thus inference is limited to the ZIP code-level, not the individual level. ZIP code is not a perfect measure of neighborhood and can mask some important heterogeneity in both exposure and outcome measures in this study. Second, we are unable to adjust for differential changes in population density by ZIP code as individuals with the means to leave NYC during this time period were likely to be disproportionately those of higher SES. 16 As such, our measure of mobility cannot distinguish between ZIP codes having fewer visitors who individually have not reduced their visitation frequency, from a ZIP code with a constant volume of visitors who on average reduced their visit frequency.
Third, limitations in the availability of COVID-19 positivity by ZIP code at the beginning of the pandemic limit our ability to fully understand the relationship between ZIP code-level mobility, SES, and positivity. We observed that the majority of the reduction in mobility occurred before public positivity data were available. Availability of ZIP-level COVID-19 positivity in March 2020 would greatly strengthen our understanding.
Finally, COVID-19 positivity is an imprecise outcome measure as it is heavily influenced both by the overall COVID-19 prevalence in a given ZIP code and access to diagnostic tests. Furthermore, the daily case and test counts were calculated as the difference between two successive cumulative case and test counts, and hence, it is assumed that the increments in cumulative counts are new case/tests and not revisions to counts from previous days.
Our study also has several important strengths. First, it uses a novel approach to measuring mobility that does not focus on distance travelled or average distance from presumed home location of a cell phone, but instead focuses on physical check-ins at POIs within an individual's presumed neighborhood. As such, this metric is conceptually different measure of "mobility" than other tracking measures, and one now being used by the CDC. 11 Second, our simple model effectively explains a high amount of the total variability in COVID-19 positivity: on April 1 a model including only three variables (proportion living in a household with 4 or more individuals, proportion of 18-to 64-year-olds without health insurance, proportion identifying as white) explained 56% of the total variability in COVID-19 positivity. While the explanatory power of the model decreased over the time span of the first wave of this pandemic, it was still able to explain substantial variability. Third, our model was robust to changes in the assumed lag time between exposure and outcome: models changing the lag from 7 days to 5 and 10 days yielded no substantial differences. Finally, the finding that change in mobility was independently associated with case positivity during the middle section of the epidemic response, but not at the beginning or the end of the time period of interest, suggests that more structural issues dominate the heterogeneous impact of COVID-19 in NYC. Further analyses as NYC begins reopening will be useful to assess the degree to which returns to pre-COVID mobility impact the distribution of future waves of this pandemic.

| CON CLUS IONS
Dramatic reductions on mobility following ramp-up of governmental interventions strongly suggests that the NYS on PAUSE worked as intended, although more recent data indicative of increases in mobility suggest that there may be a point at which individuals begin relaxing their adherence to these interventions. Evidence from the first wave at the epicenter of the US COVID-19 pandemic strongly suggests that COVID-19 incidence is unequally distributed, largely by SES. Intervention efforts should target communities most in need.

ACK N OWLED G EM ENTS
This works was supported by funding from the National Institutes of Health (GM110748 to JLS and SK) and the National Science Foundation (DMS-2027369 to JLS and SK), as well as a gift from the Morris-Singer Foundation (JLS and SK). The funders had no role in the design, data collection and analysis, decision to publish, or preparation of the manuscript.
We thank SafeGraph for sharing mobility information. SafeGraph is a data company that aggregates anonymized location data from numerous applications in order to provide insights about physical places. To enhance privacy, it excludes census block group information if fewer than five devices visited an establishment in a month from a given census block group.