Data‐Driven Placement of PM2.5 Air Quality Sensors in the United States: An Approach to Target Urban Environmental Injustice

Abstract In the United States, citizens and policymakers heavily rely upon Environmental Protection Agency mandated regulatory networks to monitor air pollution; increasingly they also depend on low‐cost sensor networks to supplement spatial gaps in regulatory monitor networks coverage. Although these regulatory and low‐cost networks in tandem provide enhanced spatiotemporal coverage in urban areas, low‐cost sensors are located often in higher income, predominantly White areas. Such disparity in coverage may exacerbate existing inequalities and impact the ability of different communities to respond to the threat of air pollution. Here we present a study using cost‐constrained multiresolution dynamic mode decomposition (mrDMDcc) to identify the optimal and equitable placement of fine particulate matter (PM2.5) sensors in four U.S. cities with histories of racial or income segregation: St. Louis, Houston, Boston, and Buffalo. This novel approach incorporates the variation of PM2.5 on timescales ranging from 1 day to over a decade to capture air pollution variability. We also introduce a cost function into the sensor placement optimization that represents the balance between our objectives of capturing PM2.5 extremes and increasing pollution monitoring in low‐income and nonwhite areas. We find that the mrDMDcc algorithm places a greater number of sensors in historically low‐income and nonwhite neighborhoods with known environmental pollution problems compared to networks using PM2.5 information alone. Our work provides a roadmap for the creation of equitable sensor networks in U.S. cities and offers a guide for democratizing air pollution data through increasing spatial coverage of low‐cost sensors in less privileged communities.


Figure S3
. Effect of higher temporal resolution modes on sensor placement distributions in St. Louis.The specified decomposition level and time window correspond to the modal information depicted in Figure S5.When fewer decomposition levels are incorporated into the mrDMD optimization, fewer sensors are required to capture pollution episodes, as there are generally fewer long-lived PM2.5 modes.Conversely, including more decomposition levels introduces additional pollution episodes, necessitating a greater number of sensors to capture the complete spatiotemporal dynamics of PM2.5.Incorporating weekly timescales (Lev = 10) would inundate the proposed sensor network.Instead, we select the 250 most variable sensors for our analysis.This choice enables the inclusion of weekly temporal dynamics while still maintaining a reasonable number of sensors for a major urban area in the United States.

Figure S1 .
Figure S1.Maps of normalized annual median income for St. Louis, Houston, Buffalo, and Boston metropolitan areas.The median annual household income is obtained from the 2020 American Community Survey interpolated onto the centroids of the Di et al (2021).PM2.5 dataset.The income landscape is normalized to the maximum median income across grid cells of an urban area ($250,000 for St. Louis, Houston, and Boston; $150,000 for Buffalo) then inverted such that 1.0 represents the lowest normalized median income and 0.0 represents the highest.

Figure S2 .
Figure S2.Effect of γ on sensor placement in St. Louis.The number of sensors is fixed at 250 for each value of γ.The γ value represents the forcing term that prioritizes the cost function (here being nonwhite grid cells) more than information from the PM2.5 air pollution modes.All sensor locations are gridded onto the same 1 km x 1 km Di et al. (2021) grid.

Figure S4 .
Figure S4.Varying the number of sensors in St. Louis.Distribution of sensor locations identified as optimal by the mrDMD algorithm.All sensor locations are gridded onto the same 1 km x 1 km Di et al. (2021) grid.Each panel shows the top number of sensors for a mrDMD decomposition using weekly time windows.Generally, under 100 sensors fails to capture many socioeconomic communities while more than 300 sensors start to observe overlapping spatial clustering which may be redundant.

Figure S5 .
Figure S5.Maps of St. Louis mrDMD PM2.5 modes.The mrDMD time window is for September 2005 to December 2016.The left axis expresses the decomposition level and its related time frequency, such that the bottom row corresponds to the average background mode of PM2.5 over 4096 d (∼136.5 months), with each successively higher row corresponding to pollution episodes lasting half the number of days of the row below.Colored boxes indicate a mrDMD mode that exhibits significant variability above the background mode; otherwise, the boxes are left blank.At each level of decomposition, the separation of the pollution episode from the background mode is deemed significant if its eigenvalues exceed a tolerance of 1.Here, we set this tolerance to a standard value of 1  10 −2 .Darker shades of the colored boxes represent a relatively larger modal signal then the background mode.Modal maps of the background averages and examples of significant pollution episodes are shown in the margins.Arrows point to the time periods of the corresponding mrDMD modal maps.All mrDMD modes are on the same amplitude scale shown to the right.

Figure S6 .
Figure S6.Maps of Houston mrDMD PM2.5 modes.The mrDMD time window is for September 2005 to December 2016.The left axis expresses the decomposition level and its related time frequency, such that the bottom row corresponds to the average background mode of PM2.5 over 4096 d (∼136.5 months), with each successively higher row corresponding to pollution episodes lasting half the number of days of the row below.Colored boxes indicate a mrDMD mode that exhibits significant variability above the background mode; otherwise, the boxes are left blank.At each level of decomposition, the separation of the pollution episode from the background mode is deemed significant if its eigenvalues exceed a tolerance of 1.Here, we set this tolerance to a standard value of 1  10 −2 .Darker shades of the colored boxes represent a relatively larger modal signal then the background mode.Modal maps of the background averages and examples of significant pollution episodes are shown in the margins.Arrows point to the time periods of the corresponding mrDMD modal maps.All mrDMD modes are on the same amplitude scale shown to the right.

Figure S7 .
Figure S7.PM2.5 sensor locations for Buffalo showing 250 sensors.The figure shows the distribution of sensor locations identified as optimal by the mrDMD algorithm, and those identified as optimal and equitable by the mrDMDcc using race and income metrics.All sensor locations are gridded onto the same 1 km x 1 km Di et al. grid.White areas of the sensor location maps represent the built environment, while the shades of green represent the natural vegetation colors of the area.

Figure S8 .
Figure S8.Maps of Buffalo mrDMD PM2.5 modes.The mrDMD time window is for September 2005 to December 2016.The left axis expresses the decomposition level and its related time frequency, such that the bottom row corresponds to the average background mode of PM2.5 over 4096 d (∼136.5 months), with each successively higher row corresponding to pollution episodes lasting half the number of days of the row below.Colored boxes indicate a mrDMD mode that exhibits significant variability above the background mode; otherwise, the boxes are left blank.At each level of decomposition, the separation of the pollution episode from the background mode is deemed significant if its eigenvalues exceed a tolerance of 1.Here, we set this tolerance to a standard value of 1  10 −2 .Darker shades of the colored boxes represent a relatively larger modal signal then the background mode.Modal maps of the background averages and examples of significant pollution episodes are shown in the margins.Arrows point to the time periods of the corresponding mrDMD modal maps.All mrDMD modes are on the same amplitude scale shown to the right.

Figure S9 .
Figure S9.Maps of Boston mrDMD PM2.5 modes.The mrDMD time window is for September 2005 to December 2016.The left axis expresses the decomposition level and its related time frequency, such that the bottom row corresponds to the average background mode of PM2.5 over 4096 d (∼136.5 months), with each successively higher row corresponding to pollution episodes lasting half the number of days of the row below.Colored boxes indicate a mrDMD mode that exhibits significant variability above the background mode; otherwise, the boxes are left blank.At each level of decomposition, the separation of the pollution episode from the background mode is deemed significant if its eigenvalues exceed a tolerance of 1.Here, we set this tolerance to a standard value of 1  10 −2 .Darker shades of the colored boxes represent a relatively larger modal signal then the background mode.Modal maps of the background averages and examples of significant pollution episodes are shown in the margins.Arrows point to the time periods of the corresponding mrDMD modal maps.All mrDMD modes are on the same amplitude scale shown to the right.

Figure S10 .
Figure S10.Cumulative frequency distributions for proportion of nonwhite locations and median annual income for the three different sensor network optimizations for Houston.Each point represents one sensor location out of the 250 designed for Houston.An additional set of points in each plot represents the distribution across racial composition (light blue) and income (orange) for a high-density, uniformly distributed sensor network across all 1 km 2 grid cells within the city bounds.The y-axis for median annual income has been reversed to make this panel consistent with the other panels, with the neighborhoods of greatest interest plotted at the high end of the distributions.

Figure S11 .
Figure S11.Cumulative frequency distributions for proportion of nonwhite locations and median annual income for the three different sensor network optimizations for Buffalo.Each point represents one sensor location out of the 150 designed for Buffalo.An additional set of points in each plot represents the distribution across racial composition (light blue) and income (orange) for a high-density, uniformly distributed sensor network across all 1 km 2 grid cells within the city bounds.The y-axis for median annual income has been reversed to make this panel consistent with the other panels, with the neighborhoods of greatest interest plotted at the high end of the distributions.

Figure S12 .
Figure S12.Cumulative frequency distributions for proportion of nonwhite locations and median annual income for the three different sensor network optimizations for Boston.Each point represents one sensor location out of the 250 designed for Boston.An additional set of points in each plot represents the distribution across racial composition (light blue) and income (orange) for a high-density, uniformly distributed sensor network across all 1 km 2 grid cells within the city bounds.The y-axis for median annual income has been reversed to make this panel consistent with the other panels, with the neighborhoods of greatest interest plotted at the high end of the distributions.

Figure S13 .
Figure S13.Scatterplot of annual average income ($USD) vs. proportion nonwhite for all sensor locations designed with the three sensor network optimizations for St. Louis, Houston, Buffalo, and Boston.