Sensitivity of Air Pollution Exposure and Disease Burden to Emission Changes in China Using Machine Learning Emulation

Abstract Machine learning models can emulate chemical transport models, reducing computational costs and enabling more experimentation. We developed emulators to predict annual−mean fine particulate matter (PM2.5) and ozone (O3) concentrations and their associated chronic health impacts from changes in five major emission sectors (residential, industrial, land transport, agriculture, and power generation) in China. The emulators predicted 99.9% of the variance in PM2.5 and O3 concentrations. We used these emulators to estimate how emission reductions can attain air quality targets. In 2015, we estimate that PM2.5 exposure was 47.4 μg m−3 and O3 exposure was 43.8 ppb, associated with 2,189,700 (95% uncertainty interval, 95UI: 1,948,000–2,427,300) premature deaths per year, primarily from PM2.5 exposure (98%). PM2.5 exposure and the associated disease burden were most sensitive to industry and residential emissions. We explore the sensitivity of exposure and health to different combinations of emission reductions. The National Air Quality Target (35 μg m−3) for PM2.5 concentrations can be attained nationally with emission reductions of 72% in industrial, 57% in residential, 36% in land transport, 35% in agricultural, and 33% in power generation emissions. We show that complete removal of emissions from these five sectors does not enable the attainment of the WHO Annual Guideline (5 μg m−3) due to remaining air pollution from other sources. Our work provides the first assessment of how air pollution exposure and disease burden in China varies as emissions change across these five sectors and highlights the value of emulators in air quality research.

these complex CTMs have high computational costs. To reduce costs, some approaches are to reduce the model complexity, reduce the model resolution or precision (Palmer, 2015), reduce the number of experiments, or to use simplified models. To reduce computational demand, a wide range of reduced−complexity or reduced−form air quality models have been developed to simplify these complex processes (Buonocore et al., 2014;Carnevale et al., 2009;Foley et al., 2014;Heo, Adams & Gao, 2016a, 2016bHenze et al., 2007;Seinfeld & Pandis, 2016;Tessum et al., 2017). For example, InMAP (Intervention Model for Air Pollution) is a reduced−form air quality model that decreases computational costs via simplified representations of atmospheric processes (Tessum et al., 2017). In contrast, the emulators are statistical machine learning models that decrease computational costs via mapping specific associations. Emulators learn these specific input−output associations from full CTM simulations. These emulators are often designed using Gaussian process regressors (O'Hagan, 2006;Rasmussen & Williams, 2006), due to their flexibility, accuracy, and skill with smaller data sets. Emulators are computationally expensive to build as their training data requires many CTM runs, though they are substantially cheaper to run once built enabling many more experiments to be explored.
In our previous work, we developed emulators to predict winter (January 2015) ambient fine particulate matter (PM 2.5 ) concentrations from emission changes across China (Conibear, Reddington, Silver, Chen, et al., 2021). Here, we further develop these emulators for annual exposure (2015) to multiple air pollutants (PM 2.5 and O 3 ) and to assess the chronic health impacts. To our knowledge, this is the first study using emulators to predict long− term (annual) air quality and the public health benefits attributed to different emission control strategies in China.

Simulator
Simulations were conducted using WRFChem (Weather Research and Forecasting model online-coupled with Chemistry) version 3.7.1 (Grell et al., 2005;Skamarock et al., 2008). Each simulation was for the whole of 2015 with one-month spin-up over China at 30 km (0.3°) horizontal resolution. There were 50 simulations for the training data and five additional simulations for the test data. The simulations differed only in the scaling of the anthropogenic emissions over China, determined from separate maxi−min Latin hypercube space-filling designs (Tables S1 and S2 in Supporting Information S1). The version of WRFChem used here was described and evaluated in our previous work (Conibear, Reddington, Silver, Chen, et al., 2021;Reddington et al., 2019;Silver, Conibear, et al., 2020).
Sectoral and speciated anthropogenic emissions inside China were from the MEIC (Multi-resolution Emission Inventory for China) emission inventory for 2015 at 0.25° × 0.25° horizontal resolution Li, Zhang, et al., 2017;MEIC Research Group & Tsinghua University, 2019;Zheng et al., 2018 and South China, as well as across South Asia (Figures S1−S4 in Supporting Information S1). A diurnal cycle was applied to the anthropogenic emissions (Qi et al., 2017;Zheng et al., 2017).
Gas phase chemistry was simulated using the extended MOZART (Model for Ozone and Related Chemical Tracers) scheme (Emmons et al., 2010;Hodzic & Jimenez, 2011;Knote et al., 2014). Aerosol physics and chemistry was simulated using the updated MOSAIC (Model for Simulating Aerosol Interactions and Chemistry) scheme with aqueous chemistry (Alma Zaveri et al., 2008). Secondary organic aerosol (SOA) formation was based on an updated volatility basis set mechanism (Knote et al., 2015).
We evaluated the simulator (WRFChem) against PM 2.5 and O 3 measurements from 1,633 sites (Jin et al., 2020;Silver et al., 2018). The normalized mean bias factor (NMBF) and the normalized mean absolute error factor (NMAEF) were used to evaluate the simulator (Yu et al., 2006). For example, a NMBF of −0.5 means that the simulator underestimated observations by 50% on average, and a NMAEF of 0.5 means that the simulator had an absolute gross error of 0.5 times the mean observation. Here, the simulator underestimated annual−mean PM 2.5 concentrations (NMBF = −0.05 and NMAEF = 0.18) and overestimated maximum 6−monthly−mean daily− maximum 8−hour (6mDM8h) O 3 concentrations (NMBF = 0.39 and NMAEF = 0.40) across China ( Figure S5 in Supporting Information S1). To provide the closest match with observations, we tuned the PM 2.5 and O 3 concentrations. Tuning was completed by scaling the model to match observations by prefecture if measurements were available, otherwise scalings were applied by province (administrative division). The tuned model had reduced bias and error for both annual−mean PM 2.5 concentrations (NMBF = 0.02 and NMAEF = 0.10) and 6mDM8h O 3 concentrations (NMBF = 0.03 and NMAEF = 0.11) across China. The scaling allowed us to accurately predict the spatial pattern and magnitude of PM 2.5 and O 3 concentrations across China.

Emulator
We developed emulators to make computationally efficient predictions of the WRFChem model, as described in our previous work (Conibear, Reddington, Silver, Chen, et al., 2021). Here, the emulator approach from our previous work was extended from January 2015 to the year of 2015, for O 3 concentrations, and the air pollution disease burden. The emulator workflow is summarized in Figure S6 in Supporting Information S1.
The emulators predict air quality across China from fractional changes in anthropogenic emissions. The emulator inputs were anthropogenic emissions from the residential (RES), industrial (IND), land transport (TRA), agricultural (AGR), and power generation (ENE) sectors. For the emulator inputs, all species from each sector were scaled between 0%-150%. These emulator inputs were from simulator data of 50 training runs and 5 test runs, based on separate maxi−min Latin hypercube space-filling designs. The training data was designed to cover all of the input distributions. The emulators were trained on the raw simulation data, before the control scaling factors were applied for tuning.
Individual emulators were developed for each output of annual−mean PM 2.5 concentrations and 6mDM8h O 3 concentrations, and each WRFChem grid cell across China (15,278 grid cells in total) to capture the spatial distribution of each pollutant. These outputs were chosen as they are the metrics used in the health impact assessment. This meant we developed a total of 30,556 separate emulators. The emulators are based on annual average values and do not have information of finer time scales.
The optimized emulator designs included an input preprocessor (Yeo & Johnson, 2000) and a Gaussian process regressor with a Matern 5/2 kernel (Conibear, 2021). Gaussian process regressors update a prior function over the inputs to a posterior function including observations (i.e., the training data) (Rasmussen & Williams, 2006). Bayesian inference is then used to sample from this posterior function to produce the Gaussian output. Gaussian process regressors notice trends well when similar inputs have similar outputs, and are flexible and accurate with smaller data sets. Our emulators were based on Gaussian process regressors as they are accurate with only a few training samples. This was a key limitation as each training sample is an annual CTM simulation. Other emulator design options, such as deep neural networks, often require much larger training datasets to avoid overfitting the limited data (Watson−Parris, 2021). Recent developments in machine learning architectures may overcome this limitation to improve emulator accuracy and scope, for example, with deep neural architecture search (Kasim et al., 2022).

of 17
The creation of the simulation data (i.e., 55 annual WRFChem runs) was computationally expensive, using 320 CPUs (Central Processing Units, Intel Xeon Gold 6,138) for approximately 1 year of wall time. The training of the emulators used 150 CPUs for approximately 1 hr. Using the emulators per prediction run on the order of seconds on 1 CPU. Hence, the key bottleneck is simulating the atmosphere using complex numerical models. Reducing the computational burden of this step is an important area of future research.
The emulators were specific to the data they were trained on. The emulators make predictions based on associational knowledge, rather than explanatory knowledge (Deutsch, 2012;Pearl, 2019). The emulators were used to predict air quality concentrations for all emission configurations within a 0%-150% matrix of emission scaling factors at 20% increments (32,768 emission configurations). Figure 1 shows the evaluation of the emulators on the unseen test data. For PM 2.5 concentrations, the emulators have a coefficient of determination (R 2 ) value of 0.9995 and a root mean squared error (RMSE) value of 0.5094 μg m −3 . For O 3 concentrations, the emulators have an R 2 value of 0.9992 and a RMSE value of 0.1667 ppb. This means that the emulators can accurately predict 99.9% of the variance in both PM 2.5 and O 3 concentrations for any similar emission configuration.

Health Impact Assessment
The health impact assessment estimated the disease burden attributable to PM 2.5 and O 3 exposure using population attributable fractions (PAF) of relative risk (RR). Exposure variations were used to predict associated outcome variations.
The exposure to annual−mean PM 2.5 (z) per grid cell was relative to the counterfactual exposure level of 2.4 μg m −3 (cf.) where no excess risk was assumed (Equation 1). The RR for a specific exposure and population age group was estimated through the GEMM (Global Exposure Mortality Model) (Burnett et al., 2018). The RR was a function of the parameters θ, α, μ, and ν (Equation 2) as defined in Supplementary Table 2 of . We used the GEMM for non−accidental mortality (non−communicable disease, NCD, plus lower respiratory infections, LRI), using parameters that included the China cohort, with age−specific modifiers for adults over 25 years of age in five−year intervals. The GEMM functions have mean, lower, and upper uncertainty intervals. The PAF was estimated as a function of the RR and the population count (P, Equation 3).
(1) Figure 1. Evaluation of emulators on the unseen test data for concentrations of (a) fine particulate matter (PM 2.5 , annual− mean) and (b) ozone (O 3 , maximum 6−monthly−mean daily−maximum 8−hour, 6mDM8h). Evaluation metrics used were the coefficient of determination (R 2 ) and the root mean squared error (RMSE).
10.1029/2021GH000570 5 of 17 The health impact assessment for O 3 exposure followed the methodology of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) for 2017 (GBD, 2017Risk Factor Collaborators, 2018. The exposure to O 3 (z) per grid cell was calculated as the change in 6mDM8h O 3 concentrations relative to the counterfactual exposure level of 35.7 ppb (cf.) where no excess risk was assumed (Equation 4) (Turner et al., 2016). The 6mDM8h metric was calculated by quantifying 24 separate 8−hour rolling mean O 3 concentrations, finding the maximum of these each day, creating 12 separate 6−monthly means to account for seasonal variations, and finding the maximum of these over the year. The PAF was a function of the hazard ratio (HR), which was 1.06 (95UI: 1.02-1.10) for chronic obstructive pulmonary disease (COPD), based on data from five epidemiological cohorts (Equation 5) (GBD, 2017Risk Factor Collaborators, 2018. Premature mortality (MORT), years of life lost (YLL), and years lived with disability (YLD) per exposure, health outcome, age bracket, and grid cell were estimated as a function of the PAF and the corresponding baseline mortality and morbidity rate (I MORT , I YLL , and I YLD ) following Equations 6-8, respectively. Disability−adjusted life years (DALYs) were estimated as the total of YLL and YLD (Equation 9). The rates of MORT, YLL, YLD, and DALYs were calculated per 100,000 people.
The United Nations adjusted population count data set for 2015 at 0.25° × 0.25° resolution was obtained from the Gridded Population of the World, Version 4 (Center for International Earth Science Information Network & NASA Socioeconomic Data and Applications Center, 2016). Population age composition was taken from the GBD2017 for 2015 for adults of 25-80 years of age in 5−year intervals, and for 80 years plus (Global Burden of Disease Study, 2017, 2018). Cause−specific (NCD, LRI, and COPD) baseline mortality and morbidity rates were taken from the GBD2017 for 2015 for MORT, YLL, and YLD for each age bracket (Institute for Health Metrics and Evaluation, 2020).
Shapefiles were used to aggregate results at the country, province, and prefecture level (Hijmans et al., 2020 Uncertainty intervals at the 95% confidence level were estimated using the derived uncertainty intervals from the exposure−outcome associations, baseline mortality and morbidity rates, and population age fractions. Health impact assessments of the disease burden associated with air pollution exposure have many uncertainties (Nethery & Dominici, 2019).

Results and Discussion
In the results and discussion, PM 2.5 concentrations are ambient annual−means and O 3 concentrations are ambient 6mDM8h. Exposures are population−weighted concentrations. Air quality standards for O 3 concentrations in units of μg m −3 are converted to ppb using the conversion factor of one ppb being approximately equal to 2 μg m −3 (Fleming et al., 2018).

Emulated Baseline PM 2.5 and O 3 Exposure and Disease Burden for 2015
Baseline (i.e., for 2015 with all emission sectors at 100%) PM 2.5 exposure in China is 47.4 μg m −3 , with higher exposure over North, East, and South Central China, and lower exposure in the GBA, South West, and North West China (Table 1 and Figure S5 in Supporting Information S1 Note. The disease burden is given by the annual number of premature mortalities (MORT) and the annual rate of disability−adjusted life years (DALYs) per 100,000 people.

Impact of Changes in Individual Emission Sectors on PM 2.5 and O 3 Exposure and Disease Burden
PM 2.5 exposure decreases approximately linearly from emission reductions in a single sector (Figure 2a). Completely removing IND emissions decreases national PM 2.5 exposure by 28% and regional PM 2.5 exposure by 23%-31%. Under this scenario, the National Air Quality Target (35 μg m −3 ) is achieved in all regions except North China (Figures S7−S13 in Supporting Information S1). Completely removing RES emissions decreases national PM 2.5 exposure by 21% and regional PM 2.5 exposure by 8%-28%. Removing IND or RES emissions results in similar reductions in regional PM 2.5 exposure, except in the GBA where reducing IND emissions provides larger reductions in PM 2.5 exposure. The impacts on PM 2.5 exposure from individual emission sector changes are then of decreasing size from TRA, AGR, and ENE emissions.
The sectoral contributions to PM 2.5 concentrations of 28% from industry, 21% from RES, and 4% from TRA emissions are similar to those from a multi−model study, which found contributions of 30% from industry, 26% from RES, and 7% from TRA emissions (Reddington et al., 2019). We find smaller contributions from AGR (3%) and ENE (2%) emissions compared to the multi−model contribution estimates of 16% from AGR and 14% from ENE emissions (Reddington et al., 2019). In our study, there are larger contributions from other sources (42%, compared to 7% in the multi−model estimates) (Reddington et al., 2019), such as from other anthropogenic emissions inside China, anthropogenic emissions outside China, and natural emissions.
The fractional reductions in PM 2.5 disease burden (Figure 2b) are smaller than the fractional reductions in PM 2.5 exposure (Figure 2a), due to the non−linear exposure−outcome association. Completely removing IND emissions decreases the national number of premature mortalities from PM 2.5 exposure by 17%, avoiding 369,100 (95UI: 334,300-404,800) deaths, and decreases the rate of DALYs by 11%. Completely removing RES emissions decreases the national number of premature mortalities from PM 2.5 exposure by 12%, and decreases the rate of DALYs by 8%. Removing either IND or RES emissions can decrease the regional number of premature mortalities by 5%-20% and the rate of DALYs by 6%-19%.
O 3 exposure changes non−linearly with changes in emissions from a single sector (Figure 2d). This non−linear response is stronger for some sectors (e.g., TRA) and in some regions (e.g., North China, Figures S7−S13 in Supporting Information S1

Impact of Changes in Multiple Emission Sectors on PM 2.5 and O 3 Exposure and Disease Burden
Removing RES and IND emissions together decreases national PM 2.5 exposure by 48% ( Figure 3) and regional PM 2.5 exposure by 37%-53% (Figures S14−S20 in Supporting Information S1). These emission reductions attain the WHO Interim Target  Removing emissions from all five sectors together decreases national PM 2.5 exposure by 57% and regional PM 2.5 exposure by 52%-61%. These emission reductions attain the WHO Interim Target 2 (25 μg m −3 ) in all regions except North China, but only attain the WHO Interim Target 3 (15 μg m −3 ) in the GBA. The WHO Annual Guideline (5 μg m −3 ) is not achieved in any region. Under this scenario, PM 2.5 concentrations result from other anthropogenic emissions inside China including shipping, aviation, and agricultural fires, from anthropogenic emissions outside China, and from natural emission sources such as vegetation fires, dust, sea spray, and secondary organic aerosols from biogenic volatile organic compounds (VOC). Previous studies have estimated the contributions to PM 2.5 concentrations in China from dust were 2%-10% and were higher in North West China (Shi et al., 2017;Yang et al., 2011). Open biomass burning was estimated to contribute 1%-8% of PM 2.5 concentrations across China (Reddington et al., 2019), with higher contributions of up to 29% in South Central China (Reddington et al., 2019;Shi et al., 2017). Biogenic SOA has been estimated to contribute 2%-8% of PM 2.5 concentrations (Hu et al., 2017;Shi et al., 2017). Long−range transport of PM 2.5 concentrations from anthropogenic emissions outside China was estimated to contribute up to 3% of PM 2.5 concentrations inside China (Liu et al., 2020). Anthropogenic emissions from shipping were estimated to contribute up to 3% (Chen et al., 2019;Dasadhikari et al., 2019;Reddington et al., 2019) and aviation 1% (Dasadhikari et al., 2019;Zhang et al., 2017). Emissions from sea salt have been estimated to contribute 1% of PM 2.5 concentrations (Shi et al., 2017).
Removing RES and IND emissions together decreases the national number of premature mortalities from PM 2.5 exposure by 32%, avoiding 691,800 (95UI: 625,100-760,400) deaths, and decreases the rate of DALYs by 21% (Figure 4 and S22 in Supporting Information S1). Removing TRA, AGR, and ENE emissions together decreases the national number of premature mortalities from PM 2.5 exposure by 5%. Removing emissions from all five sectors together decreases the national number of premature mortalities from PM 2.5 exposure by 40%, avoiding 858,800 (95UI: 774,900-945,400) deaths, and decreases the rate of DALYs by 27%.
These findings highlight that substantial public health benefits can be achieved by emission reductions, and the majority of these benefits due to reductions in IND and RES emissions. However, even after removing emissions from these five sectors, approximately half of the disease burden associated with PM 2.5 exposure remains, due to other sources of air pollution.
The largest reductions in O 3 exposure occur when removing IND emissions with either TRA or RES emissions ( Figure S22 in Supporting Information S1). Removing IND and TRA emissions decreases national O 3 exposure by 14% and regional O 3 exposure by 8%-22% (Figures S23−S29 in Supporting Information S1), reducing the disease burden by 46% and the rate of DALYs by 41% ( Figures S30 and S31 in Supporting Information S1). Removing IND and RES emissions decreases national O 3 exposure by 10% and regional O 3 exposure by 5%-16% reducing the annual number of premature mortalities by 36%. However, some combinations of reductions in ENE, AGR, and TRA emissions can increase O 3 exposure and the associated disease burden in some regions.
These findings highlight the complex dependencies between the chemical production of O 3 and precursors of O 3 , especially nitrogen oxides (NO X ) and VOC emissions. The ENE and TRA sectors have large NO X emissions but smaller VOC emissions, while the IND sector has large emissions of both NO X and VOC . As some urban areas in East China are VOC−limited (Jin & Holloway, 2015;Wang et al., 2017), reducing NO X emissions from the ENE and TRA sectors can increase O 3 exposure (Li et al., 2021).
The largest public health benefits come from reductions in PM 2.5 exposure, as the PM 2.5 disease burden is two orders of magnitude larger the O 3 disease burden. The largest public health benefits also come from reductions in IND and RES emissions, as these sectors dominate PM 2.5 exposure and are the only sectors which consistently decrease O 3 exposure.

Emission Configurations That Meet Air Quality Targets
There are 11,192 emission configurations that meet the National Air Quality Target (35 μg m −3 ) nationally for PM 2.5 concentrations, requiring mean emission reductions of 72% in IND, 57% in RES, 36% in TRA, 35% in AGR, and 33% in ENE emissions ( Figure 5) Figure S32 in Supporting Information S1). The WHO Air Quality Guideline (5 μg m −3 ) cannot be attained in any region from reductions in these five emission sectors alone.   Figure S32 in Supporting Information S1).

Conclusion
We developed emulators to predict annual mean PM 2.5 and O 3 concentrations and associated public health impacts in China. Our analysis provides a first estimate of how air pollution exposure and associated disease burden in Emulators have broad potential in air quality research. Future work could study other regions, countries, emission sources, and could further split emission inputs by species, sub−sectors, and time−of−day. Here, we chose to apply the same emission changes over all species within each sector due to computational constraints. For five inputs, 55 annual high-resolution WRFChem simulations were required for the training and testing data (Loeppky et al., 2009). If the emulators split the emissions by pollutant, then the computational burden would have increased by up to a factor of 10. Our work was conducted for one meteorological year (2015). Previous work found that the air quality impacts from meteorological changes were smaller than those from emission changes (Ding et al., 2019;Silver, Conibear, et al., 2020;Silver, He et al., 2020). However, future work should account for and explore variations in meteorology, including seasonal and inter−annual variations. To further understand how China can achieve the WHO Annual Guideline for PM 2.5 exposure (5 μg m −3 ), future work is needed exploring the relative contributions of other anthropogenic emission sources, emissions outside China and natural emissions, which may increase under climate change (Carslaw et al., 2010). Our work highlights the challenges facing China as it attempts to further reduce PM 2.5 exposure and improve public health. Research Council (NE/S006680/1). This work was undertaken on ARC4, part of the High−Performance Computing facilities at the University of Leeds, UK. This work used WRFotron version 2.0, a tool to automatise WRFChem runs with re-initialised meteorology . We acknowledge the use of WRFChem preprocessor tools mozbc, fire_emiss, anthro_emiss, and bio_emiss provided by Atmospheric Chemistry Observations and Modeling (ACOM) of the National Center for Atmospheric Research (NCAR). We acknowledge the use of Model for Ozone and Related Chemical Tracers (MOZART) global model output, available at acom.ucar. edu/wrf−chem/mozart.shtml. We acknowledge the use of the emission pre-processor, available at github.com/ douglowe/WRF_UoM_EMIT. We thank Qiang Zhang and Meng Li for providing MEIC data. We acknowledge the Python Software Foundation, Python Language Reference, available at python.org. We are particularly grateful to the Python libraries NumPy (Harris et al., 2020), Pandas (McKinney, 2010), Matplotlib (Hunter, 2007), SciPy , xarray (Hoyer & Hamman, 2017), Cartopy (Met Office, 2015), GeoPandas (Jordahl et al., 2020), Salem (Maussion, TimoRoth, Tbridel, Dusch, & Landmann, 2019), Jupyter (Kluyver et al., 2016), Scikit−learn (Pedregosa et al., 2011), TPOT (Olson et al., 2016), SALib (Herman & Usher, 2017), pyDOE, Seaborn (Waskom et al., 2020), Rasterio (Gillies, 2013), Affine, xESMF (Zhuang et al., 2020), Dask (Rocklin, 2015), and the Pangeo project (Abernathey et al., 2017). The boundaries shown on any maps in this work do not imply any judgement concerning the legal status of any territory or the endorsement or acceptance of such boundaries.