Air pollution estimation under air stagnation—A case study of Beijing

Air pollution continues to be a major environmental concern in China. The wind‐driven transmission poses difficulties in understanding the air pollution patterns at the local level. The main objective of this study is to offer a straightforward approach for investigating the temporal trends and meteorological effects on the air pollutant concentrations during the generation process without being confounded by the complex wind‐driven transmission effect. We focus on the hourly data of the three most common air pollutants: PM2.5, NO 2$$ {}_2 $$ , and CO under air stagnation in Beijing, China, during 2014–2017. We find that the local pollution levels under air stagnation in Beijing have decreased over the years; winter is the severest month of the year; Sunday is the clearest day of the week. Our model also interpolates the air pollutant concentrations at sites without monitoring stations and provides a map of air pollution concentrations under air stagnation. The results could be used to identify locations where air pollutants easily accumulate.

the lungs and penetrate deep into the circulatory system (Xing et al., 2016). The long-term exposure to PM2.5 contributed to an estimated 1.6-2.2 million premature deaths annually during 2013-2015 in China . Besides, nitrogen dioxide (NO 2 ) and carbon monoxide (CO) are also of great concern, and they are pervasive and primarily human-caused through fossil fuel consumption.
Estimating and predicting air pollution levels is challenging due to complex atmospheric processes, various emission sources, data availability and quality, rapid urbanization, and industrialization (U.S. Environmental Protection Agency, 2021). Despite these challenges, advancements in monitoring technologies and modeling techniques continue to improve our understanding of air pollution dynamics (Lelieveld et al., 2015). The recently developed modeling approaches include non-linear prediction with neural networks (Wang & Cao, 2023), machine learning method for air pollution prediction (Krylova & Okhrin, 2022), Bayesian hierarchical models (Szpiro et al., 2010), geographically weighted regression (Hu et al., 2013), non-parametric regression (Liang et al., 2015), the kriging models (Sampson et al., 2013), and the spatio-temporal models (Fioravanti et al., 2022;Wan et al., 2021). Covariates commonly used in the past air pollution studies include meteorological variables (Wan et al., 2021;Zheng et al., 2021), land-use related (Henderson et al., 2007;Hoek et al., 2008) and road-traffic related variables (Henderson et al., 2007). More recently, satellite data became popular information to model and estimate air pollution concentrations (Ma et al., 2016;Procházka et al., 2000). Procházka et al. (2000) provided general methods to process satellite images for detecting and predicting air pollution. Ma et al. (2016) utilized remote sensing satellite retrieved aerosol optical depth data to predict air pollution concentrations. This paper includes meteorological variables and nighttime light intensity from satellite images as covariates to help estimate air pollution concentrations.
However, wind effects inevitably dominate air pollution levels due to their strong transmission effects, which vary across land cover types. Many analysis results show that a higher wind speed can help disperse and dilute air pollutants and decrease their concentration (Li et al., 2014). However, wind under different meteorological conditions could have different effects. For instance, a strong wind can blow surface pollutants into the air and thus increase air pollution levels (Kukkonen et al., 2005;Vardoulakis & Kassomenos, 2008). For Beijing, China, in particular, a study has shown that the south wind brings pollutants while the north wind guarantees clean air (Liang et al., 2015). This paper aims to study the air pollutants' levels under air stagnation to eliminate the noises caused by the wind effects and better estimate the temporal and meteorological effects. Our model considers the spatial heterogeneity and dependence at the grid level and the temporal pattern at different scales such as year, month, weekday, and hour. We also investigate the relationship between air pollutants and meteorological factors, including temperature, boundary layer height, pressure, dew point, evaporation, rain, and nighttime light intensity from satellite images. On the northern edge of the severest air-polluted region in China, we take Beijing as an illustrative example with the hourly PM2.5, NO 2 , and CO data from 36 monitoring stations between February 1, 2014, and May 31, 2017.
The rest of the paper is organized as follows. Section 2 describes air pollution data, the explanatory variables, and the air stagnation condition. Section 3 presents the spatio-temporal models for kriging and propose two evaluation metrics for their kriging performances. We also discuss the methods for inference of associations between air pollutants and other variables including meteorological variables and night-time light intensity. Section 4 provides the kriging maps and results of temporal and meteorological associations for three pollutants. Section 5 concludes with discussions on future works.

DATA DESCRIPTION
The hourly air pollution data in Beijing from February 1, 2014, to May 31, 2017, for this study were obtained from 36 monitoring stations, where 35 were from the Beijing Municipal Environmental Monitoring Center (BMEMC) and one from the US Embassy site. Figure 1 shows the locations of the 36 monitoring stations in Beijing used in the study. The red points represent the monitoring sites, and the grids with the dashed line are the spatial resolution of the meteorological variables from ECMWF ERA5 hourly data.

Variables description
We focus on three pollutants: PM2.5, NO 2 , and CO. Table 1 summarizes the continuous variables used in our analysis. The meteorological variables in Table 1 are mainly based on the past studies Holzworth, 1967;Liang et al., 2015;Tan et al., 2018;Wan et al., 2021;Zhang et al., 2017;Zheng et al., 2021). For example, the air temperatures at 2 m, 250 hPa, and 850 hPa are correlated with the air pollution level in Holzworth, 1967). The pressure (PRES) is associated with air pollution in (Liang et al., 2015;Wan et al., 2021;Zhang et al., 2017;Zheng et al., 2021). We also added a night-time light intensity as a variable, which reflected the population density across different locations (Tan et al., 2018). Besides the instant measurements of wind and precipitation, their cumulative amounts are also expected to affect the air pollution level (Liang et al., 2015). Therefore, we create the integrated (cumulative) version of those variables, IHour, IRAIN, IWS10m: IHour (Integrated Stagnant Hour) stands for the percentage of hours under stagnant air status over the past 24 h, where the air stagnation is defined in subsection 2.2; IRAIN stands for the average precipitation over the past 24 h; IWS10m stands for the integrated wind speed at 10 m above the surface, where the wind speed is accumulated if the wind direction remains the same as the wind direction in the previous hour.
The spatial resolutions of monitoring stations, meteorological variables, and night-time light intensity differ, and it is known as the change of support problem (Gelfand et al., 2001). Since most of the variables are meteorological, we decide to use the centroids of their spatial grid as the standards for the location in modeling and inference. We set each grid's hourly air pollution levels as the average air pollution levels at the monitoring stations within that grid. After the aggregation of monitoring stations, there are 23 grid cells with observed air pollutant concentrations. Similarly, averaging night-time light intensity is taken in an 8-km buffer zone within each grid to match the night-time light intensity with the spatial resolution of the meteorological variables.
In addition to variables described in Table 1, we include the categorical variables such as year, month, weekday, and hour of the day and a spatial indicator for being at the center of Beijing or not. Here we define the center of Beijing as the region within Beijing's Fifth Ring Road.
Lastly, we also derive the probability of air stagnation using the ensemble wind forecast from the European Centre for Medium-Range Weather Forecasts (ECMWF) TIGGE Data (https://apps.ecmwf.int/datasets/data/tigge/) to study and control for the dependence between pollutant concentrations and air stagnation. The probability of air stagnation is calculated as the proportion of instant wind forecast ensemble members that satisfy the air stagnation definition provided in the next subsection.

Air stagnation data
The air stagnation days were firstly defined by the National Oceanic and Atmospheric Administration (NOAA) in the United States (Wang & Angell, 1999). The criteria of air stagnation days are determined by three factors: lower-air (10 m) wind, upper-air (500 hPa) wind, and precipitation. An air stagnation day is when 10 m wind speed (lower-air wind) is less than 3.2 m per second (m⋅ s −1 ), the 500 hPa wind (upper-air wind) is less than 13 m⋅s −1 , and it is precipitation-free. The National Oceanic and Atmospheric Administration (NOAA) in the United States (Wang & Angell, 1999) defined the air stagnation days with three factors: lower-air (10 m) wind, upper-air (500 hPa) wind, and precipitation. An air stagnation day is when 10 m wind speed (lower-air wind) is less than 3.2 m per second (m⋅ s −1 ), the 500 hPa wind (upper-air wind) is less than 13 m⋅s −1 , and it is precipitation-free. To accommodate for hourly data, we similarly defined the air stagnation hours: the integrated 10 m wind speed and daily 10 m (lower-air) wind speed are less than 3.2 m⋅s −1 , the current and daily 500 hPa (upper-air) wind speed is less than 13 m⋅s −1 . Since we are also interested in the effects of precipitation on PM2.5, NO 2 ,, and CO under air stagnation, precipitation is not used as the data exclusion criteria.
Our air stagnation data range from February 1, 2014, to May 31, 2017. We include 87,261 time-location combinations that satisfy the air stagnation conditions in our study. The proportion of air stagnation data takes up to 14% of the total hourly data. Although it is often preferable to incorporate all existing data in a model, we believe the air stagnation data themselves are more suitable for revealing the spatial and temporal patterns of air pollution concentrations and for investigating the associations between three air pollutants and auxiliary variables, including temporal variables, meteorological variables, and nightlight intensity. The large number of data points satisfying the air stagnation condition is also sufficient for deriving reliable estimates. The detailed patterns for the percentages of hours not under air stagnation across locations and months are shown in Appendix S1.

MODELING OF POLLUTION CONCENTRATION
We take the grid-level air stagnation data as point-referenced data for modeling, using their centroids as the location of each grid cell to calculate the distance between two grid cells. We first establish a Gaussian spatio-temporal process model for the air pollution concentrations in subsection 3.1. We then introduce two parametric spatio-temporal models and the variogram approach for estimating the covariance parameters in subsection 3.2. The model parameters are estimated by using only air stagnation data. We resolve the issue of selection bias with the covariate adjustment using propensity score in subsection 3.3 and discuss the association studies in subsection 3.4. In subsection 3.5, the kriging model is used to interpolate pollution levels at the unobserved locations and times. In subsection 3.6, we present our strategy for model selection and the metrics for evaluating the prediction accuracy.

Gaussian spatio-temporal process model
The explanatory variables being considered are the temporal variables (year, month, weekday, and hour of the day, all of which are modeled as dummy variables), the meteorological variables (rain, temperatures at 2 m, 250 hPa, and 850 hPa, surface pressure, dew point temperature, boundary layer height, integrated precipitation, integrated air stagnant hour up to time t, solar radiation, geopotential height, and evaporation), nightlight and spatial location (latitude, longitude, and the indicator for whether being at the center of Beijing). In addition to those main effects, we consider two-way interactions between significant meteorological effects and the temporal dummy variables for possible temporally varying meteorological effects. Let s ∈  = {1, … , 90} denote a grid cell for all 90 grid cells of interest, t ∈  = {1, … , T} denote a time point during our study period where T = 29,184 is the total number of hours, and n = 90 × T is the number of all (s, t) pairs. The matrix X represents the covariates selected from the candidate exploratory variables as mentioned above and their two-way interactions. It is a n × p design matrix where p is the number of covariates. One row of X, X(s, t), corresponds to the covariates measured at grid cell location s and time t. The n−dimensional vector, Y, represents the cubic-root transformed air pollution levels under the air stagnation for all (s, t) pairs. The cubic-root transformation provides better normality of the residuals. Note that Y are not fully observed for all the locations and time points. 86% of elements in Y are missing due to the absence of monitoring stations or the unmet air stagnation condition. The rows of X and the elements of Y are sorted by location first and then by timestamp.
Our modeling framework can be written as two parts: an unobservable process model and an observable data model (Cressie & Wikle, 2015). The underlying process model consists of the stationary fixed effect explained by the covariates, x(s, t) and the spatio-temporal random field, z (s, t), that capture the spatio-temporal dependence. The process model is where Z(s, t) represents the underlying pollution level at location s and time t, the p−dimensional vector represents the fixed effect coefficients for the covariates X(s, t), and z , z (s, t) ∈ z . The n−dimensional vector, z , are the Gaussian spatio-temporal random effects with mean 0 and covariance which will be specified in subsection 3.2. The data model is where Y (s, t) is the cubic-root transformed air pollution level measured at the time point t ∈  and the grid cell s ∈ .
y (s, t) ∈ y is the associated measurement error that is independent of z (s; t). The n−dimensional vector of measurement errors, y , follows multivariate Gaussian distribution with mean 0 and covariance 2 nugget I n×n , where the variance 2 nugget is also known as the nugget variance. y is independent of z . We combined the two processes to have where (s, t) ∈ and the n−dimensional vector ∼ N(0, + 2 nugget I n×n ). Let Y and X be the overall observation vector of pollutant and the covariate matrix pooling all available sites and observational times together, which gives rise to the summarizing model equation: We use a two-stage estimation procedure to expedite the computing time of parameter estimation. We first estimate the fixed effects using the ordinary least square (OLS) estimator aŝ Then, we obtain the ordinary least square residualŝ= Y − X̂O LS and use the residual to estimate the parameters in covariance models with the details in subsection 3.2. Lastly, using the estimated covariance models to obtain the generalized least square (GLS) estimator that we will describe in subsection 3.4.

The covariance models
We assume the spatio-temporal covariance, , to follow either the Gneiting model (Gneiting, 2002) or the Product-Sum model (De Iaco et al., 2001). These two parametric spatio-temporal covariance models allow for the interactions between space and time. They have been proven to be flexible in describing a wide range of covariance surfaces in various studies.
In spatio-temporal covariance models, the dependence structure is often characterized by the spatio-temporal variogram. The variogram between a pair of i th residuals i and j th residuals j is defined as and i,j is known as the semivariogram (Cressie & Wikle, 2015). The Product-Sum model assumes a spatio-temporal covariance structure: where Cov s (h) and Cov t (k) are valid temporal and spatial covariance structures, respectively. It can also be written in terms of semivariograms: where s,t (h, 0) and s,t (0, k) are the spatial and temporal semivariograms, respectively, and m is a parameter that controls for the degree of space-time interaction. We use spherical c sill,1 sph( h r ) for s,t (h, 0), where parameter r is the range parameter, and exponential semivariograms c sill,2 exp( h scale ) for s,t (0, k). The Gneiting model assumes an alternative spatio-temporal correlation structure: where a and b are non-negative scaling parameters of time and space respectively, is a smoothness parameter, the exponent is the parameter for temporal correlation, is a space-time interaction parameter, 2 sill is the variance of residuals (or global sill variance), and 2 sill (1 − p 0 ) is the nugget effect. The corresponding semivariogram structure is To estimate 2 nugget and , we first obtain the empirical variogram by using the ordinary least squares residualŝ= Y − X̂O LS . Under the isotropic assumption, that is, that the semivariogram i,j depends on neither the location nor time, the spatio-temporal semivariogram can be written as a function of distance h ij and time lag k ij between two points, where h ij = Distance(s i , s j ) using the longitude and latitude of site s i and s j , and wherêis the estimated ordinary least square residuals, A h,k is the set of pairs of data points that have distance (h − 0.5, h + 0.5] kilometers and time lag k hours respectively, and |A h,k | is the cardinality of A h,k . In practice, we only compute the empirical semivariogram at different integer values of time lag k and distance h. We refer to the upper bound of the semivariogram as the global sill variance 2 sill , which is defined as 2 sill = Var( i ) = 2 nugget + Var( z (s i , t i )) for any i. The spatial semivariogram is the spatio-temporal semivariogram when the time lag k = 0, which is s,t (h, 0), and the temporal semivariogram is the spatio-temporal semivariogram when the distance h = 0, which is s,t (0, k). We separately fit the spatial and temporal semivariograms to expedite the optimization process. Given the estimated parameters for spatial and temporal semivariograms, we can then estimate space and time interaction parameters (m in the Product-Sum model and in Gneiting model) by the weighted least squares with empirical spatio-temporal semivariograms.

3.3
Covariate adjustment using the propensity score We aim to investigate the temporal trends and meteorological effects on air pollutant concentrations during air stagnation hours and recover the air pollution maps under air stagnation. However, using only air stagnation data could introduce selection bias if a dependence exists between the pollutant concentration and the air stagnation status. To resolve this issue, we borrow the concepts from causal inference. For the location, s and time t, Y (s, t) is the air pollution level after cubic-root transformed as we defined before, and let T s,t be a binary air stagnation indicator (one corresponds to air stagnant), X(s, t) be the covariates to forecast wind speed. The outcome of interest is air pollution under air stagnation, that is, Here Y(1) means the "potential outcome", a term from causal inference, that is, air pollutant levels if it is under air stagnation. Conceptually, if we could have "forced" the wind to be stagnant every hour, we would have observed the "complete data" of Y(1). Since we cannot force the wind to be stagnant, we shall work on a subset of the complete data, that is, the observed data of Y s,t (1) under the air stagnation condition-when T s,t = 1. Fitting the Gaussian spatio-temporal process model to the observed data, we found that the residuals correlate with the air stagnant probability. It indicates that the observed data of Y s,t (1) is not a random sample of the complete data. So, the potential outcome Y s,t (1) is not independent of T s,t , that is, some covariates affect both Y s,t (1) and T s,t .
We consider the covariate adjustment using the propensity score, P(T s,t = 1), to recover the map air pollution levels during stagnant air hours (Rosenbaum & Rubin, 1983). Since the wind speed has been well studied in weather forecasting, instead of modeling the propensity score on our own, we calculate P(T s,t = 1) from the probabilistic estimates of wind speed from the European Centre for Medium-Range Weather Forecasts (ECMWF) TIGGE Data (https://apps.ecmwf.int/ datasets/data/tigge/). So the propensity scores are treated as known in our study. In addition, since air pollution has rarely been considered an important predictor in wind speed modeling, we believe the "strongly ignorable" assumption is valid, that is, Y s,t (1) and T s,t are independent conditional on the propensity score P(T s,t = 1).
The inference for association study in subsection 3.4 is conditioning on the same value of the propensity score so that Y s,t (1) and T s,t are conditionally independent under the "strongly ignorable" assumption. After controlling the probability of being air stagnant, we create the spatial maps of air pollution levels in four seasons for three pollutants in subsection 3.5.

Inference for associations
Our primary goal is to estimate each pollutant's yearly, monthly, weekly, and hourly trends. In addition, we want to study each meteorological variable's effect on pollutant concentrations. With the association inference results, we can enhance our understanding of the pattern of air pollution concentrations under varying weather conditions. The parametric structure of the covariance + 2 nugget I n×n for the residuals in the model (3) is chosen from the Gneiting model (Gneiting, 2002) or the Product-Sum model (De Iaco et al., 2001) by their corresponding cross-validation performance, with the details given in subsection 3.6. To estimate the regression coefficients of temporal and other meteorological effects in , one could use the Generalized Least Squares estimator (GLS), accounting for the spatio-temporal dependence:̂= whose variance is The presence of strong multicollinearity among meteorological variables makes the association results hard to interpret. For instance, the boundary layer height (blh) and the pressure are negatively correlated, the former is negatively correlated with air pollution, and the latter is positively correlated with air pollution. However, including both blh and pressure in the model could change the direction of one coefficient due to the multicollinearity. Although the coefficient is interpreted as the expected change of air pollution conditional on all other covariates in the model, such a coefficient can still be confusing and not very useful in practice. On the other hand, the year, month, weekday, and hour effects are almost perpendicular to each other, and it is more intuitive to ask how the air pollutant level would change with a meteorological factor, conditional on a certain year, month, day, hour, and at a given location. Therefore, we consider the model that includes all spatial and temporal effects and only one additional meteorological variable at a time. It will avoid the multicollinearity among meteorological variables.

Kriging and linear coregionalization model
Kriging was first proposed by Matheron (1963) and has been used to interpolate unsampled locations in geostatistics based on the Gaussian process. The interpolation of air pollution concentration levels at location s ′ and time t ′ requires estimating two components, the global mean, x T (s ′ , t ′ ) , and the Gaussian spatio-temporal random field, (s ′ , t ′ ). The estimated regression coefficientŝare presented in subsection 3.4, and the distribution of (s ′ , t ′ ) given , the residuals of observed data points, is The estimated air pollution level at location s ′ and time t ′ given, whose variance is We refer to the previously defined model as the univariate kriging model. Next, we develop a multivariate kriging model that allows correlations across three types of air pollutants, and we examine whether the multivariate model can improve the kriging accuracy. The univariate Product-Sum spatio-temporal variogram can be easily extended to the multivariate joint model using Linear Coregionalization Model (LCM) (De Iaco et al., 2012). Let E(s, t) = [ 1 (s, t), 2 (s, t), 3 (s, t)] T be the residual vector of three pollutants (PM2.5, NO 2 , and CO) at site s and time t. We model the residuals of three pollutants jointly and assume that

E(s, t) = DM(s, t),
where M(s, t) = [M 1 (s, t), … , M P (s, t)] T is a vector of P uncorrelated Gaussian process, and D is a 3 × P matrix. We use the following auto and cross-variogram of the residual pairs ( l , s ).
We can then model the matrix auto and cross-variogram as an LCM: where G(h, k) is a 3 × 3 matrix, whose entries are g lq (h, k), and the B j 's are positive definite P × P matrices. The basic variograms j (h, k)'s are modeled as a Product-Sum (De Iaco et al., 2001). Here, we set P = 3, the spatial variograms to be sph( h r ), and the temporal variograms to be c sill exp( k scale ). We follow the fitting procedure by De Iaco et al. (2012) to estimate the parameters in B j 's, spherical and exponential variograms, and the spatio-temporal interaction terms m in each Product-Sum model. To identify the covariance structures of M 1 (s, t), … , M P (s, t), we firstly compute the sample auto and cross variograms:̂l where A h,k is the set of pairs of data points that have distance h kilometers and time lag k hours, respectively, and |A h,k | is the size of A h,k . We can fit the basic structures of M 1 (s, t), … , M P (s, t) by simultaneous diagonalization (De Iaco et al., 2012). Lastly, we compute the coregionalization matrices B l given the fitted basic structures of M 1 (s, t), … , M P (s, t).
With the modeled variograms, we deploy the same kriging method and cross-validation strategies for the interpolations and evaluations. We compare the performance of the joint model with separate models in our result.

Model selection and evaluation
We assess the prediction accuracy of different proposed models via cross-validation. The prediction accuracy is stratified by grid cells.
1. At grid cells with monitoring stations, we impute the PM2.5, NO 2 , and CO that are missing due to either not being collected or not qualifying the stagnant air condition. The time-series data at a grid cell are randomly divided into P equal-sized subsamples, and we set P = 5 in this paper. We evaluate the prediction accuracy by repeatedly withholding one subsample at a grid cell as the test data. We refer to this evaluation approach as the Leave-Partial-Time-Out (LPTO) cross-validation. It assesses the interpolation accuracy at a location with historical air pollutant data. 2. To evaluate the prediction accuracy for grid cells without any monitoring stations, we take one observed grid cell as the test data repeatedly for all 23 observed grid cells. This evaluation approach will be referred to as the Leave-One-Location-Out cross-validation (LOLO). Such a target-oriented cross-validation design was recommended by Meyer et al. (2018) for spatio-temporal data to prevent over-fitting-the model very well predicts subsets of the time series of the locations used for training but fails in the extrapolation of unknown locations.
The LOLO cross-validation selects more variables that capture the spatial variation, while the LPTO cross-validation favors variables that capture temporal variation.
for each removed location s and partial p th time set, {y st } are the observed response from test data, andŷ st are the predicted value of test data with training data excluding the partial p th time set at location s.  sp is the partial p th time set under air stagnation at location s with observed air pollutant levels {y st }. And the MAE is averaged over the entire spatial location s and time p. Other metrics (MAE, RMSE, R 2 ) are calculated in a similar way. We calculate the mean absolute value (MAE) and the square root of the mean square error (RMSE), with the data points being equally weighted for each approach. LPTO RMSE and LOLO RMSE are both aggregated across all locations and time points. We provide these two types of cross-validation metrics to show our kriging performance for locations with partial observations and new locations. We selected the predictors and covariance structures based on the mean absolute error (MAE) by combining these two cross-validation strategies.
The covariance structures of the final models were selected first based on the average of LPTO MAE and LOLO MAE for each pollutant. After we fixed the covariance structure, we obtained the variables in the final models by combining the predictors selected by both LOLO and LPTO under selected covariance structures. We also considered the interactions between selected main effects, but no interaction term has been selected in the forward selection process.
The candidate predictors are first screened by the Bayesian Information Criterion (BIC) (Schwarz, 1978) to avoid the high computational cost and the forward selection procedure without taking the spatio-temporal effect into account. Within the candidate predictors for each pollutant, we use the forward selection cross-validation to determine the best set of predictors, including both main effects and two-way interactions of candidate predictors. These cross-validation variable selection procedures are conducted for each proposed covariance structure: Product-Sum and Gneiting.

RESULTS
In this section, we report the results of empirical analysis using the models and inference methods as reported in Section 3.
We first present the model selection results in subsection 4.1 and then analyze the temporal trends of three pollutants under air stagnation in subsection 4.2 and the associations with meteorological and nightlight variables in subsection 4.3. Lastly, in subsection 4.4, we present the spatial distributions of three air pollutants in four seasons after interpolations using the model (3) with the selected covariance and variables by cross-validation methods described in subsection 3.6.

Model selection and prediction accuracy
We select the predictors and covariance structures based on the mean absolute error (MAE) under overall cross-validation performances. According to the LOLO MAE and LPTO MAE average, the product-sum model is used for PM2.5, NO 2 , and CO. The intermediate results of model selection are provided in Appendix S2. The covariance parameters estimates are shown in Appendix S3. Table 2 summarizes the cross-validation performance of our final models for three pollutants in MAE, RMSE, and R 2 at the transformed scale (cubic root). The cross-validation R 2 is defined as where {y i } n i=1 are the observed response from test data and {ŷ i } n i=1 are the predicted value of test data. y test is the sample mean of the test data. The cross-validation R 2 does not depend on the scale of response, and thus it is more comparable among different air pollutants than MAE and RMSE.
The Leave-Partial-Time-Out (LPTO) cross-validation R 2 's are above 0.8 for all three pollutants. With the product-sum covariance, LPTO R 2 's are 0.938 for PM2.5, 0.862 for NO 2 , and 0.811 for CO. It indicates that our proposed model could accurately impute the missing values at the monitoring stations. The Leave-One-Location-Out (LOLO) cross-validation R 2 's are lower than LPTO R 2 's due to the lack of historical air pollution data. The LOLO cross-validation R 2 's are 0.873 TA B L E 2 Evaluation metrics under two types of cross-validation methods for the final models: Leave-one-location-out (LOLO) and leave-partial-time-out (LPTO) for univariate models and multivariate models.

Pollutant
Selected variables Type of CV for PM2.5, 0.526 for NO 2 , and 0.726 for CO. NO 2 is relatively harder to predict because it is more reactive than PM2.5 and CO in the atmosphere and hence has a weaker spatio-temporal dependence. We also present cross-validation results of multivariate kriging models in Table 2. The multivariate performs better for LPTO cross-validations but worse for LOLO cross-validations than the univariate models. In all cases, except the CO model with LPTO cross-validation strategy, the difference of R 2 between univariate and multivariate models is less than 0.06. Regarding the variables selected for kriging models, month and geopotential height are selected for all three pollutants. Using nightlight intensity and temperature at different levels seems to improve the prediction accuracy for NO 2 and CO but not PM2.5. We use the empirical variograms and residual diagnostic plots to check the validity of model assumptions. We first look at the empirical variogram based on five spatial lags up to half the maximum possible distance (60 km) and 24-time lags (from 0-to 23-h). The temporal variogram has more variations for a smaller spatial lag than for a larger spatial lag, suggesting a nonseparable feature for the spatio-temporal covariance. We then define a local space-time 'cylinder' Haas (1995) that consists of air-stagnant data from the nearest ten monitoring locations in space to each grid location of interest and within a specific month of the observing period. For each space-time 'cylinder,' we fit the Produce-Sum and Gneiting models to the empirical variogram and examine the heterogeneity of estimated parameters across cylinders. We present the estimates' maximum, minimum, mean, and standard deviation across cylinders in Appendix Table 3. The parameter estimates were fairly close. So, assuming spatio-temporal stationarity of the air pollution level under air stagnation seems reasonable. Residuals of three pollutants models under air stagnation are all tested trend-stationary using Dickey-Fuller test (Dickey & Fuller, 1981). Finally, we plot the residuals versus fitted values to verify the homoscedasticity assumption for error terms. It also shows that the fitted values capture most trends, and our variables have little non-linearity. The normal Q-Q plots of the three models show that the normality assumptions of our models are satisfied. See details of those diagnoses in Section 4 of the Appendix. Figure 2 shows the year effects (2014 as the baseline) and the month effects (January as the baseline) of PM2.5 1∕3 , NO 1∕3 2 and CO 1∕3 . The mean estimates of all three pollutants decreased over the years in general, except for a mild bump in 2016. The monthly trends suggest the severest air pollution is in winter. One of the reasons is the heating effect during winter, given that the central heating in Beijing usually turns on from November 15 to March 15. Comparing the monthly trends across three pollutants, CO stays at a low concentration level for much longer, from April to September. In contrast, PM2.5 remains at a low concentration level only from August to September. For all three pollutants, Sunday is the cleanest day in Beijing, as more people stay home and fewer human activities occur during that day. The weekly peaks are different for the three pollutants, but they are all concentrated from Wednesday to Friday as it approaches the end of the weekday. PM2.5 remains higher on Thursday and Friday, while it can stay lower from Sunday to Wednesday, and NO 2 remains low from Friday to Sunday. The different weekly patterns could be attributed to the different characteristics of the three

Temporal trends of air pollutants
F I G U R E 2 Temporal trends (regression coefficients) of PM2.5, NO 2 , CO in Beijing under air stagnation from February 1, 2014 to May 31, 2017, with 95% confidence intervals. The gray bands are the 95% confidence interval using GLS inference. The concentration levels are under the cubic root transformation. The first row presents the annual trends of (a) PM2.5, (b) NO 2 , and (c) CO; the second row presents the monthly trends of (d) PM2.5, (e) NO 2 , and (f) CO; the third row shows the weekly patterns of (g) PM2.5, (h) NO 2 , and (i) CO; the last row shows the hourly patterns of (j) PM2.5, (k) NO 2 , and (l) CO.
pollutants. PM2.5 and CO are more stable and could accumulate over days, while NO 2 is not. The hourly trend of the three pollutants all show a dip around 3 p.m.-5 p.m. during the afternoon, which could be attributed to the temperatures, sunlight, and human activities. The boundary layer height reaches its maximum in the afternoon because of the high temperature and the sunshine amount; thus, the air pollution is diluted near the ground. Most people in Beijing end their office hours at 5 p.m. or later, and a typical commute in Beijing can take more than an hour. So the evening rush hour starting at 5 p.m. contributes to the increasing the air pollution level between 5 p.m. and 8 p.m.

Associations of meteorological and nightlight effects
We control for the temporal effects but investigate the meteorological variables and night-time light one at a time to resolve the issue of multicollinearity as described in subsection 3.4. Here, we summarize the regression coefficients of meteorological variables and night-time light intensity. We also compare the results with the ones controlling for the probability of air stagnation as described in subsection 3.3 in which the air stagnation status does not confound the results. Table 3 presents the regression coefficients for PM2.5, NO 2 , and CO models, respectively. It summarizes results from three approaches: generalized least squares (GLS), generalized least squares with propensity score adjustment (GLS-PS), and ordinary least squares with propensity score adjustment (OLS). The GLS point estimates and standard errors are provided in Equations (8) and (9), where the Product-Sum (PS) model is used for the covariance. The OLS estimates are provided in Equation (4). The point estimates among the three methods always have the same sign, and the relative differences are less than 25% for most coefficients. The standard errors between GLS and GLS-PS are similar and less than the OLS standard errors. Given the similarity of estimates obtained from different methods, we next focus on the GLS estimates and sort the coefficients by their GLS-PS absolute t-values.
Boundary layer height (blh) showed the most significant associations with NO 2 and CO and the second most significant association with PM2.5. We quantify that with a 100-meter increase of blh, PM2.5 1∕3 will decrease 0.256, NO 1∕3 2 will decrease 0.295, and CO 1∕3 will decrease 0.652 for a particular year, month, weekday, and time.
The concentration levels of all three air pollutants increase with the temperature at 2 m, 850 hPa (about 1457 m), and 250 hPa (about 10,363 m). Among the three temperature measures, the temperature at 2 m has the most significant positive association with PM2.5 and NO 2 , and the temperature in the low-pressure area (250 hPa) has the most significant positive association with CO.
Evaporation is among the top six most significant associations for all three pollutants. The higher the evaporation rate is, the more severe air pollution is. A related meteorological indicator is the humidity level which is also positively associated with all three pollutants, but the associations are less significant than the evaporation rate.
The night-time light intensity reflects the population density at different locations (Tan et al., 2018;Zhuo et al., 2009) and thus the spatial heterogeneity of human activities. The night-time light intensity is the second most significant factor for both NO 2 and CO; it is the tenth most significant factor for PM2.5. Air pollution is more severe in areas with more human activities: a one-hundred unit increase of the night-time light intensity attributes to a 2.8 increase of NO 1∕3 2 , a 2.8 increase of CO 1∕3 , and a 0.4 increase of PM2.5 1∕3 .
For IRAIN (integrated rain), the PM2.5 and NO 2 concentration levels significantly decrease after heavy rain because PM2.5 and NO 2 can be dissolved in water while CO is not dissolvable. IHour indicates how long the air has been stagnant, and it is significantly associated with PM2.5 and CO and marginally associated with NO 2 . Among all covariates investigated, the dew point at 2 m, the geopotential height, and the surface net solar radiation show no significant associations with any air pollutant after controlling for the temporal effects.

Spatial and seasonal distributions of air pollutants
The resulting maps were rendered using ggmap package in R (Kahle & Wickham, 2013). Geo-location variables such as Latitude, Longitude, and Geopotential frequently appear in the PM2.5 and N0 2 kriging models but not in the CO model. It is because CO is stable and quickly travels for a long time and distance. As a result, its spatial heterogeneity is not as substantial as PM2.5 and N0 2 . PM2.5 and N0 2 concentration levels decrease from south to north and west to east. Several cities with heavy industries in the south of Beijing, like Baoding and Langfang in Hebei province, explained why air pollution was severe in the south. The central area of Beijing lies slightly west within our studying range. Hence, the negative coefficient of longitude in the NO 2 model captures Beijing's central effects, as the central area of Beijing showed higher air pollution levels than the rural area. Figure 3 presented the spatial distributions of PM2.5, NO 2 , and CO concentration levels under air stagnation at their original scales and averaged by season. Darker colors indicate higher concentration levels. All three pollutants have the lowest concentration levels in the summer and are severest in the winter under stagnant air. Besides, the pollution level of NO 2 is worse in the central area of Beijing City than in rural areas. It is likely attributed to the traffic emission on the roads of intensive transportation in the urban area. Figure 4 shows the standard errors of cubic root transformed PM2.5, NO 2 and CO concentration levels at all grid cells (a darker color indicates a higher standard error), and the Leave-One-Location-Out RMSE at the monitoring stations (a TA B L E 3 Coefficients of meteorological and nightlight variables for PM2.5, NO 2 , and CO models. larger dot indicates a larger RMSE). The central Beijing area has relatively lower standard errors and cross-validation RMSE because of the excellent coverage of monitoring stations. The standard errors are more prominent in spring and winter since there are fewer air-stagnant hours, or in other words, more windy days in spring and winter. The large cross-validation RMSEs often appear on the map's boundary, and they are likely to be improved by enlarging the study area to include more monitoring stations.

DISCUSSION
The behavior of air pollutants under air stagnation conditions has been studied as an important index for air pollution severity. Wang and Angell (1999) showed that air pollution is significantly severer during air stagnation days in the United F I G U R E 4 Standard errors of estimated pollutant hourly concentrations for PM2.5 (first row), NO 2 (second row), and CO (third row) in Beijing from spring (left) to winter (right) from February 1, 2014, to May 31, 2017. The standard errors at each grid are averaged over each season and shown using color gradients from white (low value) to purple (high value). Monitoring stations are located in grids with black dots. The size of the dots represents the average leave-one-location-out RMSE. The resulting maps were rendered using the ggmap package in R (Kahle & Wickham, 2013).
States. Cai et al. (2017) and Feng et al. (2018) focused on creating the air stagnant index related to the heavily polluted period in Northern China and Beijing, China, respectively. Huang et al. (2017) further interpolated the mean air stagnant days in China. However, all the previous studies have yet to model or quantify the spatial and temporal trend of air pollutants under air stagnation status. We fill this gap by developing a spatio-temporal model for three air pollutants during the stagnant air period. It is an initial attempt to separate the meteorological and temporal effects from the wind-driven transmission effects for the air pollution problem. When many stagnant air hours are available (over 80,000 data points in our case), the air stagnation analysis leads to reliable and interpretable estimates of the associations between air pollution levels and auxiliary variables of interest. We found a dependence between air pollution level and the probability of air stagnant, which suggests missing-not-at-random (MNAR). We consider the covariate adjustment using the propensity score, the probability of being air stagnant, to mitigate the potential bias due to MNAR. The investigation of the empirical variograms suggests non-separable spatio-temporal effects. So, we considered two parametric spatio-temporal covariance models. We also found dependence among air pollutants and developed multivariate non-separable stationary spatio-temporal models. To our best knowledge, the multivariate non-separable stationary spatio-temporal model has yet to be developed in the literature.
Our results indicate that Beijing's pollution levels have decreased over the years, which is consistent with China's emission control efforts in recent years. The monthly trends suggest that winter's severe air pollution is likely due to the need for heating. Sunday is the cleanest day as more people stay home and fewer human activities occur. The hourly trend of the three pollutants all show a dip around 3 p.m.-5 p.m., which could be attributed to the temperatures, sun lights, and human activities during that time. We also compare our spatial mapping with the maps, including windy days in Qi et al. (2017). Our maps show the same spatial pattern as Qi et al. (2017) that the south part of Beijing has worse air quality than the north. In addition, the spatio-temporal models uncover the relationships between the meteorological effects and air pollution levels under air stagnation. These association estimates are considered stable and reliable after eliminating the noises caused by winds.
Our results have the following limitations. The variable selection procedure is based on cross-validation error, which cannot imply any causal effects between the weather condition and the polluted levels. A future research direction is to combine our air stagnation models with other wind-driven transmission models, which will provide estimates for both calm and windy days. The wind-driven transmission can be better estimated if we know the source of emissions in the surrounding areas or have monitoring data in a larger geographical area. We also acknowledge the change of support problem (Gelfand et al., 2001) in the data. We use a simple average to solve this problem. More advanced approaches (Stein, 2012) may be utilized to solve this problem.

ACKNOWLEDGMENTS
Ying Zhang and Le Bao were supported by NIH/NIAID R01AI136664 and R01AI170249. Song Xi Chen was supported by the National Natural Science Foundation of China grants 92046021, 12071013, and 12026607, and LMEQF at Peking University. The authors are grateful to Xun Cao for discussing the sources of covariates, Lei Chen for advising the collection of meteorological data, and Kaiyi Wu and Jinjun Gao for processing the nightlight data. The authors thank the associate editor and reviewers for their constructive suggestions and comments.