Spatio‐Temporal Hourly and Daily Ozone Forecasting in China Using a Hybrid Machine Learning Model: Autoencoder and Generative Adversarial Networks

Efficient and accurate real‐time forecasting of national spatial ozone distribution is critical to the provision of effective early warning. Traditional numerical air quality models require a high computational cost associated with running large‐scale numerical simulations. In this work, we introduce a hybrid model (VAE‐GAN) combining a generative adversarial network (GAN) with a variational autoencoder (VAE) to learn the dynamic ozone distributions in spatial and temporal spaces. The VAE‐GAN model can not only decipher the complex nonlinear relationship between the inputs (the past states/ozone and meteorological factors) and outputs (ozone), but also provide ozone forecasts for a long lead‐time beyond the training period. The performance of VAE‐GAN is demonstrated in hourly and daily spatio‐temporal ozone forecasts over China. The training datasets from 2013 to 2017 and validation datasets from 2018 to 2019 are the collection of data from the air quality reanalysis datasets. With the use of VAE, large dataset sizes are decreased by three orders of magnitude, enabling hourly and daily forecasts to be computed in seconds. Results show that the VAE‐GAN achieves a reasonable accuracy in the prediction of both the spatial and temporal evolution patterns of hourly and daily ozone fields, as compared to the Nested Air Quality Prediction Modeling System (commonly used in China), the reanalysis data and observations during the validation period. Thus, the VAE‐GAN is a cost‐effective tool for large data‐driven predictions, which can potentially reinforce air pollution prediction efforts in providing risk assessment and management in a timely manner.

These challenges have motivated researchers to exploit hybrid efficient and accurate models, for example, a variety of machine learning algorithms (Chattopadhyay et al., 2020;Weyn et al., 2020). Machine learning models are capable of efficiently handling big data and easily identifying trends and patterns (Hoshyaripour et al., 2016;Jia et al., 2020;Kumar et al., 2017). Several previous works have utilized different machine learning models to predict ambient ozone concentrations directly from observations and post-processing dynamical modeling outputs (Watson, 2019). For example, the MultiLayer Perceptron (MLP) was utilized by Dutot et al. (2007) to carry out hourly maximum ozone forecast in the next day in the center of France. Similarly, Eslami et al. (2019) employed a deep convolutional neural network (CNN) to predict the hourly ozone concentration by using atmospheric data at the previous day along with in-situ ozone and NO 2 concentrations as inputs. In the work of AlOmar et al. (2020), the hybrid model (W-ANN) combining an artificial neural network (ANN) with a wavelet transform (WT) approach, could provide ozone concentration forecast at multiple time scales (2-, 3-, 4 and 5-ahead hours). While these models can predict the future behavior of ozone concentration given knowledge of its past and present states at meteorological stations, they rarely explore spatio-temporal dependencies exhaustively. It was also found that the model performance was more severely degraded with a long lead-time (Weyn et al., 2019). Recently, H.  presented the spatial and temporal (monthly) correlation in the monitoring network for regional ozone prediction. Furthermore, Zhan et al. (2018) used the random forest (RF) and Kriging approaches to interpolate the monitoring data on uniform grids and then investigated the spatial and temporal (daily) relationship of ozone concentrations. These models presented in (H. Zhan et al., 2018) are strongly dependent on the availability (locations and period) of monitoring data. The predictive accuracy is usually restricted due to a lack of sufficiently dense observations.
In this study, a hybrid physical-informed deep learning model is proposed for real-time ozone forecasting with a high spatial resolution. In hybrid deep learning modelling, the training and validation datasets originate from the reanalysis data which optimally combines the numerical solutions from the physical air quality model (here, the Nested Air Quality Prediction Modelling System -NAQPMS) and observed data in monitoring stations. Physical modelling results from the governing equations can be used to improve understanding of complex ozone transport processes and provide the details of the spatial distribution of ozone concentrations, while monitoring data can be used for correcting physical modelling results, thus improving predictive accuracy.
A newly developed generative adversarial network (GAN) model (Cheng et al., 2020) is introduced for spatial-temporal (hourly/daily) ozone forecasting in this work. GANs proposed by Goodfellow et al. (2014), have been attractive for producing high-resolution samples (e.g., images). Various versions of GANs have been applied for generating different types of high-dimensional data, such as geostatistical maps (Laloy et al., 2018), seismic waves (Li et al., 2018), weather maps (Gagne et al., 2020), total electron content map (Chen et al., 2019). GANs have the capability of learning hierarchical feature representations in various image analysis problems (Zhong et al., 2019). Recently, Cheng et al. (2020) first introduced a variational autoencoder (VAE) into GAN to explore spatial and temporal flow dynamics with a reduction of the computational cost. The original GAN consists of a generative model (generator) and a discriminative model (discriminator). The generator and discriminator contest through an adversarial process to optimize the generative model. Here, the VAE-GAN is a game theoretic framework that involves three modules, an encoder, a decoder and a discriminator (Makhzani et al., 2015). The role of the encoder is to transform real and fake samples in a high-dimensional space into a low-dimensional latent space. The low-dimensional representations, several orders of magnitude smaller than the dimensional size of the original datasets, are then applied to the adversarial network for representation learning and parameter optimization, thus alleviating computational burdens (Cheng et al., 2020). After that, the decoder continues to transfer the 3 of 26 latent states to high-dimensional samples. In essence, the VAE-GAN model includes a self-training generative and discriminative mechanism, providing an attractive way for learning the real sample data distribution in the absence of any prior physical or extensive statistical knowledge.
In this work, we have developed a spatio-temporal VAE-GAN for hourly and daily ozone forecasting. We use convolutional networks in the VAE-GAN model to improve its scalability for large data-driven modelling and implement a recursively forecasting strategy in the model architecture for achieving the long lead-time forecast. Overall, the VAE-GAN model developed in this study can: (a) learn and classify the dynamic ozone features using high-dimensional datasets; (b) decipher the complex nonlinear relationship between the present states and future events among all training samples; (c) provide accurate hourly and daily ozone forecast in a long lead-time beyond the training period. By using high-resolution reanalysis data, we show that the VAE-GAN can represent both the spatial and temporal variability of ozone concentration. To our best knowledge, this is the first proposal of the VAE-GAN for hourly and daily ozone forecasting using a combination of dynamic learning and real-time forecasting strategy.
In this work, the VAE-GAN model has been applied to hourly and daily forecasting of ozone concentrations with a high spatial resolution (15 × 15 km) across China. The datasets are collected from the air quality reanalysis datasets during 2013-2019. The accuracy of the VAE-GAN model has been evaluated by comparing the results with those obtained from both the physical air quality model (NAQPMS) and observations at monitoring stations. The remainder of this paper is organized as follows. Section 2 provides the details of datasets and methods used in this study, the VAE-GAN architecture and forecasting workflow. This is followed by Section 3 with model performance analysis of ozone forecasting capabilities in spatial and temporal spaces. Finally in Section 4, conclusions are presented.

Data and Methods
Ozone forecasting involves nonlinear, strong coupling and heterogeneous problems (Su et al., 2020). As investigated by Zhan et al. (2018), the key ozone-related meteorological variables are the temperature, relative humidity, wind speed, evaporation and sunshine duration. Among these meteorological conditions, the daytime surface temperature is an important driver of ozone episodes. A high temperature can accelerate the chemical kinetic reaction and emissions of the natural components of ozone increase (Bloomer et al., 2009;Gu et al., 2020;Yin et al., 2019). This work thus focuses on developing an ozone forecasting model depending on temperature. The impact of other meteorological factors (such as humidity and wind speed) is also explored in Section 3.4.
Given a series of ozone observations {O t−p , …, O t } (where p is the length of historical time steps) and the temperature data T t+1 at the predictive time level (t + 1), an one-step-ahead ozone forecast can be expressed mathematically as:̂+ where , M k is the number of points in the study area) represents the predictive zone concentration at time level (t + 1) and  is a forecasting model. In this work, the VAE-GANbased ozone forecasting model is selected as the forecasting model  . A schematic diagram of the proposed VAE-GAN framework is presented in Figure 1.

Reanalysis Data
Ground-level ozone has become a severe pollutant in major urban areas of China . Our objective is to develop an efficient and accurate machine learning tool for providing an interactive ozone map over China, which shows real-time (hourly/daily) ozone concentrations with a high spatial resolution across China. However, there are only 1436 monitoring stations over China, which are sparse and heterogeneous. It is obvious that the monitoring data is not enough to represent the spatial distribution of ozone on a high spatial scale. Therefore, we select the 7-year Chinese air quality reanalysis (CAQRA) data set for our training and validation purposes (Kong et al., 2021). The CAQRA dataset from 2013 to 2019 was produced by assimilating surface observations at 1436 monitoring stations into the Nested Air Quality Prediction Modeling System (NAQPMS).

of 26
Such combination could be the optimal way by which physical modeling results from the NAQPMS can provide dynamic understanding from the governing equations, while monitoring data may find unexpected patterns not provided by physical modeling.
The CAQRA dataset contains six pollutants (PM 2.5 , PM 10 , SO 2 , NO 2 , CO and ozone) and meteorological factors (temperature, humidity and wind speed) at the high spatial (15 × 15 km) and temporal (hourly) resolution, which are publicly accessible at Science Data Bank (ScienceDB). Each ozone map or meteorological map across China has a high spatial resolution with a total node number of 47,232. The ozone concentration over China varies in both spatial and temporal spaces. For example, as shown in Figure 2, it can be noted that the distributions of average hourly ozone concentration in June and December of 2018 are heterogeneous, exhibiting significant seasonal variations. This can be explained by the fact that ozone production is tightly correlated with meteorological Figure 1. Schematic of real-time ozone forecasting. The architecture consists of two processes: Dynamic learning and realtime forecasting. In real-time forecasting process, one-step ozone forecast can be obtained by a direct prediction for one-step (hourly/daily) lead-time. To get forecasts for a long lead-time, a recursive strategy is adopted, which uses the last one-step ozone forecast as a new input to predict the next-step ozone concentration.

of 26
conditions, such as ambient temperature, sunshine radiation and humidity, causing the variation of emitted intensity of atmospheric pollutants in different seasons (Bloomer et al., 2010). The high ozone concentration in June is closely related to the enhanced photochemistry reaction due to high temperature and strong solar radiance in Summer (Kong et al., 2021). To demonstrate the performance of our proposed model with seasonal variations, we choose the months of June and December as representing Summer and Winter while exhibiting hourly spatio-temporal ozone forecasts in the following Section 3.1.

Monitoring and Physical Modeling Data
The objective of this study is to develop a machine learning-based ozone forecasting model, which will be accurate and more efficient than existing air quality models. For comparison purposes, the NAQPMS (commonly used for real-time pollution forecasting in China, here referred to as the physical model) is chosen. The NAQPMS model is driven by the hourly meteorological fields produced by the Weather Research and Forecasting (WRF) model, and the aqueous-phase chemistry and wet deposition are simulated based on the Regional Acid Deposition Model (RADM) mechanism in the Community Multi-scale Air Quality (CMAQ) model. The emissions of air pollutants considered in the NAQPMS model include the monthly anthropogenic, biomass burning, biogenic volatile organic compound (BVOC), marine VOC, soil NO x and lightning NO x emissions. More details of the NAQPMS configuration are introduced in Kong et al. (2021). The hourly physical modeling dataset at the high spatial (15 × 15 km) and temporal (hourly) resolutions is generated from the NAQPMS in June of 2018, which has the same spatial and temporal resolutions as the reanalysis data (CAQRA data set). We compare the VAE-GAN results with those from the NAQPMS, where the monitoring measurements are used as the reference data. Hourly monitoring of ozone data is published by the China National Environmental Monitoring Centre (CNEMC). The hourly ozone observations are collected from 1436 stations in June of 2018 (Kong et al., 2021).

Data Processing
During the training and validation processes, the input and output datasets are pre-processed by a Standard Scaler normalization method (Buitinck et al., 2013). The ozone concentration and meteorological factors are scaled to a 6 of 26 specific interval prior to training the VAE-GAN model. In this study, the description of the training and validation datasets in hourly and daily ozone forecasting is provided in Table 1.
As shown in Table 1, we select the reanalysis datasets in 2017 for model training and 2018 for validation in hourly ozone forecasting (in Section 3.1 and Section 3.2). During the validation period, the reanalysis data is used for initializing ozone forecasting. We compare the forecasting results with the reanalysis data, monitoring observations and results from the physical model-NAQPMS in the six regions of whole China (North, Northeast, Southwest, Southeast, Northwest, and Central) as well as highly populated regions (Jing-Jin-Ji (JJJ) Yangtze River Delta (YRD) and Pearl River Delta; in Section 3.1 and Section 3.2). In daily ozone forecasting, we use the reanalysis data from the year 2013-2017 for model training and the reanalysis data from the year 2018-2019 for model validation (in Section 3.3 and Section 3.4).

Dynamic Learning Algorithm: VAE-GAN
Given a large historical dataset = { − , … , } ∈ ℜ ( +1)× (where p + 1 is the historical time length and M k is the number of nodes) over the whole China area, it is challenging to efficiently predict the ozone concentration in spatial and temporal spaces. Here the VAE machine learning technique, used in a wide variety of applications (Gonzalez & Balajewicz, 2018), is introduced with the scope of dimensionality reduction. The architecture of VAE (including an encoder  and a decoder  ) in the VAE-GAN model is used to identify low-dimensional representations of ozone concentration maps as they evolve in time. In comparison to traditional reduced order models (ROMs) using proper orthogonal decomposition (POD; Xiao et al., 2019), the VAE has a nonlinear activation function for its nonlinearly weighted input, thus better representing the nonlinear physical processes (Cheng et al., 2020).
A GAN consists of two modules: a generator  and a discriminator  . The generator is designed to produce solutions from a random dataset while the discriminator is tasked to discriminate the real samples from the generated solutions. The VAE-GAN is a probabilistic autoencoder that uses the GAN framework as a variational inference algorithm (Makhzani et al., 2015). In the hybrid VAE-GAN (as shown in the middle column of Figure 1), the inputs μ and the real solutions O d (the targeted outputs) are fed into the encoder  , which transforms the high-dimensional datasets into low-dimensional representations ζ and ζ d respectively. The discriminator  then discriminates the low-dimensional representations ζ and ζ d in an efficient way. The latent states ζ d and ζ are then transformed into the reconstructed samples ̂ (the reconstructed outputs) and real solutions ̂ (the predicted outputs) respectively in the decoder  .
For the encoder, we use a deep CNN where the model inputs are the meteorological conditions (temperature T t+1 at the predictive time level) and historical ozone dataset O while the targeted outputs are the ozone concentrations O t+1 at the predictive time level. The CNNs exhibit superior performance in extracting the high nonlinearity and chaotic nature from the inputs with a two-dimensional structure (Bolton & Zanna, 2019). The encoder has four convolutional layers followed by one densely connected layer. Each hidden layer is followed by a Leaky rectified linear unit (Leaky ReLU) activation function to extract feature maps from inputs. To avoid dimensionality loss of the edges of images, the padding operation is applied in convolutional layers. After the hidden layers, the fully connected layer is applied to further reduce the dimension of features from its previous layer and find the most task-relevant features for inference. This hierarchical convolutional feature learning acts as nonlinear model reduction, which finally transfers the inputs μ and targeted output O d into the low dimensional latent states ζ  and ζ d respectively. It is shown in Table 2 that the size of datasets on ozone spatial distributions is reduced from 47,232 to 264 in the latent space, that is, a decrease of three orders of magnitude.
The discriminator is then used for distinguishing the real latent state ζ d from the generated latent state ζ at the reduced space. The latent states ζ and ζ d are processed using five hidden layers, each using a densely connected layer with a Leaky ReLU activation function. The output layer is densely connected to the final hidden layer with a sigmoid activation function (Lee & You, 2019;Mosser et al., 2017), to yield a number between 0 and 1 representing the probability that the ζ is a fake sample created from inputs μ (as opposed to a targeted output O d ).
For the decoder, it is a reverse process for reconstructing the latent states ζ and ζ d to the high-dimensional states ̂ and ̂ . The decoder has four hidden layers including one densely connected layer and three convolutional layers, each layer followed by a Leaky ReLU activation function. The final layer uses a tanh activation (Lee & You, 2019;Mosser et al., 2017) with outputs ̂ and ̂ shown in Table 2.
Now the question is how can we train a forecasting model? The VAE-GAN is intuitively similar to VAEs, with the key difference that the VAE-GAN replaces the Kullback-Leibler divergence penalty of VAEs with the adversarial loss described below (Makhzani et al., 2015). The training process involves the minimization of reconstruction loss between the reconstructed outputs ̂ and targeted ones O d , as well as the maximization of adversarial loss in distinguishing between the latent states ζ and ζ d . The reconstruction loss jointing the encoder and the decoder is expressed as: and the adversarial loss jointing the encoder, decoder and discriminator is calculated as: where θ, ψ, ϕ represent the parameters in the encoder  , discriminator  and decoder  , respectively.
Typically, the VAE-GAN uses the hybrid loss ( +  ) as the objective minimization-maximization function as follows: the model can be minimized and maximized by stochastic gradient descent with backpropagation. This means that the loss function derivation is propagated throughout the network using the chain rule to update the model parameters θ, ϕ, ψ. In this work, we choose the adaptive moment estimation (Adam) as an adaptive stochastic gradient descent (SGD) algorithm, which has been proven to be efficient for different types of deep networks (Kingma & Ba, 2014). Once the training process is completed, the model parameters (including weights and bias) are saved for real-time forecasting.

Real-Time Forecasting
The multi-step-ahead time series forecasting can be described as an estimation of future time series {O t+1 , …, O t + H }. Referring to the work of Cheng et al. (2021), the multi-step-ahead ozone forecasting process can be summarized as consisting of the following algorithmic steps: 1. The forecasting model of VAE-GAN is trained to perform a new one-step-ahead forecast ̂+ 1 during the predictive period (t + 1 ∈ (t f , t N )).

Full connected, Sigmoid
Output (1) 3. Using the one-step predicted ̂+ 1 and temperature T t+2 as new inputs, to forecast the next-step-ahead ozone concentration at predictive time level (t + 2) by the trained model. 4. Repeating steps (1-3) for H times at the multi-step forecasting strategy as shown in Figure 1. 5. Finally, the length-H timesteps of forecasts at point k (k ∈ [0, M k ]) are obtained as: .
we use the Python library Keras (Gulli & Pal, 2017) with the TensorFlow 1.1.0 (Abadi, 2016) backend for all experiments of the proposed model. The performance of the VAE-GAN model developed in this study is assessed using the root mean squared error (RMSE), coefficient of correlation R, normalized mean bias (NMB) and normalized mean error (NME) (Emery et al., 2017) which are defined as follows: where the subscript j represents the pairing of N targeted ozone concentration O d and predictions ̂ by points ([0, M k ]) and time ([t f , t N ]), and the overbars signify means over points and/or time.

Hourly Ozone Forecasting Analysis Over China in Summer and Winter
In this section, we demonstrate the trained VAE-GAN model has the capability of capturing the details of the spatial distribution of ozone concentrations over China. For this purpose, the post-processed reanalysis dataset with a high spatial resolution (15 × 15 km) is thus chosen as the 'true' reference during the validation period (see Table 1). The trained VAE-GAN is used for hourly ozone forecasting in June and December 2018. A comparison between the predicted ozone concentrations and reanalysis data is provided in Section 3.1.1, as well as the corresponding error estimation is carried out in Section 3.1.2.

Spatio-Temporal Hourly Forecasting of Ozone Concentration
The spatial distribution of hourly ozone concentrations in June and December in 2018 is shown in Figure 3. It can be seen that the VAE-GAN model captures a large part of the spatial distribution of ozone concentration at leadtimes t = 30, 60 hr in June and December, respectively. It is noted that the ozone concentration in June is higher than that in December. This can be explained by the fact that a high temperature leads to increased convection, 9 of 26 unstable atmosphere stratification, and increased precipitation, which are beneficial to the diffusion and deposition of ozone (Yan et al., 2021). By contrast, the low temperature in December causes a surface inversion, and the height of the atmospheric mixed layer is low, which is not conductive to vertical convection. Compared to the reanalysis data, in June, the predicted ozone concentration is somewhat underestimated over the central of China at different lead-times, while slightly overestimated over the North-east and South-east area close to the North Pacific Ocean.
The last column of Figure 3 shows the residual maps between the ozone distributions predicted by the VAE-GAN and reanalysis data. Table 3 indicates that the mean residual varies from −12.4 μg/m 3 (June) and −6.91 μg/m 3 (December) at the lead-time of t = 30 hr to −5.88 μg/m 3 (June) and −1.34 μg/m 3 (December) at the lead-time of t = 60 hr respectively. It is found that 10% and 90% residuals in June are higher than those in December, that is, the predicted results using the VAE-GAN have lower accuracy in June in comparison to that in December. This may be caused by the complex and high ozone heterogeneity in the spatial distribution in June. Overall, 10 of 26 the trained VAE-GAN has captured the spatial characteristics of ozone concentrations for given new inputs at a lead-time beyond the training period. Although a discrepancy between the predicted ozone concentrations and reanalysis data, the VAE-GAN is able to provide the same level of accuracy as that of the physical model-NAQPMS (see Section 3.2). To improve the predicted accuracy during the high ozone episode (≥160 μg/m 3 ), the convolutional long short-term memory (ConvLSTM) approach is introduced to the VAE-GAN model. For details, see Appendix A.

Error Analysis of Hourly Ozone Forecasting
Further error analysis of hourly ozone forecasting is carried out through the RMSE and correlation coefficient. The temporal variations of spatial-averaged RMSEs and correlation coefficient of hourly predictive results from the VAE-GAN are shown in Figure 4. It can be noticed that the spatial-averaged RMSE values in June and December in 2018 are generally around 22 μg/m 3 and 13 μg/m 3 during the entire forecasting period. The corresponding correlation coefficient values are shown in Figure 4b. We can see that the ozone concentration predicted by the VAE-GAN model shows a good correlation with the reanalysis data over a long lead-time. Most of correlation  coefficients are around or beyond 0.6 during the forecasting period in June. In addition, it is worth noting that the magnitude of correlation coefficients in December is close to 0.9. This indicates that the VAE-GAN performs better in hourly ozone forecasting in December. Figure 5 illustrates the spatial distribution of temporal-averaged RMSEs. It can be noted that most of the temporal-averaged RMSEs are below 20 μg/m 3 in June and 16 μg/m 3 in December, except for the North-east areas where the largest errors exist due to complex factors including human activities, meteorology and industries influencing the ozone concentration. From the cumulative distribution of temporal-averaged RMSEs in both months (Figure 5c), it is shown that the temporal-averaged RMSEs of 80% points are smaller than 25 μg/m 3 . In addition, it is clear that the mean values of temporal-averaged RMSEs are 17.23 μg/m 3 and 9.88 μg/m 3 for June and December respectively.

Comparison of Hourly Ozone Forecasting Between VAE-GAN and the Physical Model-NAQPMS
In this section, to further demonstrate the forecasting performance of the VAE-GAN, the predictive accuracy of the VAE-GAN is evaluated with an existing air quality physical model-NAQPMS. The comparison of ozone results obtained from the VAE-GAN and NAQPMS has been undertaken through (a) the spatial distribution and temporal variation of ozone concentrations over China (Section 3.2.1), where the reanalysis data is considered as the 'true' reference; and (b) ozone concentration at monitoring stations (Section 3.2.2), where the monitoring measurement is used as the 'true' reference.

Spatio-Temporal Distribution of Ozone Forecasting Compared to Physical Modeling
The spatial distribution of hourly ozone forecasting results obtained from the VAE-GAN and physical model-NAQPMS at lead-times t = 40, 60 hr is plotted in Figure 6. We can see that the VAE-GAN and physical model-NAQPMS exhibit almost the same level of acceptable performance for hourly ozone forecasting. Compared with the physical model-NAQPMS, the VAE-GAN can better capture the spatial characteristics of high ozone concentrations in densely populated areas in China.
From Figure 6, we note that the spatio-temporal distributions of ozone concentrations have a high spatial heterogeneity across China. For further analysis of temporal variation, the whole country is thus divided into six  Figure 7). Figure 7 shows the spatial-averaged RMSEs of predicted results in six regions during the predictive period of June in 2018. It is noted that the predicted results from both the physical model and VAE-GAN have a low value of RMSE (below 30 μg/m 3 ) in the NW region. The highest RMSE of predicted ozone concentration occurs in the NCP region (Figure 7c), which is consistent with the findings in Zhan et al. (2018). The spatial heterogeneity of ozone concentrations in summer is associated with the spatial pattern of humidity conditions. The drier condition in the NCP region leads to a high-level accumulation of ambient ozone. The predicted ozone concentrations from both the physical model-NAQPMS and VAE-GAN are underestimated in the NCP, thus resulting in the highest RMSE. However, the VAE-GAN model generally performs better than the physical model-NAQPMS in the NCP region, which can be seen in Figure 6. In other regions (including NE, Central, SE and SW regions), the performance of the VAE-GAN model is comparable with the physical model-NAQPMS.

VAE-GAN-Based Ozone Forecasting Compared to Physical Modeling and Observations at Monitoring Stations
The performance of the VAE-GAN in spatio-temporal forecasting is further evaluated by comparing the predicted ozone concentrations with those from the physical model-NAQPMS and the observations at monitoring stations.
Here we choose three study regions in China: JJJ, YRD and Pearl River Delta region (PRD; shown in Figure 8), which are densely populated regions in China. Figure 8 shows the time series of hourly predicted and monitoring ozone concentrations during 22-30 June of 2018 in the major cities located in the three regions. It is noted that the ozone concentration is gradually increasing in the morning and reaches a peak at noon, and has a low value during the night time. This can be explained as follows. The formation of ozone is involved in the processes of photochemical reactions, advection, entrainment, and deposition. The entrainment of ozone into the atmospheric boundary layer greatly contributes to the increase of ozone concentration at day time (Freire et al., 2017), while We can see that the predicted ozone by the VAE-GAN model is largely in good agreement with observations during the predictive period (22-30 June in 2018). It is noticed the VAE-GAN performs better than the physical model-NAQPMS in Shanghai, Nanjing, Hangzhou, Hefei in the YRD region. The discrepancy between the VAE-GAN predicted results and observations occurs when the peak values of ozone drastically increase at day time due to large uncertainties in traffic, emission inventory and meteorological conditions. To better capture the peak, the VAE-GAN-LSTM model is introduced in Appendix A. In Figure A1, the predicted ozone shows a reduction of underestimation in the ozone peaks compared to that using the VAE-GAN model, and exhibits a good match with the observations in cities with high ozone concentrations, such as Beijing, Tianjin, Shijiazhuang, Nanjing and Hefei.
The performance of the machine learning-based modeling and the physical model has further been evaluated using the statistical error indicators in Equation 5 (Emery et al., 2017) and the corresponding results are shown in Table 4. The ozone concentrations predicted by the VAE-based models mostly have a smaller value of RMSE,  Kong et al. (2021)). The reanalysis data is used as 'true' reference.
14 of 26 NMB and NME than that from the physical model-NAQPMS in most of regions in China. The NMB and NME values of predicted results in the JJJ and YRD regions largely lie within ±22% and 25% respectively for the VAE-GAN and VAE-GAN-LSTM models, while those are beyond ±35% (NMB) and 35% (NME) for the physical model-NAQPMS. The performance of the VAE-GAN model is thus comparable or better than that of the physical model-NAQPMS in the JJJ and YRD regions. In the PRD region, the zone concentration mostly has a low value of <100 μg/m 3 compared to that in the JJJ and YRD regions. The variation of ozone in the PRD region is linked to summer monsoon (Shao et al., 2009) which has not currently been considered in the machine learning-based models, but in physical modeling. This can explain why the performance of the physical model is better than the VAN-based models in the PRD region.
Generally speaking, the VAE-GAN model developed in this study is relatively reliable when used for ozone forecasting during a long lead-time. Our results suggest that the well-trained VAE-GAN has captured the spatio-temporal characteristics of the ozone concentration variation. In comparison to the physical model-NAQPMS, the VAE-GAN model can provide a high efficient ozone forecast while the same level of accuracy is achieved. This highlights the potential of the VAE-GAN model to be applied in conjunction with many real-time applications, for example, early warning for air pollution emergencies.

Daily Ozone Forecasting Analysis Over China
In    Figure 9 shows the temporal variation of spatial-average daily ozone concentrations over the JJJ, YRD, and PRD regions during the 2018 predictive period. As shown in Figure 9, it can be noticed that the predicted ozone concentration exhibits good agreement with the reanalysis data in the JJJ and YPR regions. In the PRD region, the VAE-GAN model captures the daily variability, but overestimates the ozone in the summer. This could be explained by the responses between the meteorological conditions and ozone. In the regions of JJJ and YRD, the daily variation of ozone concentration shows the seasonal variation (an inverse U-shaped trend, which is highly related to the temperature; Yang et al., 2020), where the highest and lowest ozone concentrations occur in Summer and Winter respectively. However, the trend of the ozone seasonal variation is weak in South China, including the PRD region. This is due to the warmer temperatures throughout the year and the Asian Summer monsoon, which brings cloudy and raining weather, marine air and strong deep convection and other unfavorable factors for ozone formation and accumulation (Lu et al., 2018;Ma & Yin, 2021;Yang et al., 2020). It is a challenge in existing air quality models when taking into account these complex meteorological factors. This can be seen in Table 4 that a poor correlation-prevails (average 40%) between the predicted results and monitoring data in the PRD exists for both the proposed VAE-GAN and physical model-NAQPMS models. Coupling of the complex meteorological conditions (e.g., monsoon) into the VAE-GAN training process will be explored to improve predictive accuracy in future studies. Figure 10 shows the spatial distribution of temporal-average daily ozone concentrations over the predictive period (the whole year of 2018). The predicted results from the VAE-GAN are generally comparable to the reanalysis data in JJJ, YRD and PRD regions. It can be seen that the VAE-GAN model captures a large part of the spatial distribution of ozone concentrations. To further evaluate the VAE-GAN results, the relative errors of the predicted results over the three regions are presented in Figure 11. Reasonable predictive accuracy is largely achieved with a low relative error of around ±10%. Overall, the VAE-GAN model skilfully captures the spatial and temporal variation of ozone concentration during the predictive period.

Impact of Meteorological Factors
In our study cases described above, the temperature is considered as one of model inputs since it is the top ozone-related meteorological condition. As known, other meteorological factors (relative humidity and wind speed) are also important in ozone formation and accumulation. For example, low humidity causes a reduction of cloud cover which contributes to photochemical reaction . A high wind speed has a negative impact on ozone concentration by transporting and cleaning air pollutants out of the study areas (Z. B. . In this section, two deep learning models (

of 26
The spatial-averaged RMSEs of predicted ozone concentrations are plotted in Figure 12a while the predicted daily ozone concentrations are shown in Figure 12b. It is shown in Figure 12b that without the introduction of the meteorological factors, both of the VAE-GAN and DCGAN models fail to capture the temporal variation of daily ozone concentrations from the year 2018 and 2019.
In DCGAN modeling, it is clearly seen in Figure 12a that the more meteorological factors are included in the model inputs, the smaller the spatial-averaged RMSEs of daily ozone results. The DCGAN reaches its best performance when the impact of temperature, relative humidity and wind speed is considered.
In VAE-GAN modeling, the introduction of temperature plays a crucial role in improving the accuracy of ozone forecasting. The VAE-GAN with the consideration of temperature performs best among all the study cases. The VAE-GAN achieves the lowest RMSE value of 23 μg/m 3 in daily ozone concentration prediction in the JJJ region from the year 2018-2019. In Figure 12b, it can be noted that by comparing with the reanalysis data, the VAE-GAN model can better represent the peaks of ozone concentrations than the DCGAN model. However, it is also noticed that in contrast to the DCGAN model, the RMSE of predicted results is decreased when more meteorological factors (relative humidity and wind speed) are considered in the model inputs. This is due to the limitation of the VAE-GAN model. For multiple variable inputs, the quality of generated outputs suffers from the generator (decoder  ) attempting to cover all data manifolds in the data space, thus leading to a poor model performance (Pandeva & Schubert, 2019). Coupling of multiple meteorological inputs with the VAE-GAN will be investigated in our future work.
Overall, the temperature or other meteorological factors play a dominant role in improving the accuracy of ozone forecasting, for example, reducing uncertainties and accumulative error with every step of ozone forecasting.

Conclusions
In this study, a machine learning-based forecasting framework is implemented for efficient and accurate real-time (hourly and daily) ozone forecasting in China. A hybrid deep learning VAE-GAN approach is proposed to better explore the spatial and temporal features of ozone concentrations, where GAN has the capability of learning hierarchical feature representations of ozone concentrations while VAE is used for dimensionality reduction and avoiding model collapse. The hybrid VAE-GAN model consists of both adversarial training and real-time forecasting processes, which narrow the gaps between the generative and targeted ozone fields from the past to future states and provide accurate ozone forecasts for a long lead-time beyond the training period in a recursive way.
In our hybrid VAE-GAN ozone modeling, the training and validation datasets consist of hourly reanalysis datasets at a high spatial resolution of 15 × 15 km across China from 2013 to 2019. By using VAE, the high-dimensional size of input datasets on ozone and temperature spatial distributions is reduced by three orders of magnitude. It is evident that the VAE-GAN is an efficient tool for large data-driven predictions. Our results illustrate that the VAE-GAN model has captured the spatial and temporal evolution patterns of ozone concentrations during the predictive period of 2018 and 2019 in comparison to the reanalysis data, observations and physical model-NAQPMS. The VAE-GAN model can provide a high efficient ozone forecast while the same level of accuracy is achieved compared to the physical model-NAQPMS. However, it is also noticed that a low predictive accuracy of ozone concentrations occurs in some local regions where the daily variation of ozone concentrations exhibits volatility. In future studies, it will be necessary to develop a more comprehensive forecasting scheme to improve the predictive accuracy by combining the VAE-GAN with data assimilation techniques for uncertainty reduction.

Appendix A: Model Comparison
In the section, the performance of our VAE-GAN model is evaluated against a VAE-GAN-LSTM model, which introduces the convolutional long short-term memory (ConvLSTM) layer (Xingjian et al., 2015) into the encoder and decoder of the VAE-GAN. The comparative plots of hourly ozone forecasting results obtained from the VAE-GAN and VAE-GAN-LSTM models during the predictive period of June in 2018 are shown in Figure A1. It can be observed in Figure A1(a) that a relatively good agreement between the observed and forecasting hourly ozone concentration is achieved, while the arrival time of forecasting ozone peaks is slightly delayed. We can see that during the validation period of days 22-26 in June 2018, the VAE-GAN-LSTM model can better predict the ozone peaks than the VAE-GAN model, where the reanalysis data is considered as the reference. However, the VAE-GAN-LSTM model fails to capture the low ozone concentration at night, thus having higher the RMSEs than that of the VAE-GAN model in Figure A1(b). This situation also reflects on the spatial distribution of ozone forecasts as shown in Figure A2. Generally speaking, the use of LSTM in the VAE-GAN can well capture the high ozone concentration although the RMSE of results from the VAE-GAN-LSTM is slightly higher than that of the VAE-GAN. It is thus suggested that the VAE-GAN-LSTM model has the potential for hourly high ozone forecasting in China. NO 2 is the primary element involved in the process of forming ozone. When sunlight interacts with NO 2 , the gas molecules break down and release monoatomic oxygen (O) atoms which then react to diatomic oxygen (O 2 ) molecules and form the trioxygen molecule known as ozone (Ezimand & Kakroodi, 2019). Here the impacts of the precursors (here, NO 2 ) on ozone concentrations have been explored using the proposed VAE-GAN model.

B1. Hourly Ozone Forecasting Using NO 2 From Reanalysis Datasets
In real-time ozone forecasting in realistic scenarios, both NO 2 and ozone are the unknown variables (outputs) to be predicted/solved using either the air quality model or the AI-based model (here VAE-GAN). Thus it is not suitable to consider NO 2 as the model input in real-time ozone forecasting. Even so, it is worth exploring the relationship between NO 2 and ozone using the VAE-GAN model based on history ozone and NO 2 datasets, where the NOx concentration is the model input while the ozone concentration is the model output.
As introduced in Section 2.1.1, the hourly NO 2 and ozone datasets are collected from the reanalysis CAQRA datasets. We used the hourly datasets in 2017 for training, and 2018 for validation. The spatial performance of the VAE-GAN model for hourly ozone forecasting in June and December of 2018 is shown in Figure B1. It can be seen that the VAE-GAN model captures a large part of the spatial distribution of ozone concentration at lead-times t = 30, 60hr in December. The last row of Figure B1 shows the residual maps between ozone maps predicted by the VAE-GAN and reanalysis data. In June, the predicted ozone concentration is slightly overestimated over the south of China at different lead-times, while slightly underestimated over the NCP region.
The RMSE and correlation coefficient of predictive results from the VAE-GAN are shown in Figure B2. It can be noticed that the RMSE values in June and December in 2018 are generally below 17 μg/m 3 during the whole  Figure B2(b). We note that in particular, models which accurately capture the regime behavior will also show good correlation statistics when averaged over a long lead-time. Most of the correlation coefficients are around or beyond 0.8 during the forecasting period. It is worth noting that the magnitude of the correlation coefficient in December is close to 0.9. This means the VAE-GAN performs better in hourly ozone forecasting in December.

B2. Impact of NO 2 Emissions on Ozone Forecasting
In realistic scenarios, the local NOx emission from massive anthropogenic emissions, industries, or transportation is another factor impacting ozone forecasting. In this work, the NOx and VOC emissions were considered as inputs, but removed during the selection of driving variables. There are two reasons for this. First, the available NOx emission data (source) are averaged monthly or daily. The daily variation of NOx emission remains the same during the same month. The NOx emissions are constant at a daily temporal scale. Thus, the hourly temporal variation of ozone concentrations will not be affected by a constant NOx emission during the training process. However, the emission of NOx varies over space, for example, a high emission in densely populated cities. The impact of the spatial variation of NOx emission is implicitly considered during the training process. The reason for this is that the training datasets (the reanalysis datasets) combine the physical modeling results, where NOx and VOC emissions have been considered in NAQPMS model (Kong et al., 2021) and observations. We speculate the predictive results can reflect the contribution of other driven factors (including NOx and VOC) on ozone concentration. To validate this hypothesis, we have undertaken one modeling experiment, where the average temperature (27°C ± 3°C) in June is used as inputs everywhere over China. Can we expect the VAE-GAN models are able to predict the spatial distribution of ozone concentrations over China? The right column in Figure B3 shows the ozone concentration variation where the input temperature is around 300K (27°C ± 3°C). We can see that the predictive ozone varies in space with the same temperature as model inputs. It is noted that the ozone variation over the regions can match well with the NO 2 emission (right column in Figure B3) in June, a high ozone concentration in the densely populated cities, mainly in the east of China. Therefore, the predictive ozone Second, in ozone forecasting, the importance of NOx and NMVOC emissions is lower than expected although the ozone formation is associated with NOx emissions. Zhan et al. (2018) pointed out that this may be due to regional transport or titration effects. NOx and NMVOC may be transported from urban to rural areas via advection, causing the location of ozone formation to occur some distance away from the emission source. NOx titration tends to counteract the importance of NOx emissions to ozone levels. In the work of , they also found that ozone precursors such as NO 2 and satellite-based formaldehyde (HCHO) observations were found to play relatively weak roles in modulating ozone variations, though these factors were crucial to the ozone formation. In addition, Zhan et al. (2018) found that the inclusion of NOx and NMVOC into the machine learning model can introduce a noise which tends to be higher than the gained information, which is consistent with what occurred in our study.
Finally, we would like to further discuss the importance of the choice of driving variables in ozone forecasting. For machine learning-based ozone forecasting at stations, the model performance may be improved with increasing inputs (Su et al., 2020;Yafouz et al., 2021). In contrast, for spatio-temporal ozone forecasting, with the increase of the number of driving factors, the number of modeling parameters to be optimially estimated will significantly increase, especially on a large spatial scale. It would thus compromise all the drivers during the training process, thus weakening the relationship between the most important driver (e.g., temperature) and ozone concentrations. It is therefore suggested that only key-driven variables should be chosen for model prediction.