Enhancing Quantitative Precipitation Estimation of NWP Model With Fundamental Meteorological Variables and Transformer Based Deep Learning Model

Quantitative precipitation forecasting in numerical weather prediction (NWP) models is contingent upon physicals parameterization schemes. However, uncertainties abound due to limited knowledge of the precipitating processes, leading to degraded forecasting skills. In light of this, our study explores the application of a Swin‐Transformer based deep learning (DL) model as a supplementary tool for enhancing the mapping trajectory between the NWP fundamental variables and the most downstream variable precipitation. Constrained by the observational satellite precipitation product from NOAA CPC Morphing Technique (CMORPH), the DL model serves as the post‐processing tool that can better resolve the precipitation patterns compared to solely based on NWP estimation. Compared to the baseline Weather Research and Forecasting simulation, the DL post‐processing effectively extracts features over meteorological variables, leading to improved precipitation skill scores of 21.7%, 60.5%, and 45.5% for light rain, moderate rain, and heavy rain, respectively, on an hourly basis. We also evaluate two case studies under different driven synoptic conditions and show promising results in estimating heavy precipitation during strong convective precipitation events. Overall, the proposed DL model can provide a vital reference for capturing precipitation‐triggering mechanisms and enhancing precipitation forecasting skills. Additionally, we discuss the sensitivities of the fundamental meteorological variables used in this study, training strategies, and performance limitations.


Introduction
Accurate quantitatively forecasting precipitation is essential for future planning and very helpful for minimizing human lives and property damage beforehand of extreme events, especially under the current rapidly changing climate.Numerical weather prediction (NWP) models have been playing an increasingly important role in all operational centers and academia for understanding our earth system.It relies on discretizing a full set of governing equations including the Navier-Stokes equations, ideal gas law and thermodynamics and solving them numerically (Kalnay, 2003).With computing these prognostic equation sets over physical grids across different scales, the spatiotemporal evolving of the meteorological variables such as the temperature, wind speed and direction, air pressure and density are represented under the rotating earth coordinates.Building upon the enrichment of scientific knowledge in fundamental physics, and accelerated with the advances in technology such as computational power and numerous sources of observational data, NWP has shown quite a revolution over the past decades (Bauer et al., 2015).
As a result, the forecasting of fundamental meteorological variables under a so-called resolved scale of motion is readily available and more reliable in terms of its accuracy.However, many processes under the unresolved scales of motion also enter the equations, such as the moist processes involving condensation and evaporation, turbulence, convective activities and cloud microphysics, which are tightly related to precipitation formation and need to be parameterized to describe their relations with the states in resolved scales (Bauer et al., 2015).These parameterization schemes are generally based on the simplification and approximation of the physic laws to facilitate the numerical solutions, hence carefully chosen and sensitivity tests for the parameterization schemes will considerably affect the precipitation forecasting skills.Moreover, with an insufficient understanding of the underlying physics and some inherent uncertainties, using parameterization schemes will intrinsically bottleneck the further improvements of the performance for quantitatively estimating the precipitation (Zhou et al., 2022).
With the blossoming of artificial intelligence, many researchers have demonstrated the great ability of deep learning (DL) models in handling geoscience and remote sensing tasks, including precipitation estimation and forecasting.Shi et al. (2015) proposed and evaluated a series of spatial-temporal models dealing with precipitation nowcasting problems by extrapolating radar echoes and achieved better performance compared to traditional optical flow method (Shi et al., 2015(Shi et al., , 2017)).Ravuri et al. (2021) proposed to use a generative model with the stochastic method to extend nowcasting leading time without resorting to blurring.Sønderby et al. (2020) constructed a DL predictive model that uses satellite, radar and precipitation data and achieved a forecast leading time of 8 hr with a high spatiotemporal resolution and outperforms the High-Resolution Rapid Refresh in terms of its accuracy.Other than precipitation nowcasting tasks, machine learning tools are also commonly applied to satellite images for precipitation estimation.Tao et al. (2017) proposed a DL model to extract features from bispectral satellite infrared (IR) and water vapor (WV) channels for detecting rain areas.Chen et al. (2019) proposed a two-stage hybrid neural network to estimate precipitation using ground-based radar and satellite observations.Wang et al. (2021) proposed a transfer learning based method, which uses data-riched Continental US (CONUS) IR data set from the Geostationary Operational Environment Satellite for pre-training of the model, and then transferred to China through re-training with multi-band IR signals from Chinese Fangyuan (FY) satellite.Gao et al. (2022) used a U-Net model combined with the attention mechanism to directly retrieve precipitation maps using multi-band FY satellite images at a near real-time scale.
Many recent studies attempt to use data-driven models to directly perform the NWP tasks in favor of their computational efficiencies compared to state-of-the-art NWP models.These data-driven models are generally trained on climate model outputs, general circulation models (Chattopadhyay et al., 2020;Scher & Messori, 2019), or trained on reanalysis products such as ECMWF Reanalysis v5 (ERA5) data set (Rasp & Thuerey, 2021;Rasp et al., 2020).Dueben and Bauer (2018) presented a "toy model" to identify challenges and fundamental design choices for DL based forecasting systems.Arcomano et al. (2020)  Another prominent application of machine learning and DL techniques for NWP tasks is post-processing and bias correction.Grönquist et al. (2021) applied a convolutional neural network for bias correction of ensemble NWP predicted temperature field at various pressure levels, and achieved 14% improvement of ensemble forecast skill (CRPS) with a considerable reduction of computational cost owing to reduce the usage of trajectories.Taillardat and Mestre (2020), Li et al. (2022), andHess andBoers (2022) used machine learning and DL frameworks for post-processing quantitative precipitation forecasting results on the ensemble NWP models and achieved promising results on estimating heavy rainfall events located at long tails of the distribution curve.
With many machine learning methods have achieved remarkable results for nowcasting tasks or forecasting basic meteorological variables mentioned above.Directly mapping basic meteorological variables to precipitation amounts using a machine learning model instead of parameterization schemes has been rarely explored.Therefore, in this study, we aim to develop a DL method for extracting rainfall features, from basic meteorological variables including temperature, WV, and atmospheric movements simulated by Weather Research and Forecasting Model (WRF) model at 27-km resolution.The basic variables were fed to an attention mechanism based Shift Window Vision Transformer (Swin Transformer) neural network (Dosovitskiy et al., 2021;Z. Liu et al., 2021), and targeted to reproduce the high-resolution satellite rainfall product, the Climate Prediction Center morphing method (CMORPH) data (Xie et al., 2019).This DL method will circumvent uncertainties of the physical parameterization scheme owing to incompletely understood physical processes and capture the nonlinearity relationship between the predictors and labels.

Methodology
We consider the task of quantitative precipitation estimation using a DL model as an optimization problem, which can be formulated as follows: In this formulation, the model takes pairs of input consisting of basic atmospheric variables X and precipitation observational data y.The mapping function Ψ is used to relate these inputs, and it has trainable parameters θ.The goal is to find the optimal parameters Θ by minimizing a set of loss functions L using optimization algorithms.

Training Predictors
To generate a data set for high-resolution precipitation maps, we conducted a long-term dynamical simulation of 5 years (2017-2021) over the wettest season (from start of June and end of September within each year) of southeast China, as shown in the left panel in Figure 1.The first 3 years from 2017 to 2019 is used for training the DL model while the last 2 years 2020 and 2021 are used for validating and testing the model performance.The simulation was performed using the WRF model with driven data from the National Centers for Environmental Prediction (NCEP) Final Operational Global Analysis data (FNL) (National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce, 2000).The raw resolution for NCEP FNL data we used to drive the WRF simulation is 0.25°× 0.25°, and we used the meteorological fields from the WRF domain 1 simulation with 27 km resolution as predictors.The cycling running strategies are adopted to perform quasi-realtime forecasting in our operational system.The actual leading time selected for this study is from 24 to 96 hr, we state the leading time as 0-72 hr for simplicity as the first 24 hr is used for spin-up the model thus discarded by default.For cumulus parameterization, we adopted the scale-aware Grell-Freitas scheme (Grell & Freitas, 2014).This choice was informed by the fact that our study domain remained relatively large, making it cost-prohibitive to perform high-resolution (>5 km) NWP simulations.The Grell-Freitas scheme has demonstrated skill in simulating large-scale precipitation and provides valuable background fields.Furthermore, we selected the WSM-3 microphysics parameterization due to its simplicity in hydrometeors representation.This choice aligns well with the requirements of our data-driven study as the computational efficiency of WSM-3 strikes a balance between resource constraints and reasonable accuracy.More details on the WRF simulation configuration can be found in Table S1 of Supporting Information S1.
For the numerical simulation, we used a vertical grid with 38 levels to accurately represent the atmospheric system.We selected four model layers of three-dimensional (3D) basic variables, including wind velocity (U, V, W), pressure (P), temperature (T ), geopotential height (z), and humidity (Q), and stacked them with the corresponding variable at the surface level.We also included a two-dimensional (2D) diagnostic variable, total precipitable water, in our analysis.The vertical wind speed (W) at a height of 10 m was not available in our model output, so the number of layers for this variable is limited to four.These variables were combined to form a 34layer feature map, which we will refer to as 34 channels, as shown in the right panel Figure 1.
In atmospheric modeling, it is common practice to use pressure levels due to the decrease in pressure with height in the atmosphere.However, interpolating values from the model layers to the pressure levels can sometimes result in missing values of the basic variables due to varying terrain heights.To avoid this issue and minimize the memory requirements for training a DL model, we chose to directly use the values from the model layers rather than interpolating to pressure levels.This allowed us to accurately represent the atmospheric system while minimizing the computational resources needed for the simulation.

Observational Precipitation Data
To obtain observational precipitation data as the reference for ground truth, we used CMORPH (NOAA CPC Morphing Technique), a high-resolution global satellite precipitation product.The data is created by combining passive microwave and infrared wave radiance measurements from multiple satellite instruments and adjusted using daily rain gauge analysis.The full-resolution CMORPH data used in this study has a high spatial resolution of 8 km and a frequency of 30 min.To align it with the meteorological data from the model used in this study, the CMORPH data was resampled to an hourly frequency, and its pixel values were matched to the corresponding grid points in the model.

Network Architecture
The architecture of the proposed model shown in Figure 2a is based on the classical encoder-decoder framework, which has been successfully applied to many semantic segmentation tasks in the computer vision field.The model is inspired by the original UNet model (Ronneberger et al., 2015).The gridded meteorological data generated by the WRF model are first divided into 4 × 4 non-overlapping patches by using a 2D convolutional layer with a stride and kernel size equal to the patch size.The patches are then transformed into sequence embeddings and fed into the encoder.
In the encoder, we replace the CNN backbone network in the original UNet model with the Swin-transformer (Z.Liu et al., 2021).Each encoder block consists of a patch merging layer and four Swin-transformer blocks.The patch merging layer performs downsampling, similar to the pooling operation in CNN-based models, while the Swin-transformer block extracts features, similar to a convolutional operation.The input data passed through the patch partition layer has a size of H 4 × W 4 with C embedding channels, where H and W are the height and width of the input data.With each pass through an encoder block, the height and width are halved and the number of channels doubles.
In the decoder block, the patch-expanding layer performs upsampling using bilinear interpolation to restore the feature map to its original resolution, doubling the size of the feature map and reducing the feature dimension to half of its original dimension.To maintain the information lost during downsampling in the encoder, the expanded feature maps are fused with the downsampled features through a skip connection structure.This allows the Swintransformer blocks in the decoder to receive inputs with the same size as the corresponding level of the encoder but with features crossing multiple dimensions.
The bottom levels of the encoder and decoder are connected by a bottleneck, which has the same structure as the encoder but only with two Swin-transformer blocks.This hierarchical architecture design is rooted in the principle of enhancing the network's ability to learn features at multiple scales, which is particularly important for meteorological data due to its multi-scale nature.The computation route of the Swin-transformer block, as depicted in Figure 2b, is designed to reduce the computational cost compared to traditional multi-head self-attention (MSA) modules through the implementation of a sliding window mechanism.The Swin-transformer structure (Z.Liu et al., 2021), accomplishes this reduction by composing a Swin-transformer block of two consecutive attention modules.Each attention module is composed of two LayerNorm (LN) layers, a MSA module, a residual connection shortcut, and a 2-layer multilayer perceptron (MLP) with Gaussian Error Linear Units nonlinearity.More detailed parameter settings of the Swin-Transformer block can be found in Table S2 of Supporting Information S1.

Earth and Space Science
Before the MSA module and the MLP module, the LayerNorm (LN) is applied, and the two attention modules differ in the type of MSA employed.The first MSA layer uses a regular Window-Based MSA (W-MSA), while the second MSA layer adopts a Shifted Window-Based MSA (SW-MSA) module.The calculation of the Swintransformer block is described by the following equations: (2) where ẑ l is the output features of the (S)W-MSA module, z l is the output features of the MLP module, and l represents the number of blocks.

Loss Functions
In meteorology, most variables, such as temperature and wind speed, are represented as continuous values over the model grids.These variables are usually smooth and evenly distributed unless there are significant changes in terrain or extreme weather conditions.Precipitation data, however, is often underestimated during extreme precipitation events due to its imbalanced distribution, making it a challenging problem to predict from an engineering perspective accurately.To address this issue, we propose the use of the Tversky (Salehi et al., 2017) loss function in our model.The Tversky loss function is defined as: where p i and g i are the predicted and ground truth values at pixel i, they are treated with a log-sigmoid operation first to scale the range to [0,1], thus the True Positive, False Positive (FP), and False Negative (FN) terms can be calculated respectively.The tuneable parameter α and β are the weighting factors for the FP and FN terms.It can be viewed as a generalization version of the Dice similarity coefficient, which is widely used in image segmentation tasks.The advantage of using the Tversky loss function is that it provides better control over the tradeoff between precision and recall by allowing for the adjustment of the FP and FN weighting factors.This is particularly useful in our situations where the precipitation data is extremely imbalanced, with the background pixels represented by 0 indicating no rainfall.To encourage the model to better predict extreme precipitation, the Tversky loss function has been adjusted to assign higher penalties to FN predictions.This is achieved by setting β to a higher value than α in our experiments, making the model more inclined to accurately predict extreme precipitation values.In addition to the Tversky loss function, we also use the mean squared error (MSE) loss function, which is a commonly used loss function for regression problems, as our precipitation prediction task can also be viewed as a pixel-wise regression problem.The combined loss function can be expressed as: L(y, ŷ) = 0.5L Tversky (y, ŷ) + 0.5L MSE (y, ŷ) (7)

Data Transform
To enhance the network's convergence, we introduce data transformations to the original input data.Initially, a center-cropping operation is executed on the meteorological data to mitigate boundary errors.This step is crucial as the WRF model, being a regional weather forecasting model, necessitates input from the global background field from FNL.This requirement can potentially introduce errors at the lateral boundaries of the study domain.
Next, we apply mean-std normalization to the input data to scale each predictor.This helps to bring the data into a similar range and prevent one predictor from dominating the other.Finally, we take a log transformation of the observational label data, the precipitation map, to reduce the skewness of its distribution.With the log transformation, the distribution of data is better represented and improves the convergence in the training as well as the accuracy while inferencing.

Evaluation Metrics
To evaluate the performance of the model, various metrics are calculated including the probability of detection (POD), threat score (TS), equitable TS (ETS), false alarm ratio (FAR), and BIAS.These metrics are defined as follows: Earth and Space Science 10.1029/2023EA003234 LIU ET AL.
The POD primarily focuses on the number of hits, while the TS, FAR, and BIAS evaluate the combined impact of hits and false alarms.The ETS takes into account the possibility of a hit by chance by calculating , where cn is the number of correct negatives.A higher POD, TS, and ETS, and a lower FAR or a BIAS closer to 1 are considered an indicator of a more accurate prediction.

Overall Performance
In our study, we utilized a neural network that was trained using 3 years of WRF simulation data from 2017 to 2019, and the WRF simulation was driven by the FNL analysis data, which is the foundation of our deep-learning model.Therefore, the performance of our deep-learning model is ultimately dependent on the original FNL background and our WRF simulation.Despite the advancements in weather forecasting, it remains challenging to reproduce the heavy precipitation events that are generated by intense convective systems.By utilizing the Swin-transformer Unet to process fundamental meteorological variables, we observed significant improvements in the prediction of heavy precipitation events on the tail, both in terms of accuracy in rainfall quantities and the location of the rainfall.The overall evaluation scores calculated from June to September 2021, with all leading times from 0 to 72 hr, are listed in Table 1.
1. Our DL model for the prediction of drizzles with a rainfall intensity greater than 1 mm hr 1 has been shown to enhance POD, TS, and ETS, while simultaneously reducing the FAR.Additionally, the FBIAS is closer to 1 when compared with the results obtained from a pure WRF simulation.The improvement in the TS score is particularly significant, reaching as high as 21.7%.2. In the prediction of moderate rainfall with an hourly intensity of 3 and 5 mm hr 1 , the POD has increased from 0.145 to 0.218 and from 0.088 to 0.161 respectively, while the FAR has slightly decreased.As a result, the TS and ETS scores have also seen considerable increases, with the TS score increasing from 0.117 to 0.164 for the 3 mm hr 1 threshold and from 0.076 to 0.122 for the 5 mm hr 1 threshold.The ETS score has increased from 0.106 to 0.151 for the 3 mm hr 1 threshold and from 0.071 to 0.114 for the 5 mm hr 1 threshold.The relative improvement ratio for both the TS and ETS scores is as high as 60.5% for the 5 mm hr 1 threshold.3.For the heavy rainfall events with an hourly precipitation intensity exceeding 10 mm, our model has demonstrated the ability to significantly enhance the POD, TS, and ETS scores.However, in detecting heavy rainfall, there is a trade-off of introducing higher FAR.This indicates that the meteorological background data may not fully match the observational precipitation data, causing some intensive weather systems simulated by the WRF model with possibilities of heavy precipitation to be mistakenly placed.The improvement in POD for heavy rainfall, with the addition of both 10 and 20 mm intensities, is around 50%, increasing from 0.06 to 0.12.Similarly, the improvement in the TS and ETS scores ranges from 0.055 to 0.08, with an improvement rate of 45.5%.The pure WRF simulation has a BIAS of around 0.1 for heavy rainfall, which suggests that the detected rainfall area is sub-optimally small.Our deep-learning model improved this score to nearly 0.4, demonstrating a better ability to detect heavy rainfall.Improving the accuracy of heavy rainfall detection and reducing the FAR is an important area for further research.4. In terms of spatial distribution, our DL framework exhibits a substantial enhancement in forecast skill, as quantified by the cumulative TS score across all precipitation intensity thresholds shown in Figure 3. Notably, this improvement is observed in areas prone to heavy precipitation events, such as potential tropical cyclone pathways and associated rainbands, in addition to Mei-Yu frontal systems during monsoon seasons.The relative advancements in these regions amount to several-fold increases, while minor degradation is detected in an insignificant fraction of pixels, ensuring overall model performance remains robust mesoscale application.

Quantile Distribution
Quantile distribution stands as a pivotal measure for evaluating the predictive proficiency of heavy rainfall events.Figure 4 demonstrates a monotonic decline in model performance across varying quantile thresholds, signifying the escalating challenge of accurately predicting more extreme events.In our evaluations, the Swin-Transformer Unet model trained with both MSE loss and Tversky loss outperformed the others.It was closely followed by the CNN UNet model, which adopted the same combination of loss functions.Conversely, models that relied solely on the MSE loss demonstrated diminished performance, especially at higher quantiles that signify extreme precipitation events.
When examining lower percentiles, our baseline WRF simulation, which employs the Grell-Freitas (GF) cumulus parameterization scheme (Grell & Freitas, 2014), displays performance characteristics akin to our DL framework in terms of its TS, albeit with a slight overestimation of light rainfall between the 70th and 90th percentile with an hourly rainfall intensity less than 1 mm.We acknowledge that light precipitation events are less intricate to predict and the GF scheme is well accommodated to large-scale precipitation as it is a scale-aware scheme and insensitive to the model resolution, comprehensive sensitivity studies have been done by Gao et al. (2017) and proved to be more skillful in terms of estimating precipitation compared to other none scale-awarded parameterization scheme.Nonetheless, our DL post-processing framework, which is constrained by gridded observation data, contributes to a statistically significant reduction in the wet bias within this light precipitation range.
When we examine more intense rainfall events that surpass the 95th percentile threshold, our proposed DL framework begins to exhibit superiority in its capacity to replicate these extreme events and align the distribution more closely with observed ground truth.Within this percentile range, the TS score for the DL framework is approximately double that of the baseline WRF model, and the estimated rainfall intensity is closely tracks the CMORPH observation until the 99.5th extreme percentile is reached.At this point, the largest bias is about 20%, whereas the baseline WRF model can only capture less than 50% of the intensity.This result underscores the enhanced performance of this DL framework in the accurate prediction and representation of intense rainfall events.
In addition to the above, we present the meridional and zonal averages of hourly precipitation intensity at the 95th percentile in Figure 5, as well as its spatial distribution in Figure 6.The application of our DL framework enables us to refine the estimated rainfall intensity at the 95th percentile, resulting in a significantly improved alignment with the CMORPH observation, both spatially and in terms of its zonal and meridional mean.The spread property, represented by the shadowed area, also aligns more closely with the observations.
The highest relative improvements are evident in the eastern part and southwest quadrant of our experimental domain, with the maximum relative improvement exceeding 100%.Some minor fluctuations and degradation are noticeable around 100-110°Longitude and approximately 25°Latitude, the mountainous southwest part of China.This suggests that our DL framework may demonstrate lower confidence when dealing with orographic precipitation.Similar observations can be made from the spatial distribution plot in Figure 6.The largest improvements are seen over potential tropical cyclone initiation areas in the western Pacific Ocean, the eastern China region influenced by the Mei-Yu frontal rain belts around the middle and lower reaches of the Yangtze  River, the southern part of Japan and the Korean Peninsula, as well as the Southeast Asia region around the Bay of Bengal.Other regions also show varying degrees of improvement in terms of spatial distribution, indicating that our DL framework more accurately represents extreme precipitation patterns compared to the baseline WRF simulation, which exhibits limited predictability over these heavily precipitating areas during the wet season.

Case Study
In this section, we present two case studies of heavy precipitation recorded in the study domain, as shown in Figures 8 and 10.The precipitation maps depict the accumulated intervals of 6 hr and cover a period of one day for each case.The first case study occurred on 3 June 2021, while the second case study occurred on 20 August 2021.

3 June 2021
On 3 June 2021, a frontal rain band was observed to be moving southeastward.In our default WRF simulation, results generally indicate weak signals for several precipitation hot spots over both the continent and the ocean side.
To improve the accuracy of our predictions, we utilized the Swin-Transformer-Unet model in conjunction with the basic meteorological variables predicted by WRF.As a result, the overall precipitation patterns observed are exhibiting more consistency with the CMORPH observational data set for all four intervals during this case.
The frontal rain band were initialized around the center of the study domain (Latitude 27°N Longitude 110°E) before 06 UTC (Figure 8) and started moving southeastward driven by the low-pressure center located on the northeast corner of the study area, the movement and the structure of the rain band is well preserved in the prediction results at 12 UTC by our DL model.In the following sequences of time, this frontal rain band further extended in length and almost covered the whole south and southeast part of China while approaching the coast.
Based on the quantitative evaluation shown in Figure 7, the performance of the default WRF model and the DL model were compared in terms of predicting drizzle and light rainfall with thresholds less than 10 mm and heavier rainfall areas with thresholds exceeding 20 mm or even 50 mm.
For predicting drizzle and light rainfall with thresholds less than 10 mm within a 6 hr interval, the ETS scores of both models were relatively close, except for the first time interval 06 UTC at 0.1 mm thresholds, where the ETS score increased by nearly 40% from 0.34 to 0.5 in the DL model.
On the other hand, for heavier rainfall areas with thresholds exceeding 20 mm or even 50 mm, the DL model outperformed the default WRF model by doubling or even trebling the ETS score, as indicated by the results at 06 UTC and 12 UTC.Moreover, the decrease of ETS for the DL enhanced prediction was less steep, which indicates a more stable performance in estimating precipitation for all ranges compared to the default WRF model.
Despite the scale-awarded GF cumulus parameterization scheme being adopted in our study, it is essential to acknowledge that the baseline WRF simulation faces inherent challenges in resolving intense heavy precipitation under the 27 km grid resolution.Further comparison done with 4 km WRF simulation can be found from Figure S3 and S4 in Supporting Information S1 for the same two case, with increased grid resolution, the baseline WRF simulation is capable of better resolving the detail precipitation patterns with more intense convective structure, while the DL model still outperforms the WRF simulation with the mapping trajectory constrained by the observation.We conclude these challenges for the baseline WRF simulation stem from both the coarse model resolution and certain limitations inherent to parameterizations, as well as the large-scale background fields.However, the notable performance improvement shown by the DL model, especially in regions experiencing intense rainfall, carries promising implications.

20 August 2021
On As shown in the precipitation map in Figure 10, our baseline WRF simulation captures only a limited signal of the strong convective rainfall due to the coarse domain grid size and the limitations of the parameterization scheme for cumulus and cloud microphysics.However, by enhancing the estimation of precipitation with our DL model, we were able to bridge the gap between the WRF simulation and the observations, caused by intrinsic limitations rooted in the parameterization schemes.This considerably reduces the negative bias and facilitates the estimation of extreme precipitation, both in its maximum precipitation amount and in reducing errors in its spatial distribution.
Figure 9 presents statistical evidence demonstrating the effectiveness of our DL framework.The results indicate that the most significant improvement over the baseline WRF simulation occurs at 20th 06 UTC and 21st 00 UTC, where the ETS score increases by an average of 30% for events with light to moderate rainfall (precipitation amount less than 10 mm).Additionally, the baseline WRF simulation's performance degrades rapidly as the precipitation threshold increases, failing to detect precipitation exceeding 50 mm in a 6-hr interval.In contrast, our DL model enhances the estimation results, maintaining a relatively good performance with an ETS score above 0.25 for all thresholds at 06 UTC and 00 UTC, while the baseline WRF model's performance drops to less than 0.1.These results indicate that our DL framework provides a significant improvement over the baseline WRF model, even though the grid size and parameterization schemes are not ideally suited for capturing strong convective precipitation during the monsoon season.

Conclusion and Discussion
Regional NWP models like the WRF model are known to be sensitive to domain grid size (Jee & Kim, 2017) and parameterization schemes (Hasan & Islam, 2018), particularly when it comes to predicting precipitation.To address this challenge, a DL model for semantic segmentation using a Swin-Transformer backbone and a hierarchical Unet structure is proposed in this study.This model leverages basic meteorological variables such as air temperature, pressure, wind speed, and humidity to significantly improve the performance of the baseline WRF model in simulating precipitation, particularly for extreme events induced by strong convection.The overall effectiveness of this DL post-processing framework is demonstrated through a comprehensive performance evaluation, including an analysis of its spatial and quantile distributions, and a detailed discussion of two case studies.
To evaluate the model's performance, we assessed hourly precipitation amounts across intensity thresholds ranging from 0.1 to 20 mm during the period of June 2021 to September 2021.The results demonstrated that our DL model outperformed the baseline WRF simulation for all precipitation intensities.Specifically, the model improved the baseline WRF simulation by 21.7% for light rainfall and drizzle (precipitation amount less than 1 mm hr 1 ), and by 60% for moderate rainfall events with precipitation thresholds of 3 and 5 mm hr 1 .For heavy rainfall events with hourly precipitation intensity exceeding 10 mm, the improvements reflected by the TS and ETS scores reached as high as 50% compared to the baseline WRF.The overall quantile distribution of baseline WRF and the proposed DL framework are also compared, with results showing that the prediction of rainfall intensity across all the quantiles received various degrees of improvement.Additionally, the spatial distribution of the 95th percentile rainfall intensity and its zonal and meridional averages also reveal a significantly better alignment with observational data.However, minor challenges were noted in regions with possible orographic precipitation trigger mechanisms, particularly in the mountainous southwest part of China.Future exploratory efforts could be directed toward amplifying the model's proficiency in recognizing and integrating finer-scale terrain and land surface effects.Such advancements could potentially elevate the forecast skill in these currently less confident areas.
In addition to the overall evaluation, we presented two case studies of precipitation events triggered by different synoptic conditions to demonstrate the model's ability to capture complex weather phenomena.For both events, we investigated the 6-hourly accumulated rainfall of four intervals and showed how our DL model can provide more accurate precipitation forecasts by learning from meteorological data sets and extracting relevant features.
The first precipitation event is caused by the large-scale movement of frontal rain bands, while the second event is induced by strong convection during the monsoon season.In both cases, we observed rapid degradation of model performance as precipitation thresholds increased in the baseline WRF model.On the contrary, our DL model was able to compensate for the insufficient predictability of the baseline WRF model simulation and achieve improved ETS scores over each temporal interval at various precipitation thresholds.Notably, our model demonstrated particular success in capturing extreme precipitation amounts exceeding 30 mm or 50 mm, which are often difficult to predict using traditional modeling approaches.These findings demonstrate the effectiveness of our DL model in capturing precipitation characteristics from basic meteorological variables and further quantitively estimating precipitation based on extracted features.
As we are feeding the WRF model simulated meteorological fields into our DL model, the quality of the precipitation estimation results is ultimately dependent on the quality of our baseline WRF simulation, correspondingly, it should also be related to the initial forcing data used to drive the WRF simulation.Therefore, the results presented in this study show substantial relative improvements against the baseline WRF simulation, demonstrating the ability of this neural network to capture triggering processes which are not currently described in the existing precipitation parameterizations of the WRF model.Moreover, due to the model abstraction, the model grid states may not fully match the CMORPH precipitation observations used as labels for training, making it challenging to estimate precipitation amounts in terms of intensity and location accurately.As a consequence, this spatial and temporal inconsistency of prediction and observation was reflected in the increased false alarm rate (FAR) at higher thresholds.Similar results were also noted by Hess and Boers (2022), they attributed this issue to the localized intermittent nature of the heavy rainfall events.We believe that this limitation can be mitigated by accumulating hourly prediction results over several hours, or by adjusting the hyperparameter α and β in the Tversky loss which controls the trade-off between precision and recall.Compared to solely optimizing the model with a global MSEloss, several studies have attempted to manage the extremely skewed precipitation data by performing log-transformation scaling (Pathak et al., 2022;Shi et al., 2017), modifying traditional regression loss functions such as MAE loss and MSE loss by binning precipitation data and assigning different weights to each category (Franch et al., 2020;Shi et al., 2017), or combining structural similarity measure (SSIM) loss function during optimization (Hess & Boers, 2022;Tran & Song, 2019).The usage of the Tversky loss function has also demonstrated its superior ability in dealing with strongly imbalanced distributed precipitation data in our study indicated by Figure 4 and Figure S1 in Supporting Information S1.The current optimization process involves pixel-wise optimization of the Tversky loss function, followed by using MSE loss for global refinement.Exploring custom loss functions that can focus on local features and the overall distribution might be beneficial for further improving the temporal and spatial accuracy of the DL model.Other than the loss function, the effectiveness of the log transformation is also shown in Figure S2 of Supporting Information S1, we believe the shifting of distribution of skewed precipitation considerably reduces the impact of noise and outliers thus paramount for the training stability.
Additionally, the data augmentation technique is also worth exploring.The existing permutation study by Li et al. (2022) has shown that the moisture-related predictor dominates the precipitation estimation in this DL framework.By using feature patch masking, mixing, and shuffling techniques, it may be possible to further improve the model's generalizing ability by increasing the difficulty of the original task and enriching the data set's diversity.This approach can help reduce the model's reliance on specific predictors, leading to more robust and accurate predictions.Furthermore, we posit that this framework can be generalized to other NWP models or different regions and domains.Given that predictors undergo full normalization during training, it is worthwhile to explore the framework's adaptability through techniques like transfer learning and fine-tuning.This approach can facilitate the development of more resilient applications tailored to various geographical regions and domains.
designed a DL model and performed a 20-day global forecast.Evaluation results indicate that the DL model outperforms the NWP models for those state variables most affected by parameterization processes.Other than convolutional based DL models, Pathak et al. (2022) built a Fourier operator based transformer network to perform weather forecasting at globally 0.25°resolution and achieved matched accuracy with the state-of-the-art NWP model system the ECMWF Integrated Forecasting System.

Figure 1 .
Figure 1.Left: Study domain and terrain height for input Weather Research and Forecasting (WRF) meteorological data.Right: Input meteorological data simulated by WRF model.

Figure 2 .
Figure 2. (a) Architecture of the Swin-Transformer-Unet model and (b) the basic computational principle for a Swin Transformer block.

Figure 3 .
Figure 3. Spatial distribution of threat score for the baseline Weather Research and Forecasting (WRF), WRF + AI framework, and its relative improvements over the evaluation data set.

Figure 4 .
Figure 4. Threat score for precipitation events above the percentile thresholds for Weather Research and Forecasting (WRF) and WRF + AI framework (left), and hourly precipitation intensity for WRF, WRF + AI, CMORPH observation at each corresponding percentiles.
20 August 2021, Central and Northeast China experienced extreme precipitation, accompanied by thunderstorms and strong convective weather.Heavy rainfall of over 100 mm was initially observed before 12:00 UTC ) in the Central China region (located at Latitude 33°N and Longitude 115°E).Subsequently, later in the day, extreme was observed in the Yellow Sea, Northeast China, and the Korean Peninsula.