Deep Learning for Seasonal Prediction of Summer Precipitation Levels in Eastern China

Skilled seasonal forecasting will effectively reduce the economic losses caused by droughts and floods. Because of the powerful data mining capability of deep learning networks, it is increasingly applied in studies of seasonal rainfall prediction. However, there remain two prominent issues in the modeling process: the lack of enough training samples and the effect of a small number of extreme values on the model optimization. To tackle these deficiencies, we combine strategies such as principal component analysis, reduction of model hidden layers, and early‐stopping with Attention U‐Net to construct a rainfall classification forecasting model. These steps reduced the model outfitting and improved the model generalization. The results show that the prediction accuracy of this network with leads of 1–3 months is obviously better than that of the numerical model. Further analysis also supports that the spatial features of precipitation predicted by the network are very close to the observations.

Supporting Information may be found in the online version of this article.
However, the inevitable problem of using DL for seasonal forecasting is that the available observed data is relatively less for DL training (Jin et al., 2022), which will easily cause the model over-fitting.In China, we only have 70 years of observations, and considering the strong seasonal features of precipitation, mixing all monthly data for training has limited improvement on the model performance, especially for summer rainy seasons (Hu et al., 2003;Zhai et al., 2005).Furthermore, when the traditional regression tasks were adopted for precipitation prediction, another tough problem will be highlighted-the effect of extreme values (unbalanced samples).On the one hand, the extreme data might be under-fitting because of their small proportion in the training samples, which would result in their weak forecast skills (Chen & Wang, 2022).On the other hand, these extreme data can greatly increase the training loss (Bannai et al., 2023), and thus mislead the direction of parameters optimization.
To solve the above problems, our study provides a seasonal precipitation level prediction model based on Attention U-Net.To be accurate, it is a multi-classification learning model with well-proportioned pixels.We divided the original data into eight groups.Except for the null value, the number of pixels in each group is around 150,000, with a difference of no more than 20,000.This usefully removes the effect of the extreme values and guarantees the sample balance.Attention U-Net (Oktay et al., 2018) is suitable for small sample tasks and has a structure with Attention Gages (AGs) embedded into U-Net, which makes the model more focused on the relevant regions.In addition, we reduced overfitting and improved the model generalization by decreasing model parameters, applying the principal component analysis (PCA) and the early-stopping strategy (Prechelt, 2002).
The remaining organization of this paper is as follows: The second part is data and methods.This section focuses on data pre-processing methods and model design.The third section provides the model forecasting skills and the comparison with two advanced numerical models.The fourth part gives the conclusion.
The pressure levels data: geopotential height (gh).3. Monthly 5m deep sea surface salinity (sss) from the EN4.2.2 data set provided by UK Met Office Hadley Centre (Good et al., 2013).4. The hindcast and forecast precipitation rate from the National Center for Environmental Prediction (NCEP) Climate Forecast System version 2 (CFSv2) (Saha et al., 2014).This paper used the 1-3 months leading prediction in the 9-month products for 2009-2021.5.The hindcast and forecast precipitation rate from the ECMWF for the fifth-generation seasonal forecast system (SEAS5) (Johnson et al., 2019).Since the prediction of SEAS5 only consisted of a 25-member ensemble until 2016, we use the average hindcast (forecast) of these members.The time range and lead time of this data set used in the research are the same as CFSv2.

Variable Selection
Various predictors which are statistically correlated with rainfall in China are selected as model inputs (Table 1).
For instance, soil moisture has a lag impact on precipitation.This association is between regions and is across monthly or even seasonal scales (Dong et al., 2022;Meng et al., 2014;Zuo & Zhang, 2007).The signals of pre-temperature and pre-precipitation can be stored in the soil temperature, and memorized in the sub-seasonal to seasonal scale (Song et al., 2022).Then, the heat flux of the soil affects the water cycle.The snow on the Tibetan Plateau in the pre-winter and spring also has influences on precipitation through land-air interactions (X.Li et al., 2018).Sea surface salinity is a strong indicator of the global water cycle (Durack et al., 2012) and can likewise be used to predict precipitation (L.Li et al., 2016).In addition, the large-scale circulation and sea surface temperature affect precipitation through sea-air coupling processes, such as El Niño-southern oscillation (Yim et al., 2014), the Antarctic and Arctic Oscillation (Gong & Ho, 2003), and other teleconnection patterns over the extratropical Northern Hemisphere (Ding & Wang, 2005;Zhao et al., 2010).

Data Pre-Processing
1. Model targets: The targets are the precipitation anomalies percentage levels from June to August.We chose the average value of 30 years from 1991 to 2020 as the climatological state and processed the CN05.1 data set as the model targets.First, the precipitation anomaly percentage (pp, unit: %) is calculated as follows: where pre denotes the precipitation (unit: mm/day) and m is the climatological mean.

Model inputs:
The input variables are shown in Table 1.They were cropped to the same size as the target map and normalized according to Z-score normalization respectively.Then, we stack multiple variables to 45 channels.3. The data augmentation approaches like image rotation and mirroring are adopted in the training process.Thus, the number of input and target samples is increased.4. CFSv2 data were converted from a Gaussian grid to a uniform grid, then were interpolated to 0.25°*0.25°.
After that, the anomalies percentage of CFSv2 and SEAS5 were also converted into pp levels.

Model Design
The model consists of two parts.They are the PCA which is an unsupervised learning for predictors and the Attention U-Net (Att U-Net) which is a supervised model aiming at rainfall levels' prediction.
As seen above, numerous variables are chosen as inputs and these variables are not completely independent.Therefore, PCA is applied to transform the raw data into new irrelevant variables.When the principal components (PCs) of the new variables are retained, the main features are extracted and redundant information is removed.
The lower dimensional inputs make a simpler relationship with the targets, helping to reduce the complexity of the network.In this paper, the first two PCs are chosen as inputs for supervised learning, which ensures that 90% of the information is kept.
The Att U-Net is an enhanced U-Net model, with the Attention Gates (AGs) integrated into the original U-Net.As shown in Figure 1, it has a symmetric encoder-decoder structure.In terms of information propagation, each graph sampling means reaching a deeper layer of the DNN.The "skip connection" serves to link the shallow (relatively large, comprehensive) and the deep (relatively small, local) features.We reduce model over-fitting by cutting down model parameters (our model only contains three downsampling and upsampling layers, see Table S2 in Supporting Information S1).AGs can highlight useful features in certain areas during the skip connection.At the same time, they inhibit the irrelevant regions in the input image.Furthermore, the addition of AGs does not change the original full convolutional structure of the U-Net, so Att U-Net can still be used for any size of image input and output.
The DNN can do both regression and classification tasks, as long as the loss function and the target value are adjusted accordingly.In this paper, we apply the Att U-Net model for the classification task.Then the corresponding loss is the crossentropy function: (3) ln = log  (Softmax) (4) In Equation 3, The four dimensions of the variable y are batch_size N, channel C, height H, and width W. C is the number of classes and it means that the matrix is normalized in the channel dimension.

Training Process
The available observed data is relatively less for DNN training.Therefore, besides reducing model complexity, we try to avoid over-fitting by checking the state of the loss function in the validation period.In general, the training loss will gradually converge and then the model training is done.However, a typical over-fitting case shows that the validation loss may increase again after the decreasing stage.In the decreasing stage, the model has already learned and generalized well in the data set, while in the rise stage, the model learns too much and even the "unique" features in the training set.Therefore, the generalizability of the model is reduced, and it may result in poor performance in the validation set.
Therefore, we stop the model learning process when the validation loss reaches its lowest.It can balance the accuracy in the training set and the generalizability in the validation set, which is so-called early-stopping (Figure S1 in Supporting Information S1).

Testset Prediction Skill Evaluation
In this work, three evaluation metrics were used: the Pixel Accuracy (PA), the Temporal (TCC), and the Anomaly Correlation Coefficient (ACC): where N is the total number of grid points and N t is the number of grids whose predicted and observed are at the same level.
where i and j are the spatial and time series, respectively.Y and f(x) represent the true values and predictions.
The PA skill of Att U-Net and the numerical models during the test period are shown in Figures 2a-2e.It can be found that the Att U-Net showed its superiority in most regions compared with CFSv2 and SEAS5.Especially in central China: Henan Province, Hubei Province, and Hunan Province, the PA reaches above 0.2 and even some exceed 0.3.The mean value of the former was 0.163, and the latter two were 0.138 and 0.152.In some places, the PA of Att U-Net is inferior to the numerical models and the distribution showed no obvious spatial bias.
The average TCC of Att U-Net is 0.038, which is higher than −0.009 of CFSv2 (Figures 2f and 2g).In general, these two models show comparable performances in TCC.Att U-Net has better performance in the southern of northeast China, central China, and southern coastal China, and is less effective than the numerical model in eastern Inner Mongolia, near the Bohai Sea and southern Fujian.As for SEAS5, its TCC outperforms the other two models in most regions.

Prediction Skills in Different Years
To better show the prediction skills in different years, we calculated PA and ACC for each test year separately.Figure 3a shows that the PA of Att U-Net is always higher than CFSv2, except for 2015 and 2017.In 2014, 2016, and 2020, the PA values for SEAS5 are slightly higher than that of Att U-Net.As for ACC, the comparison of ACC skills over 13 test years shows that Att U-Net has higher values than CFSv2 and SEAS5 in 2012, 2013, 2017-2019, and 2021 (Figure 3b).None of the three climate prediction models are very stable.And it is worth noting that the ACC in 2012 and 2013 was raised to 0.5.
We further give the violin plot of PA and ACC in these 13 years and TCC for each grid point in the test years to show their statistical distributions (Figure 3c).In general, the Att U-Net model outperforms CFSv2 for all the indices.It also has a much better PA than SEAS5, and their probability distributions of ACC and TCC were similar.

Representative Case
In order to intuitively compare the predicted spatial feature of Att U-Net and numerical models, the cases in June  be found that the summer precipitation in eastern China usually displays four patterns.The selected cases include NCP (the northern China rainfall pattern, corresponding to Figures 4d and 4l), IRP (intermediate precipitation pattern, corresponding to Figure 4p), and SCP (the South China rainfall pattern, corresponding to Figure 4h).Among them, the highest ACC for Att U-Net was in July 2018, which was 0.570.In this case, the observed anomalies are south-negative and north-positive.The predicted pattern of the Att U-Net is generally consistent with the observed.Especially in the northeast, and east of Inner Mongolia and central China, the distribution is almost the same as the observations while the symbols of CFSv2 and SEAS5 were almost negative.The second and third well-predicted are for June 2021 and June 2013.Their ACC were 0.415 and 0.398, respectively.The Att U-Net predictions for these 2 months in the south are largely matched to the observed, while the performance in the north is barely satisfactory.However, the patterns of the numerical models still have larger deviations.The lowest ACC for Att U-Net appeared in July 2017 with only 0.083.Nevertheless, the advantages of Att U-Net are still prominent.This also indicates that the ACC metric may not totally represent the prediction skills of spatial patterns.

Contribution of Features to Principal Components
Since the inputs of the deep neural network have already been extracted features via PCA, the coefficients (Equation 9) and contributions (Equation 10) of the original features to the PCs are calculated to give a simple analysis of factors (Figure 5).It can be found that the largest loadings of PC1 are mainly distributed in the 500-1,000 hPa geopotential heights for the pre-winter northern hemisphere.This is followed by the geopotential heights for the South Pole in the Eastern Hemisphere and the North Pole in the Western Hemisphere.As can be seen in Figure 5b, half of the variables contribute positively to PC2 and the other half contribute negatively.Overall, the raw features that contributed most to the new variables were the prior soil moisture and air temperature, as well as the geopotential height of the Arctic area in the Western Hemisphere.These results may help us further filter the factors in future model improvement.
Where F is the PC.ZX represents the raw feature after normalization, and u is the loading coefficient.
Where R is the contribution of raw feature and c is the explained variance ratio.

Conclusions
In this study, we applied a DL model called Att U-Net for the prediction of summer precipitation levels in eastern China.We used three evaluation metrics (PA, TCC, ACC) to compare the forecasting ability of this model  To solve the small sample training problem in seasonal prediction models, usually they may need more complex variable selecting schemes (Fan et al., 2023) or the transfer learning strategy (Ham et al., 2019;Jin et al., 2022).
Here we suggest several new solutions to overcome the overfitting and small sample training problems.They are: changing the regression task into a classification task, embedding the AGs modules, adopting PC analysis, and the early-stopping.At the same time, the advantages of the U-Net model are retained, which can automatically capture spatial relationships and key information.The prediction skills based on the original U-Net model without any aforementioned optimization strategies are given in Table S3 in Supporting Information S1.In that case, only the Root Mean Square Error is better than CFSv2, while TCC and ACC showed no superiority.
In this work, the results proved that those strategies effectively improve prediction skills and provided new ideas for future research in seasonal precipitation forecast models based on DL networks.There are many other advanced DL networks that can be applied to seasonal prediction and they are worthy of more attempts.Also, the input variables and their period and regions deserve more investigation.Since we only consider the seasonal signals in this work, adding more inputs such as interannual and decadal signals might be beneficial to obtain more accurate predictions.

•
The Attention U-Net model outperforms the numerical model in predicting summer precipitation levels with a forecast lead of 1 month • Converting a regression task into a classification task can eliminate the bad effects of noises and extremes • The principal component analysis and the early-stopping strategy facilitate the model's generalization Supporting Information:

Figure 1 .
Figure 1.The structure of Attention U-Net model (a) and AGs (b).AGs are embedded in U-Net's skip connections.Its input is the encoder of this layer and the decoder of the next layer.The  Ĝ is the attention factor.
2013, July 2017, July 2018, and June 2021 are shown in Figure 4.These cases were maily chosen with reference to the work of Yang et al. (2016), which clustered the summer rainfall anomalies pattern from 1951 to 2015.It can

Figure 2 .
Figure 2. Pixel Accuracy and TCC of Att U-Net, CFSv2, and SEAS5 predictions for summer in eastern China from 2009 to 2021 (test set).(a, f) Are Att U-Net; (b, g) are CFSv2; (c, h) are SEAS5; and (d, e, i, and j) are the differences between Att U-Net and CFSv2, SEAS5, with positive values indicating higher accuracy of the former.

Figure 3 .
Figure 3.The spatial Pixel Accuracy (PA) (a) and Anomaly Correlation Coefficient (ACC) (b) of Att U-Net, CFSv2, and SEAS5 predictions for summer in each year from 2009 to 2021 (test set); (c) Average PA, TCC, and ACC across eastern China and different regions for summer from 2009 to 2021 (test set).The white dot denotes median values and the area denotes the amount of data.

Figure 4 .
Figure 4. Anomalies percentage of predicted and observed in eastern China for June 2013, July 2017, July 2018, and June 2021.(a, e, i, and m) are Att U-Net; (b, f, j, and n) are CFSv2; (c, g, k, and o) are SEAS5; and (d, h, l, and p) are observations.

Figure 5 .
Figure 5.The loadings and contributions of the original features to the principal components (PCs) during the test set.(a, b) Are the loadings of the PC1 and PC2.(c) Is the contribution of origin data to the PCs.