Summer Monsoon Rainfall Patterns and Predictability over Southeast China

This study advances the use of the Self‐organizing Map (SOM) to identify the summer monsoon rainfall patterns over Southeast China (SEC), using 272 gauge records from May to August. Three distinct rain belts over the Huai River basin (HRB), the Lower Yangtze River basin (LYRB) and the South Coast region (SCR) are found. Their subseasonal variability strongly agrees with the northward progression of the East Asian Summer Monsoon (EASM) front in a stepwise fashion. We find that precipitation in the SCR and HRB rain belts exhibit significant changes in the mid‐1990s, while the 1990s is the most active decade for the LYRB rain belt. Promising predictability of average daily rainfall over these three regions is obtained, with about 39% to 50% of the total variance explained by the circulation informed regression models. Both leave‐one‐year‐out cross‐validation and blind prediction verify the regression performance. The western North Pacific Subtropical High phases, mid‐latitude blocking high anomaly over northeast China and upper‐level divergence in SEC are found to best explain the variability of the rain belts. The newly proposed Russia‐China wave patterns (western/central Russia → north of Tibetan Plateau → SEC) and teleconnection between the El Niño‐Southern Oscillation and the rain belts also offer additional predictability. Findings from this work may advance the understanding of the EASM rain belts, and offer insights to the source of bias for numerical simulations of daily summer monsoon rainfall in the region.


Introduction
Summer rainfall in Southeast China (SEC), where cities with the fastest-growing economies are mostly located, contributes a significant portion to the total annual rainfall (Stephan et al., 2018). Its complex spatiotemporal variability has long been recognized in the literature (Chang et al., 2000;Yang & Lau, 2004). Summer rainfall in eastern China has traditionally been categorized into three rainfall patterns (Liao et al., 1981;Liao & Zhao, 1992). Pattern 1 features excessive rainfall in north China, whereas Patterns 2 and 3 refer to intense rainfall in the Huai River basin (HRB) and the lower Yangtze River basin (LYRB), respectively. Since the late-1970s, an increased occurrence of Pattern 3 and suppressed occurrences of the other two have been observed, constituting the so-called "South Flood and North Drought" mode Shi et al., 2009;Zhao & Feng, 2014). Another notable change occurred in the mid-1990s when significantly enhanced rainfall occurred over south China (Kwon et al., 2007(Kwon et al., , 2005. Prominent interannual variations of summertime rainfall over SEC have also been documented. The Yangtze-Huai River basin forms a seesaw pattern with north and south China on the 2-4 year time scale (Tian & Yasunari, even before the early 21 st century (Chang et al., 2000;Ding & Chan, 2005;Liao et al., 1981;Liao & Zhao, 1992;Wang & Lin, 2002). An update considering recent climate data is needed, preferably with state-ofthe-art data mining techniques that have become available over the last two decades. Some recent studies continue to adopt the traditional three rainfall patterns derived from July-June-August (JJA) data (Gao et al., 2019;Shi et al., 2009;Zhao & Feng, 2014). These patterns, however, fail to capture either the quasi-stationary rain belt over south China during the pre-summer rainy season or the varying timing and duration of rainfall patterns [See Figure 1 in Shi et al., 2009]. In light of these limitations, the present study is motivated to derive distinct and reasonable rainfall patterns in SEC during the summer monsoon season [i.e., May-JJA (MJJA)].
Various studies have suggested that the behavior of the Asian anticyclones can heavily influence the summertime subtropical rain belt over SEC. They include the western North Pacific Subtropical High (WNPSH) (Chang et al., 2000;Cheng et al., 2019;Ren et al., 2013), two blocking highs over the Ural mountains and Okhotsk Sea , and the South Asian High (Wei et al., 2014(Wei et al., , 2015. In addition to large-scale circulations, teleconnected climate systems or phenomena such as ENSO, Tibetan Plateau warming, Eurasian snow cover, and Indian Summer Monsoon, can also modulate summer climate in SEC (Wang, Bao, et al., 2008;Wei et al., 2014;Yang & Lau, 2004). Given the spatiotemporal variability of summer rainfall in SEC and the associated drivers mentioned above, our study addresses the following questions: 1. What are the distinct rainfall patterns over SEC and their variability on multiple time scales considering the current climate data and the nonlinearity of daily rainfall fields? 2. What are the underlying atmospheric dynamics and teleconnected oceanic signals associated with the variability of these patterns? 3. How much predictability could we gain from the updated rainfall patterns? 4. What useful dynamic drivers could we learn from the statistical model predictors?
In line with these research questions, this work consists of an analysis part and a modeling part. First, we employ the Self-organizing Map (SOM) to cluster the daily rainfall patterns in SEC during the summer monsoon season (i.e., MJJA) and analyze their subseasonal to decadal variability and relevant physical processes. This analysis technique represents an objective clustering of the main patterns of variability without constraints of linearity and orthogonality (Chu et al., 2012;Johnson et al., 2008;Kong et al., 2017;Oh & Ha, 2015). In contrast, previous studies on rainfall patterns often adopted empirical orthogonal function (EOF)-based analyses that suffer from those constraints (e.g., Mao et al., 2010;Stephan et al., 2018;Wu et al., 2009;Xiao et al., 2015). Considering rainfall is inherently a local phenomenon, of which data of daily rainfall is generally uncorrelated in space and has a highly skewed distribution, SOM appears suitable for classifying patterns and may help decipher the collective drivers of rainfall variability. Second, we develop multiple linear regression (MLR) models to explore daily rainfall predictability over the regions clustered in the first part. The models assist in ascertaining useful meteorological signals based on correlation maps of the useful predictors. This study also aims to offer more insights on the source of biases embedded in numerical models with a more nuanced understanding of monsoonal dynamics.

Data
Daily rainfall gauge data is provided by the China Meteorological Administration (CMA) in MJJA from 1979 to 2018. The study area [108°-125°E, 15°-36°N] covers several important basins and areas in SEC, including the South Coast region (SCR), LYRB and HRB. 272 stations with almost full records of MJJA rainfall within the study period are selected. Rainfall data from 1979 to 2017 has gone through quality control by CMA at the moment. Missing values in the data (less than 0.5%) are replaced with the value from the closest station. The rainfall data in 2018 is used as unseen data for blind prediction to evaluate the prediction skill of the statistical models developed in this work. Daily meteorological variables used in the synoptic-scale diagnosis and statistical modeling are from ERA-Interim reanalysis (Dee et al., 2011) with a 2.5°× 2.5°spatial resolution, including geopotential height at 850hPa (Z850) and 200hPa (Z200), sea surface temperature (SST), 2-meter temperature (T2m), vertically integrated water vapor transport (IVT) and total precipitation (PP). We calculate the anomaly of the above variables as the departure from the 5-day-moving-mean 10.1029/2019WR025515

Water Resources Research
climatology based on 1979-2017 data. T2m&SST refers to a temperature field equals to T2m but with values over oceans replaced with SST. For simplicity, a positive Z850 anomaly is denoted as Z850A+, and a similar convention applies to other variables hereafter.

SOM Clustering of Days With Homogeneous Rainfall Fields
SOM projects high-dimensional data onto a lower-dimensional grid while preserving its topological features and grouping similar patterns into clusters (Kohonen, 1998). In this study, we adopte SOM to cluster distinct rainfall patterns in SEC directly from the gauge data, which has not been attempted in previous studies for this region to the best of our knowledge. We treat each station as a dimension to categorize days with similar rainfall spatial patterns into a cluster (Lima et al., 2017). That is, the algorithm outputs clusters of days such that intra-cluster rainfall spatial patterns are similar, and inter-cluster rainfall spatial patterns are very different. The input matrix is X = [x 1 , …, x m , …, x M ], where x m is the time series of daily rainfall at station m with 4797 observations (K = 4797) in MJJA over 39 years, and M is the total number of stations (M = 272). We implement SOM using the R package "kohonen" with Euclidean distance (Wehrens & Buydens, 2007). A detailed mathematical description of SOM can be found in Kohonen (1998).
To assess the performance of different node settings of SOM clustering, we employ the false discovery rate (FDR) test on distinguishability (Benjamini & Hochberg, 1995). This method is a typical field significance approach that has been applied in previous studies Johnson, 2013). Following the test steps given by Johnson (2013) and Wilks (2006), for each station, the local hypotheses can be written as where μ i (μ j ) is the rainfall average over days in cluster i (j), where i, j = 1, ..., N for i ≠ j, and N is the total number of clusters. The p-value is obtained from Student's t-test. Then the FDR criterion is given by where q = 0.05 is the significance level for the FDR test in this study and p m is the m th smallest p-value among all stations. According to the FDR test, the local tests are rejected if p-values from local tests are not greater than p FDR . If all the local tests are rejected, then the cluster patterns are statistically indistinguishable (Wilks, 2006). In addition, the ratio of indistinguishable pairs is defined as the total number of local tests that fail to reject the null hypothesis. This ratio is utilized to choose the optimal cluster number (N = 2, … , 6) and node structure given a fixed N.

Change-Point Detection and Regional Grouping Based on SOM Clusters
We employ the Pruned Exact Linear Time (PELT) method, a change-point detection method (Killick et al., 2012), to detect any interdecadal change point(s) of the annual number of occurrences of each cluster. The PELT method is implemented using the R package "changepoint" (Killick & Eckley, 2014). A criterion of at least 10 annual observations (i.e., one decade) between two change points is set when using this method.
Based on the distinct spatial features of the average rainfall fields for each SOM cluster, we can further categorize the 272 gauge stations into regional groups based on their rainfall intensities during days in each cluster. First, a heavy precipitation day (HPD) is defined as a day with precipitation exceeding a certain threshold c. The SOM-based regional grouping is done by identifying the cluster (of MJJA days) that contains the majority of a station's HPDs. That is, if station m's HPDs mostly occur on days in cluster i, it is then grouped into a regional group i. We also conduct a sensitivity test to optimize the assignment of stations such that it is not sensitive to the choice of the threshold (c) for the HPD. With the above method, one can divide the SEC into several homogeneous rain belt zones based on the SOM clustering, which is an innovative application of SOM for rainfall studies presented in this work.

Predictive Models for Regional Average Daily Rainfall and Assessments
Prediction of summertime daily rainfall variability in SEC has long been a notorious challenge over years (Hsu et al., 2015;Huang et al., 2013), mainly due to the mixture of processes such as global warming (Zhao & Feng, 2014), long-term air-sea interactions, short-term synoptic-to-mesoscale circulations 10.1029/2019WR025515 Water Resources Research (Stephan et al., 2018). Nevertheless, statistical modeling on daily rainfall helps identify informative signals for predicting SOM-derived rainfall patterns.
It is noteworthy that linear dimension reduction methods are still appropriate for environmental variables that are spatially correlated. Hence, we use EOF analysis on Z850A over 105°-125°E, 15°-40°N (i.e., the entire SEC covering HRB, LYRB and SCR) to extract local signals as candidate predictors to develop an MLR model for each homogeneous group based on the SOM clusters (Section 2.3). We also examine the time-lagged correlation (up to 7 days) between extracted EOF predictors and regional average daily rainfall. To this end, the 39-year Z850A data in the EOF analysis is from April 24 to August 31 in 1979-2017. Prior to regression, the EOFs are standardized by subtracting the mean and dividing by its standard deviation. The group-averaged rainfall is transformed using a cubic root transformation to get a more symmetric distribution appropriate for MLR.
We adopt the forward stepwise selection incorporated with the Bayesian information criterion (BIC) to select the useful predictors using the R package "FWDselect" (Sestelo et al., 2016). To avoid overfitting while ensuring the best performance, for each region, we select the model that achieves 90% of the maximum BIC reduction from the simplest model (i.e., with one predictor only) to the "best model" with the lowest BIC and the least predictors. To avoid plausible colinearity between time-lagged components of any EOF, only the one that leads to a higher coefficient of determination (i.e., R-squared) is retained.
Leave-one-year-out cross-validation (CV) is adopted to assess the robustness of the regression coefficients and the goodness-of-fit of the final models. More specifically, by using EOFs obtained from the 39-year Z850A, 38 out of the 39 years of summer rainfall observations are used to estimate the coefficients of the MLR models, while the remaining one year is used to validate the performance in each CV loop (Lee et al., 2008;Lu et al., 2016). In addition to the cross-validated R-squared and estimated coefficients obtained from the 39 CV loops, another four assessment metrics (Table 1) are used to evaluate the prediction skill of the model.
Finally, a blind prediction on the unseen MJJA rainfall in 2018 is performed to assess further the prediction skill of the final models (fitted by the data from 1979-2017). Another critical difference between the CV and the blind prediction here is that the predictors in the latter one are obtained by projecting Z850A data in 2018 onto the EOFs based on 1979-2017. Prediction interval (PI) at a 95% confidence level and the four assessment metrics in Table 1 are provided to evaluate the blind prediction.

SOM Clustering on the Rainfall Patterns in SEC
In this study, we use a 2×2 SOM to cluster the rainfall field. The number of neurons is user-defined, and we also tested other settings (e.g., 1×2, 1×3, 1×4, 1×5 and 2×3 SOM). The ratio of indistinguishable pairs is found to be proportional to the cluster number N ( Figure S1). This is evident in the rainfall composites for N = 5 and N = 6 ( Figure S2), in which the spatial overlapping of clustered rainfall field is prominent, hence making the rainfall pattern less conspicuous. Despite a lower ratio of indistinguishable pairs for N = 2 and N = 3 than that for N = 4, they fail to capture the rain band caused by the well-known Meiyu front in the LYRB.
As a result, we use 4 clusters (i.e., N = 4) to classify the rainfall patterns.
Compared to the number of clusters, the topological structure of SOM exhibits a weaker influence on the ratio of indistinguishable pairs and the resulting rainfall composites. For a thorough comparison, we apply the FDR test to assess two different topological structures for N = 4 (i.e., 2×2 and 1×4 SOM). The pass rates of local tests (p-value ≤p FDR ) for each cluster pair are tabulated in Table 2. All the cluster pairs for both topological structure settings are statistically distinguishable at 0.05 significance Table 1 The Assessment Metrics for CV and Blind Prediction Ratio of observations within 95% prediction interval (PI) # of y obs k within PI K cor Pearson correlation between predicted values and observations

Table 2
The pass rates of local tests for six pairs of clusters in 2×2 and 1×4 SOM clustering.

Cluster Pairs
Water Resources Research level. The 2D structure (i.e., 2×2 SOM) leads to a slightly better clustering than the 1D structure (i.e., 1×4 SOM), with an average pass rate of~85% over all pairs. Thus we choose 2×2 SOM to analyze the daily rainfall patterns in SEC during MJJA.
Based on the results of the 2×2 SOM clustering, we composite the daily rainfall over SEC within each cluster to reveal their spatial structure ( Figure 1). Daily rainfall composites of cluster one to three (C1-C3) depict southwest-to-northeast oriented rain belts at different latitudes of SEC. The heavy rain belt in C1 days mainly situates between 30°N and 36°N and broadly covers HRB ( Figure 1a). Most of the gauge stations within HRB have average rainfall intensities over 20 mm/day during the C1 days, suggesting that C1 likely corresponds to the HRB rain belt. Similarly, C2 and C3 correspond to the rain belts over LYRB and SCR, respectively (Figure 1b, c). In general, rainfall intensity over SEC decreases from lower to higher latitudes, and from the coast to further inland. As expected, the rainfall composite of C1 is weaker than those of C2 and C3, since less tropical moisture fluxes could reach HRB compared to SCR and LYRB (Lu & Hao, 2017). Likewise, the number of occurrences of the SCR rain belt (726 days) is slightly higher than the LYRB (610 days) and HRB rain belts (543 days). Intuitively, this is quite reasonable as more persistent and frequent rainfall events generally occur in SCR compared with the other regions to the north. The last cluster (i.e., C4) occurs the most (2918 days) and likely represents the background mode (i.e., some-to-no rain condition) of the entire SEC ( Figure 1d). The relatively frequent occurrence of C4 is consistent with the fact that heavy rainfall is, by nature, a rare event compared to the some-to-no rain condition.

Subseasonal to Interdecadal Variability of Rain Belts
The SOM clustering also reveals more nuance in the temporal variability of the rain belts. From the subseasonal climatology, one can see different onset timings of the three rain belts (Figure 2a). The SCR rain belt, in

Water Resources Research
general, has the earliest onset amongst the three rain belts and prevails from mid-May to mid-June. It is closely followed by the prevailing LYRB rain belt after mid-June and then the HRB rain belt in July. Both the subseasonal onset timing of the three rain belts and their spatial rainfall patterns (Figures 1, 2) are mostly consistent with the northward progression of the subtropical fronts in a stepwise fashion driven by the EASM system (e.g., Ding & Chan, 2005). Thus, we conclude that C1 to C3 mainly represent the three stages of the subtropical front as it propagates stepwise from SCR to LYRB and lastly to HRB.
Moreover, if any cluster, say C1, occurs more than 5 days within a 15-day moving window (continuity is not imposed), we define that window as an active period of the C1 rain belt and the 8 th day of those 15 days is marked. As a result, wet spells of all the three rain belts are identified as persistently active periods of the clusters (Figure 2b). Note that a 15-day period can be active for more than one cluster (say C2 and C3), as long as the above criterion is met. To further compare the activeness among different clusters, the size of the colored points is proportional to the number of occurrences of the corresponding cluster within every 15 days. The wider the water-drop pattern is, the stronger the wet spell of the corresponding rain belt is.
Before the early 1990s, all three wet spells generally appear to be inactive and short-lived. Notably, SCR wet spells tend to occur as early as early-May. After the mid-1990s, SCR wet spells tend to postpone to June with prolonged duration and occur almost every year. On the contrary, their occurrences are rare before the mid-1990s. Interestingly, LYRB wet spells become more active and persistent in the 1990s, which line up well and close with delayed SCR wet spells in June around the same time. This finding implies that the transition between the two wet spells tends to be more rapid and frequent during that decade. Likewise, HRB wet spells appear to exhibit an earlier onset after the mid-1990s than before.
The synchronized changes among the three wet spells suggest they are likely related. To further test and quantify our hypothesis on the interdecadal changes, we apply the PELT method to detect any change point(s) in annual occurrences for C1 to C3 (Figure 2c). The method locates a change point in the annual occurrences of C3 during 1993-1994, which confirms that SCR wet spells indeed undergo a delayed onset timing, increased occurrences and extended duration after the mid-1990s. Two change points (1990-1991 and 2002-2003) are detected in the annual occurrences of C2, between which is the active decade for LYRB wet spells. On the other hand, HRB wet spells tend to be more active after 1997-1998, which is in line with the early onset mentioned above ( Figure 2b). All these suggest a systematic shift of the rainfall regime over SEC during the 1990s, featured consistently by the wet spells of three distinct rain belts.
Based on the methods in Section 2.3, we use HPDs defined by the threshold of 10 mm/day (~85 th quantile) to further cluster stations into three spatially homogenous groups (the HRB, LYRB and SCR groups) ( Figure 2d). We perform a sensitivity test by changing the HPD thresholds from 7 mm/day (~80 th quantile) to 17 mm/day (~90 th quantile) and comparing the grouping results with the threshold adopted in this work (i.e., 10mm/day). At most only two stations' grouping (less than 1% of the total stations) would change under varying HPD thresholds (Table S1), proving the robustness of our grouping strategy. The three regional groups have consistent patterns with the composite rain fields (cf. Figure 1 and Figure 2d), and such grouping of stations based on SOM will be used in Section 3.4 to explore the predictability of rainfall over SEC.

Diagnosis of the SOM Clusters
To diagnose the underlying dynamics associated with each SOM-derived rain belt, we examined the composite evolution of a selection of field variables including T2m&SSTA, Z850A, IVTA, PPA and Z200A from 6 days ahead (day -6) to 3 days after (day 3) each cluster (Figures 3-5, S5).

HRB Rain Belt
Some intriguing meteorological features are identified when the HRB rain belt reigns (Figure 3). Two significant Z850A+ over the central North Pacific and the western North Pacific are found at the developing stage, with the former emerges earlier than the latter. The anomalous high over the western North Pacific is likely associated with the western North Pacific Subtropical High (WNPSH) (Lee et al., 2013;Yun et al., 2015).
Owing to the intrinsic temporal variability of the North Pacific Subtropical High (NPSH) centered around 30°-40°N, 140°-160°W during boreal summer (Kawatani et al., 2008), we hypothesize that the Z850A+ over the central North Pacific (~45°N, 170°E-160°W; hereafter called CNPAC+) may stem from the scenario when NPSH extends to the north.
As the two anomalous highs keep strengthening at the developing stage, WNPSH gradually propagates westward towards the East Asian lands, which is the manifestation of positive WNPSH phase (hereafter called WNPSH+) (Cheng et al., 2019). On day 0, strong PPA+ over HRB and light PPA+ over western tropical Pacific, plus large-scale PPA-over western North Pacific, constitute a tripole pattern of anomalous moisture reallocation. Meanwhile, a prominent meridional temperature gradient and convergent IVTA around HRB favor the frontogenesis there. Furthermore, the highlighted temperature and pressure gradient can induce thermal winds that further boost southwesterly moisture transports (IVTA+) along the western periphery of WNPSH+ to sustain the subtropical rain belt over HRB (Chen, 2004;Ninomiya, 1984).
To explain the notable temperature gradient across the HRB region on day 0, WNPSH+ is likely responsible for introducing warm-and-moist southwesterlies and stronger solar radiation fluxes at the surface to the south of HRB (results not shown). On the other hand, a wave train consisting of a series of Z200A+ and Z200A-from central Russia (~60°N, 70°-90°E) to SEC is observed since day -3. As the wave train propagates southeastward on day 0, it triggers upper-level disturbances that can affect the circulations at lower levels. The wave train in which Z200A-is to the north of Tibetan Plateau favors a lower-level divergence (Z850A +) that assists in driving cold advection to HRB and the evident temperature gradient there. Meanwhile, Z200A+ over HRB encourages the upward motion in the air column by diverging air masses at upper levels and strengthens the convective activity within the subtropical front.
In addition, light but widely spread PPA+ over eastern Tibetan Plateau may serve as a source of latent heat release to the atmosphere. According to Hoskins and Karoly's (1981) theory, diabatic heating in the atmosphere at mid-latitudes tends to be balanced by the cold advection from polar regions, which creates anomalous vorticity to the east of the source. This phenomenon possibly drives the trough, and hence the wave train eastward in the next couple of days, eventually contributes to the quasi-stationary fronts over HRB by introducing additional vorticity and cold air masses there.

Water Resources Research
Another interesting teleconnection is the large-scale surface warming in Canada and the continental U.S. from day -3 to 0, possibly due to clear-sky conditions induced by the anomalous high (Z850A+, Z200A+) nearby ( Figure S3). This teleconnection may be associated with the Asia-North-American (ANA)-like pattern originated from the latent heat release over HRB that stimulates a wave train from East Asia to the North Pacific and lastly to the North America continent (Zhu & Li, 2016). Interestingly, mild but significant SST cooling over the eastern equatorial Pacific Ocean gradually extends to the central Pacific Ocean after day 0, suggesting that the HRB rain belt may be associated with a developing La Niña (or decaying El Niño) event.

LYRB Rain Belt
Although the LYRB and HRB are close to each other, the composite maps over the days of the corresponding cluster possess different patterns. First, a deep trough (Z850A-, Z200A-) over the central North Pacific emerges on day -6 and becomes significant on day -3, in line with the development of WNPSH+ (Figure 4). The CNPAC-, in contrast with that in C1 days, implies the southward retreat of NPSH and

10.1029/2019WR025515
Water Resources Research that may slightly drive the WNPSH+ southward and retain the quasi-stationary rain belt over LYRB (Zhou & Yu, 2005). Interestingly, the WNPSH+ in this case further penetrates towards Indochina and Indian peninsula on day 0, causing an even broader dry condition across the oceans. Meanwhile, the quasistationary LYRB rain belt stretches northeastward, reaching South Korea, southwest Japan and adjacent seas. Warm-and-moist southwesterlies at the northern flank of the WNPSH+ and cool-and-dry northeasterlies at the southern rim of a weak blocking high anomaly (Z850A+) originated from eastern Russia, likely sustain the anomalous cyclone and frontal rains over LYRB (Wang, 1992). Similar to the scenario of the HRB rain belt, we observe an upper-level wave train (Z200A) from western Russia to SEC, disturbances of which may help build up an anomalous low at low levels propagating from eastern Tibetan Plateau to LYRB.
One distinct teleconnection is the negative Pacific-North-American (PNA-) pattern recognized from a seesaw pattern of surface temperature and upper-level height anomalies over the U.S. (Leathers et al., 1991), namely Z200A-(+) and T2mA-(+) over the western (eastern) U.S. from day -6 to -3 ( Figure S4). Again, latent heat release during the LYRB rain belt appears to generate an ANA-like wave train that may induce an anomalous low at upper levels and a continent-wide surface cooling in North America on day 0 ( Figure  S4). Linking all those signals, whether there is a causative loop (or a PNA-LYRB teleconnection) from the

Water Resources Research
appearance of PNA-, then the LYRB rain belt and the ANA pattern, and lastly the blight of PNA-may be an interesting research question for future studies.

SCR Rain Belt
Regarding the composite maps for the SCR rain belt (Figure 5), relatively weak but significant Z850A-associated with strong PPA+ over western North Pacific and SCR is eminent on day -3. The reversed tripole pattern of PPA is shown with two dry conditions to the north of SCR and in tropics, and wet conditions over SCR and the adjacent seas. This pattern prominently resembles the WNPSH-phase when the WNPSH retreats to the east and bring wet conditions in SCR and adjacent seas and dry conditions elsewhere (Cheng et al., 2019). However, a weak Z850A over the Philippines on day 0 may also suggest the southward-positioned WNPSH+, as the WNPSH demonstrates a climatological northward migration from the Philippines to western North Pacific since May (Chang et al., 2000). We thus argue that the SCR rain belt could result from WNPSH-or the southward-positioned WNPSH+.
Unlike the other two rain belts, an abrupt appearance of the anomalous cyclone locally over SCR from day -3 to 0 might be due to the effect of tropical cyclones in boreal summer (Ren et al., 2007). More importantly, there is an atmospheric blocking evident by the southward intrusion of Z200A+ encompassing central Russia and north of Tibetan Plateau, possibly associated with the upper-level Rossby waves.

Water Resources Research
Such a blocking likely leads to an anomalous anticyclone and large-scale warming in Eurasia. It may also be embedded in the plausible upper-level wave train from western Russia since day -6. The blocking may strengthen the vorticity to its east, and then enhance the upper-level divergence and lower-level convergence over SCR.
Apart from the signals over the land, we observe CNPAC-and heightened NPSH anomalies over the eastern North Pacific. The former might favor a southward shift of WNPSH similar to that discerned during LYRB rain belt (cf. Figures 4, 5), while the latter might imply an eastward retreat of the NPSH and the occurrence of WNPSH-since day -3. Likewise, the ANA-like wave train (Z200A) from East Asia to North America is again striking, which likely triggers the statistically significant warming over eastern North America on day 3 ( Figure S5). We also notice mild but significant warming over the tropical Pacific Ocean throughout the diagnostic period, which is nearly opposite to that during the HRB rain belt (Section 3.3.1). The potential association between developing El Niño (La Niña) and SCR (HRB) rain belt is thus suggested and to be uncovered later.

Background Mode of SEC
Signals in the background mode of SEC are relatively inactive compared to the other three rain belts as expected. We observe a weak but significant Z850A-over western North Pacific propagating westward from day -3 to day 0 ( Figure S6). In contrast with that during the SCR rain belt, moisture mainly precipitates over SCS and western North Pacific as the anomalous cyclone hardly makes landfall. Meanwhile, suppressed rainfall over the entire SEC is outstanding. This suppressed rainfall is attributable to the anomalous anticyclone (Z850A+) over the SEC since day 0, which introduces anomalous northeasterlies that weaken the background southwesterly monsoons. Furthermore, the lower-level anticyclone and upper-level cyclone over the region create a sinking motion over SEC, further discourage the cloud formation and lead to the background mode. The seesaw pattern of PPA between SCS and SEC reveals the plausible out-of-phase relationship between EASM and SCS summer monsoon . Besides, a weak but significant cooling over western North Pacific after day 0 serves as a favorable condition for the transition to WNPSH+ (Cheng et al., 2019), which may subsequently lead to the HRB or LYRB rain belt as discussed above.

Predictability of Rainfall in the Three Rain Belt Regions and the Informative Meteorological Signals
Despite the prominent circulation and oceanic signals observed in the composite analysis, indices derived for these signals were not capable of explaining variations of rainfall in the MLR models (results not shown). As rainfall is inherently a local phenomenon, local Z850A signals are extracted by the EOF analysis to preserve as much local weather information as possible to predict the regionally averaged daily rainfall over three rain belts (Section 2.4). Thus, three rain belts share the same set of candidate predictors. The first 10 EOFs explain 96.8% of the total variance, with their spatial patterns shown in Figure 6. The leading EOF explains up to 46.7% of the total variance, which features a synoptic-scale pressure anomaly over the SEC. EOFs 2, 4, 6 and 10 display north-south wave patterns (with different wavenumbers) of anomalous mesoscale pressure bands. In contrast, EOFs 3 and 7 feature east-west wave patterns, while the remaining EOFs appear to feature small eddies. However, care must be taken when interpreting those EOFs, as some of the patterns may suffer from artifacts subject to orthogonality constraints.
Based on the Pearson correlation, only EOFs at 0 or 1 day before the rainfall showed significant linear relationships with regional rainfall (results not shown), suggesting an immediate response of rainfall to the leading modes of local pressure anomalies. Therefore, 20 predictors (i.e., 10 EOFs at lead 0 and 1 day) are used in the full MLR models. Following the selection process described in Section 2.4, three sets of informative predictors are determined for the final MLR models. The R-squared values of the three final models are 0.39 (HRB), 0.47 (LYRB) and 0.50 (SCR), which well coincide with the range of the cross-validated R-squared.
Regarding the predictions of the remaining one year in the 39 CV loops, average values of predictive Rsquared are 0.37, 0.45 and 0.48 for the final HRB, LYRB and SCR models, respectively (Table 3), suggesting that our models are not overfitted. Given the useful skills measured by additional metrics (Table S2) and the stable regression coefficients in the CV (Figure S7), the final MLR models are deemed stable and reliable.
To further assess the performance of the MLR models, we perform a blind prediction on the regional daily rainfall in 2018 (Section 2.4). The results of the blind prediction are promising. The PIR values for all three models are very close to one, revealing that the 95% PI nearly covers all the observed values (Figure 7). Strong

10.1029/2019WR025515
Water Resources Research correlations (~0.7 in all the models) further confirm the consistency between the predicted and observed values. Also, the predictive R-squared values are 0.45, 0.50, 0.41 for the HRB, LYRB and SCR rain belt regions, respectively. In terms of accuracy, the predicted rainfall by the SCR model has the smallest MSEP, which agrees with our expectation given its highest R-squared ascertained by the CV (Figure 7, Table 3). We should be aware that the results of blind prediction can be influenced by the randomness in the observations. Overall, assessments from both the CV and blind prediction suggest that the selected EOFs for homogeneous stations clustered by SOM can offer appreciable predictability of the three monsoon rain belts' strength, from which the SCR model generally has the highest performance.
Based on the diagnosis of three rain belts (Sections 3.3.1-3.3.3) as well as the signs of regression coefficients for each predictor in the final MLR models (Table 3), we attempt to utilize those useful signals to ascertain and infer the underlying physical processes. Regarding EOF 2, which explains a substantial variance (~22%), its correlation map features a large-scale Z850A dipole covering the western North Pacific and eastern China (Figure 8). Considering the negative sign of its regression coefficient in both the HRB and LYRB models, we argue that EOF 2 is useful by capturing the WNPSH+ signal that is essential for building up the HRB and LYRB rain belts and providing indispensable predictability for their intensities. Another large-scale Z850A+ encompasses HRB and northeast China, which is not entirely consistent with the composites (Figures 3, 4). The discrepancy unveils that EOF 2 alone is not sufficient to explain the full picture of the weather condition. Noticeably, EOF 2 appears to correlate with a wave train from western Russia to SEC, which considerably coincides with the wave patterns in the composite maps of the LYRB rain belt (Figure 4).

Water Resources Research
Additionally, EOFs 4, 6, 8 and 10 are common predictors shared by all three final models (Table 3). All those except EOF 8 exhibit significant correlations with certain pressure bands. Specifically, EOF 4 is associated with Z850A-over SEC and two Z850A+ fields centered in the Philippines and northeast China, respectively ( Figure 9). The one in the Philippines may be related to the southward-positioned WNPSH+, as seen in the composites of the SCR rain belt ( Figure 5). EOF 4 also exhibits significant correlations with southwesterly IVTA alongshore of SEC, northeasterly IVTA associated with a mid-latitude high pressure, and an upperlevel divergence over SEC. These suggest that EOF 4 reflects favorable conditions for the rain belt formation in the SEC, evident also by its positive regression coefficients in all three models. Further, we observe alternating Z200A fields from western Russia to the north of Tibetan Plateau and lastly to SEC. Hence, we argue that EOF 4 contains information regarding the essential wave train phenomenon, as also found in the composite analysis (Sections 3.3.1-3.3.3).
EOF 6 is associated with a wavenumber-2 wave pattern in which two large-scale Z850A fields are in the Philippines and central Russia, plus two pressure bands in between ( Figure 10). Considering the sign of EOF 6's regression coefficients in the models, we find that it might signal the upper-level trough (blocking high) over SEC during the SCR (the HRB and the LYRB) rain belt(s), which likely emanates from central Russia following the wave train. Similarly, EOF 10 exhibits correlations with a wavenumber-2.5 wave pattern from central Russia to the Philippines at both lower and upper levels ( Figure 11). Considering all those EOFs (e.g., EOFs 2, 4, 6, 10) that significantly correlate with the wave train phenomena and their signs of regression coefficients in the MLR models (Table 3), we propose that they can serve as indicators to classify rain belts under different wavenumbers of disturbance. 10.1029/2019WR025515

Water Resources Research
EOF 8 is another informative predictor that is worth our attention. It positively correlates with WNPSH+, mid-latitude Z850A+ over eastern Russia and CNPAC+ ( Figure 12). These circulation signals are particularly prominent in the composites during the HRB rain belt (Figure 3). More importantly, it significantly correlates with the basin-wide SST cooling associated with the easterly IVTA and 10-meter wind anomalies over the central and eastern Pacific Ocean. Given the positive and negative regression coefficients of EOF 8 in the HRB and SCR models, respectively (Table 3), we argue that EOF 8 can signal the ENSO-HRB-SCR teleconnection.
Although remaining EOFs are somewhat difficult to interpret ( Figures S8-12), useful signals contained in some of them are still noteworthy. For instance, EOF 3 is retained only by the HRB model. It is associated with a zonal-orientated pressure dipole from eastern Tibetan Plateau to the Yellow Sea accompanied by southerly IVTA over SEC ( Figure S9), which appears to be only consistent with the composite weather patterns in parts ( Figure 3). Nevertheless, EOF 3 exhibits significant correlations with the wave train from western Russia, CNPAC+ and SST cooling in the central Pacific Ocean, all of which are highly consistent with the composite. These reveal EOF 3's potential association with both circulation and oceanic signals that offer predictability for the HRB rain belt.

Discussion
We have shown that the three rain belts all experience regime shifts in the characteristics of their wet spells (e.g., duration, number of occurrences and onset timing) around or after the mid-1990s. Similar periods that reveal changes in summer rainfall in the region have been documented in several studies (Day et al., 2018; Water Resources Research Kwon et al., 2007;Wang et al., 2016;Wu et al., 2010). For instance, Wang et al. (2016) showed precipitation over almost the entire SEC was much suppressed during 1979-1992, while the weather condition over South China and LYRB turned into abruptly wet during 1993-1999. Later more precipitation over South China and HRB was found since the 2000s. These are in general agreement with the interdecadal variability of the SOM-derived wet spells discovered in this study. In addition, Kwon et al. (2007) reckoned that the significantly enhanced rainfall over south China after the mid-1990s is associated with weakened upper-level Asian westerly jet and increased typhoon activities. Recently, Chiang et al. (2017) proposed that the anomalous northward migration of the westerly jet would lead to a shorter duration of the Meiyu rain belt. Also, leading modes of both WNPSH and ENSO shift from 3-6-year to 2-3-year cycles since the mid-1990s (Cheng et al., 2019), which corresponds well with increased occurrences of the canonical El Niño and El Niño Modoki after the 1990s (Yeh et al., 2009). These studies offer insights on the plausible factors of the interdecadal variabilities in the rain belts found in this work. Ultimate causes of the systematic changes in rainfall patterns, circulations and climate variabilities around the mid-1990s are yet-to-be-identified.
Promising predictability with 39% to 50% explained variance in the mean rainfall of the three rain belts is achieved and cross-validated in this study, alongside with the appreciable blind prediction skills on unseen rainfall data in 2018 (Section 3.4). Yet statistical prediction of summertime daily rainfall variability in SEC has long been a notorious challenge over the years. Zhao and Feng (2014) and other early works attempted to predict the traditional three rainfall patterns using conceptual models built upon a decision tree considering the signs of a set of climate indices. Recently, Gao et al. (2019) predicted the three rainfall patterns using a multinomial logistic regression model and adopted machine learning

10.1029/2019WR025515
Water Resources Research methods to select useful predictors among 84 preceding winter climate indices. However, rainfall pattern prediction by those statistical models was limited to an annual basis by assuming only one dominant rainfall pattern occurred each year. In other words, the northward propagation of the EASM rain belt from SCR to HRB on the subseasonal time scale was overlooked in their studies. In contrast, it is reasonably captured in the regression models for daily rainfall presented in this work. Moreover, one major issue of climate-index-based prediction is the strong collinearity between the indices, which would overestimate the predictability to some extent (Gao et al., 2019). Conversely, predictors (i.e., EOFs) in our MLR models are uncorrelated with each other, from which the predictability of the SOM-derived rainfall pattern is more convincing and reliable.
It is noteworthy that the MLR models presented in this work hardly perform any lead-time forecasts, as they are based on the EOFs derived from local pressure fields at lead 1 and 0 day. Nevertheless, a recent study revealed vital biases in the numerical hindcast simulations of heavy rainfall events (20mm day -1 or above) in SEC among all the 14 CMIP5 models (Huang et al., 2013). The multimodel ensemble of the 14 models inevitably fails to capture those heavy rainfall events in the SEC. In light of this, the MLR models presented in this work can serve to identify the related circulation and oceanic signals from the model predictors, which may contribute to an improved rainfall simulation in SEC by pinpointing the critical signals that are not well represented in numerical models.
Additionally, we argue that WNPSH+ (WNPSH-or southward-positioned WNPSH+) plays an indispensable role in predicting HRB and LYRB (SCR) rain belts. Notably, WNPSH bears some similarities with the North Atlantic Subtropical High (NASH). NASH exhibits a zonal oscillation that can result in prominent rainfall anomalies over the eastern U.S. in summer (Davis et al., 1997). A recent study indicated that the

Water Resources Research
southwestward extension of the western ridge of NASH consistently follows the intensification at its center (Li et al., 2012). Their finding is somewhat similar to our hypothesis that the northward extension (southward retreat) of NPSH identified from CNPAC+ (CNPAC-) may modulate the meridional position of the WNPSH, and subsequently retain the rain belt(s) over HRB (LYRB and SCR). Our results and the parallel discussion of NASH seem to justify the hypothesis, for which future numerical experiments and in-depth diagnoses are needed.
Interesting wave trains (western/central Russia → north of Tibetan Plateau → SEC), collectively termed as the Russia-China (RC) patterns, are proposed to explain the development of the rain belts, which can offer additional predictability of the rain belt variability. In fact, extratropical teleconnection patterns have been recognized to influence the summer climate in East Asia significantly (Enomoto et al., 2003;Lu et al., 2002).
Recently, the Europe-China (EC) pattern was identified as a wave train originated from North Atlantic through eastern Europe, northwest India and lastly to East Asia, affecting July rainfall variability in northwest China (Chen & Huang, 2012). Due to the discrepancy in the main route, the proposed RC patterns, to the best of our knowledge, could be a new summertime teleconnection pattern that affects the three distinct monsoon rain belts in SEC, whereas the mechanism of which warrants further studies.
Last but not least, the proposed ENSO-HRB-SCR teleconnection largely agrees with a recent finding, from which extreme rainfall days in the south and north China (separated by 30°N) were associated with the rapidly decaying El Niño and developing La Niña, respectively (Li & Wang, 2018). The teleconnection may be explained by the combination mode dynamics (Stuecker et al., 2013(Stuecker et al., , 2015Timmermann et al., 2018). It illustrates that the transition from El Niño to La Niña in a near-annual cycle suppresses the low-level cyclonic activity over the central Pacific, and subsequently gives rise to an intensified WNPSH+. A similar WNPSH-ENSO coupling on the quasi-biennial time scale is also documented in an observational study (Cheng et al., 2019), which can modulate the summer rainfall pattern in East Asia. Combining findings from the existing literature and this study, we suggest that the ENSO-HRB-SCR teleconnection is likely the result of the WNPSH-ENSO interaction and could be useful for predicting HRB and SCR rain belts.

Conclusions
Predictability of SEC summer monsoon rainfall has long been an intriguing research challenge in hydroclimate studies. Here, we adopt SOM as a tool to cluster the intricate spatiotemporal rainfall patterns in SEC during its summer monsoon season. More importantly, we aim to explore the predictability of the rainfall patterns on a daily time scale and retrieve useful signals that help improve the understanding of the underlying climate and circulation drivers. With a careful selection of the SOM setting and a thorough analysis of the derived clusters in different aspects, key findings of this study are summarized as follows.
1. Four distinct rainfall patterns are categorized by the 2×2 SOM and are corresponding to rain belts in HBR, LYRB and SCR, and the background rainfall mode of SEC. The subseasonal variability of the three rain belts highly agrees with the stepwise propagation of the EASM front in both space and time. Our analysis suggests that SCR rain belt likely undergoes a regime shift with a delayed onset, increased occurrences and prolonged duration after the mid-1990s. The LYRB rain belt tends to be more active in the 1990s, while the HRB rain belt appears to be intensified with an earlier onset after the mid-1990s. All these suggest a systematic change in the characteristics of the monsoon rain belts in the 1990s. This particular signature is largely in phase with the interdecadal regime shift in the upper-level westerly jets, the zonal oscillation of WNPSH and ENSO documented in previous studies. 2. Promising predictability of the average daily rainfall in the three homogeneous groups of stations clustered by SOM is obtained with 39% to 50% variance explained by the MLR models, alongside with the stable cross-validated performance and the useful skill maintained in the blind prediction. In general, the assessments suggest that the SCR rain belt has the highest predictability. The encouraging performance of the regression models ascertains a set of useful EOFs and offering insights on the related atmospheric and oceanic signals for each rain belt. 3. Based on the diagnoses of the rain belts and informative predictors in the MLR models, WNPSH+ (WNPSH-or southward-positioned WNPSH+) plays a crucial role in establishing the HRB and LYRB (SCR) rain belts. We speculate that the northward (southward) shift of NPSH identified from the CNPAC+ (CNPAC-) signal may affect the meridional position of the WNPSH, and subsequently retain the rain belt(s) over HRB (LYRB and SCR). Also, mid-latitude blocking high anomaly over northeast China and upper-level divergence in SEC assist in creating a favorable condition for the monsoon rain belt formation. 4. The newly proposed RC patterns that exhibit extratropical wave trains (western/central Russia → north of Tibetan Plateau → SEC) can offer additional predictability of the monsoon rain belts in SEC. The ANA-like pattern is identified during all three rain belts, which is likely responsible for the surface temperature anomalies over the North America continent and the plausible PNA-LYRB teleconnection. Lastly, the ENSO-HRB-SCR teleconnection is also proposed, in which the HRB (SCR) rain belt occurs when statistically significant cooling (warming) over central to eastern Pacific prevails. It is likely the result of the WNPSH-ENSO interaction that could offer potential predictability to the variability of HRB and SCR rain belts. Further diagnoses and numerical experiments to verify those teleconnections and wave activities are very much needed in the future.