Define East Asian Monsoon Annual Cycle via a Self‐Organizing Map‐Based Approach

Methods to define the East Asian monsoon usually involve ad hoc and diverse criteria. This study adopts a simple and coherent approach based on the self‐organizing map to systematically derive the monsoon annual cycle based on 850‐hPa wind fields. The derived onset dates in the warm season agree well with the literature despite lacking documentation for winter‐season stages. Linear regression suggests an inverse relationship between the length of the Spring (Meiyu) stage and that of the Pre‐Meiyu (Mid‐summer) stage, suggesting predictability of the Meiyu and Fall onsets. The Aleutian Low, the western North Pacific subtropical high, and the upper‐level westerlies play important roles in the Spring, Pre‐Meiyu, and Meiyu onsets. Observations reveal strong rain belt events form right after the monsoonal wind onsets in the warm season. The proposed approach to deriving monsoon stages and onsets would be useful for monsoon climate studies and operational monsoon diagnosis and forecasting.

. The northwesterlies are generally the strongest from mid-January to mid-February and decays afterward in the late winter (Kim et al., 2013).
Despite a holistic picture of the monsoon annual cycle in East Asia, the exact onset dates and durations of stages remain controversial due to diverse monsoon definitions, data source, and the intrinsic variability of the system (Luo et al., 2013;Martin et al., 2019;Tomita et al., 2011). Different criteria were proposed for monsoon onsets, which involve variables such as precipitation (Wang & LinHo, 2002;Xu et al., 2009), shear vorticity (Wang et al., 2008), equivalent potential temperature (Tomita et al., 2011), or a combination of them (LinHo et al., 2008;Luo et al., 2013). However, all these criteria-based methods were meticulously designed for one or a few monsoon stages only. For instance, four ad hoc criteria, with specific parameters for different variables, were proposed by LinHo et al. (2008) to define an onset date of the spring rain period in South China. Likewise, Xu et al. (2009) determined the Meiyu onset by applying complex rain band definitions. To the best of our knowledge, there is no method to date being proposed to define the monsoon annual cycle in a systematic and simple way. The simplicity and coherence of such a definition could be useful for both operational and research purposes. As such, we employ a neural-network-based algorithm, the self-organizing map (SOM), to classify the East Asian monsoon stages all year round. This method projects high-dimensional data onto a low-dimensional one and preserves the topological features (Kohonen, 2001), which has shown great potentials in classifying meteorological field patterns Dai et al., 2020;Johnson, 2013;Kong et al., 2017;Nishiyama et al., 2007;Pan & Lu, 2020). The results of the SOM-based approach are then assessed with the monsoon onset dates documented in the literature.
In line with the above motivations, this study consists of two parts. First, we introduce a SOM-based methodology to classify the monsoon annual cycle in East Asia and to derive the annual onset dates and explore their predictability. Second, we focus on exploring the attributable weather conditions before and after the onset of the warm-season monsoon stages, given their close relevance to devastating rainfall and floods. The relationship between the derived monsoonal wind stages and the occurrence of rain belt events (defined in Section 2.3) are also presented.

Data
Meteorological fields are retrieved from the fifth generation of the European Centre for Medium-Range Weather Forecast (ECMWF) atmospheric reanalysis data (ERA5) between 1979 and 2018 at 1° grid resolution (Copernicus Climate Change Service, 2017). Pressure-level fields at 25-hPa interval from 925 to 700 hPa are retrieved (except for the 725-hPa level). 850-hPa horizontal winds at 2.5° grid resolution from the National Centers for Environmental Prediction and the National Center for Atmospheric Research (NCEP/ NCAR) reanalysis-1 (Kalnay et al., 1996) are also retrieved for results comparison. We estimate the equivalent potential temperature  e (K) using the empirical equation by Bolton (1980), which can be found in the supporting information (SI) (Text S1).
The vertical mean of  e (K) in the lower troposphere is computed from , p e P e p P dp dp (1) where p 1 and p 2 refer to 700 and 925 hPa, respectively.  denotes the vertical mean. Likewise, the vertical mean of the horizontal advection of  e in the lower troposphere is       925 700 h e V , where V (m s −1 ) is the horizontal wind at the pressure level. The 200-hPa stationary wave is referred to as the 200-hPa geopotential height departure from the zonal mean.

The SOM Training and Testing
We trained the SOM using the 5-day-moving-mean daily climatology of 850-hPa zonal and meridional winds over the East Asian monsoon region (110°E-140°E, 20°N-45°N, following Wang & LinHo, 2002). Not only that monsoon is traditionally defined by the seasonal reversal of near-surface winds (Krishnamurti et al., 2013), but also the 850-hPa wind fields can best describe the prevailing weather patterns during both summer and winter monsoon seasons. In contrast, the other key monsoonal variable, precipitation, is zero-bounded and thus deficient in depicting weather regimes in the dry season.
Since we adopt the Euclidean distance in the SOM algorithm, the wind fields, prior to the training, were weighted by the square root of the cosine of latitude to correctly account for the difference in grid area across latitudes (Johnson et al., 2008). To assess the performance, we conduct the false discovery rate (FDR) test (Benjamini & Hochberg, 1995;Wilks, 2006) on the number of indistinguishable pairs in the clustering results from 1 × 3 to 1 × 10 SOM configurations. In principle, the ratio of indistinguishable pairs is proportional to the number of clusters. As a result, we adopt the 1 × 8 configuration to capture all the important monsoon stages, as mentioned in Section 1, while ensuring a low ratio of indistinguishable pairs (Figure S1). A detailed illustration of the FDR test is given in the SI (Text S2) or can be found in our previous study (Dai et al., 2020).

Detection of Heavy and Persistent Rain Belt Events
Heavy and persistent rain belt events are identified to explore their relationship with monsoonal wind stages. The grid-based threshold for heavy rainfall is obtained by spatially smoothing the 75 th quantile of the accumulated precipitation on wet days (greater than 1 mm/day) using Gaussian kernel smoothing (Lu et al., 2015). Heavy rainfall events with spatial disconnection (larger than 2° in longitudes or latitudes) and temporal discontinuity (more than 1 day) are excluded. Further, events having a zonal scale of at least 10° in longitude and temporal persistence of at least 3 days are classified as heavy and persistent rain belt events. Readers may find an example of a detected rain belt event and the local thresholds map in Figure S2.

Weather Patterns of the SOM-Derived Monsoon Stages
Eight distinct monsoonal wind stages are classified using the SOM approach (Section 2.2, Figure 1). Specifically, the clustered Patterns 1, 2, and 3 (Figures 1a-1c and 1i) exhibit anticyclonic circulations with strong northwesterlies to the north of 25°N-30°N and northeasterlies to the south. These wind regimes reveal the typical wintertime conditions documented in the literature (Jhun & Lee, 2004). In line with the circulations, the 330 K   925 700 e contour line, which represents the nominal boundary between the tropical and polar air mass (Tomita et al., 2011), mainly resides at 20°N or further south, suggesting that the polar air masses encompass the entire East Asia in the three patterns (Figures 1a-1c). Notably, we identify Pattern 2 as the Mid-winter stage (Figure 1b) as it features the weakest precipitation intensity and the strongest northwesterlies to the north of 25°N (Kim et al., 2013), alongside with the weakest precipitation intensity and the coldest-and-driest air masses in the region as the 320 K   925 700 e contour line clears the southern boundary of East Asian land. The stage before the Mid-winter (i.e., Pattern 1) is thereby termed as the Early-winter ( Figure 1a). Pattern 3 is identified as the Late-winter stage as southwesterlies emerge south of 30°N and mark the decay of the winter monsoon ( Figure 1c).
Following the winter season, Pattern 4 unveils an overall weakened northwesterlies in the region with moderate southwesterlies and precipitation over South China (Figure 1d). Here comes the Spring stage. As the summer monsoon approaches, Pattern 5 orchestrates prevailing southwesterlies and a well-built rain belt in the southern part of the monsoon region (Figure 1e), which well aligns with the Pre-Meiyu weather patterns (e.g., Ding & Chan, 2005). Subsequently, Pattern 6 depicts the well-known Meiyu stage in which the quasi-stationary rain belt propagates northward to the mid-lower Yangtze River basin, South Japan and even the Korean Peninsula ( Figure 1f). Meanwhile, southwesterlies become the strongest with tropical air masses (i.e., greater than 330 K in   925 700 e ) spreading all over the region. During the Mid-summer stage (Pattern 7), the southwesterlies and the Meiyu rain belt are much suppressed, while the rain band appears to emerge and persist in the Korean Peninsula, South China coast and the adjacent seas ( Figure 1g). In the Fall stage (Pattern 8), southwesterlies almost disappear, followed by the developing northerlies and dry conditions ( Figure 1h). This stage marks the transition of the weather regime from summer to winter monsoon.

Onset Dates of the Monsoon Stages
Given the climatological wind patterns from the annual cycle (Figure 1), we derive the annual onset dates of all the monsoon stages via the following steps. We first determine the three nearest monsoon stages for each day, referring to the identified monsoon climatological calendar ( Figure 1i). We then assign each day to the one among the three candidate stages based on the minimum Euclidean distance in the 5-day-movingmean 850-hPa wind fields. After the assignment, each monsoon stage's onset date is defined as the starting day when the stage first persists for at least 5 consecutive days. Compared to the results using thresholds of 3 or 7 consecutive days, the onset timings based on the 5-day threshold (Table S2) are closest to those documented in the literature (Table 1). We also repeated the above procedures using NCEP/NCAR reanalysis-1 for comparison. The results largely agree with that using the ERA5 reanalysis and are thus omitted here ( Figure S3). A full table of annual onset dates is given in Table S1.
Notably, the standard deviation of the onset dates reaches 2-3 weeks from the Mid-winter to the Pre-Meiyu stage, compared to fewer than 2 weeks from the Meiyu to the Fall stages (Table 1). This finding suggests a stronger interannual variability of the wintertime and springtime monsoon onsets. In particular, the standard deviation of the Meiyu stage's onset date is only 9 days. Considering that the Meiyu stage represents the DAI ET AL. Outgoing longwave radiation (OLR).

Table 1 The Onset Dates of Monsoon Stages Derived in This Work and Previous Studies
peak season of southwesterlies and monsoon fronts (Figure 1f), strong land-sea and meridional thermal contrasts are therefore required and are largely determined by periodic factors like the seasonally varying insolation, which in turn may explain the relatively weak variability of the Meiyu onset timing.
Onset dates of the warm-season monsoon stages are well-documented in the literature, and our results are in the closest agreement with the Spring, Pre-Meiyu, Meiyu and Mid-summer stages as reviewed in Ding and Chan (2005) (Table 1). Notably, the monsoon onset dates given in most of the listed studies are within one standard deviation of the onset dates derived in this work. As those studies worked on designing ad hoc criteria to ideally fit the weather regime during the concerned monsoon stage(s), the consistency in the onset timings suggests that the SOM-based approach can be a simple and robust alternative to determine the monsoon stages. We notice that the climatological onset dates derived by Kong et al. (2017) using the SOM approach on precipitation data are about 2 weeks behind our results, which may be attributable to only a half-year data (i.e., April to September) used in their study. As the Spring stage's onset can occur before April (Table 1), it is important to take the whole year into consideration when identifying monsoon stages. In addition, despite the lack of documented onset timings for winter-season stages, we believe the results carry some creditability given the cross-validated summer monsoon onsets and the largely consistent wintertime wind patterns discussed in Section 3.1.

Predictability of Monsoon Stages' Onset Timing
The seasonal prediction of the monsoon onset, especially in the rainy season, is crucial but remains challenging. With the derived monsoon onset dates, we desire to know if the earlier monsoon stages can offer predictability on the subsequent stages. For this purpose, we explore the relationship between adjacent stages in terms of the onset time interval from one stage to another. The linear regression reveals that the onset time interval between the Spring and Pre-Meiyu (T S-P ) can explain 45% of the variance in that between the Pre-Meiyu and Meiyu (T P-M ) with a negative slope coefficient significant at the 0.01 level ( Figure S4). That means, the longer the length of the Spring stage (i.e., T S-P ), the shorter the Pre-Meiyu stage (i.e., T P-M ) is expected. Likewise, an inverse relationship with R 2 of 0.51 is also found between the onset time interval from Meiyu to Mid-summer (T M-MS ) and that from Mid-summer to Fall (T MS-F ). The two regression models demonstrate the predictability of the Meiyu and Fall stages' onset dates. For instance, once the onset dates of the Spring and Pre-Meiyu stages are observed, we can obtain the regressed onset time interval between the Pre-Meiyu to Meiyu stage using the regression equation ( Figure S4a), and thereby the predicted onset date of the ensuing Meiyu stage. We note that other regression methods, such as polynomial and loess regression, provide no improved results despite their flexibility. And the inverse relationship between the onset timings of adjacent stages is not observed among the other pairs of stages.

Weather Regimes Associated With the Monsoon Stages' Onset in the Warm Season
It is important to understand how the monsoon stages initiate in the first place and the weather systems at play. Given the critical hydrological impacts from warm-season stages, we limit our diagnosis to the weather regimes before and after the Spring, Pre-Meiyu, Meiyu, Mid-summer, and Fall onsets. We compute the daily composites and average them every 5 days (i.e., pentad) before and after the monsoon onset to highlight the prevailing weather with high-frequency noises removed. Here, pentad 0 denotes the period from the onset date (i.e., day 0) to 4 days after (i.e., day 4), pentad −1 refers to another five days before day 0, and a similar convention applies to others. The anomaly fields in the composites represent the deviations from the mean fields averaged from 15 days before and 9 days after the stage's onset day. For a better presentation of the signals, the sets of lead-lag pentads in the composites are slightly different across different stages.

Spring (Mar/14 ± 16 Days) and Pre-Meiyu (May/04 ± 18 Days)
A strong Aleutian low is observed concurrent with a weak western North Pacific subtropical high (WNPSH) at least 2 weeks before the Spring onset ( Figure 2a). As the onset approaches, the Aleutian Low notably DAI ET AL.

10.1029/2020GL089542
weakens while the WNPSH extends northward, manifesting an abrupt switch from negative to positive 850-hPa geopotential anomaly over North Pacific and Japan (Figure 2b). It is likely that the complete reversal of the pressure dipole anomaly from pentad −3 to pentad 0 ( Figure 2b) leads to enhanced southerlies in East Asia seen in both raw and anomaly fields (Figures 2c and 2d)  tropics into East Asia (Figures 2c and 2d). The instability generated from the warm-and-moist air at lower-levels ( Figure 2c) and cold-and-dry air from the upper-level stationary trough (Figure 2e) contribute to spring rainfall over South China (Figures 2a and 2b).
Similar weather conditions progress into the Pre-Meiyu stage. The weakening of Aleutian Low continues ( Figure S5a) while the WNPSH expands further north, which is again associated with a reversal pressure dipole anomaly from pentad −3 to pentad 0 ( Figure S5b). As the western ridge of the WNPSH hovers over the South China Sea and the Philippine Sea, a continental low-pressure anomaly develops along with prevailing southwesterlies (Figures S5b and S5c), favoring rainfall formation in South China and the adjacent seas ( Figures S5a and S5b). The enhanced rainfall signals are in line with more frequent monsoon rain belt events in the south (20°N-30°N) right after the Spring and Pre-Meiyu onsets (Figures S6a and S6b). This finding suggests a close relationship between rain belt events and the onset of the monsoonal wind stages.
Changes in the upper-level weather conditions around the onsets are also notable. Specifically, the upper-level westerly jet upstream (downstream) of the Tibetan Plateau significantly strengthens (weakens) in pentad 0 of the Spring and Pre-Meiyu stages (Figures 2f and S5f). The anomalous upper-level jet possibly results from the seasonal shift of the westerlies, from which the westerlies over the plateau intensifies and subsequently favors the downslope convergence and monsoon rainband during the early warm season (Chiang et al., 2019;Park et al., 2012).

Meiyu (Jun/12 ± 9 Days)
The preconditioned weather systems begin to differ as it approaches the Meiyu onset. As the influence of Aleutian low to East Asia vanishing, the WNPSH's western ridge rapidly strengthens and stretches to the northwest (Figure 3a), resulting in an anomalous high and drought conditions over the Philippine Sea in pentad 0 (Figure 3b). Since then, the southwesterly monsoon becomes the strongest and continuously fuels warm-and-moist air to the rain fields confined to the northwest of the WNPSH (Figures 3b, 3c and 3d). These observations reveal the typical weather regimes during the westward extension of the WNPSH (Cheng et al., 2019). The lower-level southwesterlies over East Asia assist the northward vorticity advection and favor ascending motions via the Sverdrup balance (Rodwell & Hoskins, 2001), eventually spurring the Meiyu/Baiu rain belts in the Yangtze river basin and Japan (Figures 3a and S6c). We argue that the western extension of the WNPSH is accountable to the onset timing of the Meiyu stage, as the WNPSH are found influential to the Meiyu/Baiu front formation (Chang et al., 2000;Ninomiya, 1984;Wang & LinHo, 2002). Aloft, the upper-level westerlies migrate northward to the northern edge of the Tibetan Plateau at 40°N (Figure 3e) and juxtapose with enhanced precipitation in the Yangtze river basin and South Japan during the Meiyu onset (Figures 3a and 3b). Given the rapid migration of the westerly jet right after the Meiyu onset ( Figure 3f) and its dynamic controls on the East Asian summer monsoon widely reported by other studies (J. Chen & Bordoni, 2014;Park et al., 2012;Sampe & Xie, 2010), it is deemed another determinant of the onset timing of the derived Meiyu stage Kong et al., 2017).

Mid-Summer (Jul/19 ± 13 Days) and Fall (Sep/03 ± 12 Days)
Recall that the southwesterly monsoon starts to diminish since the Mid-summer stage (Figure 1g), we notice a sudden northward migration of the subtropical high in pentad 0 ( Figure S7a), which manifests a pressure dipole consisting of a high (low) pressure anomaly in the north (south) ( Figure S7b). In this case, the pressure dipole weakens the southwesterlies over East Asia (Figures S7c and S7d). Moreover, the Meiyu rain belt vanishes while intense rainfall revives in the south since pentad 0 ( Figures S7a and S7b). It is noteworthy that the Meiyu stage ends when the upper-level westerlies just clear the northern edge of the plateau ( Figures S7e and S7f), suggesting the role of the westerlies on the demise of the Meiyu and the start of the Mid-summer Kong et al., 2017).
As the Fall stage begins, the western ridge of the WNPSH becomes less organized and extends further inland of East Asia ( Figure S8a). The preconditioning of the Siberian high manifested from a high-pressure anomaly ( Figure S8b) and anomalous northeasterlies ( Figure S8d) becomes prominent over East Asia, causing the advection of dry-and-cool air from the northeast (Figures S8c and S8d) and less rainfall ( Figure S8b). Meanwhile, the westerlies return southward to the Tibetan Plateau (Figures S8e and S8f) probably due to the southward retreat of the maximum solar radiation and a stronger meridional temperature gradient by the cold air advection from the northeast (Jhun & Lee, 2004). These signals mark the end of the summer monsoon season and the transition to the wintertime weather regime.
Additionally, the latitudinal position of rain belts in the Mid-summer is mainly confined in the south (20°N-25°N) ( Figure S6d), which is likely to result from the influence of tropical cyclones at this time Day et al., 2018;Pan & Lu, 2020). Rain belts then largely disappear in the Fall stage ( Figure S6e) due to weaker moisture transports from the tropics. Further, the zonal mean precipitation intensity ( Figures S6f-S6j) and the zonal extent ( Figures S6k-S6o) of rain belts also undergo abrupt changes DAI ET AL.
after monsoon onsets. These findings suggest the changes in the characteristics (e.g., location, extent, and intensity) of rain belt events are closely subsequent to the monsoonal wind stages' onsets.

Discussion
In addition to the role of low-level circulations and westerly jets on monsoon onsets, the thermal forcing of the plateau was proposed to affect the EASM (Hsu & Liu, 2003;Yanai & Wu, 2006). However, our further analysis reveals no significant anomalies of the mid-tropospheric temperature over the plateau during the monsoon onset in the warm season (results not shown), which implies that the thermal forcing may be a minor factor for the interannual variability of summer monsoon onset.
As we learn a close relationship between monsoonal wind stages' onsets and rain belt occurrence, the wind-rain belt relationship may be qualitatively understood through the Sverdrup balance (Rodwell & Hoskins, 2001). The planetary vorticity advection caused by warm-and-moist southerlies at lower-levels is approximately balanced by the stretching effect of air columns due to condensational heating, which feedbacks to stronger convection and thereby frontal rains. This explanation may be further supported by the frontal rain analysis by Day et al. (2018) and the finding from J. Chen & Bordoni (2014) that the meridional stationary eddy velocity is key for temperature advection and the Meiyu rainfall.
A recent study demonstrated higher predictive skills of the GloSea5 hindcast ensembles when employing the 850-hPa zonal wind index to determine the onset of the South China Sea Summer Monsoon (Martin et al., 2019). Thus, applying the SOM approach on the 850-hPa wind fields to derive the monsoon annual cycle could be useful for the operational monsoon forecasting. In addition, previous studies suggested that climate forcings, such as the El Niño-Southern Oscillation (Ding et al., 2020;Okada et al., 2017), sea surface temperature anomalies in the tropical Indian Ocean (He & Zhu, 2015) and greenhouse warming (Kitoh et al., 2013) could cause early or late onset of East Asian monsoon in past or future climates, which will be explored in our future work.

Conclusions
In the present study, we adopt a SOM-based approach on the 850-hPa wind fields to derive eight East Asian monsoon stages throughout a year in a systematic and coherent manner. The derived onset dates of the warm-season stages are cross-validated with the literature. Further, the preceding monsoon stages could inform the arrival of the ensuing stage during the warm season. Specifically, the onset time interval from the Spring to Pre-Meiyu (Meiyu to Mid-summer) can explain 45% (51%) of the variance in that from the Pre-Meiyu to Meiyu (Mid-summer to Fall) with a negative slope coefficient based on linear regression. In other words, the shorter the Spring (Meiyu) stage, the longer the Pre-Meiyu (Mid-summer) stage is expected. This finding reveals the predictability of the Meiyu and Fall onsets.
The weather diagnoses further reveal the interplay between the weakening of the Aleutian Low, the northward extension of the WNPSH and the upper-level westerlies' behavior during the Spring and Pre-Meiyu onset. Particularly, the westward extension of the WNPSH and the northward migration of the westerlies are likely to determine the Meiyu onset. In contrast, onsets of the Mid-summer and Fall stage coincide with a northwestward positioned WNPSH and thereby a high-pressure anomaly over the East Asian continent, which induces the demise of the summer monsoon. Finally, the observation of heavy and persistent rain belt events form right after the wind stages' onsets suggest a close wind-rain belt relationship. The SOM-based approach to deriving the monsoon stages and onsets would be useful for monsoon climate research and practical use in operational monsoon diagnosis and forecasting, thereby short-term planning for agricultural activities and flood control.

Data Availability Statement
The meteorological data is retrieved from the ERA5 by the European Center for Medium-Range Weather Forecast (ECMWF) at https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5. The NCEP/ NCAR Reanalysis 1 by NOAA Physical Sciences Laboratory is retrieved from https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html. The SOM training is executed by the R package "kohenon" (Wehrens, 2007).