A New Approach to Skillful Seasonal Prediction of Southeast Asia Tropical Cyclone Occurrence

Predicting the peak‐season (July–September) tropical cyclones (TCs) in Southeast Asia (SEA) several months ahead remains challenging, related to limited understanding and prediction of the dynamics affecting the variability of SEA TC activity. Here, we introduce a new statistical approach to sequentially identify mutually independent predictors for the occurrence frequency of peak‐season TCs in the South China Sea (SCS) and east of the Philippines (PHL). These predictors, which are identified from the preseason (April‐June) environmental fields, can capture the interannual variability of different clusters of peak‐season TCs, through a cross‐season effect on large‐scale environment that governs TC genesis and track. The physically oriented approach provides a skillful seasonal prediction in the 41‐year period (1979–2019), with r = 0.73 and 0.54 for SCS and PHL TC frequency, respectively. The lower performance for PHL TCs is likely related to the nonstationarity of the cross‐season TC‐environment relationship. We further develop the statistical approach to a hybrid method using the predictors derived from dynamical seasonal forecasts. The hybrid prediction shows a significant skill for both SCS and PHL TCs, for lead times up to four or 5 months ahead, related to the good performance of models for the sea surface temperatures and low‐level winds in the tropics. The statistical and hybrid predictions outperform the dynamical predictions, showing the potential for operational use.

The statistical and hybrid approaches are widely used to make basin-wide TC predictions with lead times up to several months, but they are usually less skillful for TCs at regional (subbasin and country-level) scales (e.g., Camargo & Sobel, 2005;Goh & Chan, 2010;Li et al., 2017;. Regional TC variability is affected by compound effects of multiple large-scale drivers and local environmental conditions. The modulation of regional TCs by large-scale drivers can be nonlinear and complicated, and it may not be well represented in the conventional correlation analysis. Large-scale driver indices such as the Nino3.4 index alone cannot represent such variability well. Instead of focusing on the well-known climate variability,  introduced a region-dependent seasonal forecast method, in which several predictors are identified in parallel for TC genesis in different sub-regions of the WNP. A similar method was also developed for the prediction of landfalling TC frequency on the Chinese coast (Tian & Fan, 2019). One caveat in these methods is that the predictors are not completely independent, and this could cause the prediction performance to be unrealistically inflated.
Another challenge to using statistical seasonal prediction methods is the strong seasonal variation in their performance. The typhoon season in the WNP is from June to November, but the interannual variability of TC activity varies greatly from month to month throughout the whole season (Feng et al., 2021). The statistical prediction systems, such as those operational in SEA (VNMHA and PAGASA contributing coauthors, personal communications), rely strongly on ENSO indices and have much lower skill in predicting the peak-season TC frequency than predicting the early-season (AMJ) or late-season (OND) frequency. This suggests that the complexity or nonlinearity of the ENSO effect Duy et al., 2019), including its coupling with the effects from other large-scale drivers, could not be well understood and included in these prediction systems. For example, the competing effect of the Indian Ocean SSTs in the summer of ENSO years (Xie et al., 2009(Xie et al., , 2016 could weaken or prohibit the simultaneous or lagged relationships between the eastern Pacific SSTs (the ENSO route) and SEA TCs (e.g., Lyon & Camargo, 2009;Wang & Chan, 2002;Wang, Li, et al., 2013). The study reported in this paper is motivated by the low performance of the existing seasonal prediction (including dynamical models, which will be further described in Section 4) for the frequency of peak-season TCs in SEA.

10.1029/2022JD036439 3 of 24
Recent studies suggest that the cross-season evolution of the large-scale circulation, for example, the Indo-Pacific Warm Pool Hadley Circulation (e.g., Guo et al., 2020;Guo & Tan, 2018) and the North Pacific Subtropical High , is significantly related to the interannual variability of basin-wide WNP TCs. These findings indicate that the preceding-season large-scale environment could be important sources of seasonal TC prediction. A few recent studies have attempted to establish the cross-season statistical relationships between regional TC variability (genesis and landfall frequency) and the preceding-season environmental conditions (Tian & Fan, 2019;. However, in these studies, physical interpretations of the preceding-season predictors, including independence of the predictors, remain largely uncertain. The study reported here aims to better understand the interannual variability and predictability of the occurrence frequency of peak-season (JAS) SEA TCs. The following research questions are central to this paper: (a) is the frequency of peak-season SEA TCs related to the preseason environment? (b) can we derive independent predictors from the preseason environment to build a statistical forecast? (c) can we provide physical interpretations of the predictors to elaborate the cross-season effects on SEA TCs?, and (d) what is the performance of a hybrid statistical-dynamical method in predicting the SEA TC frequency?
The paper is structured as follows. In Section 2, the data are first described. A sequential approach to identifying independent predictors for SEA TC frequency is then introduced. Section 3 starts with the description of the statistical predictors, followed by the interpretation of the predictors by linking them to the large-scale environment. The performance of the statistical prediction is evaluated. Finally, we develop a hybrid prediction method with the predictors extracted from dynamical seasonal forecasts. Conclusions and discussion are provided in Section 4.

Climate Reanalysis and TC Observations
Monthly environmental fields (e.g., SSTs, winds and relative humidity) in the period 1979-2019 from the ECMWF fifth generation climate reanalysis (ERA5) (Hersbach et al., 2020) are used to construct and interpret the predictors of the SEA TC frequency. The ERA5 data are also used to verify the predicted environmental fields from the dynamical models. ERA5 is produced using a 4D-Var data assimilation system which combines the model forecasts of the ECMWF Integrated Forecast System (IFS) with the inhomogeneous observations. SSTs in ERA5 are derived from the Hadley Center Ice and Sea Surface Temperature (HadISST2) data prior to 2007 and from OSTIA thereafter. The ERA5 high-resolution deterministic reanalysis (horizontal resolution of ∼31 km) is used for this study.
We focus on the occurrence frequency of peak-season (JAS) SEA TCs. The SEA is further divided into two subregions (Figure 1), the South China Sea (SCS; 100-120°E and 0-25°N) and east of the Philippines (PHL; 120°-140°E and 0°-28°N). The number of TCs developing in, and moving through, the two regions during the peak-season is counted and considered. TC tracks are obtained from the International Best Track Archive for Climate Stewardship (IBTrACS; Knapp et al., 2010Knapp et al., , 2018, which merges TC tracks from different meteorological centers. The data source of IBTrACS used in this paper is from the Japan Meteorological Agency (JMA). We evaluated the sensitivity of our method to different sources of Best Track data, for example, using the data from the China Meteorological administration (CMA), and the track data derived from ERA5 where TC identification is independent of Best Track (Feng et al., 2021;Hodges et al., 2017). Similar results were found, although the areas for building the predictors are slightly different. Here we only present the analysis with IBTrACS. We also use the climate indices (ENSO and PMM) from the NOAA Physical Sciences Laboratory to discuss the links between the TC predictors and climate variability.

Dynamical Models and Seasonal Hindcasts
Seasonal hindcasts from GCMs are used to construct the predictors in the hybrid prediction method. These dynamical models include the ECMWF fifth generation seasonal forecast system (SEAS5), and the UK Met Office Global Seasonal Forecasting System version 5 and 6 (GloSea5 and GloSea6). SEAS5 is configured with the IFS atmosphere model (cycle 43r1) at T319 spectral horizontal resolution (about 36 km) and 91 vertical model levels, the NEMO ocean model (version 3.0) with ORCA025 grid and 75 vertical levels, and the Louvain-la-Neuve sea ice model (version 2.0) (Johnson et al., 2019). In SEAS5, the atmosphere and land surface are initialized from ERA-Interim (Dee et al., 2011), and the ocean and sea ice are initialized from the ECMWF fifth generation ocean analysis system (Zuo et al., 2019). Initial condition perturbations and a stochastic physics scheme are used to create the forecast ensemble.
GloSea5, in scientific configuration Global Coupled 2.0 (GloSea5-GC2; MacLachlan et al., 2015;Williams et al., 2015), is based on the coupled Met Office Hadley Center Global Environment Model (version 3), in which the Met Office atmospheric Unified Model (UM) (version GA6) is coupled to the NEMO ocean model (version 3.4), the Joint U.K. Land Environment Simulator (version 6.0), and the Los Alamos Sea Ice Model (version 4.1). GloSea5 uses an N216 horizontal resolution (0.83° in longitude and 0.56° in latitude) with 85 vertical levels for the atmosphere, and the tripolar ORCA025 grid with 75 vertical levels for the ocean.
GloSea6 is the successor to GloSea5, with the same model resolution and ensemble size but a higher coupling frequency (increased from 3 hourly to hourly). GloSea6, in scientific configuration Global Coupled 3.2 (GloSea5-GC3.2), has the same model components as GloSea5 but with updated versions: UM in GA7.2, NEMO in version 3.6, land model in version 8.0 and sea ice model in version 5.1. These updates, ideally, have improved model physics and parameterizations. GloSea6 is currently the Met Office operational seasonal forecast system (Williams et al., 2018). In GloSea5 and GloSea6, the forecast ensemble is generated by perturbing atmospheric physics tendencies via the Stochastic Kinetic Energy Backscatter v2 (SKEB2; Bowler et al., 2009) scheme, with the same atmospheric initialization. Uncertainties in initial conditions are partially considered by having the ensembles initialized at different dates in each month. The atmosphere and land surface are initialized from ERA-Interim, while the ocean and sea ice are initialized from the Forecast Ocean Assimilation Model (FOAM) Ocean Analysis (Blockley et al., 2014).
Historic seasonal forecasts (hindcasts) spanning the 23-year period 1993-2015 are used to develop the hybrid prediction system. SEAS5 hindcasts have 25 members starting on the first of the month, while GloSea hindcasts have four start dates in each month (the first, ninth, seventeenth, and 25th) with seven ensemble members per start date. For the GloSea hindcasts, the nominal month M includes the members initialized on the first of the month M and all available members with start dates back to, and including, the ninth of the month M-1 (28 members in total). For the SEAS5 hindcasts, the nominal month M includes all the members initialized on the first of the month M (25 members in total). Each hindcast member covers 215 days after the initialization date. For the hybrid prediction, we use hindcasts initialized in April, March and February. In the discussion section, we also evaluated the dynamical prediction of peak-season TCs in the hindcasts initialized in June and May; TCs were identified from twelve-hourly atmospheric fields, using the method described in Befort et al. (2021) and Feng, Klingaman, Hodges, and Guo (2020).

Statistical Analysis
The Pearson correlation coefficient (denoted as r) is used to measure the correlation between the time series of two variables. The time series are detrended before calculating the correlation. A two-tailed t test with p value of 0.05 (or 0.10) is used to test significance, with a null hypothesis of a zero correlation. In other words, the statistically significant values (r) are at the 95% (or 90%) confidence level, unless stated otherwise.

Identification of TC Predictors
In our analysis, the predictand is the frequency of TCs developing in, and moving through, the SCS and PHL in the peak-season (Figure 1). The predictors are extracted from the preseason SSTs and low-level winds, which are critical for TC generation and development, over regions determined by correlation analysis. We also evaluated other environmental factors as potential predictors, such as relative humidity and vertical wind shear. However, these variables were found to have a less robust relationship with the TC numbers, presumably due to less persistence of the variables across seasons. Moreover, the dynamical seasonal forecasts are found not to predict these variables as well as SSTs and low-level winds in the tropics, limiting the skill of the hybrid predictors.
We introduce a sequential orthogonal identification process (SOIP) to identify the statistical predictors for the predictand (TC frequency). Through this approach, the predictors are designed to represent different temporal components of TC variability, which are statistically independent. The schematic workflow in this approach is shown in Figure 2. The main processes are summarized in the workflow shown in Figure 2. The individual steps in the SOIP approach are described as follows: 1. The relationship (correlation and regression) between the predictand (F 0 ) and preseason environment (i.e., SST and 850 hPa zonal winds) at each grid point (E 0 ) are calculated. The average environment over a continuous region where the p value of correlation is less than 0.1 is identified as the potential predictor. By doing this, several potential predictors (in the units of deg.C or m/s) can be found. Among these potential predictors, the one which has a physically meaningful relationship with the peak-season TC activity (i.e., formation, development and movement) is chosen as predictor 1 (P 1 ). The predictor is then rescaled to the standard deviation of TC frequency via multiplying σ TC /σ P , where σ TC is the interannual standard deviation of TC frequency and σ P is the standard deviation of predictor. The final predictor has the same units as the predictand (times/ season). Note that the shape of the region responding to the predictor can be irregular as it is essentially determined by the strength of the TC-environment relationship. 2. The effect of predictor 1 (P 1 ) is removed from the predictand (F 0 ) and preseason environment, to obtain their residuals (F 1 and E 1 ). To do this, the linear regressions of yearly predictand (F 0 ) and preseason environment in each grid (E 0 ) against the yearly predictor 1 (P 1 ) are first estimated; the values of predictand and environment determined by predictor 1 via the estimated regressions are then removed from the time series of the predictand and preseason environment at each grid point, to yield the residuals of predictand (F 1 ) and environment (E 1 ). This step is to ensure that in the following identification process the previously identified predictor has no statistical relation to the next predictors. In other words, the sequentially identified predictors are statistically unrelated. 3. The relationship between the predictand and environment residuals (F 1 and E 1 ) is further estimated. As in Step 1, the average of the environment over the regions, where the statistical relationship is significant at the 90% confidence level, are taken as the potential predictors. The potential predictor that has physical meaning to the peak-season TC activity is chosen and rescaled as the predictor 2 (P 2 ). 4. To identify another predictor (P i ), similarly, the linear effects of the previously identified predictors (P 1 , P 2 , …, P i−1 ) are removed sequentially from the predictand and environment to obtain the new residuals (F i−1 and E i−1 ); the new predictor (P i ) is then constructed from the residuals (F i−1 and E i−1 ). In principle, predictors are the regional averages of either SSTs or 850 hPa zonal winds, which have physically meaningful relationships with the predictand, and are statistically independent of each other (r = 0) because of the sequential removal process. 5. Finally, a multivariable prediction model is built to predict the regional TC frequency. The model is formed by combining regressions between the predictand (or predictand residuals) and predictors. In each of N prediction years, the predicted total frequency of regional TCs, denoted as , is expressed as: where β 0 is the climatic average of the TC frequency (a long-term average), and β i is the regression coefficient between the frequency and predictor (F i−1 and P i ). β i is estimated separately for individual predictors. For example, β 1 is calculated from the time series of F 0 and P 1 , and β 2 is calculated from the time series of residual F 1 and P 2 . For the statistical prediction, the β i value is estimated through a cross-validation procedure for each year. Block elimination is employed to minimize potential skill inflation from serial correlation. For each prediction year, β i is trained using N-5 years of the frequency and predictor (F i−1 and P i ) with a 5-year omitted block centered on the year of prediction. Then, the number of TCs in the prediction year is predicted using this estimated β i and the predictor (P i ) for the prediction year. This means that in the statistical prediction the β i value varies with years. In the cross-validation procedure, we also tested the sensitivity of prediction performance to the length of omitted years, by changing the length to 1, 3, and 7 years (this will be discussed in Section 3.3). For the hybrid prediction, the β i value for each predictor P i is constant during the forecast period because the time series of P i is directly extracted from the environment over the same region as the statistical predictor but from dynamical forecasts (i.e., no cross-validation involved).
Through the iteration of steps 1-4, a sequential set of predictors can be identified. In the rest of this paper, we only consider the first few predictors, which in total explain at least 50% of the variance of TC frequency. More predictors are needed if the percentage of the variance of TC frequency explained by predictors is set to be higher. We note that it becomes difficult to identify more than four predictors, as those areas of environment associated with TCs become smaller and sparser. We cannot rule out that the set of predictors identified by the SOIP approach could still be physically related via nonlinear and complicated processes.

Constructed Predictors
For the SCS TC frequency, to capture at least 50% of the variance over the 41-year period 1979-2019, three predictors are derived using the SOIP method from the preseason environmental fields in ERA5 ( Figure 3). Predictor 1 is constructed from the average of 850-hPa easterly wind anomalies over a region with westward tilt with latitude in the western-central tropical Pacific (Figure 3a). Predictor 2 is based on the average of warm SST anomalies over a region with eastward tilt with latitude in the central-eastern tropical Pacific ( Figure 3b). Predictor 3 is constructed from the average of cold SST anomalies over the Indian Ocean ( Figure 3c). The time series of the three predictors and the corresponding TC frequency are shown in Figure S1 in Supporting Information S1. Predictors 1-3 are significantly correlated with the year-to-year variability of the corresponding TC frequency, with r = 0.48, 0.53, and 0.60 (all significant at the 95% confidence level). Combining the TC frequency associated with individual predictors yields the total frequency. The total frequency constructed from the three predictors matches well the observed total frequency, with r = 0.80 (i.e., explaining ∼65% of the variance). The physical meanings of the predictors will be discussed in Section 3.2.
For the PHL TC frequency, four predictors are derived from the preseason environment ( Figure 4). Predictor 1 is constructed from the average of 850-hPa easterly wind anomalies over a region in the central tropical Pacific (Figure 4a). Predictor 2 is from the warm SST anomalies over a region tilted eastward with latitude in the north-eastern tropical Pacific (Figure 4b). Predictor 3 is based on the cold SST anomalies in the Arabian Sea (Figure 4c), and predictor 4 is based on the 850-hPa easterly wind anomalies over a region in the western tropical Pacific (Figure 4d). The four predictors are significantly correlated with the corresponding PHL TC frequency, with r = 0.37, 0.40, 0.43, and 0.49 (all significant at the 95% confidence level) ( Figure S2 in Supporting Information S1), respectively. The constructed total frequency explains ∼55% of the observed variance, with r = 0.74.
It is worth noticing that although the predictors of the SCS and PHL TCs are identified separately, they are not independent. Table S1 in Supporting Information S1 provides the cross correlations between the predictors for the two subregions. The SCS TC predictor 1 is significantly correlated with the PHL TC predictors 1 and 4, with r = 0.52 and 0.62, respectively. This is related to the fact that the SCS TC predictor 1 is defined by the low-level easterly winds over a region, which extends across the two regions used to define the PHL TC predictors 1 and 4 (Figures 3 and 4). The SCS TC predictor 2 is correlated with the PHL TC predictor 2, with r = 0.72. The SCS TC predictor 3 is significantly associated with the PHL TC predictors 1, 2, and 3, with r = 0.48, −0.33, and 0.52, respectively. This suggests that despite the different regions the individual components of SCS and PHL TCs share several common drivers. There are a few factors that can affect the identification of the predictors. First, there is uncertainty in observing the number of TCs affecting the two regions, especially for weak storms and when the stages of the storm lifecycle are weak (Knaff et al., 2010;Torn & Snyder, 2012;Velden et al., 2006). Using different sources of TC data (either produced by different meteorological agencies or tracked from climate reanalyses) helps to understand the uncertainty in the predictor identification. We found that when the TC tracks from ERA5 (Feng, Klingaman, Hodges, & Guo, 2020;Hodges et al., 2017) are used, the regions responsible for the PHL TC predictors are slightly larger than the regions when the IBTrACS data are used (not shown). This might be related to the longer life cycle of the storm identified in climate reanalyses (Hodges et al., 2017). Excluding weak storms and the weak stages of storms may help to address the uncertainty in the predictors. However, the reduced sample size (especially for the SCS TCs) and the late-genesis stage (Feng et al., 2021) may in turn cause the TC-predictor relationship less robust. Second, the predictors, and the regions responsible for the predictors, could vary with the analysis periods. For example, during the period 1979-2019, the PHL TC frequency is significantly related to the low-level zonal winds in the equatorial central Pacific (predictor 1). But, when considering the period 1993-2015 (analyzed in Section 3.4), this relationship becomes insignificant. This uncertainty could be related to the nonstationarity of the relationships between the preseason environment and TC activity.

Predictor-Environment Relationship
In this section, we interpret the predictors by examining the relationship between the predictors and the environmental conditions relevant to TC activity. We hypothesize that the predictors represent different clusters of TC genesis and track. TCs affecting the SCS and PHL regions include those storms that form within the region (local TCs) and others that form elsewhere but travel to this region (remote TCs). Figure 5 shows the climatology of genesis density and translation speed of the SCS and PHL TCs in the peak-season from IBTrACS. On average, there are about 7 TCs in the SCS in the peak-season, of which 61% are local TCs and 39% remote (Figure 5a). The remote TCs originate throughout a wide area in the western tropical Pacific (120°-180°E), and travel north-westward to the SCS. Associated with the easterly steering flow and the North Pacific Subtropical High to the north, most SCS TCs (both local and remote) move westward or north-westward toward Vietnam and southern China (Feng et al., 2019;Wang, Xiang, & Lee, 2013). On the other hand, some local TCs travel eastward toward the Philippines, related to a south-westerly steering flow (Yang et al., 2015). The eastward-moving storms are responsible for an overall small translation speed inside the region. There are on average 13 TCs affecting the PHL in the peak-season (Figure 5b), of which 58% are local storms, 38% form to the east of the region (145°E-160°W), and 4% form in the SCS and travel to the PHL.

SCS TC Predictors
For the SCS TCs, the relationship between predictor 1 and the environmental conditions is shown in Figure 6. In the preseason (Figures 6a-6d), positive anomalies of predictor 1 (anomalous easterly winds) indicate cold SST anomalies in the central-eastern tropical Pacific and warm SST anomalies in the subtropical Pacific and east of the Philippines. The SST changes are accompanied by an anomalous easterly wind in the lower troposphere and an anomalous cyclonic circulation in the upper troposphere. This indicates an enhanced Pacific Walker circulation related to an enhanced equatorial zonal wind-SST feedback (Cane et al., 1986). The zonal wind pattern leads to a low-level convergence and upper-level divergence in the SCS. The enhanced large-scale circulation is also associated with an increase in vertical wind shear (VWS) and a decrease in midlevel relative humidity (RH) in the central tropical Pacific. In contrast, the SCS region sees a VWS decrease and RH increase.
From the preseason to peak-season, the predictor 1-related environmental conditions strengthen (Figures 6e-6h).
In the tropical Pacific, the east-to-west SST gradients and low-level easterly wind anomalies become stronger in the peak-season, indicating a strengthening or westward shift of the summer Monsoon Trough (MT) (e.g., Basconcillo et al., 2021;Chen et al., 2019;Li et al., 2017;Teng et al., 2021). In short, with positive anomalies of predictor 1, the peak-season thermodynamic and ventilating conditions are evidently more favorable for TCs to form in the SCS and PHL regions. Additionally, with positive anomalies of the predictor 1, the easterly steering flow in the west Pacific significantly strengthens in the peak-season, and this steers remotely originated TCs to enter the SCS. The low-level relative vorticity (RVor) is another important factor for TC genesis by providing synoptic-scale disturbances (Molinari & Vollaro, 2013). The negative correlations between predictor 1 and RVor in the northern central Pacific (Figures 6d and 6h) suggest that the predictor 1-related TCs are less likely to form in the central-western tropical Pacific (150°-180°E).
The relationship between SCS TC predictor 2 and the environmental conditions is shown in Figure 7. The predictor is strongly associated with the preseason environment in the central tropical Pacific (Figures 7a-7d). The predictor 2-associated environment is depicted by warm SST and low-level westerly wind anomalies in the central tropical Pacific, suggesting enhanced zonal wind-SST feedback with a deeper oceanic thermocline and less upwelling (Cane et al., 1986). The warm SST and westerly wind anomalies could correspond to an eastward shift of the MT (Basconcillo et al., 2021;Chen et al., 2019;Molinari & Vollaro, 2013;Teng et al., 2021;Wu et al., 2012), and they are closely related to the PMM pattern (Chiang & Vimont, 2004). The correlation between predictor 2 and the preseason PMM index is r = 0.69. There is also a strong anticyclonic circulation anomaly in the upper troposphere in the north Pacific. The wind anomalies are associated with low-level convergence, upper-level divergence, VWS decrease, RH and RVor increase in the central tropical Pacific. Toward the peak-season, these environmental conditions largely persist (Figures 7e-7h). Moreover, in the peak-season, the negative effect on VWS and the positive effect on midlevel RH and low-level RVor of predictor 2 become stronger For the SCS TC predicator 3, the relationship with the large-scale environment is shown in Figure 8. The most distinctive feature in the preseason environment related to predictor 2 is the cold SST anomalies in the Indian Ocean (Figures 8a-8d). Related to this, the cross-equatorial southerly flow in the lower troposphere and the northerly flow in the upper troposphere over the Indian Ocean and west Pacific are significantly enhanced. This indicates intensified convection and a strengthened Hadley Circulation over the Indo-Pacific Warm Pool (Hoskins et al., 2020), which are strongly related to an increase in WNP TC activity Guo & Tan, 2018). This large-scale circulation change can significantly reduce VWS and increase midlevel RH in the SCS and PHL regions. These environmental conditions, which are favorable for TCs to form in and travel to SEA, decay from the preseason to peak-season (Figures 8e-8h). However, toward the peak-season, the related cross-equatorial overturning circulation in the Indo-Pacific Warm Pool shifts further east to SEA. This shift excites a strong low-level cyclonic circulation and high RVor to the north of the SCS, encouraging local TCs to form.
To support the above interpretations, we calculate the correlations between the predictors and spatial distributions of the SCS TCs. Figures 9a-9c shows the correlations with observed genesis density. (Note. This is only for the TCs affecting the SCS rather than all TCs in the WNP.) As expected, predictor 1 is significantly related to the TC genesis in the SCS and around the Philippines, and predictor 2 is related to the TCs that form in the open ocean (140°-180°E) and travel to the SCS. Predictor 3 has a less clear relation to any spatial clusters of genesis.
However, the relationship between predictors and TC track density is more evident and supportive to our conclusions (Figures 9d-f). Predictors 1 and 3 represent the two clusters of TCs in the SCS, which are teleconnected to the central-eastern Pacific and Indian Ocean SSTs, respectively. Predicator two explains well the remote TCs, which is associated with the genesis-related conditions in the central tropical Pacific and the steering flow to the north. Thus, we confirm that the three derived predictors represent three different genesis and track clusters of SCS TCs (e.g., Camargo et al., 2007;Chen et al., 2017;Kim & Seo, 2016;Teng et al., 2021).
One key question is how the predictors are associated with ENSO. We found that all three predictors are significantly correlated with the preseason ENSO. Their correlations with the preseason Nino3.4 index are r = −0.48, 0.46, and −0.34 (all significant at the 95% confidence level), respectively. As the predictors are statistically independent, the correlations with the Nino3.4 index indicate the diversity of the TC-ENSO relationship, possibly related to different types of ENSO (e.g., Modoki and mega-ENSO; Ashok et al., 2007;Kim et al., 2011;Wang, Li, et al., 2013;Wang, Liu, et al., 2013) and variations in the environment-ENSO teleconnections (e.g., Kim et al., 2010;Xie et al., 2009). We hypothesis that although the ENSO signal becomes weaker after the winter, the response of the remote environment in the following season could still be robust due to the lagged effect. Thus, we anticipate that the preseason ENSO (or the ENSO in the preceding winter) can affect the SCS TCs through different pathways, and a simple description of ENSO (as represented by the Nino3.4 index) cannot reveal the complexity of the ENSO impact. (Note. There is no significant correlation between the preseason Nino3.4 index and SCS TC frequency.) For the peak-season Nino3.4, the correlations with the three predictors are r = −0.79, 0.08, and 0.08, respectively. This suggests that the peak-season ENSO index holds a linear relationship with only a fraction of SCS TCs (the correlation between the peak-season Nino3.4 index and SCS TC frequency is r = −0.30, p = 0.06).

PHL TC Predictors
The large-scale environment related to the four predictors of the PHL TCs is shown in Figures S3-S6 in Supporting Information S1. The relationship between predictor 1 and the environment highlight the cross-season persistence of cold SST anomalies in the central-eastern tropical Pacific and the Indian Ocean ( Figure S3 in Supporting Information S1). These SST changes occur with a significant enhancement of low-level convergence and upper-level divergence east of the Philippines, suggesting a strengthening regional Hadley circulation (e.g., Guo et al., 2020). The low-level westerly wind anomalies from the Indian Ocean and easterly wind anomalies from the central Pacific bring humid air to the western tropical Pacific and reduce the VWS in the SCS and PHL. These conditions persist across the two seasons. The PHL TC predictor 1 is anticorrelated with the preseason (peak-season) ENSO index (Nino3.4), with r = −0.87 (−0.56). We also recall that the PHL TC predictor 1 is significantly associated with the SCS TC predictors 1 and 3 (r = 0.52 and 0.48, respectively), which represent the central-eastern Pacific and Indian Ocean SSTs separately. Thus, we anticipate that the PHL TC predictor 1 reveals two pathways of the ENSO effect on the PHL TCs, that is, the east pathway from the central Pacific and the west pathway from the Indian Ocean. Neither preseason nor peak-season ENSO index has significant linear correlations with other predictors of the PHL TCs.
The PHL TC predictor 2, associated with the SCS TC predictor 2 (r = 0.72), is positively correlated with the environmental conditions that are favorable for TCs to form in the central tropical Pacific, and favorable for them to travel further to the PHL ( Figure S4 in Supporting Information S1). These environmental conditions well resemble the PMM spatial pattern, which was found to significantly modulate the TC activity in the southeast sector of the basin (e.g., Gao et al., 2018;Zhang, Vecchi, Murakami, Delworth, et al., 2016). The PHL TC predictor 3, associated with the SCS TC predictor 3 (r = 0.52), reflects the favorable ventilation conditions in the southern part of the PHL, with the low-level convergence and upper-level divergence ( Figure S5 in Supporting Information S1). The PHL TC predictor 4, correlated with the SCS TC predictor 1 (r = 0.62), represents the cross-season persistence of environmental conditions that are favorable for local PHL TCs to form, that is, warm SSTs, lower VWS, cyclonic circulation anomaly and stronger low-level convergence ( Figure S6 in Supporting Information S1).
As with the SCS TCs, the PHL TCs can be decomposed into different clusters . Figure 10 shows that the four predictors can capture different spatial clusters of PHL TCs. Predictor 1 is positively correlated with the PHL TCs formed in the northwest sector of the WNP, and negatively correlated with TCs formed in the southeast sector, consistent with the TC-ENSO correlations (e.g., Feng, Klingaman, Hodges, & Guo, 2020). This reemphasizes the close relation of predictor 1 to ENSO. As expected, predictor 3 is related to the TCs that form in the southern part of the PHL and move north-westward. Predictor 2 represents the westward-moving TCs that originate to the east of the PHL, while predictor 4 tends to capture both the westward-moving TCs that generate in the PHL and the eastward-moving TCs that generate in the SCS.
We note that the predictors may not be mutually independent of the variability of the MT, which is defined as the contiguous region where monthly 850-hPa RVor is positive (e.g., Molinari & Vollaro, 2013). The MT is associated with rich synoptic-scale disturbances and favorable environmental conditions (e.g., large low-level vorticity, small VWS, warm SSTs and humid air), and is crucial for the WNP TCs; over 70% of them are born within the MT. The MT is a "TC genesis bed" (e.g., Wu et al., 2012, and references therein). The identified predictors could be associated with the year-to-year variability in the position and strength of the MT. For example, for positive anomalies of the PHL TC predictor 1, the enhanced west-to-east SST gradient and low-level easterly winds along the equatorial Pacific suggest a seesaw change in the MT: strengthening east of the Philippines and weakening to the east. This favors an increase in TC activity in the PHL (Figures 10a and 10e). We anticipate plausible links between the MT and other predictors for PHL and SCS TCs (not described here). All these are consistent with the previous findings that a strengthening or an eastward shift of the MT encourages more TCs to form in the south-eastern sector of the basin, and a weakening or westward-shift of the MT favors more TCs to form in the western sector ( et al., 2012). The merit of the TC predictors is that they are not only associated with the genesis conditions (e.g., the MT) but also related to the background steering winds in which TCs are embedded. Figure 11 shows the cross-validated prediction of the SCS TC frequency using the statistically derived predictors for the period 1979-2019. For the SCS, the TC frequency predicted by predictors 1-3 is significantly correlated with the observed frequency (i.e., total frequency, frequency residual 1 and frequency residual 2), with r = 0.37, Figure 10. As Figure 9, but for the Philippines (PHL) tropical cyclones (TCs). Climatology of PHL TC genesis is shown in Figure 5b.

Statistical Prediction Performance
0.48, and 0.56, respectively. In the cross-validation, the predicted total frequency using three predictors agrees well the observed total frequency, with r = 0.73.
The cross-validated prediction for the PHL TCs is shown in Figure 12. The frequency predicted by predictors 1 and 2 is not significantly correlated with the observed frequency, with r = 0.10 (p = 0.51) and 0.25 (p = 0.11), respectively. This contrasts with the significant TC-predictor correlations (r = 0.37 and 0.41; Figure  S2 in Supporting Information S1). The difference means that the two predictors, which resemble the ENSO and PMM signals, respectively, overfit the relationship with TCs. The disparity could be related to several reasons, including (but not limited to) the nonstationarity of the TC relationship with ENSO and PMM. This is especially the case for predictor 1, where ENSO could be dependent on, or modulated by, other climate modes on decadal to multidecadal timescales, leading to an inconsistent TC-ENSO teleconnection in different periods (e.g., Newman et al., 2016;Yeh et al., 2018;Wang et al., 2020;Zhao & Wang, 2016). In the cross-validation, the PHL TC predictor 1 has a better performance over 1979-1999 (r = 0.34, p = 0.13) than over 2000-2019 (r = 0.11, p = 0.64), confirming that the performance is not stationary with time. The prediction determined by predictors 3 and 4 performs well for the corresponding frequency, with r = 0.36 and 0.42 (both significant at the 95% confidence level), respectively. The predicted total frequency of PHL TCs using the four predictors has a significant skill, with r = 0.54. The removal of predictors 1 and 2 does not have a substantial effect on the total frequency prediction (Figure 12f), with r = 0.45, suggesting the prediction source is mainly from predictors 3 and 4. We also evaluated the sensitivity of the cross-validated prediction to different lengths of the omitted block (Table  S2 in Supporting Information S1). When the block length increases from 1 to 7 years, the predictor performance varies marginally within 0.71-0.73 for SCS TCs and within 0.50-0.59 for PHL TCs. Thus, the performance of predictors does not change substantially when the length of the omitted block changes in the cross-validated prediction.

Hybrid Prediction Performance
We further develop the statistical prediction method into a hybrid statistical-dynamical approach, based on the large-scale environment produced by the dynamical seasonal hindcasts from SEAS5, GloSea5, and GloSea6. This means that the hybrid predictors are "predicted" by the dynamical hindcasts. If the dynamical models perfectly predict the predictor-associated environmental conditions, the performance of the hybrid predictors will be the same as the anomaly correlation between the ERA5-based predictors and observed TC frequency, that is, the best potential performance. The hybrid prediction is directly verified against observed TC frequency, rather than through the cross-validation as in the statistical prediction. In the hybrid prediction, we use the ensemble mean of seasonal hindcasts to extract predictors. The hindcast period is 1993-2015, which is the available period of the C3S seasonal hindcasts. To evaluate the effect of lead time, we use the hindcasts initialized in April, March and February (i.e., 3-5-month lead time).
We first reidentified the statistical predictors for 1993-2015 using ERA5, as different periods may have different predictors (discussed above). The disparity between the two periods is expected to be due to the nonstationarity of the environment-TC relationship. For 1993-2015, we also identified three statistical predictors for the SCS TC frequency using the ERA5 environment ( Figure S7 in Supporting Information S1). The 1993-2015 predictors are built from the same environmental variables as the 1979-2019 predictors but averaged over slightly different areas. Contrasting with the four predictors for 1979-2019, three predictors of PHL TCs were identified for the shorter period ( Figure S8 in Supporting Information S1). The hybrid predictors for 1993-2015 were estimated by averaging the environmental variables in dynamical hindcasts over these "predefined" areas. Table 1 provides the skill of the statistical prediction, and the skill of the hybrid prediction based on SEAS5, GloSea5, and GloSea6 hindcasts, for the total TC frequency. For the SCS TC frequency, the statistical prediction (r = 0.80) performs slightly better than the hybrid prediction (r = 0.70-0.77) that uses the April hindcasts (3-month lead time). The hybrid prediction performs similarly when different dynamical models are used. When the March and February hindcasts (4-5-month lead time) are used, the skill of the hybrid prediction decreases to r = 0.42-0.48 (significant at the 95% confidence level) and r = 0.32-0.38 (not significant), respectively. For the PHL TCs, the hybrid prediction with the shortest lead time (the April hindcasts) performs best, with r = 0.51-0.56. This performance is similar to the statistical prediction (r = 0.57). As expected, when lead time increases, the hybrid prediction performance decreases. For the PHL TCs, the GloSea5-based hybrid prediction appears to perform better than that from the other two models. This is especially the case for 4-5-month lead time, for which GloSea5-based predictors have a significant skill while the SEAS5-based and GloSea6-based predictors have either no significant skill or a lower skill.
The performance of individual predictors in the hybrid prediction is also analyzed. Figures 13 and 14 demonstrate the predicted variability of TC frequency using SEAS5-based predictors. For the SCS TCs, three individual predictors with the shortest lead time (3-month lead time, the April hindcasts) have the best performance (r = 0.35, 0.52, and 0.63), which is close to the empirical prediction (r = 0.34, 0.59, and 0.67) ( Figure 13). When lead time increases to 5 months, the skill of individual predictors becomes insignificant at 95% confidence level, except predictor 3 (r = 0.51). We further find that all these predictors are predicted well in dynamical hindcasts, with r = 0.68-0.95 for 3-5-month lead times ( Figure S9 in Supporting Information S1). It is worth noticing the hybrid predictor 1, which has the lowest skill among all predictors ( Figure 13). This is due to the weak relationship between the predictor and TCs over this period, rather than the low predictability of the predictor.
For the PHL TCs, the hybrid predictor 1 has no skill (r < 0.20), regardless of lead time ( Figure 14). The lack of skill is related to the overfitting of predictor 1, which has no skill either in the cross-validated statistical prediction (r = −0.14). Although the statistical predictor 2 performs well (r = 0.48), the hybrid predictor 2 has no significant  skill (r < 0.25). This inconsistent performance is related to the low predictability of predictor 2 in dynamical hindcasts ( Figure S10 in Supporting Information S1). Thus, the low performance of the hybrid predictors 1 and 2 is likely related to the nonstationarity of the TC-predictor relationship and the low predictability of the environment associated with the predictors. Predictor 3 performs well in both statistical and hybrid predictions (r = 0.47-0.51). Such good performance of hybrid prediction is related to the high predictability of predictor 3 in models. The same can be applied to GloSea5-and GloSea6-based predictors, and the results remain similar (not shown). Figure 15 shows the skill of dynamical hindcasts in predicting the preseason SSTs and low-level zonal winds. In the April-initialized hindcasts, all three models accurately predict the SST anomalies in the central and eastern Pacific and in the Indian Ocean (r > 0.7), with a slightly better prediction in SEAS5. This is consistent with the generally high predictability of the predictors (Figures S9 and S10 in Supporting Information S1), which are extracted from the environment over these two basins. It is interesting to recall the relatively low predictability of the PHL TC predictor 2. Although the SST anomalies in the north-eastern Pacific are significantly predicted in hindcasts, the residuals of SSTs over this region after removing the effect of other predictors could be less accurately predicted. Additionally, the model performance for SSTs over this region degrades greatly when lead time increases (not shown). Thus, the hybrid prediction relies on the robustness/stationarity of the statistical predictor-TC relationship, and the capability of dynamical models to predict the tropical SSTs and low-level winds that form the hybrid predictors.

Conclusions and Discussion
Accurately predicting the SEA seasonal TC activity several months ahead can provide an early warning service that may significantly reduce TC-related impacts. The ENSO index tends to have an overall weak correlation with the peak-season SEA TCs, related to the complexity (or nonlinearity) of the ENSO teleconnections and the possible competing or compound effects with other climate variability. It is challenging for the approaches that are solely based on the ENSO indices to predict the peak-season SEA TCs. In this paper, we introduced a novel approach (SOIP) to predict the occurrence frequency of the peak-season TCs in two subregions of SEA: SCS and PHL. Independent predictors, which have proven to have cross-season effects on different clusters of the regional TCs, are sequentially constructed from the preseason (AMJ) large-scale environment. The predictors are drawn from the SSTs and low-level zonal winds in the tropics, which represent the atmospheric and oceanic conditions that are essential for the year-to-year variability in TC generation and track. The SOIP approach is conceptionally different from the traditional statistical methods, which rely on the known relationships (which are mostly linear) between climate variability (e.g., ENSO) and TC activity. Instead, the SOIP approach constructs the predictors that are tailored for the regional TCs. We showed that the predictors can well represent the interannual variability of SEA TCs that form either locally or remotely. In the 41-year period , the statistical approach based on ERA5 provides a skillful prediction for the observed frequency of SCS and PHL TCs, with r = 0.73 and 0.54, respectively. The relatively low performance for PHL TCs could be related to the nonstationarity of the predictors which resemble the ENSO and PMM signals.
We further developed the statistical approach into a hybrid prediction system. In the hybrid prediction, the predictors, which are predefined in ERA5, are predicted by three dynamical seasonal hindcasts (SEAS5, GloSea5, and GloSea6). The skill of the hybrid prediction is comparable to the statistical prediction when the April-initialized hindcasts are used (r = 0.7-0.8 and 0.5-0.6 for the SCS and PHL TC frequency, respectively). When using the March-initialized hindcasts, the skill of the hybrid prediction is reduced but still mostly significant (r = 0.4-0.5 and 0.3-0.5 for the SCS and PHL TCs, respectively; only not significant when using SEAS5 March hindcasts). The merit of the hybrid prediction relative to the statistical prediction is the longer lead times (4-5 months ahead). We further confirmed that the main sources of the hybrid prediction skill are from the robustness of the observed predictor-TC relationship, and the capability of dynamical models to predict the tropical SSTs and low-level winds.
The performance of seasonal hindcasts (SEAS5, GloSea5, and GloSea6) in directly predicting the SCS and PHL TCs is provided in Table 1. Over the period 1993-2015, only the GloSea5 June hindcasts show a significant skill in predicting the frequency of SCS TCs (r = 0.48), while none of the three models predict the PHL TC frequency. The new developed statistical and hybrid predictions in this paper outperform the dynamical predictions, indicating the potential for operational use. The low performance of the dynamical models for SEA TCs is consistent with previous studies (Befort et al., 2021;Feng, Klingaman, Hodges, & Guo, 2020). The biased teleconnections of climate variability, and the biased TC precursor systems in the tropics such as westward propagating tropical depressions, are among the possible reasons. Further detailed model evaluation is required to understand the model performance, which is beyond the scope of this article.
The SOIP approach in our paper shares a similar decomposing concept to the empirical orthogonal function (EOF) analysis, which is used to identify the basin-wide patterns of TC activity (e.g., Kim et al., 2005;Liu et al., 2021). Both can decompose the orthogonal variability of TCs. However, the SOIP is initially confined to the temporal variability that must be principally related to the environment, whereas the EOF analysis recognizes the geographical variability of TCs and during the analysis it does not consider the environmental conditions. We did the EOF analysis of the peak-season TC genesis and track density (both the SCS and PHL TCs). However, no robust relationships were found between the leading principal components of EOFs and the preseason environment. Although the SOIP approach is initially developed for the frequency of peak-season SEA TCs, it can be applied to other seasons, other regions and other TC metrics (e.g., intensity). In the future, it will be worth evaluating the strength and weaknesses of the SOIP approach in those applications against other statistical and dynamical methods, to form a synthesized seasonal forecast conditional on the performance of prediction methods.  (d-f) as (a-c), but for correlations of 850-hPa zonal winds. Only correlations that are significant at the 95% confidence level are shown. (Rayner et al., 2003) and https://psl.noaa.gov/data/timeseries/monthly/PMM/ (Chiang & Vimont, 2004) from the NOAA Physical Sciences Laboratory. IBTrACS TC data (West Pacific) are archived from https://www.ncei.noaa. gov/products/international-best-track-archive?name=ib-v4-access (Knapp et al., 2018;doi:10.25921/82ty-9e16).

Data Availability Statement
Computing and data storage facilities were provided by JASMIN (https://jasmin.ac.uk).