Sub‐Seasonal Predictability of North American Monsoon Precipitation

North American Monsoon (NAM) rainfall is a vital water resource in the United States Southwest, providing 60–80% of the region's annual precipitation. However, NAM rainfall is highly variable and water managers lack skillful guidance on summer rainfall that could help inform their management decisions and operations. Here we show that NAM season (June–October) precipitation can be forecasted by the European Centre for Medium‐Range Weather Forecast's model months ahead at catchment scales. This is possible by identifying the frequency of days with synoptic‐scale moisture advection into the NAM region, which greatly improves predictability over directly utilizing modeled precipitation. Other forecasting systems fail to provide useful guidance due to deficiencies in their data assimilation systems and biases in representing key synoptic features of the NAM including its teleconnections.

• Seasonal forecasting models do not have skill in simulating North American Monsoon (NAM) rainfall • We use a data driven weather type (WT) algorithm to identify synoptic situations associated with NAM rainfall on catchment scales • The frequency of monsoonal flow WTs days allows for the skillful forecast of NAM season rainfall starting in April Supporting Information: Supporting Information may be found in the online version of this article.
Predicting NAM precipitation on seasonal to sub-seasonal (S2S) time scales remains challenging. Historic observations show that NAM characteristics are modulated by Pacific sea surface temperature patterns (Castro et al., 2001). Observations also show that wet winters tend to be followed by dry monsoon seasons and vice versa in New Mexico and Arizona (Gutzler & Preston, 1997;Higgins et al., 1998). However, tree-ring reconstructions covering the last five centuries indicate that this link is weak, at best, and not stable over time (Griffin et al., 2013). Zhu et al. (2005) found no significant relationships between the antecedent soil moisture conditions during NAM onset and total precipitation. Additionally, S2S forecasting systems have no significant skill in predicting precipitation in the U.S. Southwest after two to 3 weeks of leadtime (Krishnamurti et al., 2002;Li & Robertson, 2015;Slater et al., 2019). The dominant processes that could contribute to S2S predictability of early season NAM precipitation are warm-season atmospheric teleconnection modes such as the West Pacific North America pattern or quasi-stationary Rossby wave trains (Castro et al., 2012;Ciancarelli et al., 2014). These teleconnections influence the positioning and seasonality of the monsoon high and thereby the amount of precipitation in the region (Carleton et al., 1990).
In many snowmelt-driven basins in the Western U.S., operational seasonal water supply forecasts decisions are mainly based upon snowpack and don't explicitly consider any information about monsoon precipitation due to its low predictability. Skillful forecasting products that intersect with decision points on seasonal and annual planning horizons are extremely valuable to water managers and are a research priority (Bureau of Reclamation, 2018Reclamation, , 2021Raff et al., 2013).
Here we present a forecasting framework that leverages the ability of current seasonal forecasting systems in simulating large-scale circulation patterns that are associated with monsoonal rainfall rather than utilizing erroneously modeled rainfall directly (Crochemore et al., 2016). The goal of this study is to establish a set of weather types (WTs) that can be used in a seasonal forecasting framework to investigate the potential to predict NAM precipitation variability. Therefore, we use a novel data-driven weather typing algorithm based on clustering (Prein & Mearns, 2021) that uses objective criteria to identify archetypal WTs. Specifically, we derive WTs that are optimized for NAM precipitation in catchments in Arizona and New Mexico, which include the Lower Colorado River Basin and Rio Grande Basin, respectively. The algorithm tests many combinations of atmospheric predictor variables, establishing the optimal set of WTs from two skill scores based on the precipitation average and standard deviation of the derived WTs. Similar approaches have been shown to provide novel insights into the origin of precipitation changes on climate time scales (Lehner et al., 2017;Prein et al., 2016). Further, we investigate the potential for skillful forecasts of NAM season rainfall by examining the ability of seasonal forecasts to predict the identified WTs, versus predicting precipitation directly.

Season and Region of Interest
We investigate June to October conditions, which incorporates the core monsoon season from July to September and helps to assess the onset and decay of the monsoonal high. Our focus region includes eight catchments in Arizona and six in New Mexico (Figure 1a). Those catchments were selected due to their importance for water management in these states. We derive WTs for each of these catchments based on the skill scores discussed below.

Observations and Reanalysis
We use the Parameter-elevation Regressions on Independent Slopes Model (PRISM, Daly et al., 1997) gridded daily precipitation data for calculating precipitation statistics. PRISM provides data for the 1982-2018 period on a 4 km horizontal grid over the conterminous United States.
For the WT classification, we use daily average atmospheric variables from ECMWF's Interim Reanalysis within the period 1982-2018 (Dee et al., 2011). The selection of predictor variables is limited by variables that are commonly stored by seasonal forecasting centers; we include the following 12 variables in the analysis: sea level pressure, 850 hPa zonal, meridional, and total wind speed, 850 hPa and 500 hPa moisture flux, water vapor mixing ratio, and air temperature, 500 hPa geopotential height, and 200 hPa wind speed. The same variables are also important on weather forecast timescales, as they related to dynamic forcing and thermodynamic instability.

Weather Typing Methodology
The WT algorithm was modified from the one presented in (Prein & Mearns, 2021). In short, it is a data-driven clustering algorithm that tests atmospheric predictor combinations to find optimal WTs based on objective skill scores associated with NAM precipitation. The details of the WTing algorithm are outlined below.
All days between June and October within the period 1982-2018 are considered in the clustering. We consider all 12 above-mentioned variables and up to combinations of three variables (220 possible combinations) as predictors for catchment precipitation. First, we calculate daily anomalies from the mean climate state of each predictor on a grid cell basis and apply Gaussian spatial smoothing (sigma = 0.5) to remove small-scale variability from the predictor variables. Next, we normalize the anomalies by their regional average mean and daily standard deviation. The normalized anomaly fields are then clustered using the results of a hierarchical (using Ward's linkage on a condensed distance matrix) cluster algorithm as initial seed for a k-means clustering (Romesburg, 2004). The combination of these two cluster algorithms showed high skill in a WT intercomparison study (Schiemann & Frei, 2010) and was successfully applied to capture precipitation characteristics over the US (Prein et al., 2016) and weather conditions in other regions (Comrie, 1996;Ekstrom et al., 2002).
For each catchment, we assess how these skill scores change if WTs from the remaining 13 basins are used as predictors of the selected catchment's NAM precipitation ( Figure S1 in Supporting Information S1). In Arizona, using the WTs from the two northernmost catchments (Lower Colorado-Lake Mead subregion-HUC1501; Little Colorado subregion-HUC1502) results in higher skill in characterizing precipitation in southern (upwind during NAM conditions) catchments. Based on these results we cluster the catchments in Arizona in two regions-Arizona West (AZ West containing HUC1501, HUC1503, HUC1507, HUC1810) and Arizona East (AZ East containing HUC1502, HUC1504, HUC1506, HUC1508). Similarly, we cluster the catchments in New Mexico into Northern catchments (NM North containing HUC1301, HUC140801, HUC130201, HUC130202) and southern catchments (NM South containing HUC130301 and HUC1306). The differences between Arizona and New Mexico WTs is not surprising since previous literature showed that spatial variability of monsoon precipitation differs east and west of the U.S. continental divide (Castro et al., 2012;Ciancarelli et al., 2014).
We derive clusters for each of the 14 target catchments ( Figure 1a) and test the sensitivity of the cluster domain size by adding 2°, 5°, and 10° around a rectangle that includes the catchment of interest to test sensitivities to the clustering region (Beck et al., 2016).
We use two skill scores to assess the quality of derived WTs dependent on the predictor variables and WT domain size. The first is the absolute average of WT centroid (cluster mean state) precipitation anomalies (PR anom ), which should be maximized-the precipitation in the derived WTs should be as different as possible from the climatological precipitation. The second skill score is the ratio of the intra-versus inter-cluster standard deviation of precipitation (IvI), which should also be maximized. Although these two scores are correlated ( Figure S2 in Supporting Information S1) combining them helps to improve the robustness of the derived WTs. This score favors WTs that have days with similar precipitation within WTs and whose precipitation is different between WTs. The combination of these two scores has been successfully used in previous WTing applications (Prein et al., 2016).
Based on these skill scores we select the top 10% of the tested predictor combinations and count how often each variable is included in a well-performing setting ( Figure S3 in Supporting Information S1). Water vapor mixing ratio at 850 hPa (Q850) is by far the most frequently used predictor in well-performing settings in all catchments and results in close to optimal performance as a single predictor. Performance differences between different WT domain sizes are small but adding 5° around each catchment performed best in Arizona catchments and adding 2° was best in New Mexico. WTs are derived for each basin based on ERA-Interim data. The WT centroids are then use to associate each day in the seasonal forecasts to the most similar WT centroid according to their Euclidean distance.

Seasonal Prediction Systems
We use seasonal forecasts from the North American Multi-Model Ensemble (NMME; Kirtman et al., 2014) and the Integrative Forecasting System (IFS, Version 5) seasonal forecasts from ECMWF from the Copernicus Climate Change Service. We included all NMME models that provide daily Q850 for their hindcast (retrospective forecasts) runs, which are: NCAR CESM1, UM-RSMAS CCSM4, and ECMWF IFS. We also include NASA-GMAO GEOS-5 and CCCMA CanCM4 but use Q650 and Q675 respectively instead due to missing data at the 850 hPa level. This should have little impact on the predictive skill of those models since moisture patterns are well correlated between the 850 hPa and 650 hPa level (absolute differences do not matter due to the use of daily anomalies in the WTing). All models have a horizontal grid spacing of one degree. NMME models run a ten-member ensemble forecast while ECMWF runs 25 members. All models are initialized on the first of each month and forecast 365 days except for NASA-GMAO GEOS-5 and ECMWF IFS, which forecast 273 and 215 days respectively. The common hindcast period for the NMME models is 1982-2010 while ECMWF IFS provides hindcasts for 1993-2016.
To associate seasonal forecasts days with one of the derived WTs we use the same pre-processing as described above (i.e., for the predictor variables we derive daily anomalies, apply spatial smoothing, and normalize the anomalies). Next, we regrid the ERA-Interim based WT centroids to the one-degree model grid and calculate Euclidean distances to all WTs for each forecast day. A forecast day is assigned to a WT according to the lowest Euclidean Distance. We use linearly detrended time series for calculating anomaly correlation coefficients.

Results
Monsoon season (June-October) precipitation contributes between 60% (AZ West) to 80% (NM South) to the annual rainfall in the four study areas (Figure 1b). The precipitation in this period typically comes from moisture surges from the Gulf of Mexico and tropical Pacific (Favors & Abatzoglou, 2013;Higgins et al., 1997) intercepted by dry periods. We aim to capture this variability with our WTing analysis.

Dominant Weather Types
Three weather patterns are sufficient to characterize the dominant effects of synoptic-scale variability on precipitation in the four target regions. The centroids of the three WTs in the AZ West region are shown in Figures 1d-1f. The first WT (monsoon) is associated with monsoonal moisture surges and characterized by tropical moisture advection from the Gulf of California and the eastern tropical Pacific further south into the target catchments resulting in a high water vapor mixing ratio at 850 hPa (Q850) and precipitable water anomalies. Monsoon WT days have a rapid onset in July, peak in August, and decay in September. This WT resembles the published definition of the NAM large-scale circulation and seasonal occurrence (Vera et al., 2006) and are often referred to as "moisture surges" (Pascale & Bordoni, 2016). Although only 19% of days within June to October are associated with this WT, they produce the majority of monsoon season precipitation (Figure 1c). Our definition of wet NAM days is more general than most published definition of synoptic patterns that cause wet surges. This is because we aim to used these patterns in a probabilistic predictive framework rather than understanding the drivers of wet surges in this region, which is already well studied (Higgins et al., 2004;Jiang & Lau, 2008;Schiffer & Nesbitt, 2012;Seastrand et al., 2015). Our wet WT definition includes the classical definition of wet surges but also allows capturing wet days with other synoptic-scale patterns (see Figure S13 in Supporting Information S1 and Movie S1 for an example). The second weather pattern (normal, Figure 1e) is associated with more zonal flow, a subtropical height that is weaker and centered on the coast of Texas, and climatologically average precipitable water anomalies. Days within this WT are 20%-40% dryer than average days and only contribute about one fifth of the precipitation of monsoonal flow days. The third WT (dry, Figures 1c and 1f) is associated with anomalously dry air advection, zonal flow, and extremely dry precipitation anomalies and resembles previously found dry patterns (Schiffer & Nesbitt, 2012). Dry WT days occur most frequently in June and October. Deriving WTs based on July-September conditions results in similar patterns (not shown), which indicates that these patterns are robust and inherent to the core monsoon season. An example of the occurrence of wet, normal, and dry WT days compared to daily average precipitation in the AZ West region for the dry 2009 and wet 2018 NAM season is shown in Figure S12 in Supporting Information S1.
The WTs for the other focus areas are shown in the supplement (Figures S4-S11 in Supporting Information S1). Each area features a monsoon, normal, and dry WT. The monsoonal WTs mainly differ in the size and location of the subtropical high-pressure system ( Figure S15 in Supporting Information S1 shows 500 hPa geopotential height anomalies). The high-pressure system is shifted further to the east and south particularly in the NM North and South region, which facilitates the advection of Gulf of Mexico moisture rather than moisture from the Pacific as in the Arizona catchments.
The pronounced relationship between WTs and precipitation anomalies in the target regions results in a significant (Pearson's r = 0.65) correlation between the seasonal average frequency of monsoon WT days and observed monsoon season mean precipitation in the AZ west region (Figures 2a and 2b). Similarly high correlations are found in the other three regions (Figure 2c). Also, a high frequency of dry WT days is significantly negatively correlated with monsoon seasonal average precipitation.

Seasonal Forecasts of Monsoon Precipitation in Hindcasts
There are large differences between the performance of the analyzed models in simulating the position and seasonal evolution of the monsoonal high pressure system that partly depend on the lead time of the forecast ( Figure S15 in Supporting Information S1). The ECMWF-IFS system has the best climatological representation of the monsoonal high, while the other models feature a too far northward extension and partly erroneous seasonal progression.
Also, the ability to simulate the seasonal evolution of Q850 over the AZ West WT domain strongly depends on the forecasting system and the forecast lead time (Figures 3a and 3b). National Center for Atmospheric Research's Community Earth System Model v1 (NCAR-CESM1) forecasts feature a dry bias in Q850 during the peak monsoon season. This is not necessarily a problem as long as this bias is systematic since we perform the WTing with Q850 anomalies. A more severe issue is the considerable initiation shock-visible in the June, July, and August forecasts-which typically has strong negative impacts on the forecast quality (Mulholland et al., 2015). Similar to the positioning of the monsoonal high, the ECMWF-IFS v5 model forecasts show a much better (e and f) Annual average detrended and normalized regional averaged observed (gray) and May forecast precipitation (orange). (g and h) Similar to (e and f) but for May forecast average detrended and normalized monsoon day frequency (orang) versus observed precipitation. (i-l) Heatmaps showing the Pearson correlation coefficient between the detrended and normalized observed basin average precipitation and forecasted monsoon WT frequency. Panels (i-l) show results from the AZ West, AZ East, NM North, and NM South watershed (from left to right). Each panel shows results from each modeling system (columns) and forecast start months (rows). Each month and model segment (see explanation on the bottom left) shows correlation coefficients for June-October (top two rows), June-August and September-October (second row from below), and each month (bottom row). Hatching indicates significant correlation coefficients (p < 0.1).
climatological evolution of the Q850 seasonal cycle and do not feature an initialization shock. The other forecasting systems (not shown) feature initialization shocks that are typically not as severe as the ones in the NCAR-CESM1 system.
The quality of simulating climatological Q850 conditions is reflected in the model's ability to capture the frequency and seasonality of monsoonal flow WT days (Figure 3c and 3d and Figure S16 in Supporting Information S1). NCAR-CESM1 has too many monsoon WT days in June and too few during the peak monsoon season. The impacts of the initialization shock are also visible in the forecasted monsoon WT frequencies. IFS has a much better simulation of the amount and seasonality of monsoon WT days (Figure 3d) and generally outperforms any of the other forecasting systems ( Figure S16 in Supporting Information S1).
Using the modeled precipitation for forecasting observed NAM season precipitation results in no significant skill-measured by the anomaly correlation coefficient (Figures 3e and 3f)-confirming results from previous studies (Slater et al., 2019). This is also true for using NAM season monsoon WT frequencies from the NCAR-CESM1 forecast (Figure 3g; and the other NMME models). However, using the May forecast from the IFS model allows us to skillfully predict monsoon season precipitation in the AZ West region (Figure 3h; Pearson's r = 0.66). Even the April forecast is skillful (Figure 3i). Predicting the precipitation in individual months is more challenging except for the July and August forecast, which are skillful in predicting precipitation during the first forecasting month. Generally, June to August precipitation is more predictable than September and October rainfall.
IFS has similar high skill in forecasting precipitation in the AZ East region (Figure 3j; except for the April forecast) compared to AZ West. Predictability is generally lower in New Mexico and skillfully forecasting NAM season precipitation in the NM North region becomes feasible in June (Figure 3k) while no significant predictability is found in the NM South region (Figure 3l).

Sources for Predictability
Here we investigate the potential sources of predictability that allow skillful sub-seasonal precipitation forecasts by the IFS model.
Compositing the top 25% of years (top 9 years out of 37) with the highest frequency of monsoonal flow WT days in June and July in the AZ West region show significant 500 hPa geopotential height anomalies that resemble a Rossby wave train with alternating high and low anomalies (Figure 4a). Using the top 10% of years results in similar patterns (not shown). We focus on June on July conditions since they have the highest predictability (see Figure 3). A similar pattern has been identified previously as driver of continental-scale precipitation variability in the U.S. (Castro et al., 2012;Ciancarelli et al., 2014). This pattern is associated with La Nina like tropical Pacific sea surface temperatures (Figure 4b), which is in line with some previous studies (Ciancarelli et al., 2014;Higgins & Shi, 2001) but in opposition to others (Grantz et al., 2007). The anomaly patterns change gradually when moving from the AZ-West to the NM-South region with the high anomaly over the CONUS moving eastward and weakening and the low anomalies in the Gulf of Alaska and near the Gulf of California becoming stronger ( Figure S17 in Supporting Information S1).
None of the forecasting systems is able to capture these anomaly patterns perfectly although we should expect some differences due to the different simulation periods. The June forecasts from the IFS model resemble the observed pattern best (Figure 4l) while the model's April forecasts miss the low anomaly in the Gulf of Alaska but still feature a correctly but too weak high anomaly over the central U.S. (Figure 4k). Nevertheless, IFS's April forecast better captures the observed pattern than most of the models forecast in June. for example, the GEOS-5 model has too strong and negative connections to the tropical Pacific ( Figure 4f) while CanCM4 has too strong positive connections (Figure 4j). Most models struggle with correctly simulating the positive height anomaly in the western U.S. and many models show fundamentally different patterns in their April forecasts compared to their June forecasts. Figure 4. Coefficient of variation of the 500 hPa geopotential height (ZG500) anomaly of the 25% years with the highest June and July monsoon weather type frequencies compared to the climatological average. Shown is data from ERA-Interim (a;1982-2018, the four included North American multi-model ensemble models (c-j; 1982-2010); and the ECMWF forecast (k-l; 1993-2016

Discussion and Conclusion
We assess the ability of seasonal forecasting systems to predict the North American Monsoon (NAM, June to October) precipitation in Arizona and New Mexico catchments that are important for water management. Due to significant biases in simulating precipitation in state-of-the-art seasonal forecasting systems (Crochemore et al., 2016), we developed a forecasting framework that uses hydrologically important synoptic-scale weather types (WTs) for catchment-scale precipitation predictions. We found that three WTs-a dry, normal, and wet (monsoonal flow) WT-are sufficient to characterize the interseasonal and interannual variability of precipitation in the U.S. NAM region.
We show that the observed seasonal average frequency of monsoonal flow WTs significantly correlates with catchment-scale seasonal average precipitation in all regions. Most importantly, we show that ECMWF's IFS forecasting system can skillfully predict NAM season rainfall at catchment scales with several months lead-time (i.e., the April forecast is skillful over the AZ West catchments) except for the NM South catchments while using the model's precipitation as a predictor does not provide predictive skill. The other evaluated forecasting systems do not have predictive skills partly due to large initialization shocks (Mulholland et al., 2015), and an erroneous simulation of the low-level moisture transport into the study regions. Additionally, ECMWF-IFS has a superior simulation of the location and seasonal evolution of the monsoonal high pressure ridge and outperforms the other models in simulating teleconnection patterns (Castro et al., 2012;Ciancarelli et al., 2014) that control the position of the monsoon high and the amount of regional precipitation. The superior performance of the ECMWF-IFS system is not dependent on the use of ERA5 reanalysis data, since using MERRA2 data (Gelaro et al., 2017) results in similar seasonal and inter-annual WT properties (not shown).
These results push the boundaries of seasonal predictability of regional precipitation (Slater et al., 2019) and offer novel opportunities for improved water resource management on S2S time scales in the U.S. Southwest. The presented WTing framework is flexible and could offer enhanced predictive skill also for sub-seasonal forecasts and in other drought-prone regions around the world. The presented results also show that many S2S systems have to advance their data assimilation systems and reduce biases in their climatological representation of regional phenomena such as monsoon circulations. Having multiple prediction systems with the quality of ECMWF-IFS would allow us to construct more skillful multi-model ensemble forecasts resulting in further improvements of predictive skill. Importantly, even skillful models such as ECMWF-IFS are not able to simulate the mesoscale processes that drive local precipitation. Future research should focus on the potential added value of using convection-permitting models (Prein et al., 2015) for S2S prediction, which could result in a step-improvement in our skill to simulate mesoscale weather phenomena and their extremes , snowpack dynamics (Rasmussen et al., 2011), land-atmosphere interactions , and could result in improved S2S predictive skills by better simulating teleconnections due to the dynamic interaction of convective scales with larger-scale processes (Weber & Mass, 2019).