Atmospheric River Sequences as Indicators of Hydrologic Hazard in Historical Reanalysis and GFDL SPEAR Future Climate Projections

When multiple atmospheric rivers (ARs) occur in rapid succession, the combined effect on the hydrologic system can lead to more flooding and damage than would be expected from the individual events. This temporally compounding risk is a source of growing concern for water managers in California. We present a novel moving average‐based definition of AR “sequences” that identifies the time periods of elevated hydrologic hazard that occur during and after consecutive AR events. This marks the first quantitative evaluation of when temporal compounding is contributing to AR flood risk. We also assess projected changes in sequence frequency, intensity, and duration in California using the Geophysical Fluid Dynamics Laboratory Seamless System for Prediction and EArth system Research (GFDL SPEAR) global coupled model. Sequence frequency increases over time and is fairly uniform across the state under both intermediate (SSP2–4.5) and very high (SSP5–8.5) emissions scenarios, with the largest changes occurring by the end of the century (+0.72 sequences/year in SSP2–4.5, +1.13 sequences/year in SSP5–8.5). Sequence intensity and duration both see increases in the medians and extreme values of their respective distributions relative to the historical baselines. In particular, “super‐sequence” events longer than 60 days are projected to occur 2–3x more frequently and to emerge in places that have never seen them in the historical record. In a world where California precipitation is becoming more variable, our definition of sequences will help identify when and where hydrologic impacts will be most extreme, which can in turn support better management of the state's highly variable water resources and inform future flood mitigation strategies.

Up to 85% of California's interannual precipitation variability can be attributed to ARs, long, filamentary corridors that carry moisture from the tropics to the extratropics (Dettinger, 2016).While there is no single agreedupon method for detecting ARs in the meteorological record, an event is generally identified and labeled as an AR based on a combination of its geometry, duration, and some metric of moisture content such integrated water vapor transport (IVT) (Rutz et al., 2019).A large body of research has tied ARs to extreme precipitation both globally (Liu et al., 2020;Waliser & Guan, 2017) and specifically in California and the western United States (US) (Ralph et al., 2019).ARs bring 30%-50% of California's precipitation and make up 60%-100% of the most extreme storms regionally (Lamjiri et al., 2017).AR landfalls are closely related with heavy precipitation events-ARs produce up to twice as much precipitation as other types of storms (Neiman et al., 2008) and are a major contributor to extreme precipitation events (Chen et al., 2018).Because ARs cause such significant precipitation, they are also strongly connected to hydrologic impacts.ARs have positive effects on California's hydrology as drought busters (Dettinger, 2013), ecosystem boosters (Albano et al., 2017), and sources of reservoir replenishment (Guan et al., 2010).However, ARs also contribute to significant flooding (Albano et al., 2020;Chen et al., 2019;Dettinger, 2011;Konrad & Dettinger, 2017;Leung & Qian, 2009;Ralph et al., 2006).This has led to ARs being labeled as a dual "boon and bane" for California communities and residents (Rhoades et al., 2020).
ARs, and their consequences, are projected to intensify in a future climate.Previous research shows that ARs will increase in frequency and in size along the western coast of the US (Gao et al., 2015;Massoud et al., 2019;Payne et al., 2020;Rhoades et al., 2020), although both the choice of climate model (Espinoza et al., 2018) and AR detection algorithm (O'Brien et al., 2022) affect the estimated magnitude of the trend.With these atmospheric changes come increased hydrologic volatility (Swain et al., 2016).While precipitation from smaller, non-AR storms is projected to decrease, AR events are likely to see increased precipitation, contributing a larger percentage to California's overall annual precipitation total (Dettinger, 2016;Gershunov et al., 2019).A larger portion of AR-driven precipitation is projected to occur during extreme storm events rather than the more moderate beneficial events.The most extreme ARs in California are projected to deliver precipitation at higher intensity rates in shorter amounts of time (Persad et al., 2020) and to generate precipitation in excess of Clausius-Clapeyron scaling estimates, with increases in some places as large as 40% (X.Huang et al., 2020).
However, when considering the hydrologic and economic consequences of present and future ARs, assessing individual events in isolation only reveals part of the picture.Compound events have been identified globally as an important area of study for our changing climate (Seneviratne et al., 2021).ARs and their impacts fit many of the compound event typologies defined by Zscheischler et al. (2020).They can be multivariate, for example, through simultaneous coastal and riverine flooding (Khouakhi & Villarini, 2016); spatially compounding, for example, through teleconnected climate modes (Ward et al., 2014); preconditioned, for example, through rain-on-snow flooding (Guan et al., 2016); or temporally compounding, for example, through the "back-to-back" AR events that are the focus of this study.ARs are uniquely suited to a temporally compounding perspective because of the association between successive cyclones and heavy precipitation (Moore et al., 2021) and the inherent lag between the end of the AR event and the time for the hydrologic system to fully recover (Fish et al., 2022).This time lag amplifies the impacts of back-to-back events and can lead to consequences beyond what a single AR could generate on its own.

Motivation and Scope
The first step toward understanding temporally compounding events is to define when they exist, that is, when two or more individual hazards occur close enough together in time and space to cause interacting and overlapping effects.Fish et al. (2019) created a metric to identify "AR families" that focused on differentiating the underlying synoptic conditions.Outside of the world of ARs, Baldwin et al. (2019) created a metric for temporally compounding heat waves.However, both are count-based methods, meaning they combine multiple individual events into a compound one when they occur within a certain time period, irrespective of event size.Additionally, each of these prior works has specific limitations: Fish et al. (2019) define all AR families to have a maximum duration of 5 days by construction, and Baldwin et al. (2019) assume that the first initiating heat wave will always be the largest event, which is not necessarily true for ARs.We propose sequences as a tool to capture the increased hydrologic hazard that occurs due to the cumulative effect of back-to-back ARs.Creating a new metric that identifies when multiple ARs are acting as one compound event improves our ability to characterize, quantify, and track changes in temporally compound extremes, which are a significant source of uncertainty for flood risk management in California.
Section 2 lists data sources and introduces the sequence algorithm.Section 3 shows that sequences capture time periods of elevated hydrologic hazard, as measured by runoff and soil moisture.In Section 4 we apply our definition of sequences to future climate simulation data and assess how sequences are anticipated to change under different emissions futures.Section 5 discusses some implications of the projected changes, and Section 6 concludes with a summary of key findings.

Methodology
This section first introduces the data used in this study.Atmospheric reanalysis data are used to identify AR sequences in the historical record.Hydrologic data such as streamflow and soil moisture are used to characterize the impacts of sequences, and global climate model projections are used to understand how sequences will change from present to future.We also include the results of a bias correction implementation to align the historic climate simulations with the reanalysis data.The final part of this section presents the sequence definition.We introduce the algorithm used to identify AR sequences, including both a description and a visual example, and provide both literature and physical support for the algorithm parameter values.

Reanalysis Data
The sequence definition is based on integrated water vapor transport (IVT) calculated from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2; Gelaro et al., 2017).MERRA-2 was chosen for several reasons: it was designed with an explicit focus on the hydrologic cycle (Rienecker et al., 2011); it has been used extensively to study ARs, both in the US and globally (Lora et al., 2020;Mundhenk et al., 2018;Payne & Magnusdottir, 2015); and it is the reference data set for Tier 1 of the AR Tracking Model Intercomparison Project (ARTMIP; Shields et al. (2018); Rutz et al. (2019)).It has been shown to compare well against both satellite (Jackson et al., 2016) and in situ (Guan et al., 2018) estimates of IVT, and it is particularly good at capturing strong vapor transport (Dettinger et al., 2018;Sellars et al., 2017).MERRA-2, as well as its predecessor (Rienecker et al., 2011), have been used for many studies of AR-driven precipitation (Chen et al., 2018;Lamjiri et al., 2018), hydrologic impacts (Chen et al., 2019;Cordeira et al., 2018;Payne & Magnusdottir, 2016), and economic loss (Ralph et al., 2019) in California and the western US.Finally, it was the data set of choice for Fish et al. (2019Fish et al. ( , 2022) ) in their work on AR families.MERRA-2 covers the globe at a horizontal grid resolution of 0.5° × 0.625° (∼50 km × 50 km) and reports data from water year (WY) 1981-2021 at a three-hour time step.We randomly sampled one 3-hr IVT value from each 24-hr window to align with the temporal resolution of the climate model projections presented in Section 2.1.3.We only retained days in California's wet season (October-April) as we are focusing on events likely to cause negative hydrological impacts, which tend to be concentrated in that seasonal window.ARs were identified using the Rutz et al. (2014) detection algorithm, which defines an AR as a contiguous area with length ≥ 2,000 km and IVT ≥ 250 kg/m/s.AR events are defined only for the results related to Hydrologic Hazard (Section 3) and are not used for identifying sequences.

Hydrologic Data
Precipitation data was retrieved from the Parameter-elevation Regressions on Independent Slopes Model (PRISM) (PRISM Climate Group, 2023), an interpolated observational data set produced by the PRISM Climate Group at Oregon State University.PRISM daily precipitation records are available from 1981 to present at a 4 km resolution.
Streamflow data was retrieved from US Geological Survey (USGS) gages in California using the National Weather Information System (NWIS) (USGS, 2022).Following Konrad and Dettinger (2017), we kept only gages with records that covered at least 10 years of the 41-year MERRA-2 time window, and we added an additional criterion that at least 25% of days in each gage's record needed to have recorded data.Gaps shorter than a week were imputed with a moving average.This filtering process left a total of 567 streamflow gages for consideration.
We converted streamflow (ft 3 /s) to runoff (mm/day) by dividing each timeseries by the drainage area at each gage as calculated with the USGS StreamStats online tool (USGS, 2019).We used the fractional weighting method from Brakebill et al. (2011), which is also the method used by USGS for the computed runoff reported on their official WaterWatch web page, to create a map of daily runoff at the subbasin scale (Hydrologic Unit Code 8).
Soil moisture data was retrieved from the Western Land Data Assimilation System (WLDAS), a hybrid observational/simulated product produced by the National Aeronautics and Space Administration for long-term near-surface hydrological variables (Erlingis et al., 2021).WLDAS provides soil moisture data at a daily resolution on a 0.01° (∼1 km) grid with three vertical levels: 0-10 cm, 10-40 cm, and 40-100 cm.We aggregated the data to approximately match the 4 km resolution of the PRISM data and integrated the three vertical layers to produce the total equivalent height of water in the top meter of soil, measured in mm/m.

Climate Model Projections
To assess the change in sequence characteristics over different climate states in California, we analyzed IVT from the Seamless System for Prediction and EArth system Research (SPEAR) ensemble (Delworth et al., 2020).SPEAR is the next-generation climate model developed at the Geophysical Fluid Dynamics Laboratory (GFDL), National Oceanic and Atmospheric Administration (NOAA).It shares the same model configurations (i.e., coupled atmospheric, land, oceanic, and sea-ice models) with GFDL Climate Model version 4, which is a contributor to the Coupled Model Intercomparison Project Phase 6 (CMIP6), except with parameterizations optimized for seasonal-to-decadal prediction and long-term projection.SPEAR has moderately high (∼50 km) resolution in the atmosphere and land, which can explicitly simulate ARs and their statistics, while the 1-degree ocean model enables computational time savings.Importantly, SPEAR has the same resolution as MERRA-2 over land.Horizontal resolution has been shown to significantly impact AR identification (Jackson et al., 2016), so a common grid facilitates comparison between the two products.
There is no one "best" climate model for California, but out of over 30 of the models involved in CMIP5, GFDL-CM3 (a precursor to SPEAR) was one of the 10 models identified as most appropriate for water resources planning applications in California (California Department of Water Resources (DWR) & Climate Change Technical Advisory Group (CCTAG), 2015) and was used in California's Fourth Climate Change Assessment (Bedsworth et al., 2018).Models were selected based on a three-tier evaluation process that assessed model performance on global, regional, and California-specific metrics.In addition, despite its status as a newer climate model, SPEAR has already been extensively validated for ARs.Zhao (2020) showed that ARs calculated using AM4, the atmospheric component of SPEAR, for 1950-2014 matched well the frequency, length, width, latitude, IVT, and orientation of ARs in the European Centre for Medium-Range Weather Forecasts interim reanalysis product (ERA-Interim; Dee et al. (2011)).They also confirmed that SPEAR can replicate reasonably well the variability of AR frequency associated with large-scale circulation patterns, reporting pattern correlation coefficients during the Northern Hemisphere winter (November-March) of 0.68 for the El Niño Southern Oscillation (ENSO), 0.86 for the Northern and Southern Hemisphere annular modes, and 0.81 for the Pacific-North America teleconnection (Zhao, 2020).SPEAR has also been used to estimate that the climate change-driven increase in AR-day frequency over subtropical oceans will emerge from the noise of internal variability by 2060 (Tseng et al., 2022) and to make multiseasonal forecasts of AR frequency up to nine months in advance (Tseng et al., 2021).We are therefore confident in SPEAR's ability to resolve both local climate trends in California and global circulation patterns relevant to ARs.This paper uses a five-member subset of the larger thirty-member SPEAR ensemble and uses IVT and AR statistics as calculated by Tseng et al. (2022).Tseng et al. (2022) have demonstrated that the AR statistics derived from a single ensemble member are still representative of the entire large ensemble, provided the analysis period is sufficiently long (≥30 years), given the short memory of AR events from season to season.We examined simulations with historical forcing and with projected forcings following two different shared socioeconomic pathways (SSPs).SSP2-4.5 and SSP5-8.5 represent intermediate and very high greenhouse gas emissions scenarios, respectively (Arias et al., 2021).The historical simulation covers the period of 1921-2010 and the two warming simulations cover the period of 2011-2100.

Bias Correction
Both the historical and projected timeseries of IVT from SPEAR were bias-corrected such that the statistics of the historical SPEAR simulation matched the statistics of MERRA-2 data over the 30-year period of overlap (WY 10.1029/2023EF003536 5 of 16 1981-2010).Bias correction is warranted because of the focus on rare events in this paper; while GFDL SPEAR credibly simulates both the circulation patterns and climate trend in California, it underestimates the most extreme daily IVT values.Quantile mapping (QM) is a common choice for bias correction because of its flexibility to adjust both the location and scale of the target distribution (Maraun & Widmann, 2018); however, traditional QM is limited in its ability to accurately represent extreme events (Cannon et al., 2015;Dosio et al., 2012) and does not necessarily preserve the projected future climate signal (Hempel et al., 2013;Maraun et al., 2017).We therefore implemented scaled distribution mapping (SDM; Switanek et al., 2017), a parametric bias correction method that preserves the future climate signal and more accurately represents extreme events, to perform the bias correction.
We first empirically fit gamma-Pareto mixture models (Volosciuk et al., 2017) to IVT by decade based on the 30-year periods centered on each decade.For example, the distribution representing 2021-2030 was fit using data from 2011 to 2040, the distribution representing 2031-2040 was fit using data from 2021 to 2050, etc.The first and last decades (2011-2020 and 2091-2100) in the GFDL SPEAR projections were not included because we lacked data for the full 30-year intervals.We then calculated the SDM bias correction for each decade.The distributional comparison of observed versus bias-corrected IVT is included as Figure S1 in Supporting Information S1.The bias-corrected IVT timeseries produces similar sequence frequency (Figure S2 in Supporting Information S1), intensity (Figure S3 in Supporting Information S1), and duration (Figure S4 in Supporting Information S1) results as the MERRA-2 record when compared over the period of overlap (WY 1981(WY -2010)).

Sequence Identification Algorithm
Sequences are defined in a three-step process.We first calculate the 5-day centered moving average IVT (IVT 5 ).Next, we identify continuous intervals, which are time periods when IVT 5 exceeds the 30-year median IVT.Lastly, any continuous interval where  max[IVT5] ≥ 250 kg/m/s are labeled as a sequence.This process is demonstrated visually in Figure 1.

Definition Parameters
The sequence definition uses three new parameters: the width of the moving average window; the moving average value that starts and ends continuous intervals, referred to as the start-end threshold; and the peak moving average value that determines which continuous intervals are counted as sequences, referred to as the magnitude filter.We used 5 days as the width of the moving average window because Fish et al. (2019) found this to be a suitable time length to cluster families of AR events with similar synoptic characteristics.The choice to define sequences based on a moving average of IVT rather than the raw timeseries represents a departure from previous methods, but offers several key benefits: it (a) captures the interstitial time between closely linked successive AR events where precipitation may still be occurring, (b) adds an additional temporal buffer to the front and back tails of AR events to account for the lag between atmospheric forcings and hydrologic response, and (c) introduces a new measure of sequence intensity independent of duration.
We defined the start-end threshold as the 30-year (WY 1981(WY -2010) ) median IVT calculated at each MERRA-2/SPEAR grid cell.We allow this parameter to vary spatially to account for regional climatological variability, and we define the time window as 30 years to be consistent with the standard for climatological normals used in ENSO forecasting and other applications.
We defined the magnitude filter to be 250 kg/m/s, a common threshold value for AR identification across multiple detection algorithms (Rutz et al., 2019).Requiring the 5-day moving average to reach this value filters out smaller, loosely associated AR events and focuses attention on the periods that are more likely to cause temporally compounding hazard effects.It also negates the need to define a cutoff based on a minimum number of ARs per sequence.The sequence algorithm therefore remains independent of any one AR detection method.Different detection methods produce different numbers of ARs based on the same timeseries, so maintaining the independence between sequences and ARs eliminates a significant source of potential variability in our results (Rutz et al., 2019).
While the direction and magnitude of the results presented in this paper are robust to changes in these three parameters, the statistics of sequence frequency, intensity, and duration will vary based on the chosen parameter values.We consider the flexibility of the parameters to capture a range of events as a benefit of the proposed sequence definition.Further discussion and sensitivity analysis of the parameters are included in the Supplemental Information (Text S1 and Figure S5 in Supporting Information S1).

Results: Hydrologic Hazard
The objective of the sequence metric is to identify time windows of AR activity where temporal compounding is contributing to elevated hydrologic hazard over the MERRA-2 reanalysis period of WY 1981-2021.We examine the relationship between ARs, sequences, and hydrologic hazard by investigating three variables: precipitation, runoff, and soil moisture.While extreme precipitation is closely tied with AR activity, runoff and soil moisture are more related to the hydrologic system response (Albano et al., 2020).We therefore characterize hydrologic hazard using runoff and soil moisture, and we characterize elevated hazard as days with anomalously high values of these variables.We then examine when those days with elevated hydrologic hazard intersect with days labeled as ARs or sequences.
The four columns in Figure 2 represent all wet-season days, extreme precipitation days, extreme runoff days, and extreme soil moisture days, where extremes are measured as the 95th percentiles of the respective distributions for the given grid point (e.g., Moore et al., 2021;Payne & Magnusdottir, 2014).Figures 2a and 2b show the overall frequency of AR days (top) and sequence days (bottom) within California's wet season.AR days (Figure 2a) cover 8.8 ± 5.6% of the record statewide.Sequence days (Figure 2b) are slightly more frequent, especially in the northwest, occurring across 16.4 ± 11.7% of the record.Both maps exhibit a clear spatial pattern of higher frequencies in the northwest and lower frequencies in the southeast.Along California's North Coast, AR days represent up to 18% and sequence days represent up to 32% of all wet-season days.The lowest frequencies of AR days and sequence days occur on the lee side of the Sierra Nevada Mountains and around the Mojave Desert.This spatial pattern is consistent with previous research detailing the climatology of ARs (Payne & Magnusdottir, 2014) and precipitation (Lamjiri et al., 2018) in California.
The relationships between extreme precipitation and AR days/sequence days are shown in Figures 2c and 2d.The percentage of extreme precipitation days captured by ARs is 20%-30% in the low-precipitation regions, namely the leeward side of the Sierra Nevada mountains on the eastern border of the state and the southeastern desert region.It rises to 60%-70% in the wetter regions of the north-central coast and the Sacramento Valley.This observation agrees with existing literature on the relationship between ARs and extreme precipitation (e.g., Ralph et al., 2020).The percentage of extreme precipitation days captured by sequences is higher than that captured by AR days along the coast, particularly in the San Francisco Bay Area, and lower than that captured by AR days inland.
Extreme runoff (Figures 2e and 2f) and extreme soil moisture (Figures 2g and 2h) are more likely to occur during sequence days statewide.The regional pattern across both variables is an amplified version of the east-west dichotomy visible in Figure 2d.For the northwestern half of the state, where ARs tend to be the primary drivers of hydrologic risk (Albano et al., 2020;Barth et al., 2017), sequence days capture over 90% of extreme runoff days and over 70% of extreme soil moisture days.AR days capture about 55% and 35%, respectively.Sequence days capture less of the extreme runoff and soil moisture days in the desert southeast, where ARs contribute less to the overall water balance.Therefore the AR sequence metric is successfully capturing time periods of elevated hydrologic hazard in areas where ARs are the dominant precipitation-and flood-producing mechanism.

Results: Sequence Characteristics
The previous section confirmed that sequences can identify windows of elevated hydrologic hazard due to ARs, characterized by anomalously high runoff and soil moisture, in the historical record.We now use the sequence metric to consider potential temporal compounding risk in the future.We first quantify the historical frequency, intensity, and duration of sequences, then examine how those characteristics are projected to change in a future climate.

Historical Characteristics
In the California historical record, the statewide average annual sequence frequency is 1.5 ± 1.1 sequences per year from WY 1981-2010.However, there is significant regional variation around this number, as seen in Figure 3, for the reasons discussed in Section 3.There is also high interannual variation due to the large differences in seasonal water totals between wet and dry years; on average, about a third of all 1981-2010 water years in each grid cell have no sequences at all.(g-h) extreme soil moisture, where "extreme" is defined as values above the 95th percentile of the distribution.Values range from 0% to 100% as shown in the legend at the bottom.AR days and sequence days are calculated from MERRA-2 (Gelaro et al., 2017); precipitation is calculated from PRISM (PRISM Climate Group, 2023); runoff is calculated from the USGS NWIS (USGS, 2022); and soil moisture is calculated from WLDAS (Erlingis et al., 2021).Data processing steps are detailed in Section 2.1.
Sequence intensity is defined as the peak 24-hr IVT measured over the lifetime of the sequence event (measured in kg/m/s) and sequence duration is defined as the length of continuous sequence conditions (measured in days).Unlike frequency, which is a single summary number for each grid cell, intensity and duration are recorded individually for every record in the event catalog.Figures 3a and 3b show the regional distributions of sequence average intensity and average duration, respectively, from WY 1981-2010.Average sequence intensity is higher in the northwestern half of the state, matching the distribution of average AR intensity.Average sequence duration follows a different pattern; the longest averages occur in the eastern Sierra Nevada mountains, where very few sequences occur (as indicated by bubble size) and the effects of averaging are not as strong.Other than those locations, average sequence duration seems to be relatively constant across the state, and is actually lowest in the regions with the highest sequence frequency.The values in Figure 3 will be used as the "baseline" to discuss changes from present to future in the following subsections.

Projected Frequency Change
Figure 4 reveals that the projected frequency of AR sequences is increasing over time.Figures 4a and 4b show the additive change in sequence frequency by decade under SSP2-4.5 and SSP5-8.5, respectively.The lines are colored by the hydrologic region (4-digit Hydrologic Unit Code (HUC4)) shown in Figure 4c.Rather than averaging the results by hydrologic region, which would flatten variability and create unrealistically smooth results, we selected the SPEAR grid cells at the centroid of each hydrologic region and used those as representative examples to get an overview of differences and similarities in statewide behavior.
While absolute frequency varies regionally based on California's climatology, the additive change from historical to projected future is fairly consistent statewide, as seen by the clustering of the lines in Figures 4a and 4b; under SSP2-4.5,all hydrologic regions see annual frequency increases starting at a mean of +0.13 sequences/year in this decade and increasing to a mean of +0.72 sequences/year by the 2080s.SSP5-8.5 shows slightly more regional variation and ends with a higher average increase of +1.13 sequences/year.

Projected Intensity Change
We analyze the full distribution of sequence intensity through two lenses in Figure 5.We first "zoom in" on the full intensity distribution for one case study grid cell.Each boxplot in Figure 5a represents one of the bias-corrected SPEAR ensemble members for historical (yellow), SSP2-4.5 (orange), and SSP5-8.5 (red) forcings at this location.The boxplots representing future scenarios show data from WY 2061-2090.This 30-year period allows for analogous comparisons to the 30-year historical baseline of WY 1981-2010 and approximates an "end-of-century" prediction.Figure 5a results are specific to the grid cell chosen to represent hydrologic region 1804 (shown in Figure 5b), and Figure S6 in Supporting Information S1 replicates the results for representative locations within all hydrologic regions in California.
The medians of the different ensemble members in Figure 5a vary within a given forcing category.The overall ensemble medians, though, increase from the historical simulation to the projected future, and continue to increase with increasing emissions levels.This pattern holds true for the 75th percentile (upper end of the box) and the 95th percentile (upper end of the whisker); values are larger in the projected future than in the historical past, and larger under SSP5-8.5 than under SSP2-4.5.It also holds for values above the 95th percentiles (outlier points above the upper whiskers).There are more sequences where the peak 24-hr IVT exceeds 750 kg/m/s, and there are even some sequences that break new ground and exceed a peak 24-hr IVT of 1,000 kg/m/s.This is consistent with prior work that finds shifts in the "cutoff scale" for extreme events that allow events with previously rare or unheard-of magnitudes to appear in a future climate (Neelin et al., 2017).
We then "zoom out" to understand the relative change from historical to projected future across the entire state.We focus on the median, which measures distribution central tendency and serves as a good heuristic for the behavior of the bulk of the distribution, and the 95th percentile, which measures extreme behavior at the upper tail.Together, these two statistics create a picture of overall distribution behavior.For each statistic we calculate a multiplicative change factor between the historical scenario and one of the two future emissions scenarios, then convert the change factor to a percent change.Figure 5c illustrates how relative change metrics are derived for the median (in light blue) and the 95th percentile (in dark blue).The relative change statistics derived in Figure 5c are then repeated for every grid cell and for both future emissions scenarios to generate the density plots in Figures 5d  and 5e, which summarize results for all SPEAR grid cells in the state.

Projected Duration Change
Figure 6a shows duration distributions from each of the five SPEAR ensemble members, for the same case study grid cell representing hydrologic region 1804 identified in Figure 5b.Boxplots of sequence duration for representative cells from all California hydrologic regions are shown in Figure S7 in Supporting Information S1.Many of the trends identified for the distribution of sequence intensity in Figure 5a are repeated for sequence duration in Figure 6a.The medians and 75th percentiles increase from historical to future forcings, although duration seems to be less sensitive to the specific emissions scenario.While the 95th percentiles are a bit more noisy, they seem to trend upwards with increasing emissions as well.
The most distinct difference between the two emissions scenarios, though, is in the outliers past the 95th percentile.In Figure 6a, the longest sequences in the projected future exceed those from WY 1981-2010 by more than a month and approach total durations of up to four months.To better understand these extreme events, we define a new category of "super-sequence" events that are longer than 60 days.Across California, 73% of SPEAR grid cells had at least one super-sequence in any of the five ensemble members during WY 1981-2010, and only 39% of locations had multiple super-sequences in at least one ensemble member.The percentage of grid cells experiencing at least one super-sequence during WY 2061-2090 jumps to 87% for SSP2-4.5% and 99% for SSP5-8.5.Even more dramatically, the percentage of locations that experience multiple super-sequences in at least one ensemble member during WY 2061-2090 is 67% for SSP2-4.5% and 94% for SSP5-8.5, which is an extraordinary expansion of the historical spatial coverage.The increasing frequency and widening footprint of super-sequence occurrence has serious implications for water management in locations that have never before experienced temporal compounding of this severity and duration.
Super-sequences are also the dominant driver of the projected change in overall sequence frequency.Figure 7 compares the historical versus projected future annual frequency for sequences of different durations.Figures 7a  and 7b show that the frequency rates of sequences under 2 weeks long are essentially unchanged.Sequences that are 15-30 days long (Figures 7c and 7d) and 31-60 days long (Figures 7e and 7f) see moderate increases in frequency.In the rightmost column (Figures 7g and 7h), super-sequences longer than 60 days are slated to approximately double in frequency in the SSP2-4.5 scenario and triple in the SSP5-8.5 scenario.The longer sequences are, the greater their change in annual frequency.

Comparison of Sequence Changes and AR Changes
Both the existing body of work on future ARs (e.g., Espinoza et al., 2018;Ralph et al., 2020) and the sequence projections presented in this paper indicate that future climate conditions will preferentially increase extremes at the expense of more moderate events.The results presented in Section 4 show that sequences are projected to increase in frequency, intensity, and duration in a warming climate.The changes in sequence frequency observed in Section 4.2 match the projected increases in AR frequency and intensity that we discussed in Section 1.1; however, this linked relationship was not a foregone conclusion.The sequence metric captures both the severity of individual AR events and the second-order effects of temporal sequencing on hydrologic impacts.Not only will the ARs themselves increase in frequency and magnitude, but these new and larger storms will occur in areas that are more likely to be in a state of elevated hydrologic hazard, further amplifying their effects on watersheds, ecosystems, and communities.
One of the most concerning trends in our findings is the dramatic growth in the frequency and spatial extent of super-sequences from the historic baseline (WY 1981(WY -2010) ) to the end of the century (WY 2061(WY -2090)).We hypothesize that sequences morph into super-sequences when smaller sequence events grow so large that they overlap and combine, as observed by Baldwin et al. (2019) in the context of temporally compounding heat waves.We find support for this hypothesis in Figure 4.For many of the hydrologic regions sequence frequency continues to increase through the end of the century.However, hydrologic region 1801 (one of the wettest regions in California) shows a surprising decrease, starting in the 2070s under SSP2-4.5 and in the 2080s under SSP5-8.5.When individual sequences start to saturate the time window and merge together, it reduces the total number of sequences even as the number of sequence days continues to rise.Our hypothesis about the evolution of super-sequences is additionally supported by the additive decadal change in sequence days (Figure S8 in Supporting Information S1), which do not see the same decrease toward the end of the century.
We also highlight the longest super-sequences in the projected future, the events that have durations approaching the entire length of the 6-month wet season.Having a single AR last for multiple months would be unreasonable.However, there can be gaps of multiple days within sequences where there is no precipitation, and a sequence spanning multiple months merely indicates that the hydrologic environment never has time to revert to its baseline state between successive AR events.Super-sequences are not anticipated to occur every year; recall the statistic presented in Section 4.2 that on average a third of all water years in WY 1981-2010 within each GFDL SPEAR grid cell have zero sequences.The percentage of water years with super-sequences is 1.9 ± 1.9% under the historical baseline, 4.2 ± 3.6% under SSP2-4.5, and 10.5 ± 6.3% under SSP5-8.5.These values equate to super-sequence return periods of approximately 54, 24, and 10 years, respectively, averaged across the state.Previous research has shown that California's hydroclimate is projected to see more "precipitation whiplash" events that oscillate between wet and dry extremes (Swain et al., 2016(Swain et al., , 2018) ) and that the temporal distribution of ARs will sharpen and concentrate more events into a smaller portion of the wet season (Chen et al., 2019;Rhoades et al., 2020).A "megaflood" is plausible in the next century, comparable to or larger than the Great Flood of 1862 when it rained or snowed almost continuously for 2.5 months and the Central Valley remained inundated for several months afterward (X.Huang & Swain, 2022).The most extreme super-sequences are infrequent, but they are possible, and identifying them with the sequence algorithm highlights a previously overlooked source of risk in California.

Study Limitations
There are three main sources of uncertainty in estimates of future climate change: forcing uncertainty (arising from unknown future conditions external to the modeled climate system), model uncertainty (arising from the assumptions made to create simplified representations of reality), and internal variability (arising from the stochastic nature of the climate system) (Maraun & Widmann, 2018).We have considered forcing uncertainty through the quantitative and qualitative comparison of two SSPs, and we have considered internal climate variability by using five ensemble members from GFDL SPEAR.Because ARs are transient phenomena, accurately representing their uncertainty is tied to the representation of long-term climate variability modes, and previous research has shown that long-term climate variability can be well approximated by as little as a single ensemble member given a record of approximately 30 years (Tseng et al., 2022;H. Huang et al., 2021).However, because we are only presenting results from a single climate model, model imperfections and model uncertainty are not characterized in this work.To better understand how AR sequences and temporally compound risk will evolve in a future climate, multi-model evaluations are needed to explore the full stochastic range of potential outcomes.Future work could expand to include one or more of the climate models recommended by the California Department of Water Resources (DWR) and Climate Change Technical Advisory Group (CCTAG) (2020) for water planning applications in California.
Another potential limitation of the results presented in this paper is the 50 km resolution of the MERRA-2 grid.While this work shows a broad relationship between AR sequences, which are defined and identified based on atmospheric data alone, and on-the-ground hydrologic hazard, it is not a substitute for a high-resolution, process-based analysis that resolves the underlying physical phenomena and establishes a causal pathway.However, it does serve to motivate the need for future analysis related to the temporal compounding at local scales.This paper presents a compelling argument that temporally compound AR flood risk is worthy of further investigation in convection-permitting simulations that resolve the physical connections between AR sequences and hydrologic impacts.

Summary and Conclusions
Temporally compounding (back-to-back) ARs cause outsize impacts relative to their individual intensities.This paper introduces a definition of AR sequences as a new way to identify and measure when temporal compounding is contributing to hydrologic hazard and risk.The sequence definition utilizes a moving average of IVT at a given location.We identified all days with sequences in the historical record in California and showed that sequences capture the elevated hydrologic hazard that occurs during and after back-to-back AR events, especially in regions where ARs are the dominant contributors to total precipitation and flooding.
We created a catalog of sequence events for every 50 km × 50 km grid cell in California.Each sequence event has a recorded intensity (measured as peak 24-hr IVT, kg/m/s) and duration (measured as days of continuous sequence conditions).We used bias-corrected GFDL SPEAR climate simulations to compare the changes in intensity, duration, and frequency from historical to projected future, using both a middle-of-the-road greenhouse gas emissions scenario (SSP2-4.5)and a high-end emissions scenario (SSP5-8.5).Sequence frequency is projected to increase by decade, with the largest changes occurring toward the end of the century in SSP5-8.5.
We assessed changes in sequence intensity and duration based on future climate projections from GFDL SPEAR, presenting both an in-depth analysis of a specific hydrologic region and a broad statewide overview of key distributional statistics.Sequence intensity sees increases in the distribution medians and 95th percentiles under both SSP2-4.5 and SSP5-8.5, with a larger change recorded from historical to SSP2-4.5 than from SSP2-4.5 to SSP5-8.5.Sequence duration medians and 95th percentiles also increase almost everywhere in the state, and the change from SSP2-4.5 to SSP5-8.5 is similar to or larger than the change from historical to SSP2-4.5.Both sequence intensity and duration have notable increases in extreme values larger than the 95th percentile."Super-sequences" with durations longer than 60 days are projected to dramatically increase in frequency and spatial coverage, especially for SSP5-8.5.Overall we conclude that temporally compounding extremes are projected to increase in frequency and severity in the future, generally at the expense of more moderate events, which is consistent with previous work about projected changes in ARs.
Temporal compounding is a key concern for California's water managers (Fish et al., 2022) and a major contributor to some of the state's most severe water crises (X.Huang & Swain, 2022;Michaelis et al., 2022).However, no work to date has quantified the link between temporally compounding ARs and hydrologic impacts.This paper thus fills a gap in our understanding the effect of temporal compounding on the hydrologic consequences of ARs.This material is based upon work supported by both the Stanford Gabilan Graduate Fellowship and the National Science Foundation Graduate Research Fellowship under Grant 1000265549.Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.K.-C.Tseng acknowledges funding from National Oceanic and Atmospheric Administration/Department of Commerce Grant NA18OAR4320123 and Ministry of Education, Taiwan Grant 111V1017-1.The contact author has declared that neither they nor their co-authors has any competing interests.Some of the computing for this project was performed on the Sherlock cluster.We thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results.We also thank the two anonymous reviewers and the Associate Editor for their constructive feedback, which has greatly improved the quality of this paper.

Figure 1 .
Figure 1.Example sequence timeseries.Visual description of the sequence definition for an example IVT timeseries.The original IVT record at a three-hour temporal resolution is shown in dark gray and the tick marks along the horizontal axis indicate weeks.The dashed gray horizontal line indicates the start-end threshold, defined as the 30-year median IVT (92 kg/m/s for this example location), and the solid gray line indicates the magnitude filter, defined as 250 kg/m/s.The three-step sequence identification process is as follows: 1. Calculate the five-day moving average, IVT 5 (black line); 2. Calculate continuous intervals of IVT 5 exceeding the 30-year median IVT (light and medium blue fill); and 3. Of the intervals identified in step 2, identify those where  max[IVT5] ≥ 250 kg/m/s (medium blue fill).The medium blue area is defined as a single sequence and the light blue areas are discarded.The resulting atmospheric river days and sequence days are shown along the bottom with dark blue bars and medium blue bars, respectively.

Figure 2 .
Figure 2. Hydrologic impacts of sequences, WY 1981-2021.Rows indicate what percentage of each column was labeled as an AR day (top) and or a sequence day (bottom).Panels (a-b) in the leftmost column represent all wet-season days.Moving from left to right, the next columns represent wet-season days with (c-d) extreme precipitation, (e-f) extreme runoff, and(g-h) extreme soil moisture, where "extreme" is defined as values above the 95th percentile of the distribution.Values range from 0% to 100% as shown in the legend at the bottom.AR days and sequence days are calculated from MERRA-2(Gelaro et al., 2017); precipitation is calculated from PRISM (PRISM Climate Group, 2023); runoff is calculated from the USGS NWIS (USGS, 2022); and soil moisture is calculated from WLDAS(Erlingis et al., 2021).Data processing steps are detailed in Section 2.1.

Figure 3 .
Figure 3. Historical sequence intensity and duration.Bubbles are colored by the mean values of (a) peak 24-hr IVT (kg/m/s) and (b) duration (days) at each SPEAR grid cell in California, averaged across all WY 1981-2010 sequences occurring in that grid cell.Bubble size represents average annual sequence frequency (sequences/year) as shown in the legend at the bottom.

Figure 4 .
Figure 4. Projected change in sequence annual frequency by decade.Timeseries plots show the additive change in sequence annual frequency by decade for emissions scenarios (a) SSP2-4.5 and (b) SSP5-8.5.The vertical axis measures absolute (additive) change in sequences/year relative to the baseline historical time period of WY 1981-2010, averaged across all five ensemble members.Each line represents one grid cell from each of California's hydrologic regions and are color-coded based on the legend at the bottom.(c) Map of California 4-digit Hydrologic Unit Codes (HUC4), indicating hydrologic regions, with MERRA-2/SPEAR grid cells outlined in gray.Representative grid cells (HUC4 regional centroids) are highlighted and marked with a dot.

Figure 5 .
Figure 5. Projected sequence intensity change.(a) Boxplots of the distributions of sequence peak 24-hr IVT from individual SPEAR ensemble members for the case study grid cell.Boxplots are separated by climate model forcing, where yellow represents historical (WY 1981-2010), orange represents SSP2-4.5 (WY 2061-2090), and red represents SSP5-8.5 (WY 2061-2090).The medians for individual members are marked with black lines and the overall ensemble medians are marked with colored lines.Ensemble medians are calculated by combining all five ensemble members into a single set and calculating the median value of that set.The box contains the interquartile range (IQR, 25th-75th data percentile) and the whiskers contain the 5th-95th data percentile.Values smaller than the 5th percentile and larger than the 95th percentile are plotted as black dots.(b) Spatial location of the example grid cell, which is the geographic centroid of hydrologic region 1804.(c) Illustrative figure showing how relative changes in the medians and 95th percentiles are calculated.For visual simplicity, only the averages of the historical and SSP2-4.5 ensemble members are shown and outlier points are not plotted.(d)-(e) Statewide summaries of all grid cell values are shown for the relative changes in the (d) median and (e) 95th percentile.

Figure 6 .
Figure 6.Projected sequence duration change.(a) Boxplots of duration distributions from individual ensemble members for the case study grid cell shown in Figure 5b.Boxplots are separated by climate model forcing, where light blue represents historical (WY 1981-2010), medium blue represents SSP2-4.5 (WY 2061-2090), and dark blue represents SSP5-8.5 (WY 2061-2090).The box contains the interquartile range (IQR, 25th-75th data percentile) and the whiskers contain the 5th-95th data percentile.Values smaller than the 5th percentile and larger than the 95th percentile are plotted as black dots.(b)-(c) Statewide summaries of all grid cell values are shown for the relative changes in the (b) median and (c) 95th percentile.

Figure 7 .
Figure 7. Sequence annual frequency as a function of duration.Each point represents data from one SPEAR grid cell in California.Plots show the annual sequence frequency in historical (WY 1981-2010) versus projected future (WY 2061-2090) simulations.Moving from left to right, the results are shown for sequences of specific durations, where the categories are (a-b) ≤14 days, (c-d) 15-30 days, (e-f) 31-60 days, and (g-h) >60 days.Rows represent emissions scenarios SSP2-4.5 (top) and SSP5-8.5 (bottom).