Recreating the California New Year's Flood Event of 1997 in a Regionally Refined Earth System Model

The 1997 New Year's flood event was the most costly in California's history. This compound extreme event was driven by a category 5 atmospheric river that led to widespread snowmelt. Extreme precipitation, snowmelt, and saturated soils produced heavy runoff causing widespread inundation in the Sacramento Valley. This study recreates the 1997 flood using the Regionally Refined Mesh capabilities of the Energy Exascale Earth System Model (RRM‐E3SM) under prescribed ocean conditions. Understanding the processes causing extreme events informs practical efforts to anticipate and prepare for such events in the future, and also provides a rich context to evaluate model skill in representing extremes. Three California‐focused RRM grids, with horizontal resolution refinement of 14 km down to 3.5 km, and six forecast lead times, 28 December 1996 at 00Z through 30 December 1996 at 12Z, are assessed for their ability to recreate the 1997 flood. Planetary to synoptic scale atmospheric circulations and integrated vapor transport are weakly influenced by horizontal resolution refinement over California. Topography and mesoscale circulations, such as the Sierra barrier jet, are better represented at finer horizontal resolutions resulting in better estimates of storm total precipitation and storm duration snowpack changes. Traditional time‐series and causal analysis frameworks are used to examine runoff sensitivities state‐wide and above major reservoirs. These frameworks show that horizontal resolution plays a more prominent role in shaping reservoir inflows, namely the magnitude and time‐series shape, than forecast lead time, 2‐to‐4 days prior to the 1997 flood onset.


10.1029/2023MS003793
2 of 27 inundated portions of both the Sacramento and San Joaquin valleys and portions of the present-day metropolitan area of Los Angeles.Because of its impact, this event has emerged as an important "design storm" for California water managers and led to the development of the colloquially termed "ARkStorm," which combines aspects of AR-induced flood events that occurred in 1969 and 1986 (Dettinger et al., 2012;Porter et al., 2011).The 1861/1862 flood event happened during a time in California's history when the population density and built infrastructure were at a much smaller scale than today.Since the 1860s, urbanization has resulted in the loss of floodplains in many communities that are vulnerable to flooding despite significant investments in constructing flood control infrastructure (Whipple & Viers, 2019;Whipple et al., 2017).In many low-lying regions throughout the Central Valley, aging levee systems and subsidence continue to expose populations and industries to flood impacts (Hanak & Lund, 2012).Sequences of heavy precipitation-producing storms, many of which were ARs, during the winters of 2017 and 2023 highlight the present susceptibility of California to major riverine flooding.Climate change may further exacerbate impacts felt by these storms (Corringham et al., 2022;Gershunov et al., 2019;X. Huang & Swain, 2022;Rhoades et al., 2021), particularly in the most underserved communities (Wing et al., 2022), highlighting the need for detailed analyses aimed at understanding how these storms drive compound extremes under historical and future climate conditions.
The most costly flood event ($1.6 billion) in California history was the New Year's flood event of 1997, hereafter "1997 flood" (Lott et al., 1997).Major flood losses occurred throughout the western United States, including losses of $500 million in Nevada and $125 million in Washington.The combination of flood area and severity across the western United States ranks the 1997 flood as the #2 superflood between 1950 and 2010 (Tarouilly et al., 2021).At least half a million people were displaced by the flooding and the majority of California counties (43/58) were declared disaster zones (Lott et al., 1997).
The 1997 flood was composed of three storms between 25 December 1996 and 2 January 1997 with inundation afterward (Galewsky & Sobel, 2005).Antecedent conditions played an important role in driving the impact of this event; earlier storms throughout late November and December built an abundant snowpack and elevated soil moisture throughout the Central Valley and the Sierra Nevada (Figure 1).Between 30 December 1996 and 3 January 1997, over 750 mm of precipitation fell in certain regions of northern California (e.g., 840 mm, or 33 in, at Bucks Lake in Plumas County, California; (Figure 1; https://www.cnrfc.noaa.gov/storm_summaries/ol.php?storm=jan1997).Heavy rainfall with snow above 3,000 m elevation began on 30 December 1996; the Central Sierra Snow Lab (CSSL; located at 2,100 m) reported 137 mm of rainfall on 30-31 December 1996 (Osterhuber & Schwartz, 2021).On New Year's Day of 1997, an extreme AR event made landfall (Figure 1).Maximum temperatures at 2,100 m elevation rose to 7°C and reached 3°C at 2,900 m on 1 January 1997 when 120 mm of rain fell at the CSSL (Heggli et al., 2022;Osterhuber & Schwartz, 2021).A ripe snowpack (meeting the 32% density threshold to produce terrestrial water input; Heggli et al., 2022), saturated soils, and extreme multi-day precipitation caused major rivers to reach flood stage, with several setting all-time peak flows (Figure 1; https://www.cnrfc.noaa.gov/storm_summaries/ol.php?-storm=jan1997).The December-January storms brought ∼1,200 mm of precipitation to California's 8-station index, making this the wettest 2-month period since records began in 1920.However, despite the wet start, the remainder of the water year was drier than normal.The lack of snow accumulation following the New Year's melt event led to below-normal peak snowpack but above-normal accumulated precipitation in lower and middle elevations, creating warm-snow drought conditions (Hatchett & McEvoy, 2018).With drier-than-normal conditions, reservoir levels remained below average at the end of April.The 1997 flood event thus represents an object lesson both for the study of extreme precipitation and runoff but also for reservoir and flood management in a variable climate.
A growing area of climate research is focused on understanding cascading, compound, and/or sequential hydrometeorological extreme events (AghaKouchak et al., 2020;Fish et al., 2019;Raymond et al., 2020).Simultaneously, the climate research community has sought to provide more credible and salient decision-relevant information to practitioners and management communities through iterative, co-produced research (Jagannathan et al., 2021;Lemos et al., 2018;Siirila-Woodburn et al., 2021).Examining historically significant, decision-relevant extreme events, through high-resolution climate model "storyline" recreations can both be useful for water resource managers (Bukovsky et al., 2023;Gutowski et al., 2020;Shepherd, 2019) and have also been frequently used in event attribution studies (M.F. Wehner et al., 2019).Storylines are physically based model recreations of impactful weather events, often chosen through iterative discussions between scientists and stakeholders, that are then simulated under plausible past and future climate scenarios.However, it is important to note that while such studies can provide information on the local dynamic and thermodynamic effects of climate change on extreme events, they do not provide information about the influence of large-scale circulation changes on the return probability of such events.Storyline event recreations also have practical model development implications.Climate models are mostly optimized around mean state performance for different hydrometeorological performance metrics (Fasullo, 2020), rather than extremes.This is especially true from the perspective of land-atmosphere interactions that drive compound extremes (La Follette et al., 2021).Storyline approaches can also help to convey information on model uncertainty, namely the role of structural and scenario uncertainty (Lehner et al., 2020), in a more understandable and decision-relevant way.Therefore, the recreation of the 1997 flood is a useful exercise in understanding the nature of extreme events and determining whether our cutting-edge modeling approaches are fit for purpose in simulating them.An additional benefit of storyline approaches is that the climate models used and the resultant climate research conducted become tailored toward greater practitioner relevance over time (Lemos et al., 2012).
Climate model capabilities to simulate extreme precipitation (e.g., annual seasonal maxima of daily and three-hourly precipitation) have not significantly changed from CMIP5 to CMIP6 both globally (M.Wehner et al., 2020) and within the United States (M.Wehner et al., 2021) at traditionally employed horizontal resolutions (∼200 km to ∼50 km).Certain single-model experiments with horizontal resolutions between ∼200 km to ∼25 km have shown extreme precipitation skill improvement at finer resolutions, although predominantly in midlatitude winters and over land (M.F. Wehner et al., 2014).Specific to AR events, climate model simulations run at horizontal resolutions of ≤150 km show better skill in representing AR lifecycle characteristics over ocean basins (Guan & Waliser, 2017).Regionally refined mesh Earth system model approaches over the last decade have shown that horizontal resolution influences simulation fidelity of synoptic-to-mesoscale trajectory of storm tracks and eddies (Liu et al., 2023;Rauscher & Ringler, 2014;Rauscher et al., 2013;Sakaguchi et al., 2016), although enhanced skill may converge at ∼25 km (Rhoades, Jones, O'Brien, et al., 2020).Resolution also influences the representation of topography, which in turn affects how coastal landfalling storms are orographically uplifted, the rain-snow partitioning of the storm's precipitation, and the build-up and evolution of mountain snowpack throughout the cool-season (Bambach et al., 2021;Rhoades et al., 2018;Xu et al., 2018).These AR-topography interactions, combined with the presence or absence of an extratropical cyclone, have a significant effect on AR-related impacts at landfall (Patricola et al., 2022;Rhoades, Jones, O'Brien, et al., 2020;Rhoades, Jones, Srivastava, et al., 2020).Similarly, land-surface cover and soil heterogeneity increase at finer resolutions, which can alter the surface-through-subsurface water and energy balance interactions of the hydrologic cycle (Maina et al., 2022;Xu et al., 2021).
In this study, we recreate the 1997 flood using the U.S. Department of Energy's flagship climate model, the Energy Exascale Earth System Model, and its regionally refined mesh capabilities (RRM-E3SM).We chose the 1997 flood because it is the flood of record most recently experienced by current water managers, was relatively well-monitored by a network of meteorological and hydrologic measurements, and occurred during a period in which atmospheric reanalysis products have higher skill (Hersbach et al., 2020;Uppala et al., 2005).This event also allows us to assess the relative contributions of E3SM horizontal resolution and forecast initialization time in shaping the fidelity of the flood event recreation.We pay particular attention to the interactions across the submodels of E3SM (e.g., atmospheric and land-surface) and their representation of key hydrometeorological variables before/during/after the event.This is the first time RRM-E3SM has been systematically used, across resolution and forecast lead time, to generate a storyline recreation of a western United States hydrometeorological extreme.Our scientific questions include: that RRM-E3SM performs comparably to uniform 0.25° (∼25 km) horizontal resolution simulations for water cycle-related processes and provides several improvements to uniform 1.00° (∼111 km) horizontal resolution simulations (Tang et al., 2019(Tang et al., , 2022)).These improvements include a better representation of winter-time precipitation rates and cloud properties over the contiguous United States, higher extratropical cyclone activity in the north Pacific, and snowpack accumulation rates and peak timing and volumes in the California Sierra Nevada compared with ∼111 km simulations.A detailed description of E3SMv2's atmospheric dynamical core, physics and dynamics, horizontal grids, vertical discretization, radiation, tracer transport schemes, and subgrid-scale parameterization choices (e.g., cloud microphysics scheme) can be found in Golaz et al. (2022).More specific findings related to RRM-E3SM are described in Tang et al. (2022), while Harrop et al. (2022) provides additional details on water cycle process fidelity in both the atmosphere and land-surface in E3SM at uniform horizontal resolutions of 1.00° versus 0.25° over the United States.
The RRM-E3SM meshes were produced using TempestRemap (Ullrich et al., 2016;Ullrich & Taylor, 2015); the topography was generated with the NCAR_Topo tool (Lauritzen et al., 2015) and smoothed for model stability purposes using the framework discussed in Zarzycki et al. (2015) and a coefficient of 3e −16 (c in Equation 1of Zarzycki et al., 2015).To better contextualize the effect of the topography smoothing coefficient in shaping the mean simulated hydroclimate (e.g., snowpack characteristics), Rhoades et al. (2016) showed that an RRM simulation run at a 0.25-degree horizontal resolution which only changes this coefficient from 4e −16 to 3e −16 resulted in a near doubling of average seasonal snow water equivalent (SWE) depths and more SWE interannual variability within the California Sierra Nevada, both of which more closely matched with best-available gridded SWE products.The refinement regions and topographic representation in the simulations over California for the three RRM-E3SM cases are shown in Figure 2. Hereafter, RRM-E3SM simulations with a maximum refinement resolution over California at 14 , 7, and 3.5 km will be referred to as, RRM-E3SM (14 km), RRM-E3SM (7 km), and RRM-E3SM (3.5 km), respectively.In all simulations, the E3SM default setting of 72 vertical levels is used.

Simulation and Initialization Strategy
The 1997 flood event forecast ensemble was produced for six different 8-day periods starting on 28 December 1996 at 00Z through 30 December 1996 at 12Z, initialized at 12-hr increments between those dates, using the "Betacast" framework described in Zarzycki et al. (2014) and the Atmosphere Model Intercomparison Project protocols (Gates et al., 1999).The land surface conditions are spun-up for 5 years prior to the first forecast, with a standalone simulation of the E3SM Land Surface Model (ELM) forced by the 6-hourly atmospheric data from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis (ERA5; Copernicus Climate Change Service Climate Data Store (CDS), 2017).This ensures that antecedent land surface conditions (namely soil moisture content and mountain snowpack) are consistent with the actual 1997 flood event conditions on the day each RRM-E3SM forecast is started.Subsequent forecast cycles use the 12-hr land forecast from the previous cycle for initialization.This approach gives nearly identical results to spinning up each forecast cycle's land surface independently (not shown).
The atmospheric initial state is generated using high-order remap algorithms to take data from the ERA5 reanalyses and map them onto the corresponding RRM-E3SM grid.The pressure field is adjusted based on the technique in Trenberth et al. (1993) to account for differences in ERA5 and RRM-E3SM orography that may result in geostrophic imbalances.Observed ocean surface conditions (i.e., sea surface temperatures and sea ice extent) are also prescribed by interpolating NOAA Optimum Interpolation (OI) data (Reynolds et al., 2007) to the model grid.After initialization from ERA5, the RRM-E3SM atmosphere and land surface models are globally fully coupled and allowed to freely solve the governing equations that drive these systems while being periodically bounded by prescribed ocean conditions.
All RRM-E3SM simulations utilize the hydrostatic dynamical core in E3SM.Notably, the effective resolution is 4-5× the actual grid spacing (Klaver et al., 2020;Ullrich, 2014).Further, it has been shown that non-hydrostatic dynamical cores minimally influence midlatitude wintertime precipitation (slight drying) from resolutions of 36-to-4 km, even in idealized mountain environments (Liu et al., 2022;Q. Yang et al., 2017).With each 2× refinement in horizontal resolution, the RRM-E3SM dynamics and physics timestep and second-order viscosity diffusion strength at the model top were halved.For RRM-E3SM (14 km), the atmospheric dynamics and physics timesteps and diffusion strength were 40 and 600 s and 4e −4 , for RRM-E3SM (7 km) they were 20 and 300 s and 2e −4 , and for RRM-E3SM (3.5 km) they were 10 and 150 s and 1e −4 , respectively.The only additional differences across cases were the macrophysics-microphysics subgrid-scale parameterization substeps, set to 6 in RRM-E3SM (14 km) and RRM-E3SM (7 km) and 3 in RRM-E3SM (3.5 km).The macrophysics-microphysics substep choice is resolution dependent (due to changes in the total physics timestep and grid cell area) and a choice of, at least, 2 for prognostic microphysics schemes (as used in E3SMv2) ensures numerical stability and more realistically preserves cloud condensate (akin to large eddy simulations) as falling hydrometeors can more realistically change phase as they fall to the surface (Gettelman et al., 2015).

Atmospheric River Detection and Categorization
We used TempestExtremes (TE; namely the SpineARs and StitchBlobs algorithms) to detect the primary AR that made landfall during the 1997 flood on 1 January 1997 (Ullrich & Zarzycki, 2017;Zarzycki & Ullrich, 2017).AR detectors (ARDTs) are broadly categorized as either absolute or relative threshold approaches (O'Brien et al., 2022;Shields et al., 2023).Fixed threshold ARDTs use a target variable (e.g., integrated vapor transport) and a threshold of exceedance (e.g., ≥250 kg/m/s) to identify if an AR emerges from background atmospheric conditions.TE is a relative threshold ARDT and defines AR conditions using both a threshold of exceedance and, for example, geometric characteristics of the length and width of an AR plume.Our parameter settings for TE and the extensions made to TE to estimate AR landfalling characteristics, such as the AR category scale (Ralph et al., 2019), are important for estimating water resource impacts (e.g., AR-induced flood damages in Corringham et al., 2022) as discussed in more detail in Rhoades, Jones, O'Brien, et al. (2020)  We used reanalysis and gridded climate products to evaluate storm-total precipitation and pre-and post-event changes in SWE.Storm-total precipitation is evaluated against Pierce et al. (2021) which is an updated version of the Livneh product (Livneh et al., 2015), hereafter Livneh, and against the ERA5 reanalysis product, due to its use in providing initial conditions for the RRM-E3SM simulations.According to Pierce et al. (2021), the updated Livneh product better preserves extreme event precipitation totals by more systematically accounting for daily time adjustments in precipitation gauge data (i.e., rounding-related issues related to the time of day the station observation is taken).We also conducted a preliminary analysis comparing Livneh with other widely used gridded climate products, Newman et al. (2015) (Newman) and Daly et al. (2008) (Parameter-elevation Regressions on Independent Slopes Model, PRISM) as shown in Figure S1 in Supporting Information S1.Compared with the 52 precipitation gauge measurements, we found that Livneh was either a better estimate (compared with Newman) or was indistinguishable (compared with PRISM) in its representation of the 4-day precipitation totals produced during the 1997 flood.In order to estimate the return periods of the 4-day precipitation totals during the 1997 flood, we applied a non-stationary generalized extreme value (NS-GEV) analysis to the annual maximum of 4-day precipitation totals (Rx4day) in the Livneh product interpolated to the 52 gauge locations using the first-order conservative remapping (P.W.Jones, 1999).In the NS-GEV framework, we first apply the Mann-Kendall (MK) trend test (Mann, 1945) to the Rx4day data at each gauge location to determine if the data has a significant trend at the 5% level.If the Rx4day data at a location has a significant trend, we fit time as a covariate in the location and/or scale parameters of the GEV distribution fitted to the Rx4day data at that gauge location.The complete procedure is outlined in Srivastava et al. (2021).
We assess pre-and post-event changes in SWE against the Fang et al. (2022) western United States-wide snow reanalysis product (hereafter Margulis due to it being an updated version of Margulis et al., 2016).The Margulis reanalysis product has shown skill in estimating peak SWE in the California Sierra Nevada when compared with airborne LiDAR SWE measurements (e.g., 1 April mean SWE depth differences of −0.15 to +0.05 m across 2015-2021), which have essentially become the snow community standard for spatially complete estimates of snow depth and SWE in recent years (Painter et al., 2016;Stillinger et al., 2023).More recently, K. Yang et al. (2023) intercompared the Margulis product with four other widely used SWE reanalyses within the California Sierra Nevada and showed that it had the highest water year skill across several measures including, mean absolute bias, root mean squared error, and percent bias.Although it was found to be the best performing across several measures of skill, basin-wide SWE estimates were found to be ∼8% higher than the ASO LiDAR data sets (with even higher biases in alpine and forested locations), indicating a general positive SWE bias in the California Sierra Nevada.We also compare and contrast RRM-E3SM skill with a set of simulations produced with a more traditional and widely-used dynamical downscaling approach.These simulations were produced using the Weather Research and Forecasting (WRF) model run at 14 km resolution over California that is bounded laterally and at the model top with ERA5 (A.D. Jones et al., 2022).All gridded data that is intercompared has been 10.1029/2023MS003793 8 of 27 regridded from its native grid resolution to a regular latitude-longitude grid resolution of 14 km using bilinear interpolation provided by the Earth System Modeling Framework (ESMF) Offline Regridding Weight Generator (The NCAR Command Language (Version 6.6.2),2022).

Causal Inference
The complexity of Earth system interactions within the RRM-E3SM simulations and the large number of grid cells within the spatial domain of analysis makes it difficult to unambiguously disentangle the impact of resolution and forecast lead time on processes and interactions between hydrometeorological variables.Thus, in the present study, we use causal inference to gain insights into the interactions between atmospheric and land-surface variables on one hand, and total runoff on the other.To the best of our knowledge, this is the first application of this framework for this style of problem.Causal inference allows us to move beyond canonical correlation analysis while reducing the dimensionality of analysis to investigate interactions in the model.The goal of causal inference methods is to determine causal relationships between hydrometeorological variables by using concepts of statistical conditional independence on time series data.These methods are gaining popularity in the Earth and environmental sciences community (Ombadi et al., 2020;Runge, 2023;Runge et al., 2019;Sugihara et al., 2012) and offer a unique perspective to evaluate relationships.
We use the Peter-Clark (PC) algorithm (Spirtes & Glymour, 1991), a causal inference method that utilizes graph theory and graphical rules to recover causal relations from time series data.The PC algorithm starts with a fully connected graph where all variables are causally related to each other, then iteratively and systematically removes causal relations using conditional independence tests.One of the main advantages of the PC algorithm is its ability to reduce the number of variables in the conditioning set, thereby mitigating the "curse of dimensionality."We chose to use the PC algorithm because it provides good performance in hydrometeorological systems, especially in controlling the number of falsely detected causal links (Ombadi et al., 2020).For our conditional independence tests, we used information-theoretic conditional independence instead of partial correlation due to its ability to detect nonlinear relationships (Ombadi et al., 2021).
Our causal analysis considers direct and instantaneous causality between the time series of the four key hydrometeorological variables (i.e., integrated vapor transport [IVT], precipitation, SWE, and 10 cm soil moisture content) on total runoff volume for all grid cells within a specific spatial domain (e.g., California-wide or the mountainous headwaters of a surface reservoir).We have excluded time-lagged causal relations as our analysis is focused on spatial variability instead of temporal variability (mainly to increase the statistical sample size of our short-duration forecasts).Causality was assessed at a statistical significance level of 0.05.Notably, IVT and precipitation are treated as distinct causal factors on total runoff volume as Guan et al. (2016) shows that AR-induced ventilation above and condensation of water vapor onto snowpack, which IVT is a proxy for, during rain-on-snow events can be energetically larger drivers of snow loss than raindrops onto snowpack at temperatures close to the freezing level.Murphy (1993) provides terminology to discuss forecast verification qualities that both forecasters and users of forecasts find important.In this study, we will evaluate RRM-E3SM's representation of the California New Year's flood event of 1997 according to forecast quality (forecast correspondence to observations) and forecast value (forecast utility to decision makers).We use the effects of horizontal resolution and forecast lead time to assess forecast quality and value via measures of bias (the difference between forecast and observation), association (linear correlation between forecast and observation), sharpness (forecast capability in representing extremes), and through measures of value (e.g., reservoir inflow volumes).

Resolution Influence on Atmospheric Process Representation of the 1997 Flood
We first compare the influence of regional grid refinement over California by evaluating how the representation of the large-scale atmospheric circulations that shaped the landfalling AR on New Year's Day of 1997 differ according to the resolution of the regional refinement domain.Figure 3 compares the large-scale IVT fields and circulation patterns of ERA5 and the three grid refinement resolutions at the start of the major AR landfall on 1 January 1997.The RRM-E3SM values are six-member forecast averages.The RRM-E3SM simulations forecast the low-pressure center near the Pacific Northwest coastline further southwest than it is in ERA5 on this date (Figure S2 in Supporting Information S1).The simulations generally agree across resolutions on the spatial distribution of AR categories from the California Bay Area up through the Sacramento Valley (Figure 4 and Figure S3 in Supporting Information S1).Agreement is also found with ERA5 in the northern portions of California, particularly with regard to category 5 AR conditions (Figure S4 in Supporting Information S1); however, all RRM-E3SM simulations systematically produce AR categories that are too high in southern California.This appears to be due to a disagreement in the AR width and/or the centroid of the AR landfall location with ERA5, which occurs further South (as indicated by positive IVT anomaly from central to southern California in Figure 3) and due to uniformly higher wind speeds (Figure S4 in Supporting Information S1).Notably, ERA5 may under-represent AR activity in southern California compared to other reanalyses (Collow et al., 2022).
Although IVT is important from a forecasting perspective, particularly since it allows for longer forecast lead times than precipitation (Lavers et al., 2016), IVT is simply one metric indicating the potential for precipitation to occur, and its orientation with respect to terrain can suppress or enhance precipitation (Ricciotti & Cordeira, 2022).Therefore, we also evaluate how the precipitation potential across RRM-E3SM simulations is realized in the 1997 flood, particularly its association and sharpness.The forecast ensemble average storm total precipitation amounts are shown in Figure 5.This figure compares simulated precipitation values with reanalysis and gridded climate products as well as a conventionally used regional climate model (WRF, forced by ERA5) at the grid cells nearest to the 52 precipitation gauges used in NOAA's storm summary of the 1997 flood.Map plots of storm total precipitation are also provided in Supporting Information S1 (Figure S5).Refinement from 14 to 3.5 km in RRM-E3SM has an appreciable effect on the statistical distribution of storm total precipitation, including the mean, median, and maximum.RRM-E3SM (3.5 km) matches the distribution of storm total precipitation at the 52 precipitation gauge sites better than other data sets, including the Livneh product.RRM-E3SM (3.5 km) agreement (r = 0.73) in storm total precipitation holds across individual precipitation gauge sites as well (Figure S6 in Supporting Information S1), particularly precipitation gauges in the northern Sierra Nevada, which have the highest precipitation totals (e.g., Buck's Lake and La Porte).Note that the WRF simulations were conducted at 14 km resolution and do not represent an even comparison with RRM-E3SM (7 km) or RRM-E3SM (3.5 km).The superior skill of models, relative to statistical interpolation and extrapolation techniques utilized in gridded climate products, in representing mountain precipitation processes have been noted before (Lundquist et al., 2019).
In contrast to landfalling AR characteristics, we found storm total precipitation to be resolution-dependent.We hypothesize that this is likely a result of more realistic topographic representations of California's Coast Ranges and Sierra Nevada.In addition, we hypothesize that important mesoscale circulation features known to influence the spatiotemporal characteristics of precipitation in northern California are better resolved.One such feature is the Sierra Barrier Jet (SBJ), a classic terrain-parallel low-level jet.The SBJ results from the blocking, slowing, and subsequent counter-clockwise turning of low-level winds as they interact with the Sierra Nevada in a stable environment.The SBJ has a typical core of peak winds at ∼500m to 1 km (∼950-900 hPa) above the Central Valley with wind speeds ≥15 m/s (Neiman et al., 2010(Neiman et al., , 2013)).Air within ARs is often moist-neutral (Cobb et al., 2021).When ARs interact with stable air masses in the Central Valley a mesoscale high-pressure system is created that turns the AR's low-level winds and over time erodes the stable air mass as the AR and SBJ interact (Kingsmill et al., 2013).
The location and strength of the SBJ play an important role in driving California's precipitation maxima during AR events (Neiman et al., 2013).This precipitation maximum usually occurs northwest and upstream of the Sierra Nevada crest, typically around the Buck's Lake precipitation gauge (39.85°N, 121.24°W) in the headwaters of the Oroville Dam.To examine RRM-E3SM skill in representing the SBJ, we compare winds using analogous cross-sections and transect lines outlined in Hughes et al. (2012) that dissect the typical locations of the SBJ in California.Unfortunately, wind profiler observations could not be found for the 1997 event and we rely on ERA5 to assess the horizontal resolution sensitivity of RRM-E3SM, acknowledging that ERA5 may be too coarse and/ or overly constrained by data assimilation to well resolve the SBJ.
Figure 6 shows cross-sections of zonal and meridional winds for ERA5 and the RRM-E3SM simulations at the start of the AR landfall on 1 January 1997.Similarly to previous findings in Figure 4; Figure S4 in Supporting Information S1, wind speeds are generally stronger in RRM-E3SM cases compared with ERA5.However, the altitude, latitudinal, and longitudinal locations of the wind speed maximum do generally agree with ERA5.RRM-E3SM simulates the SBJ and locates its core between 950 and 900 hPa at around 40°N, 122°W.Resolution plays an important role in RRM-E3SM better matching with ERA5 on the location of the wind speed maximum both with altitude and latitudinally.Similarly, RRM-E3SM (3.5 km) shows higher wind speeds from 1000 to 900 hPa and more orographic uplift potential along the windward sides of both the Coast Ranges and the Sierra Nevada.This favors more orographic precipitation, as is shown in Figure 5.
To assess RRM-E3SM skill in representing the entire lifecycle of the SBJ, we now show vertical profiles of both meridional and zonal winds, from both a Sierra-parallel and Sierra-perpendicular perspective, compared with ERA5 (Figure 7).Prior to the onset of the flood event, on 31 December 1996, the RRM-E3SM simulations show the jet beginning to form at the right altitude relative to ERA5, but slightly stronger.On the first day of the flood event (1 January 1997), RRM-E3SM (3.5 km) best represents the altitude location (∼950-1000 hPa) and strength (20-25 m/s) of the SBJ.The jet altitude and latitudinal location and strength match with the findings of Neiman et al. (2013) for other couplets of AR-SBJ events identified using a combination of in situ measurements (including vertical wind profilers) and reanalysis products.The RRM-E3SM results also corroborate the conclusion made by Hughes et al. (2012) that approximately a six-km horizontal resolution is needed to properly represent the SBJ in model simulations.However, regardless of RRM-E3SM resolution, the SBJ becomes both weaker and/or lower in altitude relative to ERA5 on 3-4 January 1997.

Resolution Influence on Land-Surface Process Representation of the 1997 Flood
Although the 1997 flood was one of the most costly and damaging floods in northern California history, a non-stationary return period analysis of the Livneh product at the 52 gauge sites indicates that it was, at most, a 1-in-20-year event at a few gauge locations, based on 4-day precipitation total estimates over the 105-year record covering 1915-2019 (Figure 8).At 50% of gauge locations, the return period of the event was less than 6 years.This implies that the flooding was notable due to it being a compound extreme shaped by not only the precipitation provided by the sequence of storms, culminating in a category 5 AR landfall on 1 January 1997 but also antecedent land surface conditions that were primed for snowmelt and runoff generation.The importance of antecedent conditions and land surface feedbacks was shown by Ivancic and Shaw (2015) where only 36% of the 99th percentile discharge events occurred due to a 99th percentile precipitation event when evaluated CONUS-wide between 1950 and 2000.
To evaluate the role that antecedent and land surface conditions played in shaping the flood event, we now assess the change in SWE, or dSWE, for the category 5 AR storm duration (Figure 9).Analogously to the storm total precipitation analysis, we show storm duration dSWE across 50 SNOTEL sites throughout northern California, southern Oregon, and Nevada compared to the Margulis product.Map plots of storm duration dSWE are provided in Supporting Information S1 (Figure S7).Model resolution also plays an important role in the distributions of both positive and negative dSWE in the California Sierra Nevada.This is likely due to the influence of topographic resolution on the simulated freezing level and the rain-snow partitioning of the AR event, which in turn influences the land surface representation of the accumulation and ablation of the mountain snowpack at mid-to-high elevations.The 50 SNOTEL sites indicate that more negative dSWE occurred over the duration of the 1997 flood (−152 mm/−6 in).However, at higher elevations, positive dSWE also occurred (+102 mm/+4 in).In comparison, the Margulis product indicates that more positive dSWE occurred (up to +254 mm, or +10  inches, in certain locations).Although a general negative dSWE skew in the statistical distribution is shown for RRM-E3SM, with every 2× refinement in resolution over California the simulations more closely approximate the statistical distribution from the 50 SNOTEL location observations.The negative dSWE skew in RRM-E3SM simulations is partially owed to a systematic underrepresentation of elevation at SNOTEL sites (Figure S8 in Supporting Information S1) with an average elevation of 2150 m compared to 2039 m in RRM-E3SM (3.5 km).
Figure 10 shows the effects of resolution on the spatial representation of precipitation and runoff characteristics.The differences across each RRM-E3SM case are explicitly shown in Figure S9 in Supporting Information S1.Storm total precipitation is enhanced at finer horizontal resolutions, particularly along the Coast Range and crest of the Sierra Nevada, upwards of 250 mm in RRM-E3SM (3.5 km) relative to RRM-E3SM (14 km).However, a general dry (wet) bias across RRM-E3SM simulations is seen in northwestern California's Klamath Mountains (Sierra Nevada) when compared with the Livneh product (Figure S10 in Supporting Information S1).Notably, the Livneh product had a general dry bias compared with precipitation gauge measurements (Figure 5; Figure S6 in Supporting Information S1).This indicates that Sierra Nevada crest precipitation overestimates in RRM-E3SM may not be as severe as is shown in Figure S10 in Supporting Information S1, corroborates the findings of Lundquist et al. (2019), and would support the claims made about the underrepresentation of gridded climate products' AR-related precipitation in Lundquist et al. (2015).Model resolution also plays a key role in shaping both the rain-snow partitioning of precipitation and the efficiency at which water vapor becomes precipitation (Figure 10; Figure S9 in Supporting Information S1).Snowfall is enhanced by upwards of 20% in high-elevation regions of the California Sierra Nevada, particularly in the headwaters of the American River through the Kern River watersheds.Similarly, the precipitation efficiency (the amount of precipitation per unit of integrated water vapor) is enhanced by upwards of 20% throughout the Klamath Mountains, Coastal Ranges, and the Sierra Nevada in RRM-E3SM (3.5 km).The combination of enhanced and more efficient precipitation and alterations to rain-snow partitioning changes the signature of runoff efficiency (the total runoff amount per total precipitation amount).Runoff efficiency is generally enhanced by upwards of 60% at low-to mid-elevations in northern California in RRM-E3SM (3.5 km) compared to RRM-E3SM (14 km), whereas in the high-elevation southern Sierra Nevada, a decrease is simulated.The enhanced runoff efficiency in RRM-E3SM (3.5 km) is likely associated with more precipitation that is falling on wetter soils and, importantly, more snowmelt (as seen with more grid cells with runoff efficiencies at or exceeding 1).Conversely, runoff efficiencies decline in RRM-E3SM (3.5 km) where snowfall is enhanced, which agrees with SNOTEL sites that indicate that positive dSWE changes occurred during the 1997 flood (Figure 9).
Even without a calibrated hydrologic model, comparing simulated inflow to USACE estimated inflows provides context for how well the model captures the key hydrologic-focused land-atmosphere interactions.This is because, in order to properly estimate reservoir inflows in the context of the 1997 flood, it is necessary that the model properly forecast the AR translational speed, plume intensity, and landfall location; the antecedent land surface conditions (e.g., snowpack and soil moisture); and the land-atmosphere interactions during and after the storm.Furthermore, model evaluation should also be done in decision-relevant regions (e.g., watersheds) instead of arbitrary latitude-longitude boxes.Therefore, to evaluate the value of the RRM-E3SM forecasts, we investigate reservoir inflows from the headwaters of eight major reservoirs, which represent a third (13.3 million-acre feet) of California's surface reservoir storage (Figure 11).Reservoir inflows are computed as basin averages of total runoff provided by the land-surface model in RRM-E3SM.In the headwaters of the two largest reservoirs (Lakes Shasta and Oroville), all simulations overestimate inflows, and resolution systematically increases the volume of water flowing through the system.This may be due to several factors, including a lack of parameter calibration in the land surface model (e.g., soil characteristics) and/or antecedent soil moisture being too high.Unfortunately, we could not find estimates of soil moisture content, from either in situ or remote sensing sources, and were unable to evaluate soil moisture as we did precipitation and snowpack.We were also unable to find piezometer data recording groundwater height changes.
Although the magnitude of reservoir inflows is biased even in RRM-E3SM (3.5 km), the shape of the reservoir inflow time series improves at finer resolutions.Kling-Gupta Efficiency (KGE) comparisons between RRM-E3SM simulations and USACE estimated reservoir inflows show that RRM-E3SM (3.5 km) is either the best or second best-performing simulation in seven of the eight reservoir regions, save for Oroville (Table S1 in Supporting Information S1).Across RRM-E3SM simulations, Shasta and Oroville reservoir inflow estimates depend more strongly on antecedent conditions (i.e., reservoir inflows at the beginning of 30 December 1996).In contrast, Folsom and New Melones Reservoirs antecedent conditions seem to play a lesser role in model performance, with model drift in reservoir inflow estimates starting to occur one to 2 days after the forecasts have begun.Moving further south along the western slopes of the Sierra Nevada to Pine Flat and Terminus, RRM-E3SM (3.5 km) matches USACE estimated reservoir inflows remarkably well, regardless of antecedent condition issues.Finally, RRM-E3SM simulations in the headwaters of Success and Isabella reservoirs match neither the amplitude nor shape of the USACE estimated reservoir inflows, particularly Isabella.The lack of match between simulated and USACE estimated inflows is likely influenced by infrastructure and/or management decisions made above the reservoirs in these headwater regions, especially since RRM-E3SM simulations do not account for these factors.
To better contextualize RRM-E3SM runoff forecasts across resolution, we employ the PC causal inference algorithm with the conditional mutual information test (Ombadi et al., 2020;Spirtes & Glymour, 1991).The influential strength of four hydrometeorological variables (i.e., IVT, precipitation, SWE, and 10 cm soil moisture content) on total runoff (overland flow, interflow, and baseflow) across California and within its 10 major reservoir headwater regions is shown in Figure 12 and Figure S11 in Supporting Information S1.The higher the stacked bar, the more variance is explained in total runoff.Each of the four hydrometeorological variables contributes a value ranging between zero and one, with a maximum possible total of four across variables.Across California, our causal analysis framework agrees with our prior suggestions that resolution plays an important role in amplifying the strength that both soil moisture content and SWE play in total runoff magnitude.With that said, atmospheric conditions (IVT and precipitation) heavily influence the total runoff signal across California comprising 84%-94% of the total variance explained by the four chosen hydrometeorological variables (Figure S12 in Supporting Information S1).However, this causal relationship does change considerably from one reservoir headwater region to another (particularly in the central to southern Sierra Nevada).
Through this causal inference framework, we can also see that in certain reservoir headwater regions, resolution plays a systematic role in either adding more interactions between total runoff (more components contributing to each stacked bar) and all of the hydrometeorological variables (e.g., New Melones) or simplifying interactions to a single (e.g., Oroville) or fewer hydrometeorological variable(s) (e.g., Shasta).In other headwater regions, there is an insensitivity to resolution (e.g., Don Pedro and Isabella).In New Melones Lake, where runoff interaction diversity increases the most, IVT and SWE play no role in shaping runoff in RRM-E3SM (14 km) and RRM-E3SM (7 km), with a nearly 50/50 split between precipitation and soil moisture, whereas RRM-E3SM (3.5 km) shows a more equal interaction between all four hydrometeorological variables and runoff.Conversely, in Lakes Shasta and Oroville, three hydrometeorological variables play a key role in runoff forecasts in RRM-E3SM (14 km) and RRM-E3SM (7 km), yet precipitation becomes the dominant variable of influence in RRM-E3SM (3.5 km), 91% and 100%, respectively (Figure S12 in Supporting Information S1).Finally, both Lake Don Pedro and Isabella Lake have an insensitivity to resolution where precipitation and soil moisture content play comparable roles in shaping total runoff across RRM-E3SM simulations.
To summarize the resolution dependence of RRM-E3SM simulations found thus far, we use Taylor diagrams (Figure 13) to show that although large-scale meteorology is relatively insensitive to finer horizontal resolutions (14-3.5 km), even for landfalling AR characteristics (Figure 4), storm characteristics (e.g., storm total precipitation) and land-atmosphere interactions (e.g., storm duration dSWE) are sensitive to resolution.

Forecast Lead Time Influence on Atmospheric and Land-Surface Process Representation of the 1997 Flood
Dispersion in model results associated with forecast lead time is also shown.This will be the focus for the rest of our analysis, but to decrease the dimensionality of our analysis we focus on the best-performing simulation, RRM-E3SM (3.5 km).Both storm total precipitation and storm duration dSWE are weakly and not systematically sensitive to forecast lead time (Figure 14).The highest storm total precipitation and positive storm duration dSWE occurred in the forecast that was initialized on 29 December 1996 at 00Z.This finding is counter to our original hypothesis that forecast skill should increase as forecast lead time gets closer to 31 December 1996.This assumption was made because the 30 December 1996 at 12Z forecast has the least amount of time to drift from the conditions provided by ERA5 which could influence, for example, the AR intensity, landfall location, and translational speed.
Although forecast lead time does not appear to have a significant influence on storm total precipitation and storm duration dSWE over the period of 31 December 1996-4 January 1997, these metrics may mask temporal dependencies.To determine whether there are important diurnal and/or day-to-day differences across forecast lead times, Figure 15 shows both 6-hourly rates and cumulative 6-hourly totals for precipitation, dSWE, and runoff.The cumulative total precipitation estimated at the 52 precipitation gauge stations is well bracketed by the six RRM-E3SM (3.5 km) forecasts.Hourly rates in precipitation show that precipitation diverges most across the six forecasts on 3 January 1997 (or 4 to 6 days post initialization of the forecast).From the perspective of dSWE, evaluated across the 50 SNOTEL sites, the six forecasts generally have similar tendencies throughout the flood period, but also disagree most on 3-4 January 1997.Negative dSWE values, an indication of the magnitude of snow ablation caused by the AR, were highest on 3 January 1997 in both observations and forecasts.The forecast spread on 3 January 1997 was −2 mm/hour to −7 mm/hour, which was generally stronger than was observed at SNOTEL sites.Undoubtedly, the spread in precipitation and SWE across forecasts from 3 to 4 January 1997 influenced runoff rates and totals in the reservoir headwater regions (Figure S13 in Supporting Information S1).
Finally, we evaluate how RRM-E3SM (3.5 km) forecast lead time influences the causal strength and relationship between runoff and the four key hydrometeorological variables (i.e., IVT, precipitation, SWE, and 10 cm soil moisture content) over the period of 31 December 1996-4 January 1997.Interestingly, California-wide causal strength of the hydrometeorological variables on runoff generally is maintained across the six forecast lead times Figure 16.Atmospheric conditions (IVT and precipitation) dominate the runoff signal (74%-87% range across forecasts for the total variance explained for the four hydrometeorological variables chosen).The dominance of atmospheric conditions on runoff across forecasts holds in the headwaters of both Lakes Shasta and Oroville.However, akin to the resolution-focused results, antecedent conditions and land surface feedbacks play a larger role in shaping runoff in the reservoir headwater regions of the central to southern Sierra Nevada.For example, in the central and southern Sierra Nevada (New Melones Lake, Lake Don Pedro, and Isabella Lake) the role of antecedent and land surface conditions represents 46%-51%, 40%-51%, and 30%-51%, respectively, on the causal relationship with runoff.Again, these percentages represent the range across forecasts for the total variance explained for just the four hydrometeorological variables chosen.The comparative randomness of forecast lead time relative to resolution on the causal strength and relationship of hydrometeorological variables on total runoff is likely due to the difficulty of exactly recreating the category 5 AR event life cycle.ARs have complex spatiotemporal structures that are hard to predict at watershed scales, particularly the AR landfall location latitude; the sweeping comma-shaped nature, topographic orthogonality, and translational speed of the AR plume at landfall; and the precise precipitation magni tude and rain-snow partitioning over the storm duration.This combined with biases in the forecast land-surface initial conditions, most of which are not truly constrained by in situ observations (e.g., soil moisture probe data and groundwater table levels), could help to explain the randomness of forecast lead time on total runoff at individual reservoir regions.

Summary and Conclusions
We used a storyline approach to recreate California's flood of record, the New Year's flood of 1997, using a regionally refined Earth system modeling approach, RRM-E3SM.This is the first time RRM-E3SM has been used to systematically assess how both forecast lead time and model horizontal resolution influenced forecast skill for a key western United States hydrometeorological extreme event.Across formal measures of forecast quality and value, RRM-E3SM (3.5 km) had the highest skill in recreating the 1997 flood compared with lower-resolution versions of E3SM validated against in situ, reanalysis, and gridded climate products.
RRM-E3SM's ability to simulate the North Pacific large-scale circulation patterns prior to and during the 1997 flood was minimally influenced by the refinement of horizontal resolution over California.RRM-E3SM simulations largely agreed with ERA5 in northern California, particularly for extreme AR conditions.However, all RRM-E3SM simulations systematically produce excessively high AR categories in southern California owing to regionally elevated water vapor and stronger winds throughout California.Regional refinement resolution in E3SM is important to the representation of storm total precipitation and storm duration changes in SWE.At 3.5 km, RRM-E3SM best represents the statistical distributions of storm total precipitation at observation stations, with particular improvement in the precipitation maxima.We attribute this to improved representation of both topography and important mesoscale circulations in driving precipitation location and magnitude, notably the Sierra barrier jet.Enhanced snowfall at higher elevations and snowpack ablation at low-to-mid elevations are also better represented in RRM-E3SM (3.5 km).
Reservoir inflows represent the integrated watershed response resulting from interactions between atmospheric processes and the land surface.Simulated inflows exhibit mixed forecast skill across RRM-E3SM simulations as reservoir inflow time series magnitude and occasionally shape, were off across RRM-E3SM simulations.This is partly due to the integrated surface-through-subsurface hydrology being simulated with uncalibrated (or "out-of-the-box") parameter settings.As Golaz et al. (2022) notes, E3SMv2 land surface model structure and parameter settings are inherited from CESM2 (Danabasoglu et al., 2020) and evaluated using global mean-state skill measures (Collier et al., 2018).Using these parameter values shows how E3SM's default settings, often optimized for mean state skill, represent extreme runoff.Although uncalibrated, RRM-E3SM (3.5 km) generally matched the time series shape of reservoir inflows across five of the eight major reservoirs in California and had the highest or second-highest Kling-Gupta efficiency scores across seven of the eight reservoirs.However, a general high bias in reservoir inflows was found across RRM-E3SM simulations and agrees with the general high bias in California DJF mean streamflow identified in Tang et al. (2022).
In addition to not accounting for water management infrastructure in E3SM, there were difficulties in validating certain aspects of the 1997 flood.Although the antecedent conditions (e.g., soil moisture content and groundwater table levels) provided by the "Betacast" offline 5-year ELM spinup procedure shaped reservoir inflow estimates, more observationally constrained initial conditions were not available.Soil moisture content data (both in situ and remote sensing-based estimates) were impossible to find at sub-monthly timescales prior to the year 2000.Similarly, observations of groundwater states (e.g., piezometers and/or satellite-based estimates) were not publicly available.
Forecast lead time resulted in a random effect on the meteorological representation of the 1997 flood.We speculate this is because the forecast lead times chosen (2-to-4 days prior to the 1997 flood onset) were comfortably within the forecast predictability of large-scale synoptic events like ARs (Haiden et al., 2021).Results were therefore dependent on more chaotic spinup processes such as mesoscale features associated with the main precipitation shield and/or small-scale interactions of flow with orography.Although examining forecast skill beyond timescales of a week in E3SM is beyond the scope of this study, L 'Heureux et al. (2021) indicates precipitation forecast skill in Earth system model forecasts for California sharply drops beyond lead times of 8 days.Practically, the lack of meteorological sensitivity with forecast lead times between 2 to 4 days implies that if a water manager is interested in event evolution at a specific point, an ensemble forecast approach is advantageous.A 5-day lead time could provide water managers with sufficient time to decide whether to hold or to release water, also known as forecast-informed reservoir operations (Reid, 2015).However, supporting decision-making at longer timescales requires predictive skill at sub-seasonal to seasonal timescales (Sengupta et al., 2022) and the use of climate modes of variability that may allow the 2-week predictability limits of the atmosphere to be surmounted (DeFlorio et al., 2019;H. Huang et al., 2021;Zhou et al., 2021).
Overall, RRM-E3SM (3.5 km) forecast ensemble average skill in recreating the 1997 flood gives confidence in its utility to support resiliency planning for extreme events.With time, we expect a higher frequency of winter rain events that lead to flooding, particularly in mountains (Ombadi et al., 2023).At present, compound extreme events are impossible to detect in a seasonal forecast, emphasizing that a skillful longterm forecast remains critical to minimize certain negative impacts of extreme storms.To further the utility of these storyline simulations, we plan to investigate flood characteristics if a similar event occurred without anthropogenic climate change or took place again at different global warming levels.We hope such storyline recreations of the 1997 flood event, under past and future climate conditions, can support ongoing efforts in climate resiliency planning by providing perspective on how extreme events may change given projected warming.

Data Availability Statement
Analysis and model simulations were performed using the National Energy Research Scientific Computing Center (NERSC), specifically Cori-Haswell and Cori-KNL supercomputing facilities.ERA5 is publicly available at the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) at https://cds.climate.copernicus.eu/#!/search?text=ERA5.The SSM/I data used in Figure 1 are produced by Remote Sensing Systems and sponsored by NASA.Data are available at www.remss.com/missions/ssmi.The Betacast source code is available at https://github.com/zarzycki/betacast.The RRM-E3SM simulation data and analysis scripts used for this study are provided via a NERSC Science Gateway -https://portal.nersc.gov/archive/home/a/arhoades/Shared/www/California_New_Years_Flood_1997.This study was primarily funded by the Director, Office of Science, Office of Biological and Environmental Research of the U.S. Department of Energy Regional and Global Model Analysis We would also like to acknowledge Smitha Buddhavarapu and Kripa Jagannathan for the considerable time and energy they provided in facilitating scientist-stakeholder discussions in the HyperFACETS project.The facilitated discussions played a key role in helping us to choose the 1997 flood event as a featured storyline within the Hyper-FACETS project.We also appreciate their constructive suggestions on our manuscript draft.We also thank the two anonymous reviewers and editor for their constructive comments and revision requests that elevated the impact and enhanced the clarity of our manuscript.

Figure 1 .
Figure 1.(a) Special Sensor Microwave Imager (SSM/I) integrated water vapor on 1 January 1997.(b) Tahoe City precipitation, snowfall, and snow depth from 1 December 1996-10 January 1997.(c) Examples of all-time peak daily flows set during the event on major river systems in California and Nevada.(d) Reservoir releases from Lake Oroville approached 4,530 cubic meters per second (160,000 cubic feet per second).(e) Flooding inundated the Sacramento Valley of California following heavy rainfall and snowmelt.Images (d, e) courtesy of the California Department of Water Resources.

Figure 2 .
Figure2.The Regionally Refined Mesh enabled Energy Exascale Earth System Model (RRM-E3SM) grids used to recreate the 1997 flood at horizontal resolutions of (a) 0.125° (∼14 km) (b) 0.063° (∼7 km) and (c) 0.031° (∼3.5 km) focused over California.Each RRM-E3SM case's topography is provided to the right of the grid refinement map.Note that ocean bathymetry is not represented in the RRM-E3SM simulations, but is included here for illustrative purposes.

Figure 5 .
Figure 5. (a) Storm total precipitation (31 December 1996-4 January 1997) from the Livneh product.Green dots highlight the locations of the 52 precipitation gauges used by NOAA to produce the 1997 flood event storm summary (https://www.cnrfc.noaa.gov/storm_summaries/ol.php?storm=jan1997).(b) Violin plots of reanalysis and model estimate storm total precipitation derived from the nearest grid cell to the 52 stations shown in (a).The mean is shown with a white dot, and white lines indicate the 25th, median, and 75th percentiles.The shape of each violin reflects the probability density function of the data.

Figure 6 .
Figure 6.Sierra-perpendicular (A-B) and Sierra-parallel (C-D) cross sections of meridional (v) and zonal (u) winds at the start of the 1997 flood event AR landfall (1 January 1997) for ERA5 and the six-forecast ensemble average estimates provided by RRM-E3SM.Contours depict the converse direction of wind speed.The longitudinal and latitudinal cross-section transect lines are shown on the right-most column sub-panel figures overlaid on California.In the case of Sierra-perpendicular (Sierra-parallel), positive values mean that winds are blowing from South to North (West to East).

Figure 7 .
Figure 7. Sierra-parallel (C-D) and Sierra-perpendicular (A-B) vertical profiles of zonal (u) and meridional (v) wind speeds at the latitudinal location of the jet maxima with altitude for ERA5 and the six-forecast ensemble average RRM-E3SM simulations.Latitudinal and longitudinal transects are depicted in the far right column of Figure 6a-6d) shows the vertical wind profiles at the intersection of the transects for the duration of the 1997 flood (31 December 1996 through 3 January 1997).

Figure 8 .
Figure 8.Return periods of the 4-day precipitation totals (Rx4day; 31 December 1996 through 3 January 1997) estimated using a non-stationary GEV framework on the Livneh product.To estimate the return period, the annual maxima of the Rx4day are interpolated to the precipitation gauge locations using first-order conservative remapping.The five stations shown (out of 52 total) are selected to indicate the minimum, 25th, 50th, 75th, and maximum Rx4day across the gauge locations.The left (right) y-axis provides Rx4day in English (metric) units.The horizontal and vertical dashed lines show the Rx4day and the corresponding return period in the Livneh product, as do the annotations in the bottom right.The x-axis (return period) is plotted on the log scale.

Figure 9 .
Figure 9. (a) Storm duration change in snow water equivalent, dSWE, (31 December 1996 through 4 January 1997) from the Margulis product.Black dots highlight the locations of the 50 SNOTEL stations within the vicinity of the 1997 flood.(b) Violin plots of reanalysis and model estimate storm duration dSWE derived from the nearest grid cell to the 50 stations shown in (a).The mean is shown with a white dot, and white lines indicate the 25th, median, and 75th percentiles.The shape of each violin reflects the probability density function of the data.

Figure 11 .
Figure 11.Forecast ensemble average reservoir inflow rates from each of the RRM-E3SM simulations across eight major reservoirs in California.The top figure shows the location of the eight reservoirs and the areal extent of the watersheds that feed into them (black outlines) overlaid onto Margulis product estimates of snow water equivalent, SWE, at the start of the 1997 flood.The black lines in the sub-panel plots represent the U.S. Army Corps of Engineers' estimated inflows into each reservoir.

Figure 12 .
Figure 12.Causal inference estimates for the magnitude of the impact of hydrometeorological variables on total runoff (overland flow, interflow, and baseflow).The four variables include integrated vapor transport (IVT), total precipitation (PRECT), snow water equivalent, and 10 cm soil moisture content (SOILWATER).The magnitude of the influence of each variable on total runoff (overland flow, interflow and baseflow) is represented by an individual component of a stacked bar chart.Each component has a range between 0 and 1. RRM-E3SM cases (designated by hatching) are stacked next to each other for each region assessed including California (Hydrologic Unit Code 18) and the headwater regions of the 10 major reservoirs in California (ordered by latitude from northernmost to southernmost).

Figure 13 .
Figure 13.Taylor diagrams representing all grid cells within the hydrologic unit code (HUC-2) California Region, region 18 in Seaber et al. (1987), for the forecast period of 31 December 1996 up to 4 January 1997.(a) Storm duration maximum integrated water vapor (IWV) compared to ERA5; (b) storm total precipitation compared to the Livneh product; and (c) storm duration change in snow water equivalent, dSWE, compared to the Margulis product.Each triangle represents one of the six RRM-E3SM forecasts initialized from 28 December 1996 at 00Z to 30 December 1996 at 12Z.Bold triangles represent the forecast ensemble average.Upward (downward) triangle orientation represents a positive (negative) bias compared to each reference dataset.Black radial lines provide general guidance for groupings of Pearson pattern correlation.The black and gray dashed azimuthal lines centered around REF indicate the root mean squared error and standard deviations from the reference dataset.

Figure 14 .
Figure 14.Same as Figures 5 and 9, but the violin plots now compare the initialization dates for each of the six RRM-E3SM (3.5 km) forecasts.Panels (a and b) show storm total precipitation and panels (c and d) storm duration change in snow water equivalent (dSWE).The six-forecast ensemble average (ensavg) is also provided.

Figure 16 .
Figure 16.Same as Figure 12, however, each stacked bar chart represents one of the six forecasts produced by RRM-E3SM (3.5 km) and conveys the strength of causal influence of four hydrometeorological variables, integrated vapor transport (IVT), total precipitation (PRECT), snow water equivalent, and 10 cm soil moisture (SOILWATER), on total runoff (overland flow, interflow, and baseflow).The forecast initialization date is indicated by different styles of hatching.